Comprehensive Review Of Data Visualization Techniques .

3y ago
89 Views
4 Downloads
397.20 KB
7 Pages
Last View : 30d ago
Last Download : 5m ago
Upload by : Gia Hauser
Transcription

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)Volume 3 Issue 2Comprehensive Review of Data VisualizationTechniques using PythonAyush Kumar RathoreAmity University Uttar Pradesh,Lucknow CampusDr.RanjanaRajnishAmity University Uttar Pradesh,Lucknow CampusAbstract: Big Data has changed the way that data is handled,processed and leveraged in any sector. Healthcare is one ofthe most important fields where making a transition can bemade. Visualization techniques can help to visualize therelationship between data in a visual or graphical manner.Data visualization primarily deals with relationship betweendata by various statistical parameters and then showingthem in the form of graph. When the data is presented in theform of visuals, it becomes easy to understand and makesinsights within the data and patterns clearly visible. Big datarefers to the vast amounts of information produced bydigitizing everything that different technologies store andprocess. Data visualization finds many applications,healthcare being the upcoming industry embracingtechnology to reap its benefits. Recent example of use of datavisualization in healthcare can be seen in the COVID-19scenario where lot of visual are appearing to demonstratevarious aspects of COVID-19. In the healthcare sector, theimplementation of big data analytics has many positivelifesaving outcomes. For individual, a single dashboard foreach patient may be created to show his/her entire history,that may help reducing time of treatment. Similarly, if we seedata of a bigger group, it may help us to identify somepatterns and make predictions regarding some infectiousdisease or epidemic. Data visualization is all aboutpresenting the data to the right people at the right time. Ithelps gain deep insights to the data.has thus recently become a topic of interest forresearchers, industry and academicians.In this paper, the objective is to have a comprehensivediscussion on Data Visualization and different techniquesthat are used for data visualisation.Keywords: Data Visualization, Line Plots, Area Plots,Histograms, Bar Charts, Pie Charts, Box Plots and ScatterPlots, COVID-19I.INTRODUCTIONIn recent times there has been steep rise in the data beingproduced by almost all areas like social media, healthcareetc. Not only the data is voluminous, it possesses all othercharacteristics of big data. Data Visualization helps toidentify hidden pattern in this huge dataset and showsthem in a graphical form with the help of different plots. ItData visualization has many fold benefits like betterunderstanding of data, information sharing, helps makingbetter decisions, improved ROI and time saving (due toquick glance of data). Visualization can also be presentedin the form a dashboard where quick links of importantanalysis are available, and important information aboutdata can be visualized at a glance.This paper consists of the six sections: Data Visualization,Tools used for data visualization, Python libraries used fordata visualization, Jupiter Notebook- Beginners Toolboxfor Visualization and Basic plotting using Matplotlib.II.DATA VISUALIZATIONIn this section we will discuss visualization of data andtake an example of how a visual is converted into one thatis more accurate, attractive and impactful. So, let’sproceed.Now, why should we learn how to view data?You may wonder? Okay, data visualization is a way ofdisplaying complex data in a graphically and easilyunderstandable manner. This can be particularly useful asyou seek to discover and familiarize your personal data.Since a picture is worth thousands of words, plots andgraphs can be extremely effective, specifically whentransmitting results to the public or when sharing the datawith other peer data experts. They can also be very helpfulin promoting your guidance to consumers, managers andother politicians in your field.In technical terms Data Visualization can be defined as“The process of translating data and metrics into charts,graphs and other visual reports. These visualizations letviewers discover patterns and relationships in the datathat they otherwise might not see — helping to turn theinformation into a cohesive story. Data visualizationenables organizations and individuals to gain a clearerunderstanding of their performance and goals.”. [1]42www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)Existing data visualization techniques can be classified as1D, 2D, 3D, multi-dimensional, temporal, tree andnetwork graphs.Some basic plots/charts used in Data Visualization are: III.Line PlotsArea PlotsHistogramsBar ChartsPie ChartsBox PlotsScatter PlotsTOOLS USED FOR VISULIZATIONLot of tools are available that help in making visualizationtask easier. Just like other fields, visualization tools arealso evolving and providing different ways to present thedata. We can have graphs, video, infographics and evenVR-AR components to make the presentations moreinformative. Some of the commonly used tools, accordingto Forbes [8], for data visualization are as listed below: Tableau Qlikview FusionCharts Highcharts Datawrapper Plotly SisenseMost of these are paid software, but they offer free trialversions. In this paper, we have not used any of the tools,but have used Python libraries to see how different typesof plots can be used to visualize data.IV.PYTHON LIBRARIES COMMONLYUSED FOR DATA VISULIZATIONPython provides many great graphic libraries filled withmany features. Whether you are interested in creatinginteractive or live stories, Python has some excellentlibraries:Here are a few popular plotting libraries to get a littleoverview: MatplotlibPandasSeabornggplotPlotlyVolume 3 Issue 2Darkhorse Analytics is a company that started out in 2008and did ground-breaking work on data visualization in aresearch laboratory at the University of Alberta. In severalfields including analysis of data and geospatial analyticsDarkhorse Analytics specializes across quantitativeconsulting. Your visual approach revolves around threekey points: less is more successful, more appealing andmore impactful. In other words, any function or designthat you introduce to your plot will endorse the messagethat the plot should be conveying and not diverting fromit.Let’s take a look at an example. The proceedingpagecontains a picture of the pie diagram of what peoplelike when it comes to various kinds of pig meat. Thecharts are almost half the preference for bacon in contrastwith the other types of pork meat. But I am sure almosteveryone agrees that this pie chart is a lot going on, andwe’re not even aware that it has features like bluebackground or 3d orientation. These other unwantedfeatures are simply distracting from the main message andcan confuse the viewer.Figure 1: Pig Meat PreferencesSo, let’s apply Darkhorse Analytics approach to transformthis into a visual that’s more effective, attractive, and hasmore impact. As I mentioned earlier, the message here isthat people are most likely to choose bacon over othertypes of pig meat, so let’s get rid of everything that can bedistracting from this core message. The first thing is let’sget rid of the blue background and the grey background.Let’s also get rid of borders as they do not convey anyextra information. Also let’s drop the redundant legendsince the pie chart is already color coded. 3D isn’t addingany extra information so let’s say bye to it. Text bolding isalso unnecessary, and let’s get rid of the different coloursand the wedges. But let’s thicken the lines to make themmore meaningful.43www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)Volume 3 Issue 2VI.BASIC PLOTTING WITH MATPLOLIBIn this section, we learn how to use the Jupiter scriptinginterface to produce nearly all visualization resources.Once you find, you can really build virtually all traditionalvisualization tools like histograms, bar charts, box plotsand many more, using only one element: the plot functionwhen using the plot feature.DatasetFigure 2: Bar Chart Pig Meat PreferencesIt looks a bit familiar now. Yes! After all, this is a bargraph that has horizontal bars. Finally, let’s highlightbacon to differentiate itself from the other kinds of porkmeat. Let’s now bring the pie chart and the bar graphtogether and compare what is better and morecomprehensible. I hope we all agree unanimously that thebar chart is the best of both. It’s simpler, smoother, lessdistractive and readable. Indeed, pie charts have recentlybeen put on fire by experts in data visualization, whoargue that they are only of relevance in the rarest ofsituations. On the other hand, bar graphs and charts areclaimed to be far better ways of getting a point acrossquickly. But don’t worry about this for now, we will comeback to this point when we learn how to create pie chartsand bar graphs with Matplotlib.V.In this section, we will learn more about the dataset thatwe will be using throughout the section. The data set Ihave chosen is on airline safety which was compiled byFiveThirtyEight using their large database of articles andnews. It was distributed under the Creative CommonsAttribution 4.0 International License as the GitHub kit andtheir data sets are licensed under the MIT License. To usethe dataset, you can go to GitHub and download theirdataset library or can download directly using link in thereference [3].We need to import the data in a pandasdataframe to startcreating different types of plots of data. We will have toimport the pandas’ library for extracting data from Exceltable files. Instead we call the read csvpandas function toread the data in pandas. As an argument, the functionread csv expects filename of the file that contains dataset.And let this dataframe be named df.JUPITER NOTEBOOK- BEGINNERSTOOLBOX FOR VISUALIZATIONHere we learn how to create plots with Matplotlib and usethe Jupyter notebook as our enviroment setting.Matplotlibis a well-established data visualization library, which iswell integrated in various environments like the ipythonshell, the web-application server, the graphical userinterface toolkit and the Jupyter notebook, for example. Itis now available in a number of different applications. It’san open source web application that allows you to createand exchange documents with live code views and someinformative texts. Jupyternotebook is an IDE for Pythondevelopment and has specialized support for Matplotlib,so you can simply import Matplotlib and get ready to go ifyou start a Jupyter Notebook. It is a very gooddevelopment environment for presentation of your workas well as for those who are beginning with data science.It has rich libraries to support data science and should bepart of data scientist toolbox.Figure 3: Importing Pandas and CSVIf you want to confirm that you have imported your datacorrectly, in pandas, you can always use the head()function to display the first five rows of the dataframe.Figure 4: Top five rows of the CSVLine PlotsLine plots are a trace in the form of a set of data pointsconnected by straight line, as its name indicates. It is amost simple type of diagram and is popular not only indata science but in many fields. When to use line plots,44www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)the more important question. The best way of using a lineplot is to have an on-going dataset and to display the dataover a time period. For an example, say we want to knowthe Available seat kilometres flown every week of top 5airlines in the dataframe. Based on this line plot, we canthen research for the safety measures which can be takenon these airlines. Okay, now, how can we generate thisline plot? Before we go over the code to do that, let’s do aquick recap of our dataset. Our dataset contains thefollowing columns:AirlineAirline (asteriskindicates that regionalsubsidiaries areincluded)avail seat km per weekAvailable seatkilometres flown everyweekfatal accidents 85 99Total number of fatalaccidents, 1985–1999fatalities 85 99Total number offatalities, 1985–1999incidents 00 14Total number ofincidents, 2000–2014fatal accidents 00 14Total number of fatalaccidents, 2000–2014fatalities 00 14Total number offatalities, 2000–2014Incidents 85 89Total number ofincidents, 1985–1999Now let’s process the dataframe. We already defined thedataframedf in Last section so using df as the dataframewe will plot the line graph.Volume 3 Issue 2Figure 5: Graph representing Available seat kilometresflown every weekWe can see in the graph Air Canada has the highest valueso more preference is to be taken for this flight. So, this ishow we plot a line graph.Area PlotsAn area plot is also called an area graph is a type of graphrepresenting total accumulations using numbers orproportions over time. It is based on the line plot and istypically used for comparing two or more values. So howcan we generate an area plot? It’s very simple we will usethe same dataframedf to plot area plot. Here we will usean example, say we want to compare incidents of top 5airlines in file from 1985-1999 and 2000-2014 we will usethe two fields incidents 85 99 and incidents 00 14 likethisFigure 6: Comparing incidents of top 5 airlines in filefrom 1985-1999 and 2000-2014Now from this graph we can infer that the incidents weredecreased drastically over time. And this is how we plotArea graphs.HistogramsA histogram is a way to represent a numerical data set’sfrequency distribution. The way this works implies thatthe number of data is split into bins, each datapoint in thedataset is allocated to a bin and then the number of datapoints assigned to each bin. The vertical axis is thefrequency or the number of datapoints in the individualbin.We can understand using an example, say we areinterested in knowing the fatalities between 1985-1999 soto do that we will use the same dataframe to plot ahistogram.45www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)Volume 3 Issue 2Figure 7: Fatalities between 1985-1999Note how the tick marks on the horizontal axis are notmatched with the tick marks. It makes it difficult todecipher the histogram. Let us try to fix this so that ourhistogram will be more efficient. The Numpy histogramfunction is one way to solve this problem. We continuewith Matplotlib and its scripting interface as usual, butthis time we also import the library of Numpy.Figure 9: Fatalities by flight Aeroflot in 1985-1999 and2000-2014And this is how we plot a bar graph.Pie ChartsA pie chart is a circular mathematical graph divided intoslices to display the numerical proportion. We canunderstand using an example, say we are interested inknowing incidents between 1985 – 1999 by the top 5flights in the file we can do like this.Figure 8: Using Numpy to plot the same graphAnd this is how Histogram is plotted in Matplotlib.Bar ChartsFigure 10: Incidents between 1985 – 1999 by the top 5flightsA bar chart sometimes referred to as a bar graph is a kindof plot in which every bar’s length is proportionate to itssize. The values of a variable are commonly used tocompare at certain times.Note we are using the same dataframedf. And by thegraph we can see that Aeroflot* flight has the highestincident rate between years 1985 – 1999. And this is howwe plot pie charts.For example, say we are interested in knowing fatalitiesby flight Aeroflot* in 1985-1999 and 2000-2014 we willuse the same data frame to plot bar chart like thisBox PlotsA box plot is a mathematical way of dispersing data infive main dimensions. The first calculation is zero, thesmallest of the sorted results. The second dimension is thefirst quartile (25% of the way through the sorted data). A46www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)quarter of the data points, in other words, are less than thisamount. Median, which is the mean of the sorted results,is the second dimension. The third quartile is the fourthdimension, which is 75 percent of the time through thesorted information. Three quarters of the data points areless than this amount, in other words. And the ultimateelement is the highest number in the sorted results.Volume 3 Issue 2Scatter PlotsA scatter plot is a plot type which shows values of twodifferent variables. It is usually a dependent variable to bemeasured against an independent variable to determinewhether the two variables have any correlation. Ofinstance, the salary and experience was measured and,looking at the numbers, it can be inferred that a personwith more years of experience is likely to earn a highersalary than a person with less years of experience.So, for example we say we are interested in correlationbetween salary and year of experience and to do that wewill make a new dataframe and use this dataframe to plota scatter graph like thisFigure 11: Components of Box PlotLet’s now see how Matplotlib is able to create a box plot.We can understand using an example, say we areinterested in knowing Available seat kilometres flownevery week of top 5 flights in the descending order we cando like thisFigure 12: Available seat kilometres flown every week oftop 5 flights in the descending orderFigure 13: Correlation between salary and year ofexperienceWe will use a new data set which can be found on kaggle(link in the references section). Now we will do like this[4]:Figure 14: Correlation between salary and year ofexperienceAnd this is how we make a box plot. Please Note we areusing the same dataframedf to plot the graph47www.amity.edu/ajcs

Amity Journal of Computational Sciences (AJCS)ISSN: 2456-6616 (Online)Note in scatter plot it is necessary to give x and y axis inplot function otherwise it will generate error. And this ishow we make a scatter plot.VII.DISCUSSIONIn this paper we discussed the need and importance ofdata visualization. We discussed the commonly used toolsused for data visualization, and about the different typesof visualization techniques like 1D, 2D, 3D, multidimensional, temporal, tree and network graphs. In thispaper, we have explored the basic charts using Pythonlibraries using pandas. We have used Jupiter notebook anddataset from github library to demonstrate various plots.VIII.CONCLUSIONVolume 3 Issue 2analysing. By using various visualization techniques, asper the data, companies, researchers or other using thesetools may find insights into their data and find patternswithin them. This may help in quick decisions leading tomany-fold benefits to the user.References[1][2][3][4][5][6][7][8]As seen from the above paragraphs, data visualisationplays an important role in providing visual images thatmake it easier to comprehend the information stored in thedataset. In this paper we have discussed the concept ofdata visualisation, use of Jupiter notebook for datavisualisation. We also discussed different types of plotsand the scenarios where each type of plot can be ofhelp.Data visualization may not be exact solution .amity.edu/ajcs

quick glance of data). Visualization can also be presented in the form a dashboard where quick links of important analysis are available, and important information about data can be visualized at a glance. This paper consists of the six sections: Data Visualization, Tools used for data visualization, Python libraries used for

Related Documents:

2.1 Data Visualization Data visualization in the digital age has skyrocketed, but making sense of data has a long history and has frequently been discussed by scientists and statisticians. 2.1.1 History of Data Visualization In Michael Friendly's paper from 2009 [14], he gives a thorough description of the history of data visualization.

discussing the challenges of big data visualization, and analyzing technology progress in big data visualization. In this study, authors first searched for papers that are related to data visualization and were published in recent years through the university library system. At this stage, authors mainly summarized traditional data visualization

The data source and visualization system have different data models. A database visualization tool must make a connection between the data source data model and the visualization data model. Some methods has been proposed and studied. For example, Lee [17] described a database management-database visualization integration, which

About Oracle Data Visualization Desktop 1-1 Get Started with Samples 1-2 2 Explore, Visualize, and Analyze Data Typical Workflow to Visualize Data 2-1 Create a Project and Add Data Sets 2-2 Build a Visualization by Adding Data from Data Panel 2-3 Different Methods to Add Data 2-3 Automatically Create Best Visualization 2-3 Add Data to the .

Types of Data Visualization Scientific Visualization – –Structural Data – Seismic, Medical, . Information Visualization –No inherent structure – News, stock market, top grossing movies, facebook connections Visual Analytics –Use visualization to understand and synthesize large amounts of multimodal data – File Size: 2MBPage Count: 28

language express all the facts in the set of data, and only the facts in the data. Effectiveness A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization. Design Principles [Mackinlay 86]

Data Visualization Lead Jose Lopez Web Application Lead Kiefer Giang Data Visualization Abubakir Siedahmed Data Analysis Kennedy Nguyen Web Application Fredi Garcia Data Visualization John Grover Rodriguez Data Analysis Leo Shapiro Web Application Isaac Villalva . Dr. Navid Amin

· Single-copy, protein-coding genes · DNA present in multiple copies: Sequences with known function Coding Non-coding Sequences with unknown function Repeats (dispersed or in tandem) Transposons · Spacer DNA Numerous repeats can be found in spacer DNA. They consist of the same sequence found at many locations, especially at centromeres and telomeres. Repeats vary in size, number and .