An Introduction To Data Visualization - GitHub Pages

1y ago
2 Views
1 Downloads
7.60 MB
62 Pages
Last View : 21d ago
Last Download : 2m ago
Upload by : Jewel Payne
Transcription

An Introduction to Data Visualization Anamaria Crisan @amcrisan acrisan@cs.ubc.ca http://cs.ubc.ca/ acrisan 1

PhD Master of Science (Computer Science) ( Bioinformatics ) 2008 PhD Candidate, Computer Science University of British Columbia 2010 2013 2015 British Columbia Centre for Disease Control GenomeDX Biosciences 2

Webinar Learning Goals Today Have a high-level understanding of data visualization design and evaluation Tomorrow Have a basic understanding of different data visualizations tools as well as their strengths and limitations 3

What we’ll talk about 4

Why should we visualize data? How do we use data visualizations? How should we visualize data? 5

A Comment on “How Should we Visualize Data?” There are two aspects of visualizations to think about: How do you make a visualization? Is it the right visualization? 6

Why should we visualize data? 7

Translating Numbers to Words It is not always easy to reason consistently with numbers http://bit.ly/1FxtT2z 8

Visualizing Data is Effective Least Understandable Probability 60% Frequency Visualization 6 in 10 Numeracy : the ability to reason with numbers § § Most Understandable Individuals with low numeracy have a difficulty interpreting numbers and probabilities Also true amongst educated professionals Visualization can make data more accessible to individuals with lower numeracy skills Whiting (2015) “How well do health professionals interpret diagnostic information? A systematic review” 9

But . Visualization Design ALSO matters

Example: Communicating Survival Benefit of Cancer Therapy Baseline Visualization Alternative 1 Alternative 2 11 Zikmund-Fisher (2013). A demonstration of ''less can be more'' in risk graphics.

Example: Infection Transmission in a Hospital OPTION A OPTION B 12

Example: Visualizing Arteries of the Heart for Surgery Planning Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing13

Example: Visualizing Arteries of the Heart for Surgery Planning EXISTING STANDARD Accuracy : 39% REVISED VISUALIZATION Accuracy: 91% Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing14

How do we use data visualizations? 15

Role of Data visualization in the current paradigm of scientific research Communication 16

Yes. Do you have a research Do all the Science! Inform the masses! Duh. Problem? No. But eventually you’ll have a problem right? 17 https://www.ratbotcomics.com/comics/pgrc 2014/1/1.html

Yes. Do you have a research Do all the Science! Inform the masses! Infographics are pretty Duh. Problem? Maybe data No. But eventually you’ll have a problem right? Visualization? 18

Yes. Do you have a research Inform Do all the Science! the masses! Infographics are pretty Duh. Problem? No. But eventually you’ll have a problem right? Did it work? Maybe data Visualization? 19

Yes. Do you have a research Inform Do all the Science! the masses! Duh. Problem? No. But eventually you’ll have a problem right? Did it work? No : ( Different Infographics? Maybe data Visualization? 20

Yes. Do you have a research Inform Do all the Science! the masses! Duh. Problem? No. But eventually you’ll have a problem right? Did it work? Yes! (maybe?) No : ( Different Infographics? Maybe data Visualization? Declare Victory 21

Limitation #1 : Missed Opportunity in Exploration Inform Do all the Missed Opportunity for Exploration § Exploration is looking at your data, trying different analysis methods, assessing if there are outliers or missing data etc. Science! the masses! Data Visualization! 22

Limitation #1 : Missed Opportunity in Exploration Same stats, different graphs (Anscombe’s quartet) 23

Limitation #1 : Missed Opportunity in Exploration Same stats, different graphs Autodesk Research (2017). Same Stats, Different Graphs: stats 24

Limitation #1 : Missed Opportunity in Exploration Same stats, different graphs (Datasaurus) Autodesk Research (2017). Same Stats, Different Graphs: stats 25

Limitations #2 : Identifying the Appropriate Vis Selecting the appropriate data visualization is challenging § True for exploration & communication applications Data Visualization! We’ll spend the rest of the talk on this subject 26

How should we visualize data ? 27

Cross Cutting Disciplines in Information Visualization Human Perception & Cognition Computer Graphics Data Analysis Visualization Design & Analysis 28

Encoding and Decoding Information R. Kosara (EagerEyes) – https://eagereyes.org/basics/encoding-vs-decoding

A Small Digression 30

Concrete Examples of Perception in Action Example 1: A Heat map Non-colour blind individual Colour blind individual Colour Blind Simulator: ess-simulator/ Example 2: The Dress

And we’re back! 32

Putting it all Together for Visualization Design & Analysis § Non-trivial to condense knowledge across all these areas § Still an ongoing area of research § I will try convey a simpler intuition about design & analysis 33

Breaking Down a Visualization in Three Questions Why? (Motivation) Why do you need to visualize data? How will you, or others, use the visualization? 34

Breaking Down a Visualization in Three Questions Why? (Motivation) Why do you need to visualize data? How will you, or others, use the visualization? What? (Data & Tasks) What kind of data is being visualized? What tasks are performed with the data? 35

Breaking Down a Visualization in Three Questions Why? (Motivation) Why do you need to visualize data? How will you, or others, use the visualization? What? (Data & Tasks) What kind of data is being visualized? What tasks are performed with the data? How? (Visual & Interactive Design) How do you make the visualization? Is it the right visualization? People tend to jump to this level and ignore why and what 36

Design & Evaluation with Three Questions Design Evaluation Why? Does the visualization address the the intended need? What? Are you using the right data, or deriving the right data? How? Are the visual & interactive choices appropriate for the data and tasks? Does the visualization support the tasks using that data? If interactive / computer based, is the visualization easy to use and reliable (i.e doesn’t crash all the time) 37

A Nested-model for Visualization Design & Analysis Why? What? Design How? Evaluation T. Munzner (2014) – Visualization Design and Analysis

Thinking Systematically about Data Visualization Infovis (Information Visualization) research advocates an iterative process Design Domain Problem* Data Task Visual Interaction Design Choices Algorithm Evaluation *Domain Problem Motivation T. Munzner (2014) – Visualization Design and Analysis 39

An Iterative Process An iterative approach to development allows us to get feedback before committing to ineffective design choices 40

Thinking Systematically about Data Visualization Domain Problem Data Task Visual Interaction Design Choices Algorithm 1. Identify a relevant problem that effects you or a group of stakeholders T. Munzner (2014) – Visualization Design and Analysis 41

Public Health Stakeholders § Multidisciplinary decision making teams § More data & diverse data types more informed decision making § BUT – different stakeholder abilities to interpret data & different needs Medical Health Officers Clinicians Nurses Researchers Community Leaders Patients Politicians 42

Thinking Systematically about Data Visualization Domain Problem Data Task Visual Interaction Design Choices Algorithm 2. Ask what data stakeholders use (is it available)? 3. Ask what stakeholders do with the data [tasks] T. Munzner (2014) – Visualization Design and Analysis 43

Many Different Types of Data! T. Munzner (2014) – Visualization Design and Analysis 44

Don’t Just Visualize the Raw Data! Example Example when this advice is ignored Original (Raw) Data Derived Data T. Munzner (2014) – Visualization Design and Analysis XKCD

People also Perform Different Tasks with Data DIAGNOSIS TASKS Patient Identifier Sample Collection Date Patient Prior TB Results Speciation Sample Type (sputum, fine needle aspirate) Culture results Sample Collection Site (lymph node, blood draw etc.) Acid Fast Bacilli Smear Resistotype Phenotype DST Chest x-ray Report Releate Date Requester IDs Interpretation or comments from reviewer Predicted DST MIRU-VNTR Cluster Assignment SNP/variant disance Phylogenetic Tree Reviewer ID TST results IGRA results Lab QC Spoligotype RFLP TREATMENT TASKS SURVEILLENCE TASKS TOTAL SCORE Choose Meds Choose Tx Duration Assess Response to Tx Guide Contact Tracing Report to Public Health Define a Cluster Connect case to Existing Cluster Guide Public Health Response 3 3 3 3 2 1 1 1 1 26 3 3 3 3 1 1 1 1 1 24 3 3 3 3 3 1 1 1 0 1 23 3 2 3 3 3 3 2 1 1 1 1 23 2 3 2 3 3 3 3 1 1 1 0 1 22 WGS data 1 3 2 3 3 3 3 2 1 1 0 1 22 Same 2 3 2 3 3 3 3 1 1 0 0 1 21 Speciation Predicted DST Predicted DST* NA Same Same 2 3 2 3 2 3 3 1 1 1 0 1 21 0 2 3 1 3 3 2 2 1 1 1 1 19 0 2 3 2 3 3 2 1 1 1 0 1 18 3 3 2 3 0 2 3 1 0 0 0 0 17 2 2 1 2 2 2 2 1 0 1 0 1 15 2 2 2 2 2 2 2 1 0 0 0 0 15 Same 2 2 1 2 2 2 3 1 0 0 0 0 15 Predicted DST SNPs Cluster Assignemnt SNPs Phylogenetic Tree Same Speciation** Speciation** WGS Speciffic SNPs SNPs 0 2 2 1 3 3 2 1 0 1 0 0 15 0 2 3 1 1 1 1 1 1 1 1 1 13 0 2 2 1 1 1 0 1 1 1 1 1 11 0 1 2 1 1 1 0 1 1 1 1 1 10 0 2 1 1 1 1 0 1 0 1 1 1 9 1 1 1 1 1 1 1 1 0 0 0 0 8 3 1 1 1 0 0 0 1 0 0 0 0 7 3 1 1 1 0 0 0 1 0 0 0 0 7 0 1 2 1 1 1 0 1 0 0 0 0 7 0 1 1 1 0 0 0 0 0 0 0 0 3 0 1 1 1 0 0 0 0 0 0 0 0 3 Diagnose Latent TB Diagnose Active TB 3 3 3 3 3 2 3 2 1 Same WGS equivalent Same Same Same Speciation Reactive vs New Characterize Acuqistion Transmission Risk Degree of consensus 3 (High) 2 (Some) 1 (Low) 0 (V. L ow) A Crisan (2017) – Evidence Base Design and Analysis of a whole genome sequence clinical report . 46

Thinking Systematically about Data Visualization Domain Problem Data Task Visual Interaction Design Choices Algorithm 4. Explore if other visualizations have addressed this problem and set of tasks & data 5. Implement your own solution (remember this include interaction!) T. Munzner (2014) – Visualization Design and Analysis 47

Example of a more complex visualization https://www.youtube.com/watch?v j4Ut4krp8GQ 48

A Small Digression 49

Marks & Channels : Basic Building Blocks Mark: Basic Graphical Element (basic building block) Channel: Controls the appearance of marks T. Munzner (2014) – Visualization Design and Analysis 49

Channels Vary in their Effectiveness Example Pie Chart Angle & Area Bar Chart Position Common Scale J. Heer (2010) – Crowdsourcing Graphical Perception: Using Mechanical Turk 50

Marks & Channels : ggplot2 example Channel: Position Channel: Colour ggplot (data mpg, aes( x display, y cty, colour class)) geom point( ) Mark: Point Note: Generally in ggplot2 aesthetics refer to channels and geoms refer to marks, but there are complex geoms that aren’t simple marks but chart types (i.e. geom density) and there are aesthetics that have little to do with the visual channels directly (i.e. group) https://rpubs.com/hadley/ggplot-intro 51

And we’re back! 53

Thinking Systematically about Data Visualization Domain Problem Data Task Visual Interaction Design Choices Algorithm 4. Explore if other visualizations have addressed this problem and set of tasks 5. Implement your own solution (part or all of that solution could be a new algorithm) 54

Thinking Systematically about Data Visualization Domain Problem* Data Task Visual Interaction Design Choices Algorithm 6. Test multiple alternatives (including new ones you develop) with stakeholders 7. Gather qualitative & quantitative evaluation data 55

Thinking Systematically about Data Visualization Design 1. Identify a relevant problem that effects you or a group of stakeholders 2. Ask what data stakeholders use (is it available)? 3. Ask what stakeholders do with the data [tasks] 4. Explore if other visualizations have addressed this problem and set of tasks & data 5. Implement your own solution (vis and/or algorithm) 6. Test multiple alternatives (including new ones you develop) with stakeholders 7. Gather qualitative & quantitative evaluation data Evaluation56

My Work: Evidence Based Design Discovery Information Gathering Design Design & Evaluation Implement Finalize Design MYCOBACTERIUM TUBERCULOSIS GENOME SEQUENCING REPORT NOT FOR DIAGNOSTIC USE Pa ent Name JOHN DOE Barcode Birth Date 2000-01-01 Pa ent ID 12345678910 Loca on SOMEPLACE Sample Type SPUTUM Sample Source PULMONARY Sample Date 2016-12-25 Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE Repor ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36 Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM Summary The specimen was posi ve for Mycobacterium tuberculosis. It is resistant to isoniaizd and rifampin. It belongs to a cluster, sugges ng recent transmission. Organism The specimen was posi ve for Mycobacterium tuberculosis, lineage 2.2.1 (East-Asian Beijing). Drug Suscep bility Data Gathered Expert Consults TB Task & Data Workflow Questionnaire Map Design Sprint Design Choice Questionnaire Resistance is reported when a high-confidence resistance-conferring muta on is detected. “No muta on detected” does not exclude the possibility of resistance. ! No drug resistance predicted ! Mono-resistance predicted " ! Mul -drug resistance predicted ! Extensive drug resistance predicted Drug class Drug Resistance Gene (Amino Acid Muta Ethambutol No muta on detected Interpreta on Suscep ble First Line Resistant Pyrazinimide No muta on detected Isoniazid katG (S315T) Rifampin rpoB (S531L) Streptomycin Second Line Suscep ble Page 1 of 2 on) No muta on detected Ciprofloxacin No muta on detected Ofloxacin No muta on detected Moxifloxacin No muta on detected Amikacin No muta on detected Kanamycin No muta on detected Capreomycin No muta on detected Pa ent ID: 12345678910 Date: 2017-01-01 Loca on: Someplace Qualitative Quantitative Study Design Exploratory Sequential Model https://peerj.com/articles/4218/ Embedded Model 57

My Work: Exploring Vis for Genomic Epidemiology How do researchers visualize data? How can we systematically compare visualizations? OPTION A OPTION B 58

Wrapping up 59

DATA VISUALIZATION IS NOT JUST AN ART PROJECT 60

Revisiting Today’s Learning Goal Have a high-level understanding of data visualization design and evaluation § Visualizations of data are useful § § Helpful in instance of low numeracy Can used in communication and exploration § But. visualization design also matters § Many different alternatives, important to test § It’s possible to think systematically about visualizations § § Many disciplines cross cut information visualization research At the bear minimum think “Why”, “What”, “How” § Some small examples to get you started § https://peerj.com/articles/4218/ more to come 61

An Introduction to Data Visualization Anamaria Crisan @amcrisan acrisan@cs.ubc.ca http://cs.ubc.ca/ acrisan 62

Visualization can make data more accessible to individuals with lower numeracy skills Least Understandable Most Understandable Visualizing Data is Effective 9. But . Visualization Design ALSO matters. Baseline Visualization Alternative 1 Alternative 2 Zikmund-Fisher (2013). A demonstration of ''less can be more'' in risk graphics.

Related Documents:

2.1 Data Visualization Data visualization in the digital age has skyrocketed, but making sense of data has a long history and has frequently been discussed by scientists and statisticians. 2.1.1 History of Data Visualization In Michael Friendly's paper from 2009 [14], he gives a thorough description of the history of data visualization.

Types of Data Visualization Scientific Visualization – –Structural Data – Seismic, Medical, . Information Visualization –No inherent structure – News, stock market, top grossing movies, facebook connections Visual Analytics –Use visualization to understand and synthesize large amounts of multimodal data – File Size: 2MBPage Count: 28

discussing the challenges of big data visualization, and analyzing technology progress in big data visualization. In this study, authors first searched for papers that are related to data visualization and were published in recent years through the university library system. At this stage, authors mainly summarized traditional data visualization

The data source and visualization system have different data models. A database visualization tool must make a connection between the data source data model and the visualization data model. Some methods has been proposed and studied. For example, Lee [17] described a database management-database visualization integration, which

About Oracle Data Visualization Desktop 1-1 Get Started with Samples 1-2 2 Explore, Visualize, and Analyze Data Typical Workflow to Visualize Data 2-1 Create a Project and Add Data Sets 2-2 Build a Visualization by Adding Data from Data Panel 2-3 Different Methods to Add Data 2-3 Automatically Create Best Visualization 2-3 Add Data to the .

language express all the facts in the set of data, and only the facts in the data. Effectiveness A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization. Design Principles [Mackinlay 86]

Forum Data Visualization Online Course. Module 1: Introduction to Data Visualization introduces the concept of data visualization and the ways in which it can improve how education data are viewed, analyzed, communicated, and understood by a range of common education stakeholders; introduces the key principles and characteristics of effective data

Data Visualization Lead Jose Lopez Web Application Lead Kiefer Giang Data Visualization Abubakir Siedahmed Data Analysis Kennedy Nguyen Web Application Fredi Garcia Data Visualization John Grover Rodriguez Data Analysis Leo Shapiro Web Application Isaac Villalva . Dr. Navid Amin