Wheres My Data? Evaluating Visualizations With Missing Data

1y ago
4 Views
1 Downloads
952.97 KB
11 Pages
Last View : 12d ago
Last Download : 3m ago
Upload by : Ronan Garica
Transcription

Where's My Data? Evaluating Visualizations with Missing Data Hayeong Song & Danielle Albers Szafir Visualizations with High Data Quality Visualizations with Low Data Quality Fig. 1: We measured factors influencing response accuracy, data quality, and confidence in interpretation for time series data with missing values. We found that visualizations that highlight missing values have higher perceived data quality while those that break visual continuity decrease these perceptions and can bias interpretation. Abstract—Many real-world datasets are incomplete due to factors such as data collection failures or misalignments between fused datasets. Visualizations of incomplete datasets should allow analysts to draw conclusions from their data while effectively reasoning about the quality of the data and resulting conclusions. We conducted a pair of crowdsourced studies to measure how the methods used to impute and visualize missing data may influence analysts' perceptions of data quality and their confidence in their conclusions. Our experiments used different design choices for line graphs and bar charts to estimate averages and trends in incomplete time series datasets. Our results provide preliminary guidance for visualization designers to consider when working with incomplete data in different domains and scenarios. Index Terms—Information Visualization, Graphical Perception,Time Series Data, Data Wrangling, Imputation 1 I NTRODUCTION Visualizations allow people to analyze and interpret data to understand current phenomena and guide informed decision-making. However, analysts often must make decisions using imperfect datasets. These datasets may be missing datapoints due to factors such as failures in the data collection pipeline or fusing data at different granularities. As part of the data wrangling process, visualizations have several choices for dealing with missing data, including not encoding missing elements or imputing new data (calculating substitute values) based on existing data. Prior studies show that the ways we represent data influence how accurately people interpret data and change their confidence in their data and results [16, 20, 37, 47]. We hypothesize that the ways we impute and visualize missing data may also bias analysts perceptions of that data. This study aims to provide a deeper empirical understanding of visualization for missing data. We measure how imputation and visualization techniques influence perceived confidence, data quality, and accuracy for visualizing incomplete datasets. We explore how four different categories of visualization designs employed in prior systems might manipulate perceived data Hayeong Song is with the University of Colorado Boulder. E-mail: hayeong.song@colorado.edu. Danielle Albers Szafir is with the University of Colorado Boulder. E-mail: danielle.szafir@colorado.edu. Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publication xx xxx. 201x; date of current version xx xxx. 201x. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org. Digital Object Identifier: xx.xxxx/TVCG.201x.xxxxxxx quality: highlighting imputed data (e.g., making data more salient, as in highlighting), downplaying imputed data (e.g., making the data less salient, as in alpha blending), annotation imputed values (e.g., adding additional information about the imputation outcomes, such as error bars), and visually removing information (Fig. 2). We measure effect of existing techniques corresponding to these four categories of these visual attributes on perceived data quality, result confidence, and response accuracy in two common visualizations: line graphs and bar charts. While this categorization is not exhaustive, we use this categorization as a scaffold for exploring a subset of techniques used in existing visualization systems. We also explore how methods of imputing missing values might additionally shift perceptions of data quality and bias responses. Systems use imputation to compute values that approximate missing datapoints to support analysis. As missing data is itself a type of data (it indicates no values are available), imputation allows systems to indicate where data is unexpectedly absent and provide principled approximations to avoid potential misinterpretation of absent data values [7]. Imputing values also allows systems to indicate potential threats to data quality by providing visual anchors analysts can use to readily enumerate and contextualize quality errors [5, 49]. We focus on three common imputation methods encountered in current visualization systems: ad-hoc zero-filling, local linear interpolation, and marginal means (Fig. 3). While we commonly expect that missing data should optimally degrade perceived quality, there are many cases that run counter to this assumption. For example, we may not wish to degrade perceived quality when we can closely approximate missing values or when quality may interfere with decision speed in low-risk scenarios. We therefore evaluate how visualizations manipulate confidence relative to other

Zero-Filling Marginal Mean Linear Interpolation Highlight Downplay Annotation Information Removal Fig. 2: We examined three distinct categories of visualizations that could encode imputed values: highlight and downplay encodings that manipulate attention, annotation encodings that provide additional information, and encodings that use information removal. techniques to provide designers an empirical basis for visualizing missing data. We compare imputation and visualization methods in four crowdsourced studies measuring the effects of these factors on analysts’ accuracy with and without imputed data, confidence in their conclusions, credibility, and perceived data quality. We found that highlighting and annotating imputed values lead to higher perceived data quality and more accurate interpretation. Downplaying imputed values or removing information associated with missing values significantly degraded perceived quality. Our findings suggest ways visualizations might leverage imputation and visualization to appropriately manipulate perceived data quality in different scenarios. 2 R ELATED W ORK Missing data is typically a challenge associated with “dirty data”— datasets containing missing data, incorrect data, misalignments, and other such anomalies that may lead to erroneous conclusions [38]. Missing data can occur throughout the data lifecycle and has significant implications for analysts’ trust in data [45]. These implications can be especially problematic for data visualizations as little empirical understanding exists to guide how visualizations can balance between indicating the presence of dirty data and not distracting from or biasing of the rest of the data [35]. Our research builds such knowledge by measuring the influence of various designs for visualizing missing data. 2.1 Methods for Analyzing Incomplete Data Missing data can arise at all points in the data lifecycle, including during data capture, storage, updates, transmission, deletions, and purges [38]. A scraping process might fail due to an interrupted script, packet loss, or memory errors. Subsets of data may be withheld due to privacy considerations [21]. Part of the process of data wrangling [25, 35] is locating missing data and deciding how to manage it. In many cases, systems choose to impute—estimate a substitute value for—missing data to address potential anomalies affecting dataset coverage [42]. A broad variety of methods exist for data imputation (see Little & Rubin [40] and Lajeunesse [39] for surveys). For example, hot-deck imputation samples substitute values from the current signal while colddeck imputation uses values from other sources, such as related datasets [27] or domain heuristics [31]. Interpolation methods use weighted combinations of available data to infer missing values using methods like linear interpolation, regression, and adaptive interpolation [26]. More complex imputation methods can integrate information about the processes used to generate the dataset [46] or use machine learning and related techniques to holistically estimate missing values [2]. While an exhaustive survey of imputation methods is beyond the scope of this paper, understanding the relationship between different Fig. 3: We measured effects of three different imputation methods on data interpretation: zero-filling (substituting missing values with zeros), marginal means (substituting with the mean of available data), and linear interpolation of adjacent datapoints. imputation choices and perceived data quality is critical for visualizing missing data. As Babad & Hoffer note, even if data values can be inferred with reasonable accuracy, it is important for analysts to understand when and where missing data occurs [7]. Missing data can have a significant impact on inference and decision making and can lend context to analyses. Most significantly for our work, missing data is a key component of data quality, a measure of the trust and suitability of data for addressing a given problem [44]. Time series data has specific considerations for data quality (see Gschwandtner et al. for a survey [30]). For example, non-uniform sampling may force interpolations. Joining data across two temporal sources with different granularities can create misalignment [6]. Measures taken at the same time may conflict. Since data is typically continuous, violations to trends may be especially salient. Due to these factors, we elect to use time series data for our study as it is common in both real-world analysis and empirical studies for visualization and these factors make it an important special case for understanding the implications of missing data for temporal analysis. 2.2 Visualizing Incomplete Data Wong & Varga refer to missing data as black holes in a visualization– “a dark area of the cognitive workspace that by the absence of data indicates that one should take care [54, p.5].” They argue that it is unclear when and how visualizations should replace missing data to support sensemaking, yet it is clear that people should be able to detect and reason about missing data. Many visualization systems support data quality analysis, including quality change over time [11], data preprocessing [8], and highlighting missing, incorrect, or imputed values [9,10,23]. For example, Visplause [5] supports quality inspection for temporal data to assist analysts in inferring potential causes of missing data. Wrangler [36] uses statistical methods to help analysts impute missing values. xGobi [49], MANET [53], and VIM [50] offer visualization suites that allow analysts to understand the amount of missing data and compare different imputation methods. Many visualization systems oriented towards specific domains or datatypes automatically process missing data. Some visualizations provide little to no visual indication of imputed data. For example, Turkay et al. [51] substitute missing values with feature means. Systems in meteorology [19] and psychology [31] interpolate missing data based on domain heuristics. Other systems leverage visual saliency to manipulate whether analyst attention is drawn to imputed values. For example, TimeSearcher uses brightly colored marks to indicate missing values [14]. Restorer uses grayscale to reduce the salience of missing spatial data and luminance to interpolate imputed values [52]. However, the influence of imputation and the corresponding visualization methods used in these systems is not well understood. We ground our exploration of imputation in current practices for missing data visualization. 2.3 Graphical Perception Prior studies in graphical perception show how the methods used to visualize data change our interpretation of that data. For example, studies show that visualization design changes our abilities to estimate and compare statistical values [3,15,24,32] and shift our confidence in those estimations [1]. As imputed values represent uncertain information, we

draw on prior findings in uncertainty visualization to inform our study. Specific visual attributes, such as luminance, blurriness, and sketchiness, can indicate uncertainty in data and shift people’s confidence in their conclusions [13, 16, 29, 37, 41]. Presenting data as “sketchy” additionally increases engagement with and willingness to critique data [55], which may have interesting ramifications for perceived data quality. Individual values can shift statistical perceptions of data [17], indicating imputed points introducing variation may potentially bias analyses. As many imputation methods provide no quantifiable measure of uncertainty, we evaluate encodings that both present either the level or the existence of uncertain information. A handful of prior studies have explicitly evaluated the influence of visualization methods on perceptions of data quality. Xie et al. [56] measure how to communicate data quality in high dimensional data using size, brightness, and hue, and found hue and size to be strong channels for encoding quality information. Eaton et al. [21] compared how different methods for visualizing missing data in line graphs influenced accuracy and confidence in response for point-comparison and trend estimation recall tasks. They substituted missing values with zero, rendered no marks for missing data (data absent), and used gapped circles to indicate missing data. They found that people interpreted confidently even when critical data was missing, but found no significant differences between methods. Participants expressed an overall preference for visualizations that explicitly indicated missing data. Andreasson and Riveiro [4] conducted a similar study comparing the effects of absent data, fuzziness, and annotated absent data on analyst confidence in a decision making task. Their results showed that people had a strong preference for conditions with annotated absent data and a strong dislike for fuzziness. Our work extends these findings by separating effects of imputation methods such as the zero-filling in Eaton et al. [21] from visualization methods, considering variable numbers of missing values, and leveraging a wider variety of visualization methods. We also evaluate bar charts in addition to line graphs as removing missing data from bar charts is indistinguishable from zero values. Prior studies indicate that the kinds of information people synthesize across bars and lines can vary [57], and these differences may significantly impact perceptions of missing data. tasks employed by our collaborators: average and trend analysis. We tested four categories of visualization type for communicating missing data that we encountered in the systems discussed in §2. The first category highlights missing data by leveraging bright colors to attract attention to missing data points (e.g., [10, 14]). The second category downplayed missing values by reducing the salience of imputed values relative to the rest of the data (e.g., [4, 9]). The third category used the encodings to annotate missing values with additional statistical information such as error bars drawing confidence from the imputation estimate (e.g., variance statistics for cold- or hotdeck methods) [9]. The fourth category used information removal, physically removing some element of missing values from the visualization (e.g., [4, 21]). As these semantically related to incompleteness, we anticipate that these encodings will also degrade data quality perceptions. Some tested manipulations were hybrids of these categories that examine dependencies across conditions. To mirror prior studies, we included a condition where missing data was entirely absent. We draw our tested imputation methods from three methods we observed in existing visualization systems. Zero-filling substituted a single value (0) for all missing data points, as in many commercial systems. Linear interpolation linearly interpolated between adjacent available items (e.g., [31,52]). Marginal means replaced each missing data value with the mean of all available signals (e.g., [23, 51]). For our data, zero-filling introduced the highest deviation from the original dataset, marginal means the second, and linear interpolation the lowest. While we experimented with more complex interpolation methods, we found no significant differences in our stimuli between those methods and the three selected. Figure 3 provides examples of the tested imputation and visualization categories. Based on these conditions, we hypothesized that: 3 H4–Imputed values will lead to higher perceived data quality than removed values. M OTIVATION & H YPOTHESES Data quality concerns how suitable a given dataset is to solve a problem or make a decision. Dimensions of data quality include several factors related to a data source (e.g., accessibility, volume, and relevance) and others relating to perceptions of the dataset (e.g., completeness, credibility, and reliability) [44]. While analysts must consider factors of a data source when choosing a dataset, the visualizations used to analyze data directly influence perceptions of that data. In this study, we measure how imputation and visualization choices impact response bias and perceptions when data is incomplete. We measure quality as a combination of confidence (how confidently can they complete a task given the data), completeness (how much data is available), credibility (how true is the data), and reliability (how correct is the data). Following best practices, we use the results from these metrics to construct a data quality scale ( [18, 22, 48], c.f., §4.4). Our inspiration for this study comes from collaborations with public health analysts. These analysts fuse data from sources of both low (e.g., social media) and high (e.g., CDC and WHO reports) quality data to develop holistic insights. Data collection errors and temporal misalignments after fusing these sources frequently lead to incomplete data. While our collaborators care about large scale patterns in this data, their imputation methods and whether or not they need to include imputed data in assessing these patterns is less well defined: analysts want to analyze patterns in light of missing data, but can often generate reasonable approximations about that data or want to know when and where data is missing to temper their decision making processes. As a result, we opt to evaluate missing data using similar methods to Jansen & Hornbæk where participants naturally integrate imputed values without explicit instructions as to how to consider those values in their estimates [34]. We measure performance using two common H1–Perceived data quality and response accuracy will both degrade as the amount of missing data increases. H2–Highlighting methods will generate higher perceived data quality than downplaying and information removal methods. H3–Linear interpolation will lead to higher perceived confidence and data quality than marginal means or zero-filling as it takes into account local trends in dataset. H1 stems from the idea that completeness is a key indicator of data quality and provides a quality check for our experiment. In our experiment, data quality is measured as a combination of perceived confidence, credibility, reliability, and completeness. We anticipated people could effectively reason about missing values; therefore, no change in accuracy beyond that introduced by the amount of missing data. H2 arises from certainty and completeness as aspects of data quality. As highlighting visualizations provide no visual indications associated with either completeness (as with information removal) or with reduced visual weight (as in downplaying), we anticipate it will lead to higher perceived quality. This corresponds with observations from Andreassen & Riveiro [4] who found evidence that “fuzzy” visualizations, correlated with downplaying, were not well-liked for decision-making with missing data [13]. We predict H3 on the basis of potential biases introduced by zero-filled and mean values and that linear interpolation will create plausible variation in imputed values. This aligns with Correll & Heer’s findings that values outside of a distribution can bias statistical perceptions in data [17]. H4 stems directly from Eaton et al. [21], who showed a preference for visualizations using explicit visual indications of missing data. 4 M ETHODS We ran two 7 (visualization type) 3 (imputation method) 4 (percentage of missing data) full factorial within-participants studies to measure how visualization and imputation influence time series analysis, focusing on two conventional visualizations: line and bar graphs. Each study followed the same general procedure. Specific differences between the two studies are discussed in their respective sections. For

each study, we had three independent variables—visualization type, imputation method, and percentage of missing data—and five dependent variables—accuracy, confidence in response, data credibility, data reliability, and data completeness—combined to measure quality using scale construction [18]. 4.1 Stimuli & Tasks We generated each graph as a 1000 400 pixel graph using D3 [12] and Plot.ly [33] (see Fig. 1 for examples). Each graph visualizes 60 values representing the frequency of Tweets collected per minute over an hour to provide a concrete problem scenario where we often find missing data in the real-world. We simulated missing data completely at random (MCAR) by randomly removing a subset of values in each graph (0%, 10%, 20%, or 30%). We replaced these values with imputed values computed using one of the three imputation methods described in §3 (zero-filling, linear interpolation, or marginal means). The 0% condition provided a baseline for measuring changes to our dependent variables due to data removal. The imputed values were then rendered using one of the seven candidate visualization methods per graph type (Figs. 4 and 6). Above each graph, we provided a brief sentence contextualizing the data, a statement encouraging participants to complete the questions as quickly and accurately as possible, and a counter indicating current progress through the study. Below each graph, we enumerated five questions, answered using radio buttons. We evaluated two tasks each for line graphs and bar charts: average and trend comparison. Each task required participants to answer five questions for each stimuli with task language determined in piloting. 1. Were there more Tweets on average in the first or second halfhour? (Averaging) Is the overall rate of change larger in the first or second half-hour? (Trend Detection) 2. How confident are you in your response? 1–Extremely Unconfident, 7–Extremely Confident 3. How credible is this data? 1–Extremely Uncredible, 7–Extremely Credible 4. How complete is this data? 1–Extremely Incomplete, 7–Extremely Complete 5. How reliable is this data? 1–Extremely Unreliable, 7–Extremely Reliable We chose to use averaging and trend comparison tasks in our evaluation as they forced participants to consider information from all points in the dataset and mitigated changes to the correct response and task difficulty introduced by randomly removing values. Prior studies in missing data visualization have relied on trend detection tasks (e.g., [21]), while our public health collaborators noted the importance of averaging for comparing relative frequencies across datasets. 4.1.1 Data Generation Both noise and task difficulty may influence data perceptions and performance: noisier signals may change the effects of different imputation methods and confidence may correlate with difficulty. To control for these concerns, we used synthetic datasets to provide control over noise and difficulty. Each graph contained 60 y-values ranging from y 0 to y 100 uniformly spaced in time. We computed the y-values by first generating a signal from structured random noise and then adjusted each signal based on task constraints [43]. To assist with reproducing and extending our results and analyses, data and experimental infrastructures are available at http://cmci.colorado.edu/visualab/MissingData/. Average Data: We generated signals using five different noise levels and considered noise as a random effect in our analyses. We then used a constraint-based optimization to adjust the mean difference between the first and last thirty points while minimizing deviation from the original random signal to control difficulty. We separated the means of the first and last half hour by 2.0, 4.0, and 6.0, randomly selecting which half hour was highest. We used this difference threshold as it achieved desirable response accuracy in prior studies [3]. For the average task, each graph visualized a randomly selected dataset from 110 total datasets generated using this method. Trend Data: We generated signals using four different noise levels and considered noise as a random effect in our analyses. We separated the difference in the slopes of the first and last half hour by 0.5 or 0.7, randomly selecting which half hour larger overall rate of change. For the trend task, each graph visualized a randomly selected dataset from 96 total datasets generated using this method. 4.2 Procedure Our study consisted of five phases: (1) consent, (2) screening, (3) instructional tutorial, (4) formal study, and (5) demographic questionnaire. Each participant first provided informed consent to participate in the study in accordance with our IRB protocol. We then screened participants for color vision deficiencies using a set of four Ishihara plates. Participants then received instructions about the study and were serially shown examples of each of the seven visualization conditions with one missing value. Each stimuli in the tutorial explained that some data was missing and that we had “guessed” at the values and described how we visualized “guessed” values. Participants were not informed of specific imputation methods or subjective tasks. Participants correctly identified the half-hour with the highest average or trend for each condition before beginning the formal study. The formal study consisted of 87 trials presented serially (84 from our factorial design and 3 engagement checks). To mitigate effects from changing the visualization paradigm, we blocked stimuli by visualization method and randomized the order of blocks. Within each block, participants saw all twelve combinations of missing data (0%, 10%, 20%, and 30%) and imputation method (zero-filling, linear interpolation, and marginal mean) presented in random order. Each stimuli visualized a random dataset, with each dataset occurring at most once per participant. For averaging, engagement checks had 0% missing data where the average between halves of the dataset differed by 20.0. For trends, engagement checks differed in slopes by 1.0. These engagement checks were added to blocks 2, 4, and 6. After completing the formal study, participants completed a demographic questionnaire, which included an opportunity for open-ended comments, and were compensated 1.00 for their participation. 4.3 Participants We collected data from 303 U.S. participants on Amazon’s Mechanical Turk (µage 36.3, σage 12.7, 150 female, 153 male). All participants reported normal or corrected-to-normal vision. To ensure honest participation and task understanding, we excluded any participants who answered two or more engagement checks incorrectly. Individual demographics and exclusions are reported in each Results section. 4.4 Measures & Analysis We used three primary measures to analyze participant responses: perceived confidence in their answer (Question 2), credibility (Question 3), and a two-item scale describing perceived quality (Questions 4-5). We constructed the two-item scale by identifying correlation between the four quality questions at α .7. We combined correlated dimensions to construct our data quality scale per best practices [18, 22, 48]. We used this scale in place of the component questions to increase measure validity and use only descriptive analyses with single-item scales to mitigate effects of participant interpretation on our results. Accuracy both with and without imputed values formed a secondary metric to detect performance biases. Unless otherwise specified, our main analysis used a repeated measures analysis of covariance (ANCOVA) to test for main and interaction effects with question order and noise treated as random effects and the actual (difference in when data points are removed) and imputed difference between means as covariates to mitigate effects of task difficulty. In both experiments, our response data was normally distributed. To control for Type I errors in planned comparisons between independently distributed settings of visualization method and imputation, we used Tukey’s Honest Significant Difference test with α .05 for post-hoc analyses. We elected not to use response time as a measure. While

(a) Data Absent (b) Color Points (c) Color Points & Line Gradients (d) Connected Error Bars (e) Disconnected Error (f) Bars Unfilled Points (g) Unfilled Points & Line Gradients Fig. 4: We tested seven different methods for visualizing missing values in line graphs manipulating both point and line appearance: two highlighting missing values, two downplaying missing values, two annotating missing values, and one removing missing values. . understanding the effects of missing data on analysis speed is an interesting question, the inclusion of our subjective measures and use of crowdsourcing make it less reliable for our experiment. Prior studies in missing data visualization focused on line graphs, one of the most common and ubiquitous methods for visualizing data. We tested three factors we hypothesized that may effect missing data

Where's My Data? Evaluating Visualizations with Missing Data Hayeong Song & Danielle Albers Szafir Visualizations with High Data Quality Visualizations with Low Data Quality Fig. 1: We measured factors influencing response accuracy, data quality, and confidence in interpretation for time series data with missing values.

Related Documents:

14. Data Visualizations . In NVivo, data (beyond their original forms) is shown as table data, matrices, data visualizations (dendrograms, word trees, word clouds, treemaps, and others) , and locational maps. Different types of visualizations may be made based on the particular data (and th

SAP Lumira Server Share, secure, and modify Lumira visualizations and stories. Applies to Lumira Server for BI Platform, Lumira Server for HANA, and Lumira Server for Teams " Browser based access to existing Lumira visualizations and storyboards. " Create new visualizations an

Additional Visualizations with t-SNE and PCA Figure 2: t-SNE (1st column) and PCA (2nd column) visualizations on Sines, and t-SNE (3rd column) and PCA (4th column) visualizations on Stocks. Each row provides the visualization for each of the 7 benchmarks, ordered as follows: (1) TimeGAN, (2) RCGAN, (3) C-RNN-GAN, (4) T-Forcing, (5)

use this toolkit to construct visualizations. 4.2 The InfoVis Toolkit The InfoVis Toolkit (Fekete, 2004), developed at Uni-versity of Paris-Sud, is another Java based visualiza-tion toolkit that uses several interactive components to construct visualizations, for instance range sliders. Providing a consistent framework, the InfoVis toolkit

Chapter 7: Evaluating Educational Technology and Integration Strategies 10 Chapter 7: Evaluating Educational Technology and Integration Strategies 11 Evaluating Educational Technology Evaluating Software Applications Content Is the software valid? Relate content to school's and state's specific curriculum standards and related benchmarks

techniques such as charts, maps, network graphs are offered for visualization purposes. Another feature is the ability to merge data with other people's data as well as publicly available data for better visualizations. This paper will analyse and compare creating visualizations and

expectations about the data, directly into the visualization for compar-ison to the data. A few prior visualization systems integrate abstract visualizations of prior viewers' interactions directly into the interface. For example, Scented Widgets [38] are small visualizations depicting prior visits to views in an interactive visualization .

Adventure tourism is a rapidly expanding sector of the tourism industry internationally. New Zealand is internationally recognised as a country where adventure tourism and adventure sports are undertaken by a large proportion of the resident and visitor population. While the risks associated with adventure tourism and adventure sport activity are increasingly highlighted in media reports of .