Data Visualisation For Analysis In Scholarly Research April2013

1y ago
2 Views
2 Downloads
5.50 MB
41 Pages
Last View : 5m ago
Last Download : 3m ago
Upload by : Gideon Hoey
Transcription

23/04/2013 Data Visualisation for Analysis in Scholarly Research British Library Digital Scholarship Training Programme April 2013 Mia Ridge, Open University http://openobjects.org.uk/ @mia out While we're getting started Check that the mouse on your laptop works and that you can get online Open a web browser Go to this address: http://bit.ly/13pMaYa (or https://www.dropbox.com/s/12isaw33zmmty 0d/April2013 datasets and notes.zip ) Save the file to the desktop for use later 1

23/04/2013 Timetable 10am Start 11:30-11:45 Break 13:00-14:00 Lunch 15:30-15:45 Break 16:00 Conclude Sources and further reading http://bit.ly/UJwgEz Overview Introductions; what is data visualisation? History and types of visualisations Visualisations for scholarly analysis Dealing with library, museum, humanities data Planning visualisations Creating your first visualisations 2

23/04/2013 Who am I? http://bit.ly/13JqWpr Tool from http://neatline.org/ Introductions In a sentence or two, what's your interest in data visualisation? Do you have any potential users in mind? What form will your visualisation take? (e.g. print vs. digital, static or interactive?) What kinds of data do you work with? 3

23/04/2013 What is data visualisation? ' the graphical display of abstract information for two purposes: sense-making (also called data analysis) and communication' (Stephen Few) ' fundamentally about showing quantitative and qualitative information so that a viewer can see patterns, trends, or anomalies, constancy or variation, in ways that other forms – text and tables – do not allow.' (Michael Friendly) ' interactive, visual representations of abstract data to amplify cognition' (Card et al) Types of visualisations Intersections of format and purpose Data types: quantitative, qualitative, geographic, time series, media, entities (people, places, events, concepts, things) Static, interactive; print, digital; product, process Exploratory, explanatory: find new insights, or tell a story? Pragmatic, emotive? 4

23/04/2013 Joseph Priestley, 1769 John Snow’s cholera map, 1854 5

23/04/2013 Florence Nightingale's petal charts, 1857 Charles Minard’s figurative map, 1869 'Figurative Map of the successive losses in men of the French Army in the Russian campaign 1812-1813'. Drawn up by M. Minard, Inspector General of Bridges and Roads in retirement. Paris, November 20, 1869. 6

23/04/2013 .translated poleon.html Harry Beck, 1931 7

23/04/2013 Harry Beck, 1931 Web 2.0 and the mashup, 2006 8

23/04/2013 Visualising images and video / ‘Mondrian vs. Rothko’, Lev Manovich, 2010. images preparation: Xiaoda Wang Visualisation types (based on Many Eyes) /page/Visualization Options.html 9

23/04/2013 Timelines Maps A sample of publication printing locations 1534-1831 http://bit.ly/W9VM7D 10

23/04/2013 Infographics / resized at http://notes.husk.org/post/509063519/infographics Network visualisations ehind-my-ideal-bookshelf/ 11

23/04/2013 Exercise 1: network visualisations 1. In your browser, go to http://bit.ly/11qqXuj 2. Scroll down the page to the network graph. 3. Take a few minutes to explore it: try holding the cursor over items, clicking, dragging, etc. 4. You can see the same data represented in other graphs below. 5. Discuss with your neighbour: does interacting with the network graph give you more or less information than the other visualisations? Does it open up new questions? N-grams http://books.google.com/ngrams/ 12

23/04/2013 Exercise 2: comparing N-gram tools 1. Think of two words or phrases you’d like to compare over time (e.g. Burma, Burmah) 2. Open two browser windows 3. In one, go to http://books.google.com/ngrams 4. In the other, go to http://bookworm.culturomics.org 5. Enter your words or phrases in each and compare the results 6. Discuss with your neighbour: what differences did you find, and what might have caused them? Bookworm tips Click here to change options 13

23/04/2013 Topic modelling http://discontents.com.au/mining-for-meanings/ e-24 Other forms of text analysis Entity recognition: turning text into things 14

23/04/2013 Scholarly data visualisations Visualisations as ‘distant reading’ where distance is ‘a specific form of knowledge: fewer elements, hence a sharper sense of their overall interconnection’ (Moretti, 2005) Inspiring curiosity and research questions But - which questions do they privilege and what do they leave out? Exercise 3: exploring scholarly visualisations In pairs, explore and discuss one of the following visualisations: University of Richmond, “Visualizing Emancipation” http://www.americanpast.org/emancipation/ Stanford "Mapping the Republic of Letters" viz.swf GAPVis http://gap.alexandriaarchive.org/gapvis/index.html Digital Harlem :: Everyday Life 1915-1930 http://www.acl.arts.usyd.edu.au/harlem/ 15

23/04/2013 University of Richmond, “Visualizing Emancipation” http://www.americanpast.org/emancipation/ Stanford "Mapping the Republic of Letters" viz.swf 16

23/04/2013 GAPVis http://gap.alexandriaarchive.org/gapvis/index.html Digital Haarlem http://www.acl.arts.usyd.edu.au/harlem/ 17

23/04/2013 Comments or questions? Considerations for GLAM data (GLAM: galleries, libraries, museums, archives) Commercial tools often assume complete, born-digital datasets – no missing fields, consistent data entry over time GLAM records often contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information) Includes metadata, data, digital surrogates 18

23/04/2013 When were objects collected? http://ibm.co/OS3HBa Messiness in GLAM data 'Begun in Kiryu, Japan, finished in France' 'Bali? Java? Mexico?' Variations on USA: – – – – – – – U.S. U.S.A U.S.A. USA United States of America USA ? United States (case) Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A. 19

23/04/2013 Computers don't cope Cleaning data for visualisations GLAM data often needs manual cleaning to: remove rows where vital information is missing tidying inconsistencies in term lists or spelling converting words to numbers (e.g. dates) remove hard returns and non-ASCII characters (or change data format) split multiple values in one field into other columns (e.g. author name, date in single field) expand coded values (e.g. countries, language) 20

23/04/2013 Data Preparation Generally needs to be in tables, one row per item, one column per value Data cleaning tools – Excel – OpenRefine http://openrefine.org – Wrangler: http://vis.stanford.edu/wrangler Google Refine 21

23/04/2013 but be careful Getting started in Refine Free tool, runs on desktop Downloadable from http://openrefine.org/ Should be installed on your laptops already 22

23/04/2013 Exercise 4: data cleaning in Refine Load the prepared data Inspiring Women through History April2013.xlsx If the preview looks ok, click 'Create Project' 23

23/04/2013 Cleaning: click the down arrow in the Discipline column; select Facet; Text Facet Select rows where Discipline is 'blank' 24

23/04/2013 Remove the selected 'blank' rows Automatically find mis-typed matches 25

23/04/2013 Merge Disciplines to reduce the complexity of the dataset Remove rows with no birth year 26

23/04/2013 Export your data Exercise 5: trying entity recognition In your browser, go to http://nlp.stanford.edu:8080/corenlp/process Find a short paragraph or two of text (e.g. from a news site or digitised text) to paste into the box How many of the things you recognise did it pick up? Is any of the other information presented useful? 27

23/04/2013 Reconciling and enhancing data Test data quality and create new visualisation possibilities What other data can you join to yours? – Information from general sites like Wikipedia, Freebase, VIAF – Information from other GLAMs and related disciplines – Other information about the same event, place, person, object, etc – General contextualising information – science, history, reviews, citations? e.g. Google Refine – try reconciling against other datasets for geo-coding, entity extraction, etc Linking Makers to the rest of the web 191877 Versace, Gianni 191881 Versace, Gianni 191882 Versace, Gianni Gianni Versace Gianni Versace Gianni Versace 1946 1946 1946 1 Designer 1 Designer 1 Maker 28

23/04/2013 Planning a visualisation .is like planning for a meal: Is it a sit-down dinner or a picnic? Do you have all the ingredients you need? How do they need to be prepared? What prep needs to be done in advance? Who are you serving it to, and what do they like? Serve yourself or pre-prepared plates? Types of visualisations (review) Intersections of format and purpose Data types: quantitative, qualitative, geographic, time series, media, entities (people, places, events, concepts, things) Static, interactive; print, digital; product, process Exploratory, explanatory: find new insights, or tell a story? Pragmatic, emotive? 29

23/04/2013 Worst practice in data visualisations Source: 3/deceptive-donut-chart/ Worst practice in data visualisations Source: 3232 30

23/04/2013 Best practice visual design How effectively does the visualisation support cognitive tasks? Most important and frequent visual queries/pattern finding should be supported with the most visually distinct objects (e.g. colour (hue, lightness), elementary shape (orientation, size, elongation), motion, spatial grouping and stereoscopic depth) Dealing with complex data Find a visualisation type that can harbour the data in a meaningful way or reduce the data in a meaningful way. – e.g. go from individual values to distribution of values – e.g. introduce interaction: overview, zoom and filter, details on demand (Ben Shneiderman) 31

23/04/2013 Do you really need a visualisation? Use tables when: – – – – doc will be used to look up individual values to compare individual values precise values are required the quantitative info to be communicated involves more than one unit of measure Use graphs when: – the message is contained in the shape of the values – the document will be used to reveal relationships among values Visualisation tools http://selection.datavisualization.ch/ 32

23/04/2013 Tools that don’t require programming Excel Google Fusion Tables, Google Drive IBM Many Eyes Tableau Public IBM Many Eyes 33

23/04/2013 Tableau Public alised Exercise 6: create a pie chart using Google Fusion Tables Instructions on the hand-out 34

23/04/2013 Pre-aggregation view of a pie chart Change options, aggregate by Discipline 35

23/04/2013 Sample pie chart results Exercise 7: geocoding data and creating a map using Google Fusion Tables Instructions on the hand-out 36

23/04/2013 Geocode 'Combined birthplace location' Add a new map 37

23/04/2013 Update the source location field Map results 38

23/04/2013 Continuing. Try Scatter chart Bar chart Timeline Sparkline Motion chart Candlestick/box chart Treemap Or try it with your own data. Planning a visualisation With a dataset in mind, consider. Exploratory or explanatory? Static or dynamic? Small- or large-scale? Choose a type of visualisation (map, timeline, chart, etc) – Is your dataset in a suitable format for your visualisation type? How can you clean it? – Optionally, sketch out your visualisation on paper to test it 39

23/04/2013 Visualising British Library data Google Docs dataset for Pin-a-tale http://bit.ly/WT1Ai5 Choose a type of visualisation (map, timeline, chart, etc) and evaluate the results – Is your dataset in a suitable format for your visualisation type? How can you clean it? – Is more cleaning or transformation needed? You may need to iterate with different versions of your data – Optionally, sketch out your visualisation on paper to test your ideas Review: visualisation tools Any data cleaning tips? What did you learn about the data? What did the tool do well? Poorly? Were the tool and the data a good match for each other? What other data could you link to? 40

23/04/2013 Review: publishing visualisations How can you contextualise, explain any limitations of your visualisations? e.g. – provenance and qualities of original dataset; – what you needed to do to it to get it into software (how transformed, how cleaned); – what's left out of the visualisation? References and finding out more http://bit.ly/UJwgEz Thank you! Mia Ridge, Open University http://openobjects.org.uk/ @mia out 41

23/04/2013 4 What is data visualisation? ' the graphical display of abstract information for two purposes: sense-making (also called data analysis) and communication ' (Stephen Few) ' fundamentally about showing quantitative and

Related Documents:

Scientific and information visualisation Research on using visualisation to facilitate information and knowledge processing has advanced greatly in the past two decades. Card et al. (1999) classify visualisation techniques into two categories – scientific visualisation and information visualisation – based on the nature of the data

– VTK is a set of methods (toolkit) that implement a variety of visualisation operations – Implements a visualisation pipeline – Platform independent (we use linux, DICE) – Object-orientated visualisation – Program in C or Java or use an interpreted language such as Tcl/Tk or Python – VTK also implements basic tools for visualisation:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

studies relating to whether visualisation may help L2 learners’ writing. Using visualisation as a learning strategy, this paper reports on how visualisation training might affect a group of Chinese intermediate EFL learners’ narrative writing. Quantitative data from the pre-test and post-test did not

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .