Stat 470/670 Lecture 1 - GitHub Pages

1y ago

29 Views

2 Downloads

3.36 MB

51 Pages

Last View : 4d ago

Last Download : 3m ago

Upload by : Elise Ammons

Report this link

Download PDF

Transcription

Stat 470/670 Lecture 1

What is Exploratory Data Analysis? 1

We will be exploring numbers. We need to handle them easily and look at them effectively. Techniques for handling and looking — whether graphical, arithmetic, or intermediate — will be important. Tukey, Exploratory Data Analysis (1977) 3

A first example: Heights of the highest points by state ## load required packages and data library(tidyverse) ## -- Attaching packages --------------------------------------tidyverse 1.3.0 -## ## ## ## v v v v tibble 3.0.1 tidyr 1.1.0 readr 1.3.1 purrr 0.3.4 v dplyr 1.0.2 v stringr 1.4.0 v forcats 0.5.0 ## -- Conflicts -----------------------------------------tidyverse conflicts() -## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() options(tibble.print min 15) heights read csv("highest-points-by-state.csv") ## Parsed with column specification: ## cols( ## elevation col double(), ## state col character() 4

A first try at looking at the data 5

heights ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # A tibble: 50 x 2 elevation state dbl chr 1 733. Alabama 2 6168. Alaska 3 3851. Arizona 4 839. Arkansas 5 4418. California 6 4399. Colorado 7 725. Connecticut 8 137. Delaware 9 105. Florida 10 1458. Georgia 11 4205. Hawaii 12 3859. Idaho 13 376. Illinois 14 383. Indiana 15 509. Iowa # . with 35 more rows 6

A second try at looking at the data 7

arrange(heights, elevation) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # A tibble: 50 x 2 elevation state dbl chr 1 105. Florida 2 137. Delaware 3 163. Louisiana 4 246. Mississippi 5 247. Rhode Island 6 376. Illinois 7 383. Indiana 8 472. Ohio 9 509. Iowa 10 540. Missouri 11 550. New Jersey 12 595. Wisconsin 13 603. Michigan 14 701. Minnesota 15 725. Connecticut # . with 35 more rows 8

arrange(heights, desc(elevation)) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # A tibble: 50 x 2 elevation state dbl chr 1 6168. Alaska 2 4418. California 3 4399. Colorado 4 4392. Washington 5 4207. Wyoming 6 4205. Hawaii 7 4123. Utah 8 4011. New Mexico 9 4005. Nevada 10 3901. Montana 11 3859. Idaho 12 3851. Arizona 13 3426. Oregon 14 2667. Texas 15 2207. South Dakota # . with 35 more rows 9

Stem-and-leaf plots Goals: Write down the set of numbers, keeping as much detail as possible Pack the numbers efficiently, so you can see all of them at once 10

Stem-and-leaf plots Goals: Write down the set of numbers, keeping as much detail as possible Pack the numbers efficiently, so you can see all of them at once These are in conflict! 10

Stem-and-leaf plots Remedy: Notice that parts of the numbers (the beginnings) are repeated. The first digit of each number is printed at the beginning of the line, the remainder at the ends. The first digit is the “stem”, the remainder are the “leaves”. 10

Stem-and-leaf-plot example Set of numbers: 16, 17, 17, 17, 17, 18 Stem-and-leaf display: 1 677778 11

Stem-and-leaf plot for the elevations in meters: stem(heights elevation) ## ## ## ## ## ## ## ## ## ## The decimal point is 3 digit(s) to the right of the 0 1 2 3 4 5 6 11222445555667778 0011123355566779 0027 4999 00122444 2 12

The stem-and-leaf plot shows that there are three groups of states: Alaska The western and Rocky Mountain states (California, Colorado, Washington, Wyoming, Hawaii, Utah, New Mexico, Nevada, Montana, Idaho, Arizona, Oregon) All the other states 13

Note 1 14

Hoosier Hill: Elevation 1257 feet Source: google street view 15

Note 2 16

Compare the stem-and-leaf plot with a density estimate ggplot(heights, aes(x elevation)) geom density() density 3e 04 2e 04 1e 04 0e 00 0 2000 4000 6000 elevation 17

Compare the stem-and-leaf plot with a density estimate ggplot(heights, aes(x elevation)) geom density() density 3e 04 2e 04 1e 04 0e 00 0 2000 4000 6000 elevation Where is Alaska? 17

Compare the stem-and-leaf plot with a density estimate ggplot(heights, aes(x elevation)) geom density() geom rug() density 3e 04 2e 04 1e 04 0e 00 0 2000 4000 6000 elevation Where is Alaska? 18

We have made an advance in understanding this set of numbers! 19

We have made an advance in understanding this set of numbers! What would traditional statistics have to say about these numbers? 19

What if we have a many more numbers, e.g. census data? Source: US Census Bureau Public Information Office, via the National Geographic Society 20

Or a large matrix? Source: Still from “The Matrix” 20

Or graph data? Source: KEGG PATHWAY Database 20

Exploratory vs. Confirmatory Analyses 21

Confirmatory analysis Probability model for the data specified before analysis takes place Given the probability model, test hypotheses or infer parameter values 22

Exploratory analysis: everything else! In particular: Check distributional assumptions Check for outliers Decide on variable transformations Decide on the form of the model: what variables to include BUT: Not limited to the work done before fitting a model! In the highest points example, we had an EDA-based advance that wasn’t related to model fitting at all. 23

What does Tukey say? 24

chapter index on next page 1A. Quantitative detective work Exploratory data analysis is detective work--numerical detective work-or counting detective work--or graphical detective work. A detective investigating a crime needs both tools and understanding. If he has no fingerprint powder, he will fail to find fingerprints on most surfaces. If he does not understand where the criminal is likely to have put his fingers, he will not look in the right places. Equally, the analyst of data needs both tools and understanding. It is the purpose of this book to provide some of each. Time will keep us from learning about many tools--we shall try to look at a few of the most general and powerful among the simple ones. We do not guarantee to introduce you to the "best" tools, particularly since we are not sure that there can be unique bests. Understanding has different limitations. As many detective stories have made clear, one needs quite different sorts of detailed understanding to detect criminals in London's slums, in a remote Welsh village, among Parisian aristocrats, in the cattle-raising west, or in the Australian outback. We do not expect a Scotland Yard officer to do well trailing cattle thieves, or a Texas ranger to be effective in the heart of Birmingham. Equally, very different detailed understandings are needed if we are to be highly effective in dealing with data concerning earthquakes, data concerning techniques of chemical manufacturing, data concerning the sizes and profits of firms in a service industry, data concerning human hearing, data concerning suicide rates, data concerning population growth, data concerning fossil dinosaurs, data concern- 25

ones, there is likely to be nothing for confirmatory data analysis to consider. Experiments and certain planned inquiries provide some exceptions and chapter index on next page partial exceptions to this rule. They do this because one line of data analysis the experiment or inquiry. Even here, however, was planned as part of 1A. Quantitative detective work restricting one's self to the planned analysis--failing to accompany it with Exploratory datasight analysis is detective work--numerical work-exploration--loses of the most interesting results too detective frequently to be or counting detective work--or graphical detective work. comfortable. AAsdetective investigating a crimeus, needs both toolscircumstances and understanding. If he all detective stories remind many of the surrounding has no fingerprint powder, he will fail to find fingerprints on indications most surfaces. If a crime are accidental or misleading. Equally, many of the to be he does notin understand where criminaloris misleading. likely to have his all fingers, he discerned bodies of data arethe accidental To put accept appearwill notaslook in the would right places. Equally, the analyst of data needsdetection both tools ances conclusive be destructively foolish, either in crime or and understanding. It is the purpose of this book to provide some of each. in data analysis. To fail to collect all appearances because some--or even Time willonly keepaccidents us from learning about many shall try to look at most--are would, however, be tools--we gross misfeasance deserving a(and few often of thereceiving) most general and powerful among the simple ones. We do not appropriate punishment. guarantee to introduce you to the "best" tools, particularly since we are not sureExploratory that there can uniquecan bests. databe analysis never be the whole story, but nothing else Understanding differentstone--as limitations. can serve as the has foundation the As firstmany step. detective stories have made clear, one needs quite different sorts of detailed understanding to detect criminals in be London's in aWe remote among We will exploringslums, numbers. need toWelsh handlevillage, them easily andParisian look at aristocrats, in the cattle-raising west, or in the Australian outback. do Tukey, Exploratory Data Analysis (1977) pp.not 1-3 them effectively. Techniques for handling and looking--whetherWe graphical, expect a Scotland Yard officer to do well trailing cattle thieves, or a Texas arithmetic, or intermediate--will be important. The simpler we can make ranger to be effective in the heartlong of as Birmingham. very different these techniques, the better--so they work, Equally, and work well. When detailed understandings are needed if we are to be highly effective in dealing details make an important difference, they deserve--and will get--emphasis. with data concerning earthquakes, data concerning techniques of chemical manufacturing, data concerningreview the sizes and profits of firms in a service questions industry, data concerning human hearing, data concerning suicide rates, data What ispopulation exploratory data analysis? How is fossil it related to confirmatory data concerning growth, data concerning dinosaurs, data concern- 25

Exploratory: Collect everything that even seems to be true about the data, detective in character, “magical thinking” Confirmatory: Given one pre-planned hypothesis, infer parameter values or test hypotheses, judicial in character, set a high bar for what we are willing to believe about the data. 26

The never ending data analysis cycle: 1. Get data. 2. Perform exploratory analysis to suggest a model. 3. Fit the model. 4. Perform exploratory analysis to critique the model and suggest a new model. 5. Return to step 3. This workflow is dangerous! Using the data more than once Assiduous EDA means multiple comparison problems 27

Tukey’s EDA also emphasizes tools and best practices for the practice of data analysis, all pen-and-paper based. 28

The basis of stem-and-Ieaf technique, entering an additional digit--or digits--to mark each value, works well for batches of limited size. Once we have much more than 20 leaves on a stem, however, we are likely to feel cramped--and our stems begin to be hard to count. We ought to be able to Example: Tallying escape to some other way of handling such information, whenever the other way gives us enough detail. Standard method: The fast methods involve one pencil (or pen) stroke per item. One method counts by fives in this style: I II III IIII This has been widely used. The writer finds it treacherous, especially when he tries to go fast. (It is too easy for him to do or for this approach to give satisfactory performance.) The recommended scheme uses first dots, then box lines, then crossed lines to make a final character for 10. Thus: 4 8 10 is is is 29

The basis of stem-and-Ieaf technique, entering an additional digit--or nts bydigits--to fives in mark this each style: value, works well for batches of limited size. Once we s s have much more than 20 leaves on a stem, however, we are likely to feel cramped--and our stems begin to be hard to count. We ought to be able to Example: Tallying escape to some other way of handling such information, whenever the other has been widely used.detail. The writer finds it treacherous, especially way gives us enough Standard method: The fast methods involve onehim pencilto (ordo pen) stroke per item. One method to go fast. (It is too easy for counts by fives in this style: II I I III II IIII when IIIor IIII This has been widely used. The writer finds it treacherous, especially when he this approach to give satisfactory tries to go fast. (It is too easy for himperformance.) to do The Tukey’s recommended proposal:scheme uses first dots, then box lines, then cros s to make a final character for 10.or Thus: for this approach to give satisfactory performance.) is dots, then box lines, then crossed The recommended scheme4 uses first lines to make a final character 8for 10.isThus: 10 4 8 10 is is is is 29

Pen-and-paper methods primarily of historical interest. 30

Pen-and-paper methods primarily of historical interest. Philosophical descendants are the tidyverse packages in R. 30

What about this class? 31

What about this class? Two categories of topics: what to do and how to do it. 31

For what to do, organize by type of data: Univariate data Bivariate data Trivariate/Hypervariate data Categorical data Distance data Graph data Other topics according to interest In addition: Dangers of EDA and how to avoid them 32

In the how to do it bin, we will learn to work with R ggplot2 tidyverse packages 33

How is this class different from others? Machine learning: We put less emphasis on supervised learning. Data mining: More emphasis on visualization. Applied statistics: Less emphasis on 𝑝-values and inference, more flexibility in the methods used. 34

Texts: Cleveland, Visualizing Data Wickham, ggplot2: Elemant Graphics for Data Analysis Wickham and Grolemund, R for Data Science Other notes posted to the class website and canvas as necessary 35

Assessment: Homeworks (30%). Two mini projects (30%). Final project (40%). 36

How to succeed: Practice! Follow along with the code examples, actually type in the commands instead of copying and pasting. Start early on assignments and projects. Presentation matters – make your documents look nice enough thta you would be happy to show them to potential employers as examples of your work. 37

Exploratory data analysis is detective work--numerical detective work--or counting detective work--or graphical detective work. A detective investigating a crime needs both tools and understanding. If he has no fingerprint powder, he will fail to find fingerprints on most surfaces. If he does not understand where the criminal is likely to have .

Related Documents:

Pass4sure 70-470 Dumps with Real Questions & Practice Test - Killexams.com

pass4sure 70-470, 70-470 dumps, 70-470 real questions, 70-470 Question bank, 70-470 braindumps, 70-470 questions and answers, 70-470 Q&A, 70-470 vce, free 70-470 download, Free 70-470 braindumps, 70-470 practice test, 70-470 practice exam, killexams.com 70-470, 70-470 actual test, 70-470 PDF download, 70-470 examcollection, Passleader 70-470 .

16 Views

10m ago

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

99 Views

2y ago

Departmental Structure and Information

STAT 810: Alpha Seminar STAT 822: Statistical Methods ll STAT 821: Statistical Methods l STAT 883: Mathematical Statistics ll STAT 850: Computing Tools Elective STAT 882: Mathematical Statistics l Choose a faculty advisor and form a MS Supervisory Committee STAT 892*: TA Prep Choose an MS Comprehensive Exam option with the

19 Views

1y ago

PSM 900 Personal Monitor Wireless System Frequency …

2 CHANNEL Group 11 Group 12 Group 13 Group 14 Group 15 Group 16 Group 17 Group 18 Group 19 Group 20 1 472.225 470.300 470.500 478.200 486.200 494.200 470.125 470.575 470.525 470.350 2 472.975 472.225 471.400 478.775 486.775 494.775 472.000 472.100 471.575 471.125 3 476.700 477.100 471.925 480.100 488.100 496.100

63 Views

2y ago

DIMENSIONS NOUVEAU MASTER - planeterenault.com

DIMENSIONS NOUVEAU MASTER FOURGON TRACTION Volume utile (m 3) L1H1 L1H2 L2H2 L2H3 L3H2 L3H3 8 9 10,8 12,3 13 14,8 Dimensions extérieures (mm) Longueur hors tout 5 048 5 048 5 548 5 548 6 198 6 198 Largeur hors tout / avec rétro 2 070 / 2 470 2 070 / 2 470 2 070 / 2 470 2 070 / 2 470 2 070 / 2 470 2 070 / 2 470

22 Views

1y ago

7.1ch Home Theater System HT-S6200 - ONKYO Asia and ...

7.1ch Home Theater System HT-S6200 AV Receiver (HT-R670) Speaker Package (HTP-670) Front Speakers (SKF-670) Center Speaker (SKC-670) Surround Speakers (SKR-670) Surround Back Speakers (SKB-670) Powered Subwoofer (SKW-770) Dock for iPod (UP-A1) Instruction Manual Thank you for purchasing an Onkyo 7.1ch Home Theater System. Please read this .

31 Views

2y ago

MET Grid-Stat Tool - dtcenter.org

MET Grid-Stat Tool John Halley Gotway METplus Tutorial July 31 -August 2, 2019 NRL-Monterey, CA. 2 PB2NC ASCII2NC Gridded NetCDF Gridded Forecast Analysis Obs PrepBufr Point STAT ASCII NetCDF Point Obs ASCII . l Grid-Stat, Point-Stat, and Stat-Analysiscan output the ECLV line type.

15 Views

10m ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

59 Views

1y ago

Recent Views

Guide to becoming a barrister in New South Wales - NSW Bar

1. What is a barrister? 7 2. Eligibility to be a barrister 8 3. The New South Wales Bar exam 8 3.1 Registering for the Bar exam 8 3.2 The exam process 9 3.3 Preparing for the Bar exam 9 4. Bar Practice Course 10 4.1 Registering for the Bar Practice Course 10 4.2 Attendance during the Bar Practice Course 11 4.3 Bar Practice Course material 11 5.

1y ago

137 Views

Republic of Mauritius Code of Ethics

clerk, barrister's clerk or clerk or any other employee of any person acting in any of the above capacities. II. GENERAL PRINCIPLES 3. Independence of the Barrister and the Cab-Rank Principle 3.1 The many duties to which a barrister is subject require his absolute independence, free from all other influence, especially such as may arise from

1y ago

129 Views

Simon and Katy Gittins Innovation and the Bar - Clerksroom

Absolute Barrister was the first company formed to take advantage of the direct access rules and its goal is to continue to drive innovation to allow better access to legal services. Husband and wife team first founded Absolute Barrister under a different name in 2011. Absolute Barrister is an innovative, award-winning online

1y ago

121 Views

PLANNERS AS EXPERT WITNESSES - RTPI

When acting as an expert witness your advocate will work with you to ask questions to draw out the key elements of the case. You will also be questioned by the opposing side’s advocate12. Working with your barrister The role of advocate is often carried out by a barrister. In order to perform effectively as an expert

3y ago

244 Views

BARRISTER - Miami

BARRISTER 1 Dennis O. Lynch Is Law School's New Dean D ennis O. Lynch, professor and dean emeritus at the University of Denver College of Law and prominent expert on Latin American law, is the new dean of the University of Miami School of Law. He succeeds Mary Doyle, who had been interim dean since the May 1998 resignation of Samuel C .

1y ago

216 Views

Becoming a Barrister - Bar Council

BARRISTER? "It is wonderful to be able to stand up and represent someone in court using your skills, to win a case for them." Simon O'Toole, 5 Pump Court chambers In England and Wales, the legal profession is split into two main groups: barristers and solicitors, with legal executives making an increasingly important contribution.

1y ago

131 Views

JAMES S. M. KITCHEN Suite 224 BARRISTER & SOLICITOR Airdrie AB T4B 3C3

BARRISTER & SOLICITOR. 203-304 Main St S Suite 224 . Airdrie AB T4B 3C3 . Phone: 403-667-8575 . Email: james@jsmklaw.ca [2] COVID mRNA vaccines (Pfizer and Moderna). She is also compelled to maintain both the physical and spiritual integrity of her body by asserting her God-given prior right to decline

1y ago

206 Views

A Barrister's Guide to Your Personal Injury Claim - Headway

This is the first edition of "A Barrister's Guide to Your Personal Injury Claim". My website www.abarristersguide.org.uk explains that the guide is intended to provide clear, authoritative and independent advice about all aspects of personal injury claims in England and Wales.

1y ago

113 Views

ALTERNATIVE DISPUTE RESOLUTION - Law Reform

Barrister-at-Law LEGISLAT ION DIRECTO. RY. Project Manager for Legislation Directory: Heather Mahon LLB (ling. Ger.), M.Litt., Barrister-at-Law . Legal Researchers: Margaret Devaney LLB Eóin McManus BA, LLB (NUI), LLM (Lond) vi ADMINISTRATION STAFF Head of Administration and Development:

1y ago

117 Views

Albania Albanie

10th and 11th April 2006 at the Peace Palace in The Hague. She is a barrister -at-law and is a member of Gray's Inn, London, United Kingdom. Ms. SITPAH SELVARATNAM, Bachelor of Laws (University of Wales, United Kingdom); Master of Law (University of Cambridge, United Kingdom); Barrister -at-law (Lincoln's Inn,

1y ago

126 Views

Seminar Resolving and Avoiding Construction Disputes - The Hong Kong .

Seminar Resolving and Avoiding Construction Disputes Gary Soo Barrister-at-Law & Chartered Engineer Dates 03/05/2011 Gary Soo Arbitrator, Barrister-at-Law, Chartered Engineer CEDR Accredited Mediator LLM (Peking), LLB & BSc FHKIArb, FCIArb, FIoD, CQP, MIStructE, MICE, MHKIE, MASCE Arbitration and litigation involving commercial and construction .

10m ago

73 Views

by Anthony Trollope - Robert C. Walton

Anthony Trollope (1815-1882) was born in London to a failed barrister and a novelist whose writing for many years supported the family. Financial difficulties forced him to transfer from one school to another and prevented a university education. At age 19 he began work for the Post Office,

3y ago

119 Views

DAMAGES IN Small Claims Court

Deputy Judge, Small Claims Court, Superior Court of Justice . 1:00 p.m. – 1:25 p.m. Damages in Employment Law-Managing Your Client’s Expectations and Effective Advocacy before the Court (15 minutes) Carla Bocci, Barrister & Solicitor, Deputy Judge, Small Claims Court, Superior Court of Justice . 1:25 p.m. – 1:30 p.m.

3y ago

198 Views

A PRACTICAL APPROACH TO PLANNING LAW

A PRACTICAL APPROACH TO PLANNING LAW THIRTEENTH EDITION Victor Moore LLM, BARRISTER Professor of Law Emeritus, University of Reading Michael Purdue LLB, LLM (LONDON), . It furthers the University’s objective of excellence in research, scholarship, and education by p

2y ago

117 Views

INTERNATIONAL SOCIETY - Courts

and people, as well as smuggling migrants across borders and engaging in maritime piracy and cybercrime, and the responses of numerous jurisdictions to these plus other criminal justice problems. - 11 - Richard C.C. Peck, Q.C. (*) Barrister. Vcmcouver. BC. Canada Prof . Ellen S Pod

2y ago

111 Views

Stat 470/670 Lecture 1 - GitHub Pages

It looks like you're using an ad-blocker