How To Lie With Statistics - Heidelberg University

1y ago
2 Views
2 Downloads
929.84 KB
29 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Casen Newsome
Transcription

How to Lie with Statistics a book by Darrell Huff, 1954 Peter Hügel Seminar “How do I lie with statistics?” Supervisor: Prof. Dr. Ullrich Köthe Heidelberg, 17.10.2019

Outline Introduction Simple Ways to Lie Selection Bias The Average Missing Figures Charts and Pictographs Semi Attached Figures Correlation and Causation Causes for Lies Identifying Lies Peter Hügel 1

Introduction Statistics are all around us Actual statistics Assumptions we make based on available information There can be a lot more or a lot less to the statistics we are exposed to They can distort reality while technically being correct Incompetence Ill intent Misinterpretation Peter Hügel 2

Introduction – Nitrate Levels in Groundwater Germany monitors nitrate levels in accordance to an EU conservation directive from 1991 “More and more nitrate in groundwater”[1] “From 2013 to 2017 the average nitrate concentration in the top 15 polluted regions has increased by 40mg/l” What could be wrong with this statement? [1] Rheinische Post, 8.8.2019, im-grundwasser-gefahr-fuer-mensch-und-natur aid-44825553 Peter Hügel 3

Simple Ways to Lie – Selection Bias “The average Yaleman, Class of 1924 makes 24,111 a year.” Time Magazine How was this number derived? Graduates of that year had be asked: How were they located and contacted? Are all equally likely to be found? Are all equally likely to respond? Do they answer honestly? Exaggerate? Understate? Selection bias Peter Hügel 4

Simple Ways to Lie – Selection Bias Imagine a barrel of red and white beans How can we find out the ratio of red and white beans? Count all of them Take a sample from the top Taking a sample requires a uniform distribution Different densities may cause a different distribution A result of a sampling study is no better than the sample it is based on Is a sample representative for the whole distribution? Peter Hügel 5

Simple Ways to Lie – the Average “The average” is ambiguous: Mean – arithmetic average Median – middle value when sorted Mode – value that appears most often Assume a company with workers and management: Mean salary of workers: 2,308 Mean salary of management: 25,000 Mean salary: 3,309 Median: 2,400 Mode: 2,000 Labor Union: Average salary of workers: Average salary of management: 2,000 25,000 Peter Hügel Management: Average salary payed: 3,309 6

Simple Ways to Lie – the Average Time magazine in “A Letter from the Publisher” about their readers: “Their median age is 34 years and their average family income is 7,270 a year” Was the mean used to get a bigger number? Peter Hügel 7

Simple Ways to Lie – Missing Figures “Users report 23% fewer cavities with toothpaste X!” From an independent laboratory Certified by public accountant Are they lying? How was this number obtained? Through the fine print we find out: 12 participants – Statistically inadequate sample size A small group switches to toothpaste X: 1. Distinctly more cavities 2. Distinctly fewer cavities 3. About the same as before Observer selection: Repeat the study, cherry-pick, discard the rest We don’t know how often an experiment has been repeated Peter Hügel 8

Simple Ways to Lie – Missing Figures We can “show” that a coin toss results in tails 75% of the time Tossing a coin 8 times: Outcomes with 75% tails : 86 28 Total possible outcomes: 28 256 Probability to get 75%: 28 256 Binomial Coefficient: 𝑛 𝑛! 𝑘 𝑘! 𝑛 𝑘 ! 𝟎. 𝟏𝟎𝟗 Tossing a coin 128 times: 30 Outcomes with 75% tails : 128 1.48 10 96 Total possible outcomes: 2128 3.4 1038 Probability to get 75%: 1.48 1030 3.4 1038 𝟒. 𝟑 𝟏𝟎 𝟗 Law of small numbers – unpredictable at the beginning Law of large numbers – predictable in the long run Peter Hügel 9

Simple Ways to Lie – Statistical Significance Statistical significance can be expressed with a number Start with the null hypothesis Every toothpaste is the same 50% tails, 50% heads Given the null hypothesis, the p-value is the chance of getting the observed or a more extreme result p-values of the coin results: 6 / 8 tails or more: 1.45 10 1 96 / 128 tails or more: 6.42 10 9 The toothpaste example would probably have a low statistical significance Peter Hügel 10

Simple Ways to Lie – Missing Figures Knowing just an average can be worse than knowing nothing Example – American housing: Mean of 3.6 people per family mainly build houses for 3-4 people Some more information: 35% lie within 1-2 45% lie within 3-4 20% have 5 or more Family Sizes 50% 45% 40% 35% 30% 25% 20% 15% 10% Many families are small, some are large 5% Just the mean of 3.6 can distort the picture 0% Peter Hügel 1-2 3-4 5 11

Simple Ways to Lie – Missing Figures A study about the harmful substances in different tobacco brands: Virtually no difference between brands One had to be at the bottom of the list A huge advertising campaign – “The healthiest cigarette of them all!” Later in the this seminar: “The health effects of smoking” Peter Hügel 12

Simple Ways to Lie – Indicators of Range There are multiple ways of providing a range Standard deviation A measure for variation or dispersion of the set 𝜎 𝑥 𝑥ҧ 2 𝑛 Tobacco study: standard deviation when testing the same brand is very high Box plots – Box & Whiskers plots Display of quartiles 50% of the data is within the box The median is displayed within the box Whiskers can be limited in different ways Draw outliers as points Peter Hügel 13

Simple Ways to Lie – Missing Figures “Electric power is available to more than 3Τ4 of U.S. farms.” Could have been expressed as “Almost 1Τ4 do not [.]” What classifies as “available”? Do they have access to electricity in their homes? Power lines in the vicinity? Within meters? Within kilometers? motivation The statement isn’t false, but little information is conveyed The deceptive thing about the missing figures is, that their absence often goes unnoticed. Peter Hügel 14

Simple Ways to Lie – Charts and pictographs This version saves paper! At a glance it seems to double This seems newsworthy A small increase of 10% is perceived accurately At least the cut is made obvious here Peter Hügel 15

Simple Ways to Lie – Charts and pictographs The increase of 100% can be presented as a pictograph Instead 1 bag of twice the size height 4 times the area, 8 times the volume Sterzer, P. and Rees, G., 2006. Perceived size matters. Nature Neuroscience, 9(3), p.302. Additionally, humans are easily fooled when perceiving size More about charts in the upcoming presentation next week: “How to Lie with Charts” Peter Hügel 16

Simple Ways to Lie – The Semi Attached Figure An advertisement for a new electrical juicer: “Extracts 26% more juice!” 26% more juice than it’s competitors? The figure comes from a comparison to a hand juicer This information is almost irrelevant when buying an electric juicer More people died on airplanes this year than 100 years ago: Does this mean airplanes are becoming more dangerous? There are more people There are more airplanes Peter Hügel 17

Simple Ways to Lie – The Semi Attached Figure While navy personnel counted 9 deaths in 1000, for civilians in New York it was 16 in 1000 Is it safer to be in the navy? Navy consists mostly of young and healthy people “If you can't prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference.” – Darrell Huff, How to Lie with Statistics Things may sound the same at first, but they are not This is also known as the association fallacy More about fallacies in this seminar in Topic 4: Fallacies of Thinking Peter Hügel 18

Simple Ways to Lie – Correlation and Causation Indigenous people from the island Vanuatu assumed having lice causes good health The evidence: Everyone had lice Most sick people had no lice A temperature change of 4-5 Degree is fatal for lice People with fevers had no lice Peter Hügel 19

Simple Ways to Lie – Correlation and Causation Is there a correlation between number of younger siblings and presence of Down Syndrome? Many younger siblings presence of Down Syndrome? Birth order Presence of Down Syndrome Birth order More likely: High maternal age Birth order High maternal age Presence of Down Syndrome Peter Hügel High maternal age Presence of Down Syndrome 20

Simple Ways to Lie – Types of Correlations Correlation by chance Apparent correlation at first, but no correlation after multiple runs Toothpaste example Real correlation, but what is the cause and what is the effect? Income and ownership of stocks: More income more stock ownership More stock ownership more income High maternal age Real correlation, but a third factor is the cause for both Confounder Birth order and Down Syndrome Birth order Presence of Down Syndrome Peter Hügel 21

Simple Ways to Lie – Unlimited Extrapolation Trends can be used to extrapolate on the data A lot of rain positively correlates with the quality of a harvest But too much rainfall ruins the crop Positive correlation may only hold to a point Estimates for the future world population Peter Hügel 22

Causes for Lies Cost of living: Milk price halves Bread price doubles In this case the geometric mean is the appropriate average: “Cost of living up!” Past prices are the base 𝑛 ෑ 𝑥𝑖 “Cost of living down?” New prices are the base 1 𝑛 𝑛 𝑥1 𝑥2 𝑥𝑛 𝑖 1 2 Confusion of base Peter Hügel 200% 50% 100% 23

Causes for Lies – Confusion of base “Buy your Christmas presents now and save 100%” based on the new price 50% cheaper Today this counts as unfair business practice Pay cuts “50% pay cut” 100 0.5 50 “50% pay cut restored” 50 ( 100 0.5) 75 50 1.5 75 Conveniently both cases sound better than they really are Peter Hügel 24

Causes for Lies – Motive Are most lies a product of ill intent? Distortion and manipulation of statistics is not always the work of professional statisticians Legitimate work may be distorted, cherry-picked, and exaggerated for personal gain For most lies in statistics a motive can be found Sensationalize Inflate Confuse Oversimplify “Mistakes” are one-sided Peter Hügel 25

Identifying Lies in Statistics Should only one measure be used to express averages? Statistical methods shouldn’t be rejected arbitrarily Each measure has its place According to the author: Statistics should be taken with a grain of salt One should be able to recognize sound and usable data Competence & integrity of the statistician Competence & integrity of the writer Competence of the reader Maybe statistics can be changed in a way that removes human error from the equation Topic 9: How to do Better? Peter Hügel 26

Identifying Lies in Statistics Who says so? A reputable name does not imply proper representation of the data Who is drawing the conclusions? How does he know? Bad sampling? What’s missing? The distribution might be very unnatural Averages may differ and don’t convey the underlying information well Did somebody change the subject? Association fallacy / semi attached figure Does it make sense? One statistic judged readability based on average word-length Peter Hügel 27

Identifying Lies – Revisiting Nitrate Levels in Groundwater “From 2013 to 2017 the average nitrate concentration in the top 15 polluted regions has increased by 40mg/l” “Average”: In 2013 the “average” was the mean over the whole year In 2017 the “average” was of maxima over multiple days Unspecified and different averages “Top 15 polluted regions”: Nitrate concentrations are measured to ensure safe levels Regions with low concentration are not as interesting Measurement devices are moved to regions with higher concentrations The top 15 regions, the compared samples, change over time When looking at the same regions, nitrate levels have decreased Missing figure selection bias Peter Hügel 28

Simple Ways to Lie -Missing Figures Knowing just an average can be worse than knowing nothing Example -American housing: Mean of 3.6 people per family mainly build houses for 3-4 people Some more information: 35% lie within 1-2 45% lie within 3-4 20% have 5 or more Many families are small, some are large

Related Documents:

Chapter II. Lie groups and their Lie algebras33 1. Matrix Lie groups34 1.1. Continuous symmetries34 1.2. Matrix Lie groups: de nition and examples34 1.3. Topological considerations38 2. Lie algebras of matrix Lie groups43 2.1. Commutators43 2.2. Matrix exponentiald and Lie's formulas43 2.3. The Lie algebra of a matrix Lie group45 2.4.

call them matrix Lie groups. The Lie correspondences between Lie group and its Lie algebra allow us to study Lie group which is an algebraic object in term of Lie algebra which is a linear object. In this work, we concern about the two correspondences in the case of matrix Lie groups; namely, 1.

Chapter 1. Introduction 7 Chapter 2. Lie Groups: Basic Definitions 9 §2.1. Lie groups, subgroups, and cosets 9 §2.2. Action of Lie groups on manifolds and representations 12 §2.3. Orbits and homogeneous spaces 13 §2.4. Left, right, and adjoint action 14 §2.5. Classical groups 15 Exercises 18 Chapter 3. Lie Groups and Lie algebras 21 §3.1 .

Chapter 1. Lie Groups 1 1. An example of a Lie group 1 2. Smooth manifolds: A review 2 3. Lie groups 8 4. The tangent space of a Lie group - Lie algebras 12 5. One-parameter subgroups 15 6. The Campbell-Baker-HausdorfT formula 20 7. Lie's theorems 21 Chapter 2. Maximal Tori and the Classification Theorem 23 1. Representation theory: elementary .

Chapter 1. Introduction 7 Chapter 2. Lie Groups: Basic Definitions 9 §2.1. Lie groups, subgroups, and cosets 9 §2.2. Action of Lie groups on manifolds and representations 12 §2.3. Orbits and homogeneous spaces 13 §2.4. Left, right, and adjoint action 14 §2.5. Classical groups 15 Exercises 18 Chapter 3. Lie Groups and Lie algebras 21 §3.1 .

(1) R and C are evidently Lie groups under addition. More generally, any nite dimensional real or complex vector space is a Lie group under addition. (2) Rnf0g, R 0, and Cnf0gare all Lie groups under multiplication. Also U(1) : fz2C : jzj 1gis a Lie group under multiplication. (3) If Gand H are Lie groups then the product G H is a Lie group .

The Lie algebra g 1 g 2 is called the direct sum of g 1 and g 2. De nition 1.1.2. Given g 1;g 2 k-Lie algebras, a morphism f : g 1!g 2 of k-Lie algebras is a k-linear map such that f([x;y]) [f(x);f(y)]. Remarks. id: g !g is a Lie algebra homomorphism. f: g 1!g 2;g: g 2!g 3 Lie algebra homomorphisms, then g f: g 1! g 2 is a Lie algebra .

2 It did not seem to be such a big deal!!!! The lie didnt seem to be a really big lie Just a slight twisting or misrepresentation of the truth What some people today call Za little white lie [ Even the lie itself was cloaked within another lie At this point, please allow me to share two passages of Scripture with you Proverbs 6:16-17 - NIV - Don't be foolish!!!