Instructor Preview - Hawkes Learning

3y ago
30 Views
4 Downloads
1.55 MB
86 Pages
Last View : 23d ago
Last Download : 3m ago
Upload by : Lilly Kaiser
Transcription

Third EditionInstructor PreviewJames S. Hawkes

WEVIYLNOFOERRDiscovering Statisticsand DataThird EditionWIEVERRJames S. HawkesInstructor PreviewOFYLNO

WEVIYLNOERREditor: Robin HendrixFOAssistant Editors: Wesley Duckett, Amber WidmerDesigners: Trudy Gove, D. Kanthi, E. Jeevan Kumar, U. Nagesh, James Smalls, PatrickThompson, Rebekah Wagner, Tee Jay ZajacCover Design: James Smalls and Patrick ThompsonVP Research & Development: Marcel PrevuznakDirector of Content: Kara RochéWIEA division of Quant Systems, Inc.VERROFYLNO546 Long Point Road, Mount Pleasant, SC 29464Copyright 2017 by Hawkes Learning / Quant Systems, Inc. All rights reserved.No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of the publisher.Printed in the United States of AmericaISBN: 978-1-946158-72-7

Instructor Sample ContentsYLNLetter from the Author. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivDiscovering Statistics and Data Third Edition Table of Contents . . . . . . . . . . . . . . . . . . viChapter 1WEVIStatistics and Problem Solving1.11.3FO1.5The Meaning of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2ERR1.21.4OStatistics as a Career. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3The Data Explosion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4The Fusion of Data, Computing, and Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Chapter 3Visualizing DataYLNO3.4Histograms and Other Graphical Displays of Quantitative Data. . . . . . . . . . . . . . . . 153.5Analyzing Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Chapter 4WIEDescribing Data From One Variable4.1Measures of Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3Measures of Relative Position, Boxplots, and Outliers. . . . . . . . . . . . . . . . . . . . . . . . 65OFVERR

ivPrefaceInstructor PreviewLetter from the AuthorWe strived to maintain the friendly conversational writing style of previous editions. The newedition pays homage to the technology-driven data explosion by the incorporation of largerreal data sets. Additionally, we have a strong focus on data visualization. Some of the moresignificant improvements we have made in this edition include the following.YLNO Technology instructions, data sets, and interactive activities will now reside on ourweb resource for easier access and to allow us to update technology as new versionsbecome available, as well as add other technologies requested by users in the future. The recommendations included in the Guidelines for Assessment and Instruction inStatistics Education (GAISE), published by the American Statistical Association, werecarefully considered and incorporated whenever possible in this edition. As a result,you will find more real data sets, and hopefully more relevant data sets for students inthis edition. In addition, we have incorporated more technology screen shots into thetext so students can see the expected output from an analysis.WEVIFOERR Basic concept questions have been added to help students understand and reflect onwhat they have learned in each section (a GAISE guideline). Expansion of the solution content in our examples to better model the statisticalthinking process for students and aid them in making valid conclusions from data (alsoa GAISE guideline). In this edition we have placed more emphasis on “Big Data” and the problems arisingfrom having large data sets. You will find some unique visualization techniques thatcan be used on large data sets to reveal significant findings that might have remainedhidden otherwise. One of the key feature requests from our customers was a modernization of ourhypothesis testing procedures. To accomplish this, we streamlined our hypothesistesting steps from ten steps to six, revised the null hypothesis to reflect strict equality,incorporated both critical values and P-values for all tests, and established stricterchecking of assumptions prior to conducting a hypothesis test. We also published thenew ASA guidelines on hypothesis testing in the text. New web-based resources and apps have been developed to illustrate and provideinteraction with significant statistical concepts often misunderstood by students.Along with our courseware, these apps will help students with the comprehension ofdifficult concepts such as:WIEVERROFYLNOThe Central Limit Theorem and sampling distributionsRecognition of different distribution typesMulti-variable visualization techniquesConstructing confidence intervals using bootstrappingUnderstanding the relationship between Type I and Type II errors in hypothesis testingThe Direct Mail Game -- using confidence intervals and hypothesis tests

Preface We have added over 60 new examples and over 200 new exercises in this edition. The Table of Contents has also been streamlined to contain fewer individual sectionsin a chapter, while at the same time expanding the content to make each section morerobust and complete. The regression chapter was split into two chapters to discuss Linear Regression andMultiple Regression separately. Important content has been summarized in procedure, formula, or definitionboxes to enhance student learning and to aid students in reviewing the contentfor assessments.WEVI vYLNOERR Updated the technology images and output to reflect the latest versions of Excel,Minitab, and the TI-83/84 Plus calculator. We have also incorporated geospatial datavisualization examples using the R Statistical Programming Language.FOWIEOFVERRYLNO

viPrefaceInstructor PreviewDiscovering Statistics and Data Third Edition Table of ContentsNote: Content subject to changeYLNChapter 1Chapter 4Statistics and Problem Solving1.1The Meaning of DataDescribing and SummarizingData from One Variable1.2Statistics as a Career4.1Measures of Location4.2Measures of DispersionWEIO1.3The Data Explosion1.4The Fusion of Data, Computing,and Statistics4.3Measures of Relative Position,Box Plots, and Outliers1.5Big Data4.4Data Subsetting1.6Introduction to Statistical Thinking4.5Analyzing Grouped Data1.7Descriptive vs. Inferential Statistics4.6Proportions and Percentages1.8The Consequences of Statistical IlliteracyVERRFOChapter 2Data, Reality, and Problem SolvingChapter 5Discovering Relationships2.15.1Collecting DataScatterplots and Correlation2.25.2Data ClassificationFitting a Linear Model2.35.3Time Series Data vs. Cross-Sectional DataEvaluating the Fit of a Linear Model2.45.4Data ResourcesFitting a Linear Time Trend5.5Scatterplots for More Than Two VariablesYLNOChapter 3Chapter 6Visualizing DataProbability, Randomness,and UncertaintyWIE3.1Frequency Distributions3.2Displaying Qualitative Data Graphically6.1Introduction to Probability3.3Constructing Frequency Distributionsfor Quantitative Data6.2Addition Rules for Probability6.3Multiplication Rules for ProbabilityHistograms and Other GraphicalDisplays of Quantitative Data6.4Combinations and Permutations6.5Combining Probability andCounting Techniques6.6Bayes’ Theorem3.43.5 Analyzing GraphsOFVERR

Preface Chapter 7Chapter 11Discrete Probability DistributionsHypothesis Testing: Single Samples7.1 Types of Random Variables11.1 Introduction to Hypothesis Testing7.2 Discrete Random Variables11.2 Testing a Hypothesis abouta Population Mean7.3 The Discrete Uniform Distribution7.4 The Binomial Distribution7.5 The Poisson Distribution7.6 The Hypergeometric DistributionWEVIChapter 8Continuous Probability DistributionsERR8.1 The Uniform Distribution8.2 The Normal Distribution8.3 The Standard Normal DistributionFO8.4 Applications of the Normal Distribution8.5 Assessing Normality8.6 Approximations to Other DistributionsChapter 9YLN11.3 The Relationship between ConfidenceInterval Estimation and Hypothesis TestingO11.4 Testing a Hypothesis about aPopulation Proportion11.5 Testing a Hypothesis about a PopulationStandard Deviation or Variance11.6 Practical Significance vs.Statistical SignificanceChapter 12Inferences about Two Samples12.1 Inference about Two Means:Independent Samples12.2 Inference about Two Means: DependentSamples (Paired Difference)Samples and Sampling Distributions12.3 Inference about TwoPopulation Proportions9.1 Random Samples andSampling Distributions12.4 Inference about Two PopulationStandard Deviations or Variances9.2The Distribution of the Sample Meanand the Central Limit Theorem9.3 The Distribution of the Sample Proportion9.4 Other Forms of SamplingChapter 13WIEYLNORegression, Inference,and Model BuildingVERR13.1 Assumptions of the Simple Linear ModelChapter 10Estimation: Single Samples10.1 Point Estimation of the Population MeanOF10.2 Interval Estimation of the Population Mean10.3 Estimating the Population Proportion10.4 Estimating the Population StandardDeviation or Variance10.5 Confidence Intervals Based on Resampling(Bootstrapping) (Courseware Only)vii13.2 Inference Concerning β113.3 Inference Concerning theModel’s Prediction

viiiPrefaceInstructor PreviewChapter 14Chapter 18Multiple RegressionStatistical Process Control(Courseware Only)14.1 The Multiple Regression ModelYLN14.2 The Coefficient of Determinationand Adjusted R 214.3 Interpreting the Coefficients ofthe Multiple Regression Model14.4 Inference Concerning the MultipleRegression Model and its CoefficientsWEVI14.5 Inference Concerning theModel’s PredictionO14.6 Multiple Regression Models withQualitative Independent VariablesERRChapter 15FOAnalysis of Variance (ANOVA)15.1 One-Way ANOVA15.2 Two-Way ANOVA: TheRandomized Block Design15.3 Two-Way ANOVA: The Factorial DesignChapter 16Looking for Relationshipsin Qualitative Data16.1 The Chi-Square DistributionWIE16.2 The Chi-Square Test for Goodness of Fit16.3 The Chi-Square Test for AssociationVERRChapter 17Nonparametric Tests17.1 The Sign Test17.2 The Wilcoxon Signed-Rank TestOF17.3 The Wilcoxon Rank-Sum Test17.4 The Rank Correlation Test17.5 The Runs Test for Randomness17.6 The Kruskal-Wallis TestYLNO

An Ocean of DataAlthough you usually can’t see it, data is being created by everything electronic—phones, cars, computers, appliances, cameras, airplanes, medical equipment,telescopes, atom smashers, DNA sequencers, environmental monitors, manufacturingsensors, social media sites, email, text, and a multitude of other places. We live ina world where the amount of data being generated is incomprehensible, and itkeeps growing. The phenomenal growth of stored data is one of the significantachievements of modern civilization and can be considered a measure of the technicaladvancement of any civilization.WEVIYLNOBetween 1986 and 2020 the data storage capacity worldwide will have increasedby a factor of more than 15,000. Large amounts of data are impacting all thenatural sciences and leading to new discoveries in physics, biology, astronomy,and cosmology. In addition, new online businesses are changing the way peopleshop, find jobs, find relationships, get directions, get recommendations, and findthe answers to many questions. Our society is in the midst of a data revolutionwhose eventual impact may be greater than the industrial revolution. New wealthand convenience have already been created on a massive scale and there is muchmore to come. One of the companies that has participated in this technologicalrevolution is Amazon.ERRFOAmazon has changed the way we shop. The story of Amazon’s success is a storyin which statistics and data play a very large role. Originally, Amazon just soldbooks. Because they were born as an internet book store, keeping data about theircustomers was relatively easy and straightforward. Amazon tracked, what customers purchased,what they looked at and didn’t buy,how they navigated through their site,whether they were affected by promotions, reviews, or web design layouts,andrelationships between individuals and groups.WIEVERRAmazon saw their data as an opportunity to understand the kind of books customerswanted to read. As Chinese general Sun Tzu said,“Opportunities multiply as they are seized.”OF-Sun TzuWow, did the opportunities multiply for Amazon!Amazon’s success is based on their connection to their customer through the datathey have collected and analyzed. Basically, they have set the standard in onlineretail for the data a company needs to collect to compete. So far, Amazon is at thetop of the online retail mountain. The story you will hear about as you read thisbook is that data represents opportunity to learn something. Because the amountof data being stored in the world is doubling every two years, it seems like there isgoing to be a lot of opportunity for individuals willing and able to tangle with it.YLNO

2Chapter 1Statistics and Problem SolvingThe Meaning of Data1.1Historically we have associated data with measurements and numbers that werepurposefully generated to help solve a problem. For example, in 3800 BC theBabylonian empire was interested in things that could be taxed or have some potentialmilitary value (especially the availability of adult males for their armies). Tryingto solve their taxation and military problem caused Babylonians to perform the firstcensus by counting people, livestock, butter, honey, milk, and other consumablesin their territories. Obtaining data in those days must have been a time consumingand rather expensive task, but the data came from measurements or counts, had apurpose, and there was some expectation the data would be examined later.WEVIYLNOWhat constitutes data is changing. Presently, the average smartphone owner usesabout 3,000,000,000 (3 billion) bytes of data per month, and this number is growingrapidly. The word “data” in this context is different from the historical notion ofpurposeful measurement to solve a problem. When you stream a video on your phonethe data will never be analyzed by anyone. However, anytime you use your webbrowser your movements around the web are probably being recorded in a databaseat your browser provider, who in turn will sell the data on your browser activity toa digital marketing group. The marketing group will employ statistical methods todetermine what and how they can market to you for their business clients. At leastthis data is still a measurement of sorts.ERRFOAnother category of data comes from the desire to create artificial intelligence. Asresearchers confront the problem of reproducing human “intelligence” they mustsolve the same data problems we humans do—comprehending large volumes ofvisual and audio data. An enormousamount of what is considered data in theTHE HUMAN BRAIN Cerebral CortexParietal Lobequest for artificial intelligence are not evenmeasurements in the traditional sense. ForFrontal Lobeexample, recently an artificial intelligenceOccipital Lobecompany taught a computer how to playan old Atari video game in the same wayhumans learn, by looking at the screenusing a video camera as “eyes”. In otherwords, the pixels on the screen were thedata for the machine learning model.WIEVERRTemporal LobeOFYLNOFortunately, humans are born with theability to perform powerful sensory anddata analytic feats. The brain receivesthe equivalent of 100 million bytes ofsensory information (data) for eachCerebellumsecond of sensory experience. The eyesalone generate the equivalent of about 90Brain Stemmillion bytes of information per second.Medulla OblongataAssuming we are awake 16 hours thenSpinal Cordthe eyes produce roughly the equivalentto 5.18 trillion (5,180,000,000,000) bytes of data per day. This data plus audio,olfactory, and tactile data are analyzed by the brain’s 100 billion neurons. Someneurons in the cerebellum are thought to have 200,000 inputs per neuron. Neuronsin the cerebral cortex are thought to have around 10,000 connections per neuron.

1.2Statistics as a CareerAt the other end of the connection spectrum are the neurons in the retina whichonly have a few connections. Which means there are hundreds of trillions of neuralconnections in your brain. Essentially, your brain is a supercomputer that producesyour model of physical reality that you call the “now”.YLNIn addition to creating the “now” our brains produce data driven predictive realitymodels that are extremely useful in making decisions. Do I have enough time tosafely cross the street now or should I wait until the oncoming car goes by? ShouldI make a left turn now or should I wait until the oncoming bus passes? The brainalso designs experiments to gather relevant data for decision making. For example,every morning when you take a shower most people stick their hand or foot intothe shower to sense temperature (gather relevant data) before deciding to jump in.WEVIOStatisticians do the same sort of things you do unconsciously as you go about yourdaily life. They analyze data using pictures, summary measurements and builddata driven predictive models. They develop methods of designing experimentsand gathering data that are cost effective and diminish bias. Essentially, statistics isa “formal” way of thinking with data.ERRFO1.2Statistics as a CareerWe live in the information age, an economy and culture based on computers andinformation. While, there is an overwhelming amount of new data being producedevery year, there have not been large increases in the number of statisticians beingproduced annually until very recently. That is why Hal Varian, Google’s chiefeconomist said,Statistics Degrees Conferred by LevelWIE1992-2015Number of Degrees Conferred“I keep saying the sexy job in the next 10years will be statisticians. People thinkI’m joking but who would’ve guessedthat computer engineers would’ve beenthe sexy job of the 1990s? The abilityto take data—to be able to understand it,to process it, to extract value from it, tovisualize it, to communicate it—that’sgoing to be a hugely important skill inthe next sMaster'sPhD's50001992 1996 2000 2004 2008The chart depicts the number of statisticsYeardegrees conferred in the U.S. by yearand by degree. It also illustrates thegrowing interest in the field of statistics. However, notice that the number of PhDstatisticians isn’t growing nearly as fast as the number with Master’s and Bachelor’sdegrees. Many industry experts are worried that the U.S. isn’t producing enoughhighly-trained data scientists. Therefore, companies are providing considerableincentives in the form of lucrative salaries to convince people that careers in statisticsand data science are well worth their while.We will begin our journey into statistics by first discussing some of the reasonsthere is so much data being produced.20123

4Chapter 1Statistics and Problem SolvingThe Data Explosion1.3To discuss the data explosion, we need to understand numbers of very largemagnitudes. It is not uncommon to have a cell phone or computer that has 64gigabytes (64 billion) bytes of memory. However, data is accumulating in databasesso fast we need much larger numbers to describe the sizes of modern databases.YLNOTable 1.3.1 - Quantifying 00,000,000WEVIFOERR10 910121015101810 21Data Storage Quantities11111gigabyte (1 billion bytes)terabyte (1 trillion bytes)petabyte (1 quadrillion bytes)exabyte (1 quintillion bytes)zettabyte (1 sextillion bytes)As was mentioned earlier the eyes produce the equivalent o

James S. Hawkes Discovering Statistics and Data Third Edition Instructor Preview FOR REVIEW ONLY FOR REVIEW ONLY . Editor: Robin Hendrix Assistant Editors: Wesley Duckett, Amber Widmer Designers: Trudy Gove, D. Kanthi, E. Jeevan Kumar, U. Nagesh, James Smalls, Patrick Thompson, Rebekah Wagner, Tee Jay Zajac Cover Design: James Smalls and Patrick Thompson VP Research & Development: Marcel .

Related Documents:

Hawkes Learning System Lessons: The Hawkes lessons are how you will learn the material for this course. These lessons play the role of lecture and homework in a face-to-face class. There are 31 total lessons to complete on Hawkes as well as two online webtests to complete on Hawkes. Your two lowest Hawkes lessons will be dropped.

also discuss stability for Hawkes processes from a regeneration point of view. We will give a constructive proof of a random time such that the incremented Hawkes process is a Hawkes process in itself independent of the past. Standard Markov chain techniques allow us to prove various asymptotic results for the Hawkes process. Finally, we end by .

Hawkes graphs Paul Embrechts, Matthias Kirchnery RiskLab, Department of Mathematics, ETH Zurich, R amistrasse 101, 8092 Zurich, Switzerland This version: January 23, 2017 Abstract This paper introduces the Hawkes skeleton and the Hawkes graph. These objects summa-rize the branching structure of a multivariate Hawkes point process in a compact .

multi-dimensional Hawkes processes is proposed in [Hall and Willett, 2014], which approximates continuous Hawkes processes in a discrete manner. Multi-dimensional Hawkes processes have achieved promising results in many chal-lenging tasks. However, most of the existing works fo-cus on learning triggering patterns of sequences while few

mance for learning multiple point processes when incorporating their relational information. 2 BACKGROUND We briefly review Neural Hawkes Process [21] and Transformer [29] in this section. Neural Hawkes Process generalizes the classical Hawkes pro-cess by parameterizing its intensity function with recurrent neural networks. Specifically, ( ) Õ

3.2 Multi-dimensional Hawkes Processes In order to model social influence, one-dimensional Hawkes process discussed above needs to be extended to the multi-dimensional case. Specifically, we have U Hawkes processes that are coupled with each other: each of the Hawkes processes corresponds to an in-dividual and the influence between .

2.2. Hawkes Processes Though Poisson processes have many nice properties, they cannot capture interactions between events. For this we turn to a more general model known as Hawkes pro-cesses (Hawkes,1971). A Hawkes process consists of K point processes and gives rise to sets of marked events fs n;c ngN 1, where c n2f1;:::;Kgspecifies the process

2.2 Hawkes Processes Hawkes processes [13] have been widely used to capture self-exciting and mutual-exciting behavior between entities. A few ap-plications are predicting earthquakes, modeling!nancial markets and crime modeling [12, 16, 20]. pted to model information propagation on networks[24 .