ADA1: Chapter 9: Introduction To The Bootstrap

2y ago

18 Views

2 Downloads

1.35 MB

36 Pages

Last View : 12d ago

Last Download : 3m ago

Upload by : Jamie Paz

Report this link

Download PDF

Transcription

BootstrapIThe bootstrap as a statistical method was invented in 1979 byBradley EfronIThe idea is nonparametric, but is not based on ranks, and is verycomputationally intensive.IThe bootstrap simulates the sampling distribution for certainstatistics, when it is difficult to derive the distribution from theory.IThe sampling distribution then is usually used in order to getconfidence intervals.ADA1December 5, 20171 / 36

Example: want a confidence interval for the medianTo get a confidence interval for the medianIthe Wilcoxon test might be used—–based on ranks, which is a simplification of the data.—–doesn’t take full advantage of the data.What are other ways to get a confidence interval for the populationmedian?IThere isn’t a Central Limit Theorem that applies to sample medians.IIf the sample median is used to estimate the population median, it isusually difficult to know what an appropriate standard error is needed,especially if the underlying distribution is unknown.ADA1December 5, 20172 / 36

BootstrapThe bootstrap is a way to get confidence intervals for quantities like odds,medians, quantiles and other aspects of a distribution where the standarderrors are difficult to derive.IThe bootstrap assumes that the data is representative of thepopulation.——if you sample from the data, then this is similar to sample fromthe population as a whole.IResampling: instead of sampling repeatedly from the population, wesample repeatedly from the sample itself, hoping that the sample isrepresentative of the population. This procedure is called resampling.ADA1December 5, 20173 / 36

Bootstrap ProcedureSuppose θ is the parameter of interest, θ̂ is the estimator of θ using theoriginal sample1. Treat original sample as population, then draw “resamples” withreplacement from the original sample2. Take R bootstrap resamples, obtaining θ̂1 , · · · , θ̂R .3."#2RR1X1 Xθ̂r θ̂rV̂B (θ̂) R 1Rr 1orr 1RV̂B (θ̂) 1 X(θ̂r θ̂)2R 1r 14. 95% CI of θ: [q2.5% , q97.5% ]ADA1December 5, 20174 / 36

Bootstrap ExampleIget an estimate of the mean µ from a normal distribution with mean0 and standard deviation 1, sample size is n 20.Icompare the bootstrap CI and t-based confidence intervals x - rnorm(20,0,1) x - sort(x) options(digits 3) x-3.2139 -0.6799 -0.6693 -0.2472 -0.2196-0.1190 -0.0459 -0.0148 0.0733 0.12200.1869 0.2759 0.3283 0.4984 0.54290.9491 1.0510 1.4324 1.4534 1.7554ADA1December 5, 20175 / 36

BootstrapTo get a resample: sample with replacement.—-resample will be similar to the original sample, but not exactly thesame as the original sample.—-the resample should have approximately the same mean, median, andvariance as the original.b - sample(x,replace T) sort(b)-0.6799 -0.6799 -0.6693 -0.2196 -0.0459-0.0148 -0.0148 0.1220 0.1220 0.18690.1869 0.2759 0.3283 0.4984 0.49840.5429 0.5429 1.0510 1.0510 1.4324The observation -0.6799 shows up twice in the resample, while -3.2139doesn’t show up at all.ADA1December 5, 20176 / 36

Bootstrap mean(x)[1] 0.173 mean(b)[1] 0.226 median(x)[1] 0.154 median(b)[1] 0.187 sd(x)[1] 1.05 sd(b)[1] 0.564ADA1December 5, 20177 / 36

BootstrapNow repeat this procedure many times—– take a look how variable the resampled values are, such as the mean,median, and standard deviation.I - 1000boot.mean - 1:Iboot.median - 1:Iboot.sd - 1:Ifor(i in 1:I) {b - sample(x,replace T)boot.mean[i] - mean(b)boot.median[i] - median(b)boot.sd[i] - sd(b)}ADA1December 5, 20178 / 36

d)ADA1December 5, 20179 / 36

BootstrapThere is an outlier, but it was simulated using x - rnorm(20) in R.ADA1December 5, 201710 / 36

Bootstrap CILook at the 2.5 and 97.5 percentiles of the bootstrap distribution.Isorting the variables boot.mean, boot.median, and boot.sd andexamining the appropriate values.Ithe bootstrap distribution can be visualized by a histogram of thebootstrapped sample statistics.Ifor I 1000 bootstraps, the 25th and 976th observations can be usedsince observations 26, 27, . . . , 975 is exactly 950 observations, themiddle 95% of the bootstrap distribution.ADA1December 5, 201711 / 36

Bootstrapboot.mean - sort(boot.mean)boot.median - sort(boot.median)boot.sd - sort(boot.sd)CI.mean - c(boot.mean[25],boot.mean[976])CI.median - c(boot.median[25],boot.median[976])CI.sd - c(boot.sd[25],boot.sd[976])CI.mean#[1] -0.315 0.521CI.median# -0.119 0.521CI.sd#[1] 0.522 1.563ADA1December 5, 201712 / 36

BootstrapCompare to the t-based interval for the mean and Wilcoxon-based intervalfor the median.CI.mean from bootstrap#[1] -0.315 0.521 t.test(x) conf.int[1] -0.317 0.663CI.median from bootstrap# -0.119 0.521 wilcox.test(x,conf.int T) conf.int[1] -0.117 0.657The bootstrap CI for the mean, is quite similar to the t-based CI for themean,The bootstrap CI for the median is similar to the Wilcoxon-based CI forthe median.ADA1December 5, 201713 / 36

BootstrapIn addition to means and medians, you can get intervals for otherquantities, such as the 80th percentile of the distribution (here sort eachbootstrap data set, sort it, and pick the 80th percentile, corresponding toobservation 16 or 17 in the sorted sample).For proportion data, you get functions of proportions such as risk ratio andodds ratios.ADA1December 5, 201714 / 36

BootstrapDistribution of the risk ratio.—– Risk ratios are often used in medicine. For example, given eitheraspirin or placebo, the number of strokes is recorded for subjects in astudy. The results are as follow:aspirinplacebostroke11998no stroke1091810936subjects1103711034Proportions of strokes for aspirin versus placebo takers:pb1 119 0.0108,11037pb2 98 0.0088811034where p1 is the proportion of aspirin takers who had a stroke and p2 is theproportion of placebo takers who experienced a stroke.ADA1December 5, 201715 / 36

BootstrapThe proportions can be compared by using a test of proportions. However,an issue with this is that the proportions involved are very small: prop.test(c(119,98),c(11037,11034),correct F)2-sample test for equality of proportions withoutcontinuitydata: c(119, 98) out of c(11037, 11034)X-squared 2, df 1, p-value 0.2alternative hypothesis: two.sided95 percent confidence interval:-0.000703 0.004504sample estimates:prop 1 prop 20.01078 0.00888ADA1December 5, 201716 / 36

BootstrapFor this type of problem, often instead a risk ratio, or relative risk isreported.—–This gives you an idea of how much more risky it is to have onetreatment than another in relative terms, without giving an idea of theabsolute risk.—–an estimate for the relative risk ispb1 /bp2 1.21The relative risk is 1.21, which indicates that a random person selectedfrom the aspirin group was 21% more likely to experience a stroke than aperson from the placebo group, even though both groups had a fairly lowrisk (both close to 1%) of experiencing a stroke. In medical examples, arelative risk of 1.21 is fairly large.We’d also like to get an interval for the relative risk.ADA1December 5, 201717 / 36

BootstrapThe usual approachItake the logarithm of the relative risk, get an interval for thelogarithm of the relative riskIthen transform the interval back into the original scale.Ithe reason for this is that the logarithm of a ratio is a difference, andfor sums and differences, it is much easier to derive reasonablestandard errors.ADA1December 5, 201718 / 36

Bootstraptreatmentplacebo (or control)outcome (e.g., stroke)x1x2no outcomen1 x1n2 x2subjectsn1n2c pb1 /bLet RRp2 be the estimated relative risk or risk ratio. The standardlarge sample CI for the log iss(n1 x1 )/x1 (n2 x2 )/x2 n1n2r1111 x1 n1 x2 n2c zcritCI log(RR)c zcrit log(RR)ADA1December 5, 201719 / 36

BootstrapTo get the interval on the original scale, you then expontiate bothendpoints. In the stroke example,rr11111111 0.136SE x1 n1 x2 n2119 11037 98 11034The 95% interval is for log RR is therefore (here, log 1.21 1.91):0.191 1.96(0.136) ( 0.0756, 0.458)This is an interval for the log of the relative risk.ADA1December 5, 201720 / 36

BootstrapExponentiating the interval, we get (0.927, 1.58). This is done using exp(.191-1.96*.136)[1] 0.927 exp(.191 1.96*.136)[1] 1.58The interval includes 1.0, which is the value that corresponds to equalrisks. The value 0.927 corresponds to the risk for the aspirin group being92.7% of the risk of the placebo group, while 1.58 corresponds to theaspririn gorup have a risk that is 58% higher than the placebo group.ADA1December 5, 201721 / 36

BootstrapHow to do bootstrapping for proportion data?Here we create data sets of 0s and 1s and bootstrap those data sets.ADA1December 5, 201722 / 36

BootstrapADA1December 5, 201723 / 36

BootstrapADA1December 5, 201724 / 36

BootstrapFor a two-sample proportion case, we need two sets of 0s and 1s (i.e., redand blue) to represent the placebo group and the treatment (aspirin)group).ADA1December 5, 201725 / 36

Bootstrap codeaspirin - c(rep(1,119),rep(0,11037-119))placebo - c(rep(1,98),rep(0,11034-98))boot.rr - 1:1000boot.or - 1:1000for(i in 1:1000) {aspirin.b - sample(aspirin,replace TRUE)placebo.b - sample(placebo,replace TRUE)boot.rr[i] - mean(aspirin.b)/mean(placebo.b)p1hat - mean(aspirin.b)p2hat - mean(placebo.b)boot.rr[i] - p1hat/p2hatboot.or[i] - p1hat*(1-p1hat)/(p2hat*(1-p2hat))} c(sort(boot.rr)[25],sort(boot.rr)[976])[1] 0.9286731 1.6014550 c(sort(boot.or)[25],sort(boot.or)[976])[1] 0.929285 1.594332ADA1December 5, 201726 / 36

BootstrapThe bootstrap intervals for relative risk [0.9286731, 1.6014550] areremarkably close to the interval obtained by exponentiating the interval forthe log of the relative risk [0.927, 1.58].ADA1December 5, 201727 / 36

Bootstrap regression problemsBootstrapping can also be applied to more complex data sets such asregression problems.Ibootstrap each row in the data set—–this means that if xi appears in the bootstrap sample, then sodoes the pair (xi , yi ).Ito sample rows of the data set, randomly bootstrap the index for therow you want to include in the bootstrap sample, then apply the rowsto a new, temporary data set, or just new vectors for the x and yvariables.ADA1December 5, 201728 / 36

Bootstrap codex - read.table("couples.txt",header T)attach(x)a - lm(HusbandAge WifeAge)abline(a,lwd 3)plot(WifeAge,HusbandAge)abline(a,lwd 3)for(i in 1:100) {boot.obs - sample(1:length(WifeAge),replace T)boot.WifeAge - WifeAge[boot.obs]boot.HusbandAge - HusbandAge[boot.obs]atemp - lm(boot.HusbandAge boot.WifeAge)abline(atemp,col "grey")}abline(a,lwd 3) # original is hidden by bootstrap linesADA1December 5, 201729 / 36

Bootstrap, 100 replicatesADA1December 5, 201730 / 36

Bootstrap, 100 replicatesADA1December 5, 201731 / 36

Bootstrap, about outliersAn interesting feature of the bootstrap is how it handles outliers. If a dataset has an outlier, what is the probability that the outlier is included in onebootstrap sample?The probability that the outlier is not included is 1 nP(no outlier) 1 nwhere n is the number of observations. The reason is that eachobservation in the bootstrap sample is not the outlier with probabilityn 11 1 nnbecause there are n 1 ways to got an observation other than the outlier,and each of the n observations is equally likely.ADA1December 5, 201732 / 36

BootstrapIf n is large, then 1 n 0.368P(no outlier) 1 nHow large is large?n236122030100(1 r 5, 201733 / 36

BootstrapApproximately 1 e 1 63% of bootstrap replicates DO have theoutlier, but a substantial proportion do not have the outlier.IThis can lead to interesting bootstrap histograms, where if the outlieris strong enough, the bootstrap samples can be bi- or multi-modal,where the number of modes is the number of times that the outlierwas included in the bootstrap sample (recall that in a bootstrapsample, an original observation can occur 0, 1, 2, . . . , n times in theory.IThe number of times the outlier appears in a bootstrap sample is abinomial random variable with parameters n and p 1/n. For a dataset with 100 regular observations and 1 outlier, the probability thatthe outlier occurs k times, for k 0, . . . 4 is dbinom(0:4,101,p 1/101)[1] 0.36605071 0.36971121 0.18485561 0.06100235 0.01494558ADA1December 5, 201734 / 36

Bootstrap code x - rnorm(100)x - c(x,10) #add 10 as an outlierboot.sd - 1:10000for(i in 1:10000) {temp - sample(x,replace T)boot.sd[i] - sd(temp)}hist(boot.sd,nclass 30)ADA1December 5, 201735 / 36

ADA1December 5, 201736 / 36

Bootstrap I The bootstrap as a statistical method was invented in 1979 by Bradley Efron I The idea is nonparametric, but is not based on ranks, and is very computationally intensive. I The bootstrap simulates the sampling distribution for certain statistics, when it is di cult to derive the distribution from theory. I The sampling distribution then is usually used in order to get

Related Documents:

Heir of Fire (Throne of Glass Book 3) - WordPress.com

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

357 Views

3y ago

ADA1: Chapter8, Correlation & Regression

Chapter 8: Correlation & Regression ADA1 November 12, 2017 14 / 105. Chapter 8: Correlation & Regression CIs and hypothesis tests can be done for correlations using cor.test(). The test is usually based on testing whether the population

48 Views

2y ago

MOCKINGBIRD harper lee - ICRRD

TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26

108 Views

1y ago

To Kill A Mockingbird - franglish.fr

DEDICATION PART ONE Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 PART TWO Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 .

343 Views

2y ago

Cecilia Fitzpatrick, devoted Tupperware business owner and ...

About the husband’s secret. Dedication Epigraph Pandora Monday Chapter One Chapter Two Chapter Three Chapter Four Chapter Five Tuesday Chapter Six Chapter Seven. Chapter Eight Chapter Nine Chapter Ten Chapter Eleven Chapter Twelve Chapter Thirteen Chapter Fourteen Chapter Fifteen Chapter Sixteen Chapter Seventeen Chapter Eighteen

202 Views

3y ago

COMBINED EDITION Solutions Manual

18.4 35 18.5 35 I Solutions to Applying the Concepts Questions II Answers to End-of-chapter Conceptual Questions Chapter 1 37 Chapter 2 38 Chapter 3 39 Chapter 4 40 Chapter 5 43 Chapter 6 45 Chapter 7 46 Chapter 8 47 Chapter 9 50 Chapter 10 52 Chapter 11 55 Chapter 12 56 Chapter 13 57 Chapter 14 61 Chapter 15 62 Chapter 16 63 Chapter 17 65 .

196 Views

3y ago

ERIN - ~ Book Bee

HUNTER. Special thanks to Kate Cary. Contents Cover Title Page Prologue Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter

148 Views

2y ago

A Trial of Sorcerers

Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 . Within was a room as familiar to her as her home back in Oparium. A large desk was situated i

91 Views

2y ago

Recent Views

SHORT RUSSIAN PHRASEBOOK FOR ENGLISH-SPEAKING TRAVELERS .

Russian words in English. Version 4.0 December 2011 English-Russian phrases for trips to Belarus (Russia) compiled by Andrei Burdenkov, a certified Minsk guide a@minskguide.travel 2 Common Russian phrases. Numerals Russian phrase Say it in Russian English translation 0 – ноль nol' zero 1 – один Odin one 2 – два Dva two 3 – три Tri three

3y ago

209 Views

UTOMOTIVEEMC - unitest

AUTOMOTIVEEMC SAFETY&EMC2011 Table3ISO11452-2severitylevels SeverityLevel Field/(V/m) I25 II50 III75 IV100 V(opentotheusersofthestandard) a

3y ago

38 Views

DELHI TECHNOLOGICAL UNIVERSITY

4 ME104 Basic Mechanical Engineering AEC 4 4 0 0 3 0 25 - 25 50 - 5 ME106 Workshop Practice AEC 2 0 0 3 0 3 - 50 - - 50 6 HU102 Communication Skills HMC 3 3 0 0 3 0 25 - 25 50 - Total 21 16 1 7. ME-9 II Year: Odd Semester S.No. Code Title Area Cr L T P TH PH CWS PRS MTE ETE PRE 1. PE251 Engineering Materials & Metallurgy AEC 4 3 0 2 3 0 15 15 30 40 2. ME201 Mechanics of Solids DCC 4 3 0 2 3 0 .

3y ago

49 Views

Director Due Diligence

what to consider before nominating a potential director onto a board and should not be considered as a checklist in deciding whether to accept a potential candidate or not. From a potential director’s perspective, the paper aims to guide the individual on what to consider about a company prior to accepting an appointment. Terminology used in the paper Whilst the terms “company” and .

3y ago

154 Views

Hierarchical Level and Leadership Style

ORGANIZATIONAL BEHAVIOR AND HUMAN PERFORMANCE 18, 131--145 (1977) Hierarchical Level and Leadership Style ARTHUR G. JAGO AND VICTOR H. VROOM School of Organization and Management, Yale University This research investigates the relationship between the hierarchical level of managerial personnel and individual differences in their leadership styles, specifically the degree to which they are .

3y ago

64 Views

EFEKTIVITAS MEDIA PEMBELAJARAN E-LEARNING

EFEKTIVITAS MEDIA PEMBELAJARAN E-LEARNING TERHADAP PRESTASI BELAJAR PENDIDIKAN AGAMA ISLAM SISWA DI SMA NEGERI 1 YOGYAKARTA SKRIPSI Diajukan Kepada Fakultas Ilmu Tarbiyah dan Keguruan Universitas Islam Negeri Sunan Kalijaga Yogyakarta Untuk Memenuhi Sebagian Syarat Memperoleh Gelar Sarjana Strata Satu Pendidikan Islam Disusun oleh: Aldila Siddiq Hastomo (09410111) PENDIDIKAN AGAMA ISLAM .

3y ago

179 Views

USAHA PENGERINGAN EMPON-EMPON BAHAN OBAT HERBAL DI .

Keywords: empon-empon, herbal medicine, production, management, marketing. PENDAHULUAN Indonesia merupakan salah satu negara penghasil komoditi obat-obatan yang potensial. Aneka ragam jenis tanaman obat telah diproduksi sebagai bahan baku obat modern maupun obat tradisional (jamu). Prospek pengembangan produksi tanaman obat cukup cerah antara lain karena berkembangnya industri obat modern dan .

3y ago

92 Views

Sara Cristina Oliveira A Guitarra clássica Marques Almeida .

caracterização técnica, estilística e estética do repertório para guitarra de Carlo Domeniconi. Associados a este ponto fundamental, apresenta-se uma abordagem histórica e analítica dos recursos técnicos e estilísticos da guitarra, e, de uma forma sucinta dada a natureza do trabalho, os movimentos estéticos da vanguarda e a sua reflexão na criação artística, que constituem um .

3y ago

36 Views

UNCLASSIFIED//FOR OFFICIAL USE ONLY DELIBERATIVE DOCUMENT .

to provide intelligence and information on transnational organized crime, terrorism, cyber-threat actors, counterintelligence vulnerabilities, economic security, and other developing threats that pose a critical danger to the Nation’s security and our democratic way of life.

3y ago

89 Views

2020 Asia Pacific Virtual Summit Agenda Overview

Cyber Threat Hunting: Resourcing & Methods Sindhu HS, Vice President, Goldman Sachs y To effectively and efficiently execute cyber threat missions leveraging collaborative Tiger Teams to address resource constraints common in every organization y To proactively identify previously undetected malicious activity and

3y ago

28 Views

Corporate Governance and Controls: The Federal Reserve’s .

The Federal Reserve has proposed new supervisory guidance on corporate governance (Governance Proposal) that would apply to large U.S. financial institutions . its internal risk management, its internal controls and, for U.S. G-SIBs, its recovery

3y ago

20 Views

Cross Stitch Tutorial - allstitches4you

cross stitch kit. It is better to cut and fasten off your thread at the back of the needle work as normal, and start again at the new area of the design. Half Cross Stitch Many projects now have areas worked in half cross stitch, for example to give a "soft focus" background.

3y ago

32 Views

Service Manual for WP6 Diesel Engine

Service Manual for WP6 Diesel Engine Preface WP6 mechanical pump series diesel engine has the features of compact structure, reliable

3y ago

121 Views

From Eye to Insight

medaka, and Xenopus, are often used in molecular and developmental biology. An adult zebrafish is shown below. Adult zebrafish (Danio rerio). In molecular and developmental biology, these aquatic vertebrate model organisms are widely applied to study molecular processes of development and as disease models. To study these molecular

3y ago

25 Views

Geometry SIG Update - POSC Caesar

ISO 15926 P&ID/3D Information Models and Proteus Schema Mapping Project yCurrent Team Members – Adrian Laud (Lead), Keith Willshaw, Andrew Prosser, Manoj Dharwadkar yDocument P&ID and 3D Models in terms of Classes and Templates yMapping the Proteus Schema to the ISO 15926 P&ID/3D Information Model

3y ago

45 Views

ADA1: Chapter 9: Introduction To The Bootstrap

It looks like you're using an ad-blocker