Advanced Computational Methods In Statistics Lecture 4 Bootstrap

5m ago

8 Views

1 Downloads

1.06 MB

67 Pages

Last View : 14d ago

Last Download : 3m ago

Upload by : Brenna Zink

Report this link

Download PDF

Transcription

Advanced Computational Methods in Statistics Lecture 4 Bootstrap Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/ agandy London Taught Course Centre for PhD Students in the Mathematical Sciences Autumn 2015

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Sample Mean/Median Sources of Variability An Example of Bootstrap Failure Confidence Intervals Hypothesis Tests Asymptotic Properties Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Axel Gandy Bootstrap 2

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Introduction I Main idea: Estimate properties of estimators (such as the variance, distribution, confidence intervals) by resampling the original data. I Key paper: Efron (1979) Axel Gandy Bootstrap 3

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Slightly expanded version of the key idea I Classical Setup in Statistics: X F, I I I I F Θ where X is the random object containing the entire observation. (often, Θ {Fa ; a A} with A Rd ). Tests, CIs, . . . are often built on a real-valued test statistics T T (X ). Need distributional properties of T for the “true” F (or for F under H0 ) to do tests, construct CIs,. . . (e.g. quantiles, sd, . . .). Classical approach: construct T to be an (asymptotic) pivotal quantity, with distribution not depending on the unknown parameter. This is often not possible or requires lengthy asymptotic analysis. Key idea of bootstrap: Replace F by (some) estimate F̂ , get distributional properties of T based on F̂ . Axel Gandy Bootstrap 4

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Mouse Data (Efron & Tibshirani, 1993, Ch. 2) I 16 mice randomly assigned to treatment or control I Survival time in days following a test surgery Group Data Mean (SD) Median (SD) Treatment 94 197 16 38 99 141 23 86.86 (25.24) 94 (?) Control 52 104 146 10 51 30 40 27 46 56.22 (14.14) 46 (?) Difference: 30.63 (28.93) 48 (?) I Did treatment increase survival time? I A goodPestimator of the the standard deviation of the mean x̄ n1 ni 1 xi is the sample error q Pn 1 2 ŝ n(n 1) i 1 (xi x̄) I What estimator to use for the SD of the median? What estimator to use for the SD of other statistics? I Axel Gandy Bootstrap 5

Bootstrap Principle I test statistic T (x), interested in SD(T (X)) I Resampling with replacement from x1 , . . . , xn gives a bootstrap sample x (x1 , . . . , xn ) and a bootstrap replicate T (x ). I get B independent bootstrap replicates T (x 1 ), . . . , T (x B ) I estimate SD(T (X)) by the empirical standard deviation of T (x 1 ), . . . , T (x B ) x1 . x . xn dataset bootstrap samples x 1 T (x 1 ) x 2 T (x 2 ) . . . . x B T (x B ) bootstrap replications

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Back to the Mouse Example I B 10000 I Mean: I Treatment Control Difference Median: Treatment Control Difference Mean 86.86 56.22 30.63 bootstrap SD 23.23 13.27 26.75 Median 94 46 48 Axel Gandy bootstrap SD 37.88 13.02 40.06 Bootstrap 7

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Illustration Real World Observed Unknown Random Probability Sample Model x (x1 , . . . , xn ) P T (x) Statistic of Interest Axel Gandy Bootstrap World Estimated Bootstrap Probability Sample Model x (x1 , . . . , xn ) P̂ T (x ) Bootstrap Replication Bootstrap 8

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Sources of Variability I sampling variability (we only have a sample of size n) I bootstrap resampling variability (only B bootstrap samples) sample unknown probability measure P bootstrap samples x 1 T (x 1 ) x 2 T (x 2 ) . . . . x B T (x B ) x sampling variability Axel Gandy bootstrap sampling variability Bootstrap 9

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Parametric Bootstrap I Suppose we have a parametric model Pθ , θ Θ Rd . I θ̂ estimator of θ I Resample from the estimated model Pθ̂ . Axel Gandy Bootstrap 10

Example:Problems with (the Nonparametric) Bootstrap I I I I I X1 , . . . , X50 U(0, θ) iid, θ 0 MLE θ̂ max(X1 , . . . , X50 ) 0.989 Non-parametric Bootstrap: sampled indep. from X , . . . , X X1 , . . . , X50 1 50 with replacement. Parametric Bootstrap: X1 , . . . , X50 U(0, θ̂) Resulting CDF of θ̂ max(X1 , . . . , X50 ): Parametric Bootstrap 0.8 Fn(x) 0.4 0.6 Fn(x) 0.4 0.6 0.8 1.0 1.0 Nonparametric Bootstrap 0.2 0.2 0.80 I 0.85 0.90 x 0.0 0.0 0.95 1.00 0.80 0.85 0.90 x 0.95 1.00 In the nonparametric bootstrap: Large probability mass at θ̂. n In fact P(θ̂ θ̂) 1 (1 1/n)n 1 e 1 .632

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Three Types of Confidence Intervals Example - Exponential Distribution Hypothesis Tests Asymptotic Properties Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Axel Gandy Bootstrap 12

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Plug-in Principle I I Many quantities of interest can be written as a functional T of the underlying probability measure P, e.g. the mean can be written as Z T (P) xd P(x). I Suppose we have iid observation X1 , . . . , Xn from P. Based on this we get an estimated distribution P̂ (empirical distribution or parametric distribution with estimated parameter). I We can use T (P̂) as an estimator of T (P). For the mean and the empirical distribution P̂ of the observations Xi this is just the sample mean: n Z T (P̂) xd P̂(x) 1X Xi n i 1 Axel Gandy Bootstrap 13

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Plug-in Principle II I To determine the variance of the estimator T (P̂), compute confidence intervals for T (P), or conduct tests we need the distribution of T (P̂) T (P). I Bootstrap sample: sample X1 , . . . , Xn from P̂; gives new estimated distribution P . I Main idea: approximate the distribution of T (P̂) T (P) by the distribution of T (P ) T (P̂) (which is conditional on the observed P̂). Axel Gandy Bootstrap 14

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Bootstrap Interval I Quantity of interest is T (P) I To construct a one-sided 1 α CI we would need c s.t. P(T (P̂) T (P) c) 1 α. Then a 1 α CI would be ( , T (P̂) c). Of course, P and thus c are unknown. I Instead of c use c given by P̂(T (P ) T (P̂) c ) 1 α This gives the (approximate) confidence interval ( , T (P̂) c ) I Similarly for two-sided confidence intervals. Axel Gandy Bootstrap 15

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Studentized Bootstrap Interval I Improve coverage probability by studentising the estimate. I quantity of interest T (P), measure of standard deviation σ(P) I Base confidence interval on I Use quantiles from T (P̂) T (P) σ(P̂) T (P ) T (P̂) . σ(P ) Axel Gandy Bootstrap 16

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Efron’s Percentile Method I Use quantiles from T (P ) I (less theoretical backing) I Agrees with simple bootstrap interval for symmetric resampling distributions, but does not work well with skewed distributions. Axel Gandy Bootstrap 17

Example - CI for Mean of Exponential Distribution I I X1 , . . . , Xn Exp(θ) iid I Confidence interval for E X1 1θ . I Nominal level 0.95 One-sided confidence intervals: Coverage probabilities: 10 Normal Approximation 0.845 0.817 Bootstrap Bootstrap - Percentile Method 0.848 0.902 Bootstrap - Studentized I I I I I 20 0.883 0.858 0.876 0.922 40 0.904 0.892 0.906 0.942 80 0.919 0.922 0.92 0.949 160 0.928 0.917 0.932 0.946 320 0.934 0.94 0.94 0.944 100000 replications for the normal CI, bootstrap CIs based on 2000 replications with 500 bootstrap samples each Substantial coverage error for small n Coverage error & as n % Studentized Bootstrap seems to be doing best.

Example - CI for Mean of Exponential Distribution II Two-sided confidence intervals Coverage probabilities: 10 Normal Approximation 0.876 0.828 Bootstrap Bootstrap - Percentile Method 0.854 0.944 Bootstrap - Studentized I I I I 20 0.914 0.89 0.896 0.943 40 0.93 0.906 0.921 0.936 80 0.947 0.928 0.926 0.936 160 0.949 0.936 0.923 0.954 Number of replications as before Smaller coverage error than for one-sided test. Again the studentized bootstrap seems to be doing best. 320 0.95 0.942 0.93 0.946

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests General Idea Example Choice of the Number of Resamples Sequential Approaches Asymptotic Properties Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Axel Gandy Bootstrap 20

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Hypothesis Testing through Bootstrapping I I I I I I I Setup: H0 : θ Θ0 v.s. H1 : θ / Θ0 Observed sample: x Suppose we have a test with a test statistic T T (X) that rejects for large values p-value, in general: p supθ Θ0 Pθ (T (X) T (x)) If we know that only θ0 might be true: p Pθ0 (T (X) T (x)) Using the sample, find estimator P̂0 of the distr. of X under H0 Generate iid X 1 , . . . , X B from P̂0 Approximate the p-value via B 1 X p̂ I(T (X i ) T (x)) B i 1 I To improve finite sample performance, it has been suggested to use P i 1 B i 1 I(T (X ) T (x)) p̂ B 1 Axel Gandy Bootstrap 21

Example - Two Sample Problem - Mouse Data I I I I I I Two Samples: treatment y and control z with cdfs F and G H0 : F G , H1 : G st F T (x) T (y, z) y z, reject for large values Pooled sample: x (y0 , z0 ). Bootstrap sample x (y 0 , z 0 ) : sample from x with replacement p-value: generate independent bootstrap samples x 1 , . . . , x B p̂ B 1 X I{T (x i ) T (x)} B i 1 I Mouse Data: tobs 30.63 B 2000 p̂ 0.134 Empirical CDF of T(x*1),.,T(x*B) 1.0 1 p 0.8 0.6 0.4 0.2 0.0 50 0 T(x) 50 100

How to Choose the Number of Resamples (i.e. B)? I (Davison & Hinkley, 1997, Section 4.25) I Not using the ideal bootstrap based on infinite number of resamples leads to a loss of power! I Indeed, if π (u) is the power of a fixed alternative for a test of level u then it turns out that the power πB (u) of a test based on B bootstrap resamples is Z 1 πB (u) π (u)f(B 1)α,(B 1)(1 α) (u)du 0 where f(B 1)α,(B 1)(1 α) (u) is the Beta-density with parameters (B 1)α and (B 1)(1 α).

How to Choose the Number of Resamples (i.e. B)? II I If one assumes that πB (u) is concave, then one can obtain the approximate bound s 1 α πB (α) 1 π (α) 2π(B 1)α A table of those bounds: 19 39 99 199 499 999 9999 B α 0.01 0.11 0.37 0.6 0.72 0.82 0.87 0.96 α 0.05 0.61 0.73 0.83 0.88 0.92 0.95 0.98 (these bounds may be conservative) I To be safe: use at least B 999 for α 0.05 and even a higher B for smaller α.

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Sequential Approaches I General Idea: Instead of a fixed number of resamples B, allow the number of resamples to be random. I Can e.g. stop sampling once test decision is (almost) clear. Potential advantages: I I I I Save computer time. Get a decision with a bounded resampling error. May avoid loss of power. Axel Gandy Bootstrap 25

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Saving Computational Time I I I It is not necessary to estimate high values of the p-value p precisely. P Stop if Sn ni 1 I(T (X i ) T (x)) “large”. Besag & Clifford (1991): Stop after τ min{n : Sn h} m steps Sn h 0 m 0 I ( h/τ Estimator: p̂ (Sτ 1)/m Axel Gandy n Sτ h else Bootstrap 26

Uniform Bound on the Resampling Risk The boundaries below are constructed to give a uniform bound on the resampling risk: ie for some (small) 0, sup Pp (wrong decision) 80 p 20 40 60 Un 0 Ln 0 200 400 600 n Details, see Gandy (2009). 800 1000

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Other issues I How to compute the power/level (rejection probability) of Bootstrap tests? See (Gandy & Rubin-Delanchy, 2013) and references therein. I How to use bootstrap tests in multiple testing corrections (eg FDR)? See (Gandy & Hahn, 2012) and references therein. Axel Gandy Bootstrap 28

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests Asymptotic Properties Main Idea Asymptotic Properties of the Mean Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Axel Gandy Bootstrap 29

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Main Idea I Asymptotic theory does not take the resampling error into account - it assumes the ’ideal’ bootstrap with an infinite number of replications. I Observations X1 , X2 , . . . I Often: d n(T (P̂) T (P)) F for some distribution F . I Main asymptotic justification of the bootstrap: Conditional on the observed X1 , X2 , . . . : d n(T (P ) T (P̂)) F Axel Gandy Bootstrap 30

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Conditional central limit theorem for the mean I I I Let X1 , X2 , . . . be iid random vectors with mean µ and covariance matrix Σ. P For every n, suppose that X̄n n1 ni 1 Xi , where Xi are samples from X1 , . . . , Xn with replacement. Then conditionally on X1 , X2 , . . . for almost every sequence X1 , X2 , . . . , I d n(X̄n X̄n ) N(0, Σ) (n ). Proof: Mean and Covariance of X̄n are easy to compute in terms of X1 , . . . , Xn . Use central limit theorem for triangular arrays (Lindeberg central limit theorem). Axel Gandy Bootstrap 31

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Delta Method I Can be used to derive convergence results for derived statistics, in our case functions of the sample mean. I Delta method: If φ is continuously differentiable, d d n(θ̂n θ) T and n(θ̂n θ̂) T conditionally then d d n(φ(θ̂n ) φ(θ)) φ0 (T ) and n(φ(θ̂n ) φ(θ̂)) φ0 (T ) conditionally. Example 1 Pn E(X ) X i i 1 n and θ̂n 1 Pn Suppose θ 2 . Then convergence E(X 2 ) i 1 Xi n of n(θ̂ θ) can be established via CLT. Using φ(µ, η) η µ2 gives a limiting result for estimates of variance. Axel Gandy Bootstrap 32

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Bootstrap and Empirical Process theory I Flexible and elegant theory based on expectations wrt the empirical distribution n 1X Pn δXi n i 1 I I (many test statistics can be constructed from this) Gives uniform CLTs/LLN: Donkser theorems/Glivenko-Cantelli theorems Can be used to derive asymptotic results for the bootstrap (e.g. for bootstrapping the sample median); use the bootstrap empirical distribution n 1X P n δXi . n i 1 I For details see van der Vaart (1998, Section 23.1) and van der Vaart & Wellner (1996, Section 3.6). Axel Gandy Bootstrap 33

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests Asymptotic Properties Higher Order Theory Edgeworth Expansion Higher Order of Convergence of the Bootstrap Iterated Bootstrap Dependent Data Further Topics Axel Gandy Bootstrap 34

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Introduction I It can be shown that that the bootstrap has a faster convergence rate than simple normal approximations. I Main tool: Edgeworth Expansion - refinement of the central limit theorem I Main aim of this section: to explain the Edgeworth expansion and then mention briefly how it gives the convergence rates for the bootstrap. I (reminder: this is still not taking the resampling risk into account, i.e. we still assume B ) I For details see Hall (1992). Axel Gandy Bootstrap 35

Edgeworth Expansion I I I θ0 unknown parameter θ̂n estimator based on sample of size n Often, d n(θ̂n θ) N(0, σ 2 ) (n ), i.e. for all x, θ̂n θ x) Φ(x) n , P( n σ Rx 2 where Φ(x) φ(t)dt, φ(t) 12π e t /2 . 1 I Often one can write this as power series in n 2 : I θ̂n θ j 1 P( n x) Φ(x) n 2 p1 (x)φ(x) · · · n 2 pj (x)φ(x) . . . σ This expansion is called Edgeworth Expansion. Note: pj is usually an even/odd function for odd/even j. Edgeworth Expansion exist in the sense that for a fixed number of approximating terms, the remainder term is of lower order than the last included term. I

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Edgeworth Expansion - Arithmetic Mean I I Suppose we have a sample X1 , . . . , Xn , and n 1X θ̂n Xi . n i 1 I Then I I p1 (x) 16 κ3 (x 2 1) 1 p2 (x) x 24 κ4 (x 2 3) 1 2 4 72 κ3 (x 10x 2 15) where κj are the cumulants of X , in particular I I κ3 E(X E X )3 is the skewness κ4 E(X E X )4 3(Var X )2 is the kurtosis. (In general, the jth cumulant κj of X is the coefficient of j!1 (it)j in a power series expansion of the logarithm of the characteristic function of X .) Axel Gandy Bootstrap 37

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Edgeworth Expansion - Arithmetic Mean II I The Edgeworth expansion exists if the following is satisfied: I I Cramér’s condition: lim t E exp(itX ) 1 (satisfied if the observations are not discrete, i.e. possess a density wrt Lebesgue measure). A sufficient number of moments of the observations must exist. Axel Gandy Bootstrap 38

Edgeworth Expansion - Arithmetic Mean - Example Xi Exp(1) iid, θ̂ 1 n Pn i 1 Xi n 2 n 4 3 2 1 0 x 1 2 3 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 1.0 n 1 3 2 1 1 2 3 1 0 x 1 1 2 3 0 x 1 2 3 2 3 1.0 1.0 0.8 true CDF Φ(x) Φ(x) n 1/2p1(x)φ(x) Φ(x) n 1/2p1(x)φ(x) n 1p2(x)φ(x) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 2 2 n 32 0.8 0.8 0.6 0.4 0.2 0.0 3 3 n 16 1.0 n 8 0 x 3 2 1 0 x 1 2 3 3 2 1 0 x 1

Coverage Prob. of CIs based on Asymptotic Normality I I Suppose we construct a confidence interval based on the standard normal approximation to Sn n(θ̂n θ0 )/σ where σ is the asymptotic variance of I nθ̂n . One-sided nominal α-level confidence intervals: I1 ( , θ̂ n 1/2 σzα ) where zα is defined by Φ(zα ) α. P(θ0 I1 ) P(θ0 θ̂ n 1/2 σzα ) P(Sn zα ) 1 (Φ( zα ) n 1/2 p1 ( zα )φ( zα ) O(n 1 )) α n 1/2 p1 (zα )φ(zα ) O(n 1 ) α O(n 1/2 )

Coverage Prob. of CIs based on Asymptotic Normality II I Two-sided nominal α-level confidence intervals: I2 (θ̂ n 1/2 σxα , θ̂ n 1/2 σxα ) where xα z(1 α)/2 , P(θ0 I2 ) P(Sn xα ) P(Sn xα ) Φ(xα ) Φ( xα ) n 1/2 [p1 (xα )φ(xα ) p1 ( xα )φ( xα )] n 1 [p2 (xα )φ(xα ) p2 ( xα )φ( xα )] n 3/2 [p3 (xα )φ(xα ) p3 ( xα )φ( xα )] O(n 2 ) α 2n 1 p2 (xα )φ(zα ) O(n 2 ) α O(n 1 ) I To summarise: Coverage error for one-sided CI: O(n 1/2 ), for two-sided CI: O(n 1 ).

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Higher Order Convergence of the Bootstrap I I Will consider the studentized bootstrap first. I Consider the following Edgeworth expansion of P I θ̂n θ x σ̂n ! Φ(x) n 12 θ̂n θ σ̂n : 1 p1 (x)φ(x) O n The Edgeworth expansion usually remains valid in a conditional sense, i.e. ! j 1 θ̂n θ̂n P̂ x Φ(x) n 2 p̂1 (x)φ(x) · · · n 2 p̂j (x)φ(x) . . . σn Use the first expansion term only , i.e. Axel Gandy Bootstrap 42

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Higher Order Convergence of the Bootstrap II θ̂n θ̂n x σn P̂ ! Φ(x) n 12 1 p̂1 (x)φ(x) O n Usually p̂1 (x) p1 (x) O( 1n ). I Then P θ̂n θ x σ̂n ! P̂ θ̂n θ̂n x σ ! 1 O n I Thus the studentized bootstrap results in a better rate of convergence than the normal approximation (which is O(1/ n) only). I For a non-studentized bootstrap the rate of convergence is only O(1/ n). Axel Gandy Bootstrap 43

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Higher Order Convergence of the Bootstrap III I This translates to improvements in the coverage probability of (one-sided) confidence intervals. The precise derivations of these also involve the so-called Cornish-Fisher expansions, an expansion of quantile functions similar to the Edgeworth expansion (which concerns distribution functions). Axel Gandy Bootstrap 44

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests Asymptotic Properties Higher Order Theory Iterated Bootstrap Introduction Hypothesis Tests Dependent Data Further Topics Axel Gandy Bootstrap 45

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Introduction I Iterate the Bootstrap to improve the statistical performance of bootstrap tests, confidence intervals,. I If chosen correctly, the iterated bootstrap can have a higher rate of convergence than the non-iterated bootstrap. I Can be computationally intensive. I Some references: Davison & Hinkley (1997, Section 3.9), Hall (1992, Section 1.4,3.11) Axel Gandy Bootstrap 46

Double Bootstrap Test (based on Davison & Hinkley, 1997, Section 4.5) I Ideally the p-value under the null distribution should be a realisation of U(0, 1). I However, computing p-values via the bootstrap does not guarantee this (measures such as studentising the test statistics may help - but there is no guarantee) I Idea: use an iterated version of the bootstrap to correct the p-value. I let p be the p-valued based on P̂. I observed - data fitted model P̂; I Let p be the random variable obtained by resampling from P̂. I padj P (p p P̂)

Implementation of a Double Bootstrap Test Suppose we have a test that rejects for large values of a test statistic. Algorithm: For r 1, . . . , R: I Generate X1 , . . . Xn from the fitted null distribution P̂, calculate the test statistic tr from it I Fit the null distribution to X1 , . . . , Xn obtaining P̂r For m 1, . . . , M: I I I I generate X1 , . . . Xn from P̂r calculate the test statistic trm from them t } 1 #{trm r . 1 M 1 #{pr p} 1 M Let pr Let padj Effort: MR simulations. M can be chosen smaller than R, e.g. M 99 or M 249.

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests Asymptotic Properties Higher Order Theory Iterated Bootstrap Dependent Data Introduction Block Bootstrap Schemes Remarks Further Topics Axel Gandy Bootstrap 49

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Dependent Data I Often observations are not independent I Example: time series I Bootstrap needs to be adjusted I Main source for this chapter: Lahiri (2003). Axel Gandy Bootstrap 50

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Dependent Data - Example I (Lahiri, 2003, Example 1.1, p. 7) I X1 , . . . , Xn generated by a stationary ARMA(1,1) process: Xi βXi 1 i α i 1 where α 1, β 1, ( i ) is white noise, i.e. E i 0, Var i 1. 2 1 Xn 0 1 2 Realisation of length n 256 with α 0.2, β 0.3, i N(0, 1): 3 I 0 50 100 150 200 250 n Axel Gandy Bootstrap 51

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Dependent Data - Example II I I P Interested in variance of X̄n n1 ni 1 Xi . Use the Nonoverlapping Block Bootstrap (NBB); Blocks of length l: I I I I I B1 (X1 , . . . , Xl ) B2 (Xl 1 , . . . , X2l ) . Bn/l (Xn l 1 , . . . , Xn ) with replacement; concatenate to resample blocks B1 , . . . , Bn/l get bootstrap sample (X1 , . . . , Xn ) I P Bootstrap estimator of variance: Var( n1 ni 1 Xi ) (can be computed explicitly in this case - no resampling necessary) Axel Gandy Bootstrap 52

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Dependent Data - Example III I I Results for the above sample: True Variance Var(X̄n ) 0.0114 (based on 20000 simulations) l 1 2 4 8 16 32 64 d X̄n ) 0.0049 0.0063 0.0075 0.0088 0.0092 0.0013 0.0016 Var( bias, standard deviation, MSE based on 1000 simulations: l 1 2 4 8 16 32 64 bias -0.0065 -0.0043 -0.0025 -0.0016 -0.0013 -0.0017 -0.0031 sd 5e-04 0.001 0.0016 0.0024 0.0035 0.0052 0.0069 MSE 0.0066 0.0044 0.003 0.0029 0.0038 0.0055 0.0076 Note: I I I I block size 1 is the classical IID bootstrap Variance increases with block size Bias decreases with block size Bias-Variance trade-off Axel Gandy Bootstrap 53

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Moving Block Bootstrap (MBB) I X1 , . . . , Xn observations (realisations of a stationary process) I l block length. I Bi (Xi , . . . , Xi l 1 ) block starting at Xi . To get a bootstrap sample: I I I I Draw with replacement B1 , . . . , Bk from B1 , . . . , Bn l 1 . Concatenate the blocks B1 , . . . , Bk to give the bootstrap sample X1 , . . . , Xkl l 1 corresponds to the classical iid bootstrap. Axel Gandy Bootstrap 54

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Nonoverlapping Block Bootstrap (NBB) I Blocks in the MBB may overlap I X1 , . . . , Xn observations (realisations of a stationary process) I l block length. I b bn/lc blocks: Bi (Xil 1 , . . . , Xil l 1 ), i 0, . . . , b 1 I To get a bootstrap sample: draw with replacement from these blocks and concatenate the resulting blocks. I Note: Fewer blocks than in the MBB Axel Gandy Bootstrap 55

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Other Types of Block Bootstraps I Generalised Block Bootstrap I I Periodic extension of the data to avoid boundary effects Reuse the sample to form an infinite sequence (Yk ): X1 , . . . , Xn , X1 , . . . , Xn , X1 , . . . , Xn , X1 , . . . I I I I A block B(S, J) is described by its start S and its length J. The bootstrap sample is chosen according to some probability measure on the sequences (S1 , J1 ), (S2 , J2 ), . . . Circular block bootstrap (CBB): sample with replacement from {B(1, l), . . . , B(n, l)} every observation receives equal weight Stationary block bootstrap (SB): S Uniform(1, . . . , n), J Geometric(p) for some p. blocks are no longer of equal size Axel Gandy Bootstrap 56

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Dependent Data - Remarks I MBB and CBB outperform NBB and SB (Lahiri, 2003, see Chapter 5) I Dependence in Time Series is a relatively simple example of dependent data I Further examples are Spatial data or Spatio-Temporal data here boundary effects can be far more difficult to handle. Axel Gandy Bootstrap 57

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Outline Introduction Confidence Intervals Hypothesis Tests Asymptotic Properties Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Bagging Boosting Some Pointers to the Literature Axel Gandy Bootstrap 58

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Bagging I I Acronym for bootstrap aggregation I data d {(x(j) , y (j) ), j 1, . . . , n} response y , predictor variables x Rp I Suppose we have a basic predictor m0 (x d) I Form R resampled data sets d1 , . . . , dR . I empirical bagged predictor: m̂B (x d) R 1X m0 (x dr ) R r 1 This is an approximation to mB (x d) E {m0 (x D )} D resample from d. Axel Gandy Bootstrap 59

Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics Bagging II I Example: linear regression with screening of predictors (hard thresholding) m0 (x d) p X β̂i I( β̂i ci )xi i 1 corresponding bagged estimator: mB (x d) p X E (β̂i I( β̂i ci ) D )xi i 1 corresponds to soft thresholding I Bagging can i

0.4 0.6 0.8 1.0 Parametric Bootstrap x Fn(x) I In the nonparametric bootstrap: Large probability mass at . In fact P( ) 1 (1 1 n)n n!!11 e 1 ˇ:632. Introduction CIs Hypothesis Tests Asymptotics Higher Order Theory Iterated Bootstrap Dependent Data Further Topics

Related Documents:

Structure-Based Computational Modeling Architecture for ...

theoretical framework for computational dynamics. It allows applications to meet the broad range of computational modeling needs coherently and with fast, structure-based computational algorithms. The paper describes the SOA computational ar-chitecture, the DARTS computational dynamics software, and appl

49 Views

2y ago

IBM SPSS For Introductory Statistics: Use and Interpretation, Fourth ...

Statistics Student Version can do all of the statistics in this book. IBM SPSS Statistics GradPack includes the SPSS Base modules as well as advanced statistics, which enable you to do all the statistics in this book plus those in our IBM SPSS for Intermediate Statistics book (Leech et al., in press) and many others. Goals of This Book

36 Views

1y ago

Tutorial: Computational Methods for Aeroacoustics - BU

I refer to ANY computational method focussing on the computation of the sound associated with a fluid flow as computational aeroacoustics - (CAA). The CAA methods are strongly linked to CFD CAA methods use specific techniques to resolve wave behavior well which makes this different than general computational fluid dynamics (CFD).

9 Views

1y ago

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced ...

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics Statistical formulae and tables For first certification from June 2018 for: Advanced Subsidiary GCE in Statistics (8ST0) For first certification from June 2019 for: Advanced GCE in Statistics (9ST0) This copy is the property of Pearson. It is not to be removed from the

112 Views

3y ago

SPSS Advanced Statistics 17 - University of Texas at Austin

SPSS Statistics 17.0 is a comprehensive system for analyzing data. The Advanced Statistics optional add-on module provides the additional analytic techniques described in this manual. The Advanced Statistics add-on module must be used with the SPSS Statistics 17.0 Base system and is completely integrated into that system. Installation

4 Views

1y ago

IBM SPSS Advanced Statistics 19 - University of North Texas

IBM SPSS Statistics is a comprehensive system for analyzing data. The Advanced Statistics optional add-on module provides the additional analytic techniques described in this manual. The Advanced Statistics add-on module must be used with the SPSS Statistics Core system and is completely integrated into that system. About SPSS Inc., an IBM .

3 Views

1y ago

Web statistics --- Measuring user activity - Bureau of Justice Statistics

Web Statistics -- Measuring user activity Contents Summary Website activity statistics Commonly used measures What web statistics don't tell us Comparing web statistics Analyzing BJS website activity BJS website findings Web page. activity Downloads Publications Press releases. Data to download How BJS is using its web statistics Future .

21 Views

1y ago

AP Statistics / Statistics - Summer Review Packet 2022

of numerical data (statistics with capital "S") 2: a collection of quantitative data (statistics with lowercase "s") The study of Statistics is unlike any Math class that you have taken before. Advanced Placement Statistics acquaints students with the major concepts and tools for collecting, analyzing, and drawing conclusions from data.

7 Views

1y ago

Recent Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

263 Views

Lasso Technique Application In Stock Market Modelling: An Empirical .

This research tries to see the influence of G7 and ASEAN-4 stock market on Indonesian stock market by using LASSO model. Stock market estimation method had been conducted such as Stock Market Forecasting Using LASSO Linear Regression Model (Roy et al., 2015) and Mali et al., (2017) on Open Price Prediction of Stock Market Using Regression Analysis.

3m ago

18 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

103 Views

Stock Market Development in the Philippines: Past and Present

Philippine stock market. This paper may serve as a basis for further research on the stock market development in the country. This paper is organized as follows: Section 2 traces the origins of the stock market in the Philippines while section 3 outlines the reforms that have been implemented to strengthen the stock market.

1y ago

128 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

1.11.1. Where to Find Wall Street Training - Investing 101

investing and day trading, how to trade stock options, online free stock trading, market timing strategies, and mutual funds. But, first—learn what these terms mean. Play stock market games:Play stock market games: A stock simulation market game will train you to be comfortable with investing

2y ago

125 Views

Stock Price Prediction Using RNN and LSTM - JETIR

1. BASIC INTRODUCTION OF STOCK MARKET A stock market is a public market for trading of company stocks. Stock market prediction is the task to find the future price of a company stock. The price of a share depends on the number of people who want to buy or sell it. If there are more buyers, then prices will rise. If the seller has a number of .

1y ago

114 Views

Stock Market Wealth Effects - Harvard University

negative stock return and a subsequent decline in household spending and employment. We use a local labor market analysis to address this empirical challenge and provide quantitative evidence on the stock market consumption wealth e ect. Our empirical strategy combines regional heterogeneity in stock market wealth with aggregate movements in stock

1y ago

104 Views

Artificial Intelligence Approach for Stock Market - IJSER

The forecast of stock market helps investors to make investment decisions, via giving them strong insights about the behavior of stock market for avoiding investment risks. It was found that news has an influence on the stock price behavior [2]. The stock market is a constantly changing indicator of economic activity all over the world.

1y ago

109 Views

The Stock Market Game Student Activity Packet - Maryland Council on .

1. The Stock Market Game Kick Off! (3 mins) 2. Intro to Investing (4 mins) 3. Intro to Companies (3 mins) 4. Intro to Stocks (4 mins) 5. Building Your Portfolio (5 mins) 6. The Stock Market Game Trading Portfolio (6 mins) 7. The Stock Market Game Rules (6 mins) 8. Conducting Research (5 mins) 9. Entering Stock Trades (4 mins) 10. Assessing Risk .

1y ago

114 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

The Stock Market Crash of 1929, Great Depression, Dust .

The Stock Market Crash of 1929 In 1929, the Stock Market Crashed!! The stock of a business represents the original money paid into or invested in the business by its founders. So the stock represents how much mone

2y ago

358 Views

Web Based Stock Forecasters - Winlab

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. The stock market is not an efficient market.

1y ago

102 Views

Advanced Computational Methods In Statistics Lecture 4 Bootstrap

It looks like you're using an ad-blocker