Bootstrap For Complex Survey Data

2y ago
24 Views
3 Downloads
697.99 KB
101 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Amalia Wilborn
Transcription

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap for complex survey dataVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationStas KolenikovDepartment of StatisticsUniversity of Missouri-ColumbiaConclusionsReferencesJSM 2009Washington, DC

SurveybootstrapStasKolenikovEducational objectivesUpon completion of this course, you willBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferences become familiar with main variance estimation methodsfor complex survey data, their strengths andweaknesses be able to identify appropriate variance estimationmethods depending on the sample design, complexityof the problem, confidentiality protection know how to utilize the existing bootstrap weights know how to create bootstrap weights in Stata and R know how to choose parameters of the bootstrap

SurveybootstrapOutlineStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusions1 Bootstrap for i.i.d. data2 Variance estimation for complex surveys3 Survey bootstrapsReferences4 Software implementation5 References

SurveybootstrapThe bootstrap for i.i.d. dataStasKolenikovBootstrap fori.i.d. data1Bootstrap principle2Bootstrap bias and variance estimates3Bootstrap confidence intervals4More bootstrap theory5Some extensionsBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferences

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationBootstrap principle Population: distribution F , parameter θ T (F ), bothcan be multivariate Sample: data X1 , . . . , Xn i.i.d. F , distribution Fn ,parameter estimate θ̂n T (Fn ) Inference: need to know distribution D[θ̂n ], often in asymptotic form IPr[ n(θ̂n θ) x] Bootstrap: use Fn to take samples from Bootstrap samples: X1 , . . . , Xn i.i.d. Fn , distributionFn , parameter estimate θ̂n T (Fn )ConclusionsReferencessampleF Tθ? Fn Tθ̂nbootstrap Fn Tbootstrapθ̂n

SurveybootstrapAside: what is T ?StasKolenikovBootstrap fori.i.d. dataT does something to distribution F that results in a numberor a vector: θ T (F ).Bootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveys T finds a point where F (x) 1/2: θ is the median ofthe distribution T takes an expected value with respect to F :Zθ E[X ] SurveybootstrapsSoftware implementationConclusionsReferences T finds a solution toRxF (dx)(y θx)F (dx, dy ) 0:θ E[y]/ E[x]

SurveybootstrapBootstrap principleStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencessampleF Tθ? Fn Tθ̂nbootstrap Fn Tbootstrapθ̂n Theoretical/ideal/complete bootstrap: sampling distributionsover all nn possible samplesBias[θ̂n ] E[θ̂n θ] 2V[θ̂n ] E (θ̂n E[θ̂n ]) MSE[θ̂n ] E (θ̂n θ)2Fθn θ (x) IPr[θ̂n θ x]. E [θ̂n θ̂n X] . E (θ̂n E[θ̂n ])2 X . E (θ̂n θn ])2 X. IPr [θ̂n θ̂n x X] (1)

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataMonte Carlo bootstrapAs taking nn bootstrap samples is not feasible, use MonteCarlo simulation instead:Bootstrap principleBias and varianceestimates1For the r -th bootstrap sample, take a simple random( r )( r )sample with replacement X1 , . . . , Xn fromX1 , . . . , Xn .2Compute the parameter estimate of interest θ̂n .3Repeat Steps 1–2 for r 1, . . . , R.4Approximate the ideal bootstrap distribution with( 1)( R)distribution of θ̂n , . . . , θ̂n .Bootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementation( r )ConclusionsReferencessampleF Tθ? Fn Tθ̂nbootstrap simulate( r )Fn Fn T Tbootstrap ( r ) θ̂n θ̂n

SurveybootstrapStasKolenikovEstimates of bias and varianceEstimate of the bias:Bootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsR1 X ( r ).Bias[θ̂n ] E[θ̂n θ] E [θ̂n θ̂n X] θ̂n θ̂nRr 1More bootstraptheorySome extensionsVarianceestimation forcomplexsurveysBias corrected estimate:R1 X ( r )θ̂nθ̃n 2θ̂n RSurveybootstrapsSoftware implementationConclusionsReferencesr 1Variance estimate: 2 . 2V[θ̂n ] E (θ̂n E[θ̂n ]) E (θ̂n E[θ̂n ]) XRR1 X ( r ) 1 X ( l) 2 θ̂n θ̂n vBOOT [θ̂n ]RRr 1l 1

SurveybootstrapStasKolenikovNumber of samplesHow to chose the number of the bootstrap samples R?Bootstrap fori.i.d. dataBootstrap principle Stability of the standard errors:Bias and varianceestimatesBootstrap CIsMore bootstraptheoryr cv (sR ) Some extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsκ̂ 24Rwhere κ̂ is the kurtosis of θ̂n Confidence interval accuracy:vuu11 tcv (CBR θ̂n ) zα R12(1 α)α(1 α) 2φ(0)φ(0)φ(zα )φ(zα )2!Referenceswhere CBR is the confidence bound with level 1 α Estimation of moments: R 50–200 Estimation of quantiles/distribution functions: R 1000

SurveybootstrapPercentile confidence intervalsStasKolenikovIdea:Bootstrap fori.i.d. data.IPr[θ̂n θ x] IPr [θ̂n θ̂n x X]Bootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheoryLower confidence bound of level α:Some extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferences 1KBOOT(α)whereKBOOT (x) IPr [θ̂n x]is the (ideal or Monte Carlo) bootstrap distribution of θ̂n .

SurveybootstrapNormal confidence intervalsStasKolenikovIdea:Bootstrap fori.i.d. dataθ̂n N(θ, σn2 ). θ̂n N(θ̂, σn 2 )Bootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesLower confidence bound of level α:θ̂n σn Φ 1 (α)where σn 2 is the variance of the bootstrap distribution.

SurveybootstrapBootstrap-t CIStasKolenikovIdea: pivotal quantityBootstrap fori.i.d. datat (θ̂n θ)/σ̂nBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheoryhas asymptotic distribution that is the same for all hF , θi.Some extensionsVarianceestimation forcomplexsurveysLower confidence bound of level α: 1θ̂n σ̂n GBOOT(1 α)SurveybootstrapsSoftware implementationConclusionsReferenceswhereGBOOT (x) IPr [(θ̂n θ̂n )/σ̂n x]is the bootstrap distribution of the above pivot.

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBias corrected CIIdea: φn (·) is an increasing transformation (e.g., variancestabilizing, skewness reducing); assumeBootstrap principleBias and varianceestimatesBootstrap CIsIPr[φn (θ̂n ) φn (θ) z0 x] Φ(x)More bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesLower confidence bound of level α: 1KBOOT(Φ(zα 2Φ 1 (KBOOT (θ̂n ))))

SurveybootstrapAccelerated bias corrected CIStasKolenikovIdea:Bootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheoryIPrh φ (θ̂ ) φ (θ)in nn z0 x Φ(x)1 aφn (θ)with tuning parameter a correcting for skewness of φn (θ̂).Some extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesLower confidence bound of level α: 1KBOOT(Φ(z0 (zα z0 )/(1 a(zα z0 ))))Parameter a needs to be computed or estimated, e.g. viathe jackknife.

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleAsymptotic justification of thebootstrapLet us look at the diagram again:Bias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencessampleF Tθ? Fn Tθ̂nbootstrap Fn Tbootstrapθ̂n When would the relation between θ̂n θ̂n be similar to theone between θ̂ θ̂n ?

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationAsymptotic justification of thebootstrap The bootstrap can only be successful if Fn is sufficientlyclose to F for the bootstrap distribution D [θ̂n ] toresemble the sampling distribution D[θ̂n ]. Small deviations of Fn from F must translate to smalldeviations of D [θ̂n ] from D[θ̂n ]. Taylor series expansion/the delta method for θ T (F ):θ̂n θ Tθ̂n θ̂n TConclusionsReferencesF(Fn F ) o(kFn F k),Fn(Fn Fn ) o(kFn Fn k) Functional T must satisfy some smoothness conditions,and its “derivative” should be bounded away from zero. Fn must converge to Fn at the same rate as Fnconverges to F .

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap failuresSometimes, the simple bootstrap as described aboveproduces a misleading answer.Bootstrap principleBias and varianceestimates Non-i.i.d. data: time series, spatial data, clusteredBootstrap CIssurveys, overdispersed count data (Canty, Davison,Hinkley & Ventura 2006) Non-regular problems (Shao & Tu 1995, Sec. 3.6)More bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementation Certain heavy tailed distributions (Canty, Davison,Hinkley & Ventura 2006) Zero derivatives (Andrews 2007): X̄n2 when µ 0 Non-smooth functions (Bickel & Freedman 1981): X̄n ,sample quantiles/extreme order statistics/min/maxConclusions Different rates of convergence (Canty, Davison, HinkleyReferences& Ventura 2006): sample mode, shrinkage and kernelestimators Constrained estimation (Andrews 2000): X̄n when µ 0

SurveybootstrapBootstrap testsStasKolenikovBootstrap fori.i.d. dataH0 : T (F ) θ0vs.H1 : T (F ) 6 θ0Bootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationTo compute the p -values of the bootstrap distribution, oneneeds to sample from the distribution that satisfies H0 . Forcontinuous problems, the data distribution won’t satisfy H0with probability 1. The data need to be transformed prior tothe bootstrap: shift? scale? rotation?ConclusionsReferences reweighting?Non-parametric flavor will likely be lost.

SurveybootstrapBalanced bootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheoryMotivation: if θ̂n T (Fn ) X̄n , the complete bootstrap givesE [θ̂n ] X̄n and V [θ̂n ] s2 /n. Is it possible to match themoments of the simulated bootstrap? Equality for the mean:Some extensions1 X ( r )1 X X ( r )1XX̄n Xifi XiR rnRnrVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesi( r )fii # times unit i is used in the r -th bootstrap sample First order balance (Davison, Hinkley & Schechtman 1986):X ( r )fi R for all ir Practical implementation: permutation of {1, . . . , n}R(Gleason 1988)

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBalanced bootstrap Equality for the variance:X ( r ) 1 X X ( r )( r )(fi 1)2 Xi (fi 1)(fj 1)Xi Xj2Rn riBootstrap CIsMore bootstraptheory Some extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesi6 jn 1Xn3(1 1 X1 2) Xi 3Xi Xjnni6 j Second order balance (Graham, Hinkley, John & Shi 1990):for all i, jX ( r )X ( r ) ( r )nfi 2 R(2n 1), nfi fj R(n 1)rr Additional restriction: R must be a multiple of n Practical implementation: orthogonal arrays and incompleteblock designs

SurveybootstrapWild bootstrapStasKolenikovBootstrap fori.i.d. dataSpecial situation: heteroskedastic regression (Wu 1986) ornon-parametric regression (Härdle 1990).Bootstrap principleBias and varianceestimates1Fit regression model yi bf (xi ) ei2Bootstrap distribution of residuals i in observation i:Bootstrap CIsMore bootstraptheorySome extensions E [ i ] 0,Varianceestimation forcomplexsurveys 33E [ i ] eiExample: two-point golden rule distribution: i ei (1 5)/2 with prob. (5 5)/10SurveybootstrapsSoftware implementationConclusionsReferences 22E [ i ] ei ,3Form bootstrap samples as yi bf (xi ) i

SurveybootstrapReview questionsStasKolenikov1Explain how the bootstrap can be used to estimateCV[X̄n ].2Suggest a method to compute σ̂n for bootstrap-tconfidence interval method.3Given that the kurtosis of the bootstrap distribution is0.5, find the number of replicates needed to make theCV of the bootstrap standard errors equal to 5%.4(requires calculus) Assuming all Xi ’s are distinct, find X ] where Xlim IPr [X(n)(n)(n) is the maximum in theBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesn is the maximum in the bootstrap sample.data, and X(n)Hint: find the probability of the complement of thisevent.

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataBootstrap principleBias and varianceestimatesBootstrap CIsMore bootstraptheorySome extensionsVarianceestimation forcomplexsurveysSurveybootstrapsSoftware implementationConclusionsReferencesNotes

SurveybootstrapVariance estimation for complexsurveysStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveys1Features of complex survey data2Linearization variance estimation3Replication methods: overview4Jackknife5BRRComplex survey strapsSoftware implementationConclusionsReferences

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesSurvey settings Complex survey designs include stratification, clustersamples, multiple stages of selection, unequalprobabilities of selection, non-response andpost-stratification adjustments, longitudinal and rotationfeatures. Unless utmost precision is required (or samplingfractions are large), it suffices to approximate realdesigns by two-stage stratified designs with PSUssampled with replacement. Notation: L # stratanh # units in stratum hPSUs are indexed by iSSUs are indexed by jgeneric datum is xhij

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesVariance estimation goals Reporting and analytic purposes: a survey analystneeds standard errors to include in the report; anapplied researcher needs standard errors to test theirsubstantive models. Design purposes: a sample designer needs to knowpopulation variances to find efficient designs, strataallocations, small area estimators.

SurveybootstrapExplicit variance formulaeStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysFor a (very) limited number of statistics, explicit varianceformulae are available.Horvitz-Thompson estimator:tHT [x] Complex survey dataLinearizationReplicationX xiπii SJackknifeBRRSurveybootstrapsSoftware implementationConclusionsDesign variance: x 1 Xxj 2(πi πj πij ) i V tHT [x] 2πiπji6 j UReferencesYates-Grundy-Sen variance estimator:vYGS x j 21 X πi πj πij xi 2πijπiπji6 j S

SurveybootstrapStasKolenikovExplicit variance formulaeStratified sample:Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesnhL Xnh X(1 fh )vstr tstr [x] (thi t̄h )2nh 1h 1i 1X xhijthi πhijj PSUhinh1 Xthit̄h nhi 1

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysLinearization variance estimator θ f (T [x1 ], . . . , T [xk ]) is a function of moments θ̂ f (t[x1 ], . . . , t[xk ]) is its estimator Taylor series expansion/delta method:θ̂ θ f (t[x] T [x]) . . .Complex survey dataLinearizationReplicationJackknifeBRR Hence[ θ̂] vvL [θ̂] MSE[SurveybootstrapsSoftware implementationhX f itk tkkConclusions Regularity conditions: f / tk T [x] 6 0.References Example: ratio r t[y ]/t[x], variance estimatorvL [r ] 1v (ei ),t[x]2ei yi rxi ,T [x] 6 0

SurveybootstrapStasKolenikovLinearization variance estimator θ̂ solves estimating equationsBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysg(x, θ̂) X g(xi , θ̂)i Sπi 0Complex survey dataLinearizationReplication Taylor series expansion:JackknifeBRRSurveybootstrapsSoftware implementationConclusionsg(x, θ̂) g(x, θ) g · (θ̂ θ) . . . Invert it and account for g(x, θ̂) 0 to obtainθ̂ θ ( g) 1 g(x, θ) . . .References Take the variance and plug the estimates:[ θ̂] ( g) 1 v [g(x, θ̂)]( g) 1TvL [θ̂] MSE[ Example: GLM (Binder 1983)

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataReplication methodsFor a given estimation procedure (X1 , . . . , Xn ) 7 θ̂:1To create data for replicate r , reshuffle PSUs, omittingsome and/or repeating others, according to a certainreplication scheme.2Using the original estimation procedure and thereplicate data, obtain parameter estimate θ̂(r ) .3Repeat Steps 1–2 for r 1, . . . , R.4Estimate variance/MSE asVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesRA X (r )vm [θ̂] (θ̂ θ̃)2R(2)r 1Pwhere A is a scaling parameter, θ̃ r θ̂(r ) /R forvariance estimation and θ̃ θ̂ for MSE estimation.(r )Alternative implementation: replicate weights whij

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesPros and cons of resamplingestimators Only need software that does weighted estimation; noneed to program specific estimators for each model No need to release unit identifiers in public data sets– Computationally intensive– Post-stratification and non-response adjustments needto be performed on every set of weights– Bulky data files with many weight variables

SurveybootstrapThe jackknifeStasKolenikovKish & Frankel (1974), Krewski & Rao (1981)Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveys Replicates: omit only one PSU from the entire sample Replicate weights: if unit k from stratum g is omitted,Complex survey straps(gk )whij 0,h g, i kh g, i 6 k whij ,h 6 gngw , ng 1 hijSoftware implementation Number of replicates: R nConclusions Scaling factor in (2):References(n 1,A nh 1 within strata,L 1L 1

SurveybootstrapStasKolenikovThe jackknifeVariance estimators:Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveysvJ1 X nh 1 XhComplex survey datavJ2 LinearizationReplicationBRRvJ3 SurveybootstrapsvJ4 Conclusionsnh(θ̂(hi) XX(θ̂(hi) XgiX nh 1 XhReferencesnh(θ̂(hi) θ̂)2iX nh 1 XhSoftware implementationnh(θ̂(hi) θ̂h )2iX nh 1 XhJackknifenhiwhereθ̂h hXiθ̂(hi) /nhθ̂(gk ) /n)2kθ̂h /L)2

SurveybootstrapThe jackknifeStasKolenikovPseudo-values:Bootstrap fori.i.d. dataθ̃(hi) nh θ̂h (nh 1)θ̂(hi)Varianceestimation forcomplexsurveysComplex survey dataLinearizationMore variance estimators:ReplicationJackknifeBRRvJ5 XvJ6 XSurveybootstrapsSoftware implementationConclusionshhXXX1(θ̃(hi) θ̃(gk) /n)2(nh 1)nhgikXXX1(θ̃(hi) 1/L1/ngθ̃(gk ) )2(nh 1)nhgikReferencesBias corrected point estimator:θ̂J (n 1 L)θ̂ Xh(nh 1)θ̂h

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesThe jackknife/linearizationfailuresLinearization and the jackknife estimators are inconsistentfor non-smooth parameters: Percentiles (including median) Extreme order statistics: min, max Exotic estimation problems: θ , matching estimators

SurveybootstrapDelete-k jackknifeStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysIf nh k 1 for all h, a variation of the jackknife is to deletek PSUs at a time rather than one. Replicate weight:Complex survey strapsSoftware implementationConclusionsReferences(r )whij 0, nhnh k whij , unit hi is omitted,whij ,units in the same stratumare omitted but not hi,units in stratum other than hare omitted Number of replicates: R Phnh k Scaling factor in (2): (nh k)/k , within strata Pros: better performance in non-smooth problems Cons: increased computational complexity

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationBalanced repeated replication(BRR) Design restriction: nh 2 PSUs/stratum Replicates (half-samples): omit one of the two PSUsfrom each stratum Replicate tware implementationConclusionsReferences(r )whij(2whij , 0,PSU hi is retainedPSU hi is omitted (2nd order) balance conditions: each PSU is used R/2 times each pair of PSUs is used R/4 times Number of replicates: L R 2L McCarthy (1969): L R 4m L 3 usingHadamard matrices Scaling factor in (2): A 1

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationReplicationJackknifeAside: Hadamard matrices n n matrix with entries 1 Rows are orthogonal Special case of orthogonal arrays (Hedayat, Sloane &Stufken 1999) Hadamard conjecture: for every integer m, there existsan Hadamard matrix of order 4mBRRSurveybootstrapsSoftware implementation Smallest order for which no matrix is known: 4m 668 Sylvester construction for orders 2k : if H is Hadamard,so is H HH HConclusionsReferences(r ) BRR designs: whi (1 Hrh )whi

SurveybootstrapBRRStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationReplicationComplementary half-samples: swap included/excludedunits, obtain θ̂(rc) .Variance estimators:R1 X (r )vBRR1 [θ̂] vBRR H [θ̂] (θ̂BRR θ̂)2RJackknifeBRRSurveybootstrapsSoftware implementationConclusionsReferencesvBRR2 [θ̂] vBRR D [θ̂] r 1RX14R(r )(rc)(θ̂BRR θ̂BRR )2r 1R1 X (r )(rc)(θ̂BRR θ̃)2 (θ̂BRR θ̃)2vBRR3 [θ̂] vBRR S [θ̂] 2Rr 1Bias corrected estimate: 1 X (r )1 X (r )θ̂Bc 2θ̂ θ̂or 2θ̂ θ̂ θ̂(rc)R r2R r

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationFay’s modification Confidentiality protection: if units have a replicateweight of 0 they belong to the same PSU Modified weights:(r )whi (1 kHrh e implementationConclusionsReferencesfor some 0 k 1 Scaling constant in (2): A 1/k 2

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationReplicationJackknifeExtensions of BRRWhat if nh 2? Gurney & Jewett (1975): nh p for a prime p,R (pk 1)/(p 1) L Gupta & Nigam (1987) and Wu (1991): mixedorthogonal arrays for nh 2, 1 PSU/stratum recycled,R ?BRRSurveybootstrapsSoftware implementationConclusionsReferences Sitter (1993): orthogonal multiarrays for nh 2, abouthalf PSUs/stratum recycled, R ? Availability of a suitable orthogonal array needs to beestablished for each particular design

SurveybootstrapApproximate BRRStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey dataLinearizationReplicationJackknifeSince BRR is a common estimation technique, somepublicly released data use design approximations that wouldallow the end user to use BRR techniques: strata collapse grouping of PSUs treating SSUs as PSUs for self-representing unitsBRRSurveybootstrapsSoftware implementationConclusionsReferencesCaution: Shao (1996) gives an example where groupedBRR is inconsistent.Remedies: repeated grouping, random subsampling.

SurveybootstrapReview questionsStasKolenikov1(requires calculus) If variance estimator v [θ̂] is availablefor parameter estimate θ̂, what is vL [eθ̂ ]?2True or false: In regression analysis, the linear modeltextbook variance estimator s2 (X 0 X ) 1 is appropriatefor complex survey data.3For a design with 2 PSUs/stratum, which method will befaster, the jackknife or BRR?4If L 45 and nh 2 for every stratum, can oneconstruct BRR designs with R 50? R 60? What’sthe smallest number of replicates necessary?Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferences

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysComplex survey strapsSoftware implementationConclusionsReferencesNotes

SurveybootstrapComplex survey bootstrapsStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveys1Naı̈ve bootstrap2Rescaling bootstrapOther survey bootstraps:3 SurveybootstrapsNaı̈ve bootstrapRescaling bootstrapOther surveybootstrapsComparison ofestimatorsSoftware implementationConclusionsReferences4bootstrap without replacementmirror-match bootstrapmean bootstapbootstrap for imputed databalanced bootstrapvariance components bootstrapwild bootstrapparametric bootstrap for small area estimationComparison of all methodsHow about some theory?

SurveybootstrapNaı̈ve bootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveys1Sample with replacement nh units from stratum h.2For each replicate, compute θ̂(r ) .3Estimate the variance using (2).Rao & Wu (1988):SurveybootstrapsNaı̈ve bootstrap V [x̄ ] Rescaling bootstrapOther surveybootstrapsConclusionshnhhComparison ofestimatorsSoftware implementationX W 2 nh 1rather thanv [x̄] X W2hhReferencesScaling issue? Choice of A?nhnhsh2sh2

SurveybootstrapRescaling bootstrap (RBS)StasKolenikovRao & Wu (1988): for parameter θ f (x̄),Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveys1Sample with replacement mh out of nh units in stratumh.2Compute pseudo-valuesSurveybootstrapsRescaling bootstrapOther surveybootstrapsComparison ofestimatorsSoftware implementation1/2(r )h3Repeat Steps 1–2 for r 1, . . . , R.4Compute vRBS [θ̂] using (2) with A 1.ConclusionsReferences( r )x̃h x̄h mh (nh 1) 1/2 (x̄h x̄h ),X(r )x̃ (r ) Wh x̃h , θ̃(r ) f (x̃ (r ) )Naı̈ve bootstrap(3)

SurveybootstrapScaling of weightsStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysSurveybootstrapsNaı̈ve bootstrapRescaling bootstrapOther surveybootstrapsComparison ofestimatorsSoftware implementationConclusionsReferencesRao, Wu & Yue (1992): weights can be scaled instead ofvalues. For the r -th replicate,n m 1/2 m 1/2 no(r )( r )hhhwhik 1 mhi whiknh 1nh 1mh(4)( r ) mhi # times the i-th unit in stratum h is used in ther -th replicate Equivalent to RBS for functions of moments Applicable to θ̂ obtained from estimating equations

SurveybootstrapBootstrap scheme optionsStasKolenikovChoice of mh :Bootstrap fori.i.d. dataVarianceestimation forcomplexsurveysSurveybootstrapsNaı̈ve bootstrapRescaling bootstrapOther surveybootstraps mh nh 1 to ensure non-negative replicate weights mh nh 1: no need for internal scaling mh nh 3: matching third moments (Rao & Wu 1988) Simulation evidence (Kovar, Rao & Wu 1988): fornh 5, the choice mh nh 1 leads to more stableestimators with better coverage than mh nh 3Comparison ofestimatorsSoftware implementationConclusionsReferencesChoice of R: No theoretical foundations Popular choices: R 100, 200 or 500 R design degrees of freedom n L

SurveybootstrapBootstrap without replacement(BWO)StasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysBWO (Sitter 1992a) mimics sampling without replacement12SurveybootstrapsNaı̈ve bootstrapRescaling bootstrapOther surveybootstraps3Comparison ofestimatorsSoftware implementationConclusionsReferencesLet nh nh (1 fh ), kh Nhnh (1 1 fhnh ).Create pseudopopulation: in stratum h, replicate {yhi }kh times.Take SRSWOR of nh units from pseudopopulationstratum h, combine across h.4Compute θ̂(r ) .5Repeat Steps 3–4 for r 1, . . . , R.6Compute vBWO using (2).7Randomize between bracketing integer values fornon-integer nh , kh .Extension to two-stage sample is available.

SurveybootstrapStasKolenikovBootstrap fori.i.d. dataVarianceestimation forcomplexsurveysSurveybootstrapsMirror-match bootstrap (MMB)MMB (Sitter 1992b) for sampling without replacement designs1Draw SRSWOR of nh nh PSUs from stratum h.2Repeat Step 1 kh nh (1 fh )/nh (1 fh ) times.3Repeat Steps 1–2 independently for each stratum to formthe r -th replicate.4Compute θ̂(r ) .5Repeat Steps 1–4 for r 1, . . . , R.6Compute vMMB using (2).Naı̈ve bootstrapRescaling bootstrapOther surveybootstrapsComparison ofestimatorsSoftware implementationConclusionsReferences fh nh /Nh is the original sampling fraction fh nh /nh is the bootstrap sampling fraction Randomize if mh /kh is not integer Rescaling bootstrap: special case with nh 1

SurveybootstrapMean bootst

know how to create bootstrap weights in Stata and R know how to choose parameters of the bootstrap. Survey bootstrap Stas Kolenikov Bootstrap for i.i.d. data Variance estimation for complex surveys Survey bootstraps Software im-plementation Conclusions References Outline

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

Bootstrap World (right triangle) E.g., the expectation of R(y;P) is estimated by the bootstrap expectation of R(y ;P ) The double arrow indicates the crucial step in applying the bootstrap The bootstrap 'estimates' 1) P by means of the data y 2) distribution of R(y;P) through the conditional distribution of R(y ;P ), given y 3

the bootstrap, although simulation is an essential feature of most implementations of bootstrap methods. 2 PREHISTORY OF THE BOOTSTRAP 2.1 INTERPRETATION OF 19TH CENTURY CONTRIBUTIONS In view of the definition above, one could fairly argue that the calculation and applica-tion of bootstrap estimators has been with us for centuries.

Chapter 1: Getting started with bootstrap-modal 2 Remarks 2 Examples 2 Installation or Setup 2 Simple Message with in Bootstrap Modal 2 Chapter 2: Examples to Show bootstrap model with different attributes and Controls 4 Introduction 4 Remarks 4 Examples 4 Bootstrap Dialog with Title and Message Only 4 Manipulating Dialog Title 4

Bootstrap Bootstrap is an open source HTML, CSS and javascript library for developing responsive applications. Bootstrap uses CSS and javascript to build style components in a reasonably aesthetic way with very little effort. A big advantage of Bootstrap is it is designed to be responsive, so the one

Bootstrap adalah metode berbasis komputer yang dikembangkan untuk mengestimasi berbagai kuantitas statistik, metode bootstrap tidak memerlukan asumsi apapun. Bootstrap merupakan salah satu metode alternatif dalam SEM untuk memecahkan masalah non-normal multivariat. Metode bootstrap pertama kali dikenalkan oleh Elfron (1979 dan 1982)

Thanks to the great integratio n with Bootstrap 3, Element s and Font Awesome you can use all their powers to improve and enhance your forms. Great integration with DMXzone Bootstrap 3 and Elements - Create great-looking and fully responsive forms and add or customize any element easily with the help of DMXzone Bootstrap 3 and Bootstrap 3 Elements.

Solutions: AMC Prep for ACHS: Counting and Probability ACHS Math Competition Team 5 Jan 2009. Problem 1 What is the probability that a randomly drawn positive factor of 60 is less than 7? Problem 1 What is the probability that a randomly drawn positive factor of 60 is less than 7? The factors of 60 are 1,2,3,4,5,6,10,12,15,20,30, and 60. Six of the twelve factors are less than 7, so the .