Data-smoothing And Bootstrap Resampling

1y ago

8 Views

2 Downloads

2.11 MB

7 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Sasha Niles

Report this link

Download PDF

Transcription

DATA-SMOOTHING AND BOOTSTRAP RESAMPLING G.A. Young Statistical Laboratory, University of Cambridge, 16 Mill Lane, Cambridge CB2 1SB, U.K. 1. INTRODUCTION This paper reviews aspects of the smoothed bootstrap approach to statistical estimation. The basic problem underlying the bootstrap methodology is that of providing a simulation algorithm which produces realisations from an unknown distribution F, when all that is available is a sample from The bootstrap of Efron (1979) simulates, with replacement, from the observed sample. The smoothed bootstrap, discussed by Efron (1979, 1982) and Silverman and Young (1987), smooths the sample observations first and hence effectively simulates from a kernel estimate of the density f underlying F. This is achieved, without construction of the kernel estimate itself, by resampling from the original data and then perturbing each sampled point appropriately. The bootstrap and smoothed bootstrap will be considered as competing methods of estimating properties of an unknown distribution F. Given a general functional a , which may relate to the sampling properties of a parameter estimate, it is required to estimate on the basis of a set of sample data the population value a(F) of this functional. The standard bootstrap estimates a(F) by a(F) , F denoting the n n empirical c.d.f. of the sample data. The smoothed bootstrap estimates a(F) by a(F) ,where F is a smoothed version of F The simple idea undern lying the bootstrap estimation, therefore, is that of usinR P or F as n a surrogate or estimate for the unknown F In many circumstances the bootstrap estimate will itself be estimated by resampling from F or F n though as yet unpublished work by Davison and Hinkley points in the direction of bootstrap resampling without the resampling'. Though conceived by Efron (1979) as a means of tackling complex estimation problems, for a discussion of smoothing there is some advantage in studying the very simplest case where the functional a is linear in F. Relevant questions to be considered are: (1) When is it advantageous to use a smoothed bootstrap rather than the standard bootstrap? (ii) How should the smoothing be performed? Is there any advantage in simulating from a 'shrunk' version of the kernel estimator, with the same variance structure as the sample data? (iii) Is it possible to define data-driven procedures which will choose the degree of smoothing to be applied automatically?

145 2. SMOOTHED BOOTSTRAP PROCEDURE Suppose Xl' ""Xn F. variate are independent realisations from an unknown Assuming F has a smooth smoothed bootstrap is obtained defined by underlyinR densitYA from the kernel estimator f a convenient f of f h r- ,5 (2.1) n -1 h-r n L 1 1 Here K is a symmetric probability density function of an r-variate distribution with unit variance matrix. Operationally V is taken as the variance matrix of the sample data and h is a parameter defining the degree of smoothing. Realisations generated from f have expectation equal to X the h meaD of the observed sample, but smoothing inflates the marginal variances. Silverman and Young (1987) give a number of simple examples which show that smoothing of this type can have a deleterious effect on the bootstrap estimation: see also section 3. The kernel estimator f is therefore 'shrunk' h to give an estimator f with second-order moment properties the same as h,s 2 ! those in the observed sample. Note that the mean of f is Xj(l h) . h,s J 3. LINEAR FUNCT10NALS For a linear functional estimator is dh(F) 1 n &h(F) f a(t)f a(F) h,s f (t)dt a(t)dF(t) r the smoothed bootstrap This estimator may be written n L 1 1 w*(X ) i (3.1) where J w*(x) a{ (1 h2)-! (X hV!o)} K(o)do Using a Taylor expansion of a K the mean squared error of and the assumptions on the kernel function dh(F) may, for h small, be expanded as (3.2) Here we have assumed that matrix V rvij] is a fixed positive definite symmetric and C o 1. J (a(t) )2 dF(t) , 1 J (a(t) - } aO(t)dF(t) n n ,

146 1 C 2 n [ 2 f {a(t)- } ( f 1 4 (n-l) where a*(t) f - 4 )2 a*(t)dF(t) f a*(t)2dF(t) ] , a(t)dF(t) D 8(t) V 1 - a**(t)dF(t) t·Va(t) a**(t) Here Dya(t) Il: i j y . 02a(t)/ot J (H ). 02a(t)/ot a l.j See i i ot j dt . J Silverman and Young (1987) for details The expansion (3.2) immediately gives of the manipulations. the result: Lemma Provided a(X) and a*(X) are negatively correlated, the mean squared error of the smoothed bootstrap estimator dh(F) of a(F) will be less than that of the unsmoothed estimate h 0 . The f corresponding a(t)fh(t)dt result constructed seX) and for do(F) the bootstrap from the unshrunk D 8(X) to be negatively V As a simple example, suppose distribution and let aCt) t5. f a(t)dFn(t) , for some n estimator kernel estimator, requires correlated. F is the univariate standard Gaussian With V 1 we have, cov{a(X) ,a*(X)} 0 cov{a(X) ,Dya(X)} 0 so that smoothing, with shrinkage, is of potential value in bootstrap estimation of the fifth moment. The lemma above states that if C 0 in (3.2) some small degree of 1 smoothing at least is worthwhile. If also C 0 we might speculate that 2 some larger degree of smoothing may be appropriate. If both C 0 and I C2 0 the appropriate bootstrap estimator is the unsmoothed estimator &O(F) Otherwise, the optimal smoothing parameter. in the sense of minim- I Co Clh 2 C h 4 is given by h (2ICI /4C ) t . 2 2 The quantities C and C depend on the unknown underlying distrib1 2 ution function F , and in general will be complicated functions of the ising the approximate MSE

147 moments of F A possible strateeY would be to choose h with reference to a standard distribution, such as the standard r-variate Gaussian. In circumstances where the sample data do not suer-est any sensible statistical model, C1 and C can be estimated, for example by substitution of the 2 sample moments. C1,C2 Given estimates for choosing the h if h 00 corresponds C1 for C ,C 1 2 an entirely data-driven strategy degree of smoothing would and Cz 0 h 0 to Efron's Rather than choosing and be (2Icll/4C2) 'parametric h h 0 to take bootstrap' C1 if otherwise. (Efron, 0 The case 1979). by reference to (3.2), which gives an expan- SiOD for h in the neighbourhood of zero, the representation (3.1) of the estimator can be used in conjunction with computer algebraic manipulation to obtain an exact expression for MSE{dh(F)}. This expression can then be minimised 4. in h to obtain the optimal EXTENSION TO NON-LINEAR value of the smoothing parameter. FUNCTIONALS When an explicit bootstrap procedure is being used the functional a is unlikely to be linear. The ideas of Section 3 can be applied to bootstrap estimation for more general a I provided a admits a first-order von Mises expansion about F of the form a(F) F for a(F) A(F - F) F. 'near' A(F) an integral, of the bootstrap of f a(t)dF(t) a(F) estimator I of Let F be an unknown the skewness, univariate will be of is linear and hence and to first-order Provided (4.1) 5. a functional A(F) -1 Op(n ) . estimator A(F) The (4.1) a(F) are suplF-FI the representable sampling the same is Op (n -1) properties as those , as of the the error in EXAMPLE a(F) Simple linear and consider estimation EF(X - E X)3 F (E (X E X)2j3/2 F F manipulations, approximation aCt) distribution easily perfotmed by computer (4.1) is defined by algebra, show 2 2 4 2 3 2 2 3 (t(-2 1 t 3 1 z t 61.\ J.l3 - 6 1 z .4 1 J.lzt - 3 lV2 2 t - 6V1V2 3 2 2 3 1 2 V 3 2 )!(V 2 3 6 2 2 - 1 ) 2 2 ZJ.lt z 3V2v3t»/2(V1 that 311 6 2 1 - the J.l 3 t 3v1 4 2

148 where lJ r EFX r . The bootstrap estimator is given by: 3X V In the special case of the function F of standard gives a(t) Gaussian, a closed computer 3/2 2 . (5.1) (1 h2)3/ algebraic form approximation manipulation for the MSE of B (F): h 6 n(l h and gives h , . In the C 1 (5.2) 2 3 ) -18/n general 36/n case, the These formulae values for C suggest, and 1 C 2 misleadingly, are complicated functions of the moments of F. With a manipulation package such as REDUCE it is straightforward to write FORTRAN subroutines to evaluate these coefficients: the moments of the observed sample are then substituted to yield estimates The formula for MSE{dh(F)} , of which (5.2) is a special 1 2 case, amounts to hundreds of lines of code. If J.l 0 it reduces to the 1 C ,C simpler form: MSE{Bh(F)} 2 2 2 2 2 2 2 48(h 1)'h "22"3 12(h 1)'h "2"3"5 - 8(h 1)'n"22"3 48(h 16h 2,222, 1) 622 n 2 2 3 \1 3 - 12(h 1) 2 3 5 822 4h n 2 4 2 2 24h DlJ 3 2 222 20h DlJ lJ 36h2lJ 5 2 3 2 2 3 2 2 2 24h "2 "4 - 22h "2 "3 2 2 2 2 5 2 2 4h lJ 6 18h "3 "4 8n 2 J.l 36"2 3 2 2 2 2 2 - 13"2 "3 4lJ2 lJ6 9"3"4)/ Invariance of the estimator 3 (5.1) (i 1, . n) suggests the followine the observations Xi by calculating under the 5 2 4 (4n"2 (h 1) ) transformation procedure for choice of h Y Xi - X (i l . n) i 3 24lJ 2 ) lJ 4 (5.3) Xi Xi C Centre Then i n.(5 3) Th·15 ga.-ves an n -1 en L l y i r for lJ (r 2. 6) i r estimate of the mean squared error of the bootstrap estimator as a function of h Use a numerical routine to minim se this and use the minimising value of h for the bootstrap estimation itself. su h s tit ute

149 For each of four underlying distributions - standard Gaussian, uniform on [-1,1], Beta (5,3) and standard exponential - and two sample sizes, n 5 and n 50 , 1000 datasets were generated. Table 1 shows, for each combination, the mean squared error over the lOaD replications of the boot- strap estimators A takes h 0.0 estimates C ,C 1 2 1 : MSE h 0.5 to n 5 3, while h according Strategy estimators, 0.0 U[-l,l) 0.0 the strategies. Strategy always, Strategy C estimated D is the procedure skewness example. Beta(5,3) -0.310 values, described as above, Exp(l) 2.0 Smoothing Strategy A B C D 50 chooses N(O,I) a(F) 1s chosen by various and of bootstrap Distribution h always, Strategy B takes described in Section based on (5.3). Table ,when IX (F) h A B C D 0.3607 0.1847 0.2977 0.0912 0.3566 0.1826 0.2950 0.0869 0.3889 0.2341 0.3629 0.1554 2.4497 2.7557 2.5674 3.0748 0.1092 0.0559 0.1066 0.0596 0.0450 0.0230 0.0446 0.0218 0.0650 0.0435 0.0649 0.0589 0.4930 0.8661 0.5331 0.5490 The results of the simulation disappoint in that they do not provide concrete evidence in favour of any particular smoothing procedure. Automatic application of a small amount of smoothing can lead to substantially less accurate estimation: see the figure for the exponential simulation, n 50 Strategy C is unlikely to make the estimation dramatically worse and generally leads to some improvement over the standard bootstrap. Strategy D can lead to considerably greater accuracy in the bootstrap estimation but, as the exponential simulation makes clear, may also lead to quite inappropriate choice of h. Errors in the linear expansion (4.1), which is the basis of strategies C and D, may, even for moderate sample size, be quite appreciable. Automatic procedures for choosing the degree of smoothing should be used with caution. It is probably advisable to examine the sample data, using an estimator of the form (2.1) say, and then to choose h with reference to some suggested parametric family of distributions. Acknowledgement our I am grateful joint work. to Bernard Silverman for permission to -include details of

REFERENCES Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist., 7:1-26. Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia; SIAM. Silverman, B.W. and Young, G.A. (1987). The bootstrap: to smooth or not to smooth? Biometrika, 74. (To appear)

The standard bootstrap estimates a(F) by a(F) , F denoting the n n empirical c.d.f. of the sample data. The smoothed bootstrap estimates a(F) by a(F),where F is a smoothed version of F n The simple idea under-lying the bootstrap estimation, therefore, is that of usinR P n or F as a surrogate or estimate for the unknown F In many circumstances the

Related Documents:

ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture ...

Bootstrap Resampling Regression Lecture 3 ICPSR 2003 2 Overview Calibration Powerful idea of using the bootstrap to check itself. Resampling a correlation Correlation requires special methods Its sampling distribution depends on the unknown population correlation. Bootstrap does as well as special methods. Simple regression Model and assumptions

12 Views

5m ago

The Bootstrap, Resampling Procedures, and Monte Carlo Techniques

Bootstrap World (right triangle) E.g., the expectation of R(y;P) is estimated by the bootstrap expectation of R(y ;P ) The double arrow indicates the crucial step in applying the bootstrap The bootstrap 'estimates' 1) P by means of the data y 2) distribution of R(y;P) through the conditional distribution of R(y ;P ), given y 3

11 Views

1y ago

BOOTSTRAP METHODS FOR TIME SERIES

BOOTSTRAP METHODS FOR TIME SERIES 1. Introduction The bootstrap is a method for estimating the distribution of an estimator or test statistic by resampling one’s data or a model estimated from the data. Under conditions that hold in a wide variety of applications, the bootstrap provides approximations to distributions of statistics,

10 Views

2y ago

Bootstrap Aggregating and Random Forest - University of California ...

Bootstrap-based methods like Bagging and Bragging, when we train fˆ b(x) on the bootstrap sample, there are many data points not selected by resampling with re-placement with the probability P((x i;y i)2 Boot b) 1 1 N N!e 1 ˇ37%; where Boot b is the bth bootstrap sample. There are roughly 37% of the original sam-

8 Views

3m ago

Hypothesis Testing with the Bootstrap

Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods . Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic - P( ) (not necessary an estimat

39 Views

2y ago

Bootstrap for complex survey data

know how to create bootstrap weights in Stata and R know how to choose parameters of the bootstrap. Survey bootstrap Stas Kolenikov Bootstrap for i.i.d. data Variance estimation for complex surveys Survey bootstraps Software im-plementation Conclusions References Outline

25 Views

2y ago

DMXzone Bootstrap 3 Forms Designer

Thanks to the great integratio n with Bootstrap 3, Element s and Font Awesome you can use all their powers to improve and enhance your forms. Great integration with DMXzone Bootstrap 3 and Elements - Create great-looking and fully responsive forms and add or customize any element easily with the help of DMXzone Bootstrap 3 and Bootstrap 3 Elements.

14 Views

1y ago

AIMS MODULE 1 - Counselling and Psychotherapy in Scotland

the use of counselling skills. 2. To present basic attending and responding skills to the participants. 3. To provide participants with the opportunity to practise these skills in a safe and supportive environment. 4. To set these skills within the essential ethical framework of a counselling approach. 5. To introduce participants to the concept and experience of self-awareness and personal .

65 Views

3y ago

Recent Views

Court Reporter Plan Final 1-9-2019 - United States District Court for .

court assignments, pooling, authorization of leave, and efficient service to the Court and litigants. Each official court reporter in this district shall prepare and submit to the Court Operations Supervisor the quarterly report AO 40A, Attendance and Transcripts of U.S. Court Reporters, listing hours and days in court and any transcript backlog.

1y ago

134 Views

Mixed Court and Court: Could the Continental Alternative Fill the .

embraced the mixed court after conquering Hanover. By the 1870s when unified national codes of procedure and court structure were being drafted, the Prussians sought to eliminate the jury court entirely in favor ofthe mixed court. The politics ofthe moment resulted in a compromise for the 1877 code that lasted until'1924: The jury court was .

1y ago

110 Views

Gymnasium Equipment Court Design & Rules

Gym Equipment Court Design and Rules-International Page 3 of 16 International/Olympic (FIBA)—Basketball Court Layout and Equipment Rules (Men's & Women's) RULE TWO - COURT AND EQUIPMENT Article 2 Court 2.1. Playing court The playing court shall have a flat, hard surface free from obstructions (Diagram 1) with dimensions of 28 m

8m ago

66 Views

Chapter 9 - Suits/Action Types (G-M) - Judiciary of Virginia

the general district court may issue following receipt of a circuit court abstract of judgment (use form CC-1464, A. BSTRACT . O. F . J. UDGMENT) in the general district court. If a district court abstract is docketed in the circuit court, the limitation for the enforcement of that district court judgment is extended to twenty years from the . date

3y ago

129 Views

THE KENYAN WORKER AND THE LAW - Kituo Cha Sheria

7. The Industrial Court Act No. 20 of 2011 The Act establishes a revamped Industrial Court that is the same status of the High Court as espoused in the Constitution of Kenya. The Industrial Court is established as a court of superior record. The Court is given powers to adjudicate over cases of employment and labour relations.

3y ago

189 Views

1. Rome Statute of the International Criminal Court Contents

Rome Statute of the International Criminal Court 8 PART 1. ESTABLISHMENT OF THE COURT Article 1 The Court An International Criminal Court ("the Court") is hereby established. It shall be a permanent institution and shall have the power to exercise its jurisdiction over persons for the

3y ago

172 Views

COURT COMMISSIONER - California

Southern California and the Bay Area, Sacramento County is very affordable. THE COURT SYSTEM. The Sacramento Superior Court is a consolidated court with all legal functions, operations, and administration governed by the Presiding Judge and Court Executive Officer. The Sacramento Superior Court has 66 authorizedJudges and 9.5

3y ago

143 Views

Audit of the Superior Court of California, County of Fresno

Fresno Superior Court June 2016 Page iv . STATISTICS . The Superior Court of California, County of Fresno (Court) has 49 judges and subordinate judicial officers who handled more than 171,025 cases in FY 2013–2014. The Court operates five courthouses and an archives facility located in Fresno. The Court employed approximately

3y ago

147 Views

Superior Court of California, County of Fresno

The audit of the Superior Court of California, County of San Joaquin (Court) was initiated by IAS in September 2009. Depending on the size of the court, the audit process typically involves . Court management’s attention. Specifically, the Court needs to improve and refine certain

3y ago

138 Views

DAMAGES IN Small Claims Court

Deputy Judge, Small Claims Court, Superior Court of Justice . 1:00 p.m. – 1:25 p.m. Damages in Employment Law-Managing Your Client’s Expectations and Effective Advocacy before the Court (15 minutes) Carla Bocci, Barrister & Solicitor, Deputy Judge, Small Claims Court, Superior Court of Justice . 1:25 p.m. – 1:30 p.m.

3y ago

198 Views

Report of the Alaska Supreme Court Advisory Committee

Judge Larry Zervos Alaska Superior Court, Sitka Rural Access Subcommittee Judge Dale Curda, co-chair Alaska Superior Court, Bethel Judge Roy Madsen (retired), co-chair Alaska Superior Court, Kodiak Louise Brady Sitka Tribe of Alaska, Sitka James Jackson Alaska Court Magistrate, Galena Judge Michael Jeffery Alaska Superior Court, Barrow

2y ago

141 Views

Public Employee Strikes in Colorado: The Supreme Court .

Court of Appeals. The Colorado Court of Appeals held that the teachers’ strike was unlawful.3 Reviewing precedent from other states, the court concluded that “under the common law, strikes by public employees are illegal.” The court declined to adopt the contrary rule of the California Supreme Court upholding a common law

2y ago

508 Views

JURY NOTES - Ohio jury

Tuscarawas County Common Pleas Court, Kim Switzer, Director of Court Services/Chief Probation Officer for the Hancock County Common Pleas Court, Andrea White, Clerk of Court or the Kettering Municipal Court and John VanNorman, Senior Policy and Research Counsel for the Supreme Court of Ohio

2y ago

321 Views

Criminal Court City of New York

Queens Criminal Court 125-01 Queens Blvd., Kew Gardens, NY 11415 - Drug Court Queens Summons 120-55 Queens Blvd., Kew Gardens, NY 11415 Midtown Community Court 314 W. 54th Street, New York, NY 10019 - Drug Court Citywide Summons 346 Broadway, New York, NY 10013 Manhattan Criminal Court 100 Centre Street, New York, NY 10013

2y ago

321 Views

Terms and Sessions - Butler County, Ohio

The terms "this court", "the court" and "court" as used in these rules mean the Juvenile Court of Butler County, Ohio and its actions as directed by the judges or through the magistrates of said court. All rules, unless specifically set forth to the contrary, shall apply equally in proceedings before the judges and magistrates of this court.

2y ago

321 Views

Data-smoothing And Bootstrap Resampling

It looks like you're using an ad-blocker