Modern Statistical Methods

3y ago

62 Views

3 Downloads

427.10 KB

66 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Azalea Piercy

Report this link

Download PDF

Transcription

Modern Statistical MethodsRajen D. Shahr.shah@statslab.cam.ac.ukCourse webpage:http://www.statslab.cam.ac.uk/ rds37/modern stat methods.htmlIn this course we will study a selection of important modern statistical methods. Thisselection is heavily biased towards my own interests, but I hope it will nevertheless give youa flavour of some of the most important recent methodological developments in statistics.Over the last 25 years, the sorts of datasets that statisticians have been challenged tostudy have changed greatly. Where in the past, we were used to datasets with many observations with a few carefully chosen variables, we are now seeing datasets where the numberof variables can run into the thousands and greatly exceed the number of observations. Forexample, with microarray data, we typically have gene expression values measured for several thousands of genes, but only for a few hundred tissue samples. The classical statisticalmethods are often simply not applicable in these “high-dimensional” situations.The course is divided into 4 chapters (of unequal size). Our first chapter will start byintroducing ridge regression, a simple generalisation of ordinary least squares. Our studyof this will lead us to some beautiful connections with functional analysis and ultimatelyone of the most successful and flexible classes of learning algorithms: kernel machines.The second chapter concerns the Lasso and its extensions. The Lasso has been at thecentre of much of the developments that have occurred in high-dimensional statistics, andwill allow us to perform regression in the seemingly hopeless situation when the numberof parameters we are trying to estimate is larger than the number of observations.In the third chapter we will study graphical modelling and provide an introduction tothe exciting field of causal inference. Where the previous chapters consider methods forrelating a particular response to a large collection of (explanatory) variables, graphicalmodelling will give us a way of understanding relationships between the variables themselves. Ultimately we would like to infer causal relationships between variables based on(observational) data. This may seem like a fundamentally impossible task, yet we will showhow by developing the graphical modelling framework further, we can begin to answer suchcausal questions.Statistics is not only about developing methods that can predict well in the presenceof noise, but also about assessing the uncertainty in our predictions and estimates. Inthe final chapter we will tackle the problem of how to handle performing thousands ofhypothesis tests at the same time and more generally the task of quantifying uncertaintyin high-dimensional settings.Before we begin the course proper, we will briefly review two key classical statisticalmethods: ordinary least squares and maximum likelihood estimation. This will help to setthe scene and provide a warm-up for the modern methods to come later.i

Classical statisticsOrdinary least squaresImagine data are available in the form of observations (Yi , xi ) R Rp , i 1, . . . , n, andthe aim is to infer a simple regression function relating the average value of a response, Yi ,and a collection of predictors or variables, xi . This is an example of regression analysis,one of the most important tasks in statistics.A linear model for the data assumes that it is generated according toY Xβ 0 ε,(0.0.1)where Y Rn is the vector of responses; X Rn p is the predictor matrix (or designmatrix) with ith row xTi ; ε Rn represents random error; and β 0 Rp is the unknownvector of coefficients.Provided p n, a sensible way to estimate β is by ordinary least squares (OLS). Thisyields an estimator β̂ OLS withβ̂ OLS : arg min kY Xβk22 (X T X) 1 X T Y,(0.0.2)β Rpprovided X has full column rank.Under the assumptions that (i) E(εi ) 0 and (ii) Var(ε) σ 2 I, we have that: Eβ 0 ,σ2 (β̂ OLS ) E{(X T X) 1 X T (Xβ 0 ε)} β 0 . Varβ 0 ,σ2 (β̂ OLS ) (X T X) 1 X T Var(ε)X(X T X) 1 σ 2 (X T X) 1 .The Gauss–Markov theorem states that OLS is the best linear unbiased estimator inour setting: for any other estimator β̃ that is linear in Y (so β̃ AY for some fixed matrixA), we haveVarβ 0 ,σ2 (β̃) Varβ 0 ,σ2 (β̂ OLS )is positive semi-definite.Maximum likelihood estimationThe method of least squares is just one way to construct as estimator. A more generaltechnique is that of maximum likelihood estimation. Here given data y Rn that we takeas a realisation of a random variable Y , we specify its density f (y; θ) up to some unknownvector of parameters θ Θ Rd , where Θ is the parameter space. The likelihood functionis a function of θ for each fixed y given byL(θ) : L(θ; y) c(y)f (y; θ),where c(y) is an arbitrary constant of proportionality. The maximum likelihood estimateof θ maximises the likelihood, or equivalently it maximises the log-likelihood (θ) : (θ; y) log f (y; θ) log(c(y)).ii

A very useful quantity in the context of maximum likelihood estimation is the Fisherinformation matrix with jkth (1 j, k d) entry 2ijk (θ) : Eθ (θ) . θj θkIt can be thought of as a measure of how hard it is to estimate θ when it is the trueparameter value. The Cramér–Rao lower bound states that if θ̃ is an unbiased estimatorof θ, then under regularity conditions,Varθ (θ̃) i 1 (θ)is positive semi-definite.A remarkable fact about maximum likelihood estimators (MLEs) is that (under quitegeneral conditions) they are asymptotically normally distributed, asymptotically unbiasedand asymptotically achieve the Cramér–Rao lower bound.Assume that the Fisher information matrix when there are n observations, i(n) (θ) (wherewe have made the dependence on n explicit) satisfies i(n) (θ)/n I(θ) for some positivedefinite matrix I. Then denoting the maximum likelihood estimator of θ when there aren observations by θ̂(n) , under regularity conditions, as the number of observations n we have (n)dn(θ̂ θ) Nd (0, I 1 (θ)).Returning to our linear model, if we assume in addition that εi N (0, σ 2 ), then thelog-likelihood for (β, σ 2 ) isn1 Xn2(yi xTi β)2 . (β, σ ) log(σ ) 222σ i 12We see that the maximum likelihood estimate of β and OLS coincide. It is easy to checkthat 2 T σ X X02i(β, σ ) .0nσ 4 /2 The general theory for MLEs would suggest that approximately n(β̂ β) Np (0, nσ 2 (X T X) 1 );in fact it is straight-forward to show that this distributional result is exact.iii

Contents1 Kernel machines1.1 Ridge regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.1.1 The singular value decomposition and principal components analysis1.2 v-fold cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 The kernel trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4.1 Examples of kernels . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4.2 Reproducing kernel Hilbert spaces . . . . . . . . . . . . . . . . . . .1.4.3 The representer theorem . . . . . . . . . . . . . . . . . . . . . . . .1.5 Kernel ridge regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.6 Other kernel machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.6.1 The support vector machine . . . . . . . . . . . . . . . . . . . . . .1.6.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . .1.7 Large-scale kernel machines . . . . . . . . . . . . . . . . . . . . . . . . . .113457891213161618192 The Lasso and beyond2.1 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 The Lasso estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2.1 Prediction error of the Lasso with no assumptions on the design2.2.2 Basic concentration inequalities . . . . . . . . . . . . . . . . . .2.2.3 Some facts from optimisation theory and convex analysis . . . .2.2.4 Lasso solutions . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2.5 Variable selection . . . . . . . . . . . . . . . . . . . . . . . . . .2.2.6 Prediction and estimation . . . . . . . . . . . . . . . . . . . . .2.2.7 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Extensions of the Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3.1 Structural penalties . . . . . . . . . . . . . . . . . . . . . . . . .2.3.2 Reducing the bias of the Lasso . . . . . . . . . . . . . . . . . . .212122232327303031353636373 Graphical modelling and causal inference3.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 Conditional independence graphs . . . . . . . . . . . . . . . . . . . . . . .3.3 Gaussian graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . .38384041iv.

3.43.53.63.73.3.1 Normal conditionals . . . . . . . . . . . . . . . . .3.3.2 Nodewise regression . . . . . . . . . . . . . . . . . .3.3.3 The precision matrix and conditional independence3.3.4 The Graphical Lasso . . . . . . . . . . . . . . . . .Structural equation models . . . . . . . . . . . . . . . . . .Interventions . . . . . . . . . . . . . . . . . . . . . . . . .The Markov properties on DAGs . . . . . . . . . . . . . .Causal structure learning . . . . . . . . . . . . . . . . . . .3.7.1 Three obstacles . . . . . . . . . . . . . . . . . . . .3.7.2 The PC algorithm . . . . . . . . . . . . . . . . . .4 Multiple testing and high-dimensional inference4.1 The closed testing procedure . . . . . . . . . . . .4.2 The False Discovery Rate . . . . . . . . . . . . .4.3 Inference in high-dimensional regression . . . . . .4.3.1 Using the debiased Lasso in practice . . .v.41424343454646474749.5253545659

Chapter 1Kernel machinesLet us revisit the linear model withYi xTi β 0 εi .For unbiased estimators of β 0 , their variance gives a way of comparing their quality interms of squared error loss. For a potentially biased estimator, β̃, the relevant quantity isEβ 0 ,σ2 {(β̃ β 0 )(β̃ β 0 )T } E[{β̃ E(β̃) E(β̃) β 0 }{β̃ E(β̃) E(β̃) β 0 }T ] Var(β̃) {E(β̃ β 0 )}{E(β̃ β 0 )}T ,a sum of squared bias and variance terms. A crucial part of the optimality argumentsfor OLS and MLEs was unbiasedness. Do there exist biased methods whose variance isis reduced compared to OLS such that their overall prediction error is lower? Yes!—infact the use of biased estimators is essential in dealing with settings where the number ofparameters to be estimated is large compared to the number of observations. In the firsttwo chapters we’ll explore two important methods for variance reduction based on differentforms of penalisation: rather than forming estimators via optimising a least squares or loglikelihood term, we will introduce an additional penalty term that encourages estimatesto be shrunk towards 0 in some sense. This will allow us to produce reliable estimatorsthat work well when classical MLEs are infeasible, and in other situations can greatlyout-perform the classical approaches.1.1Ridge regressionOne way to reduce the variance of β̂ OLS is to shrink the estimated coefficients towards 0.Ridge regression [Hoerl and Kennard, 1970] does this by solving the following optimisationproblem2R2(µ̂Rλ , β̂λ ) arg min {kY µ1 Xβk2 λkβk2 }.(µ,β) R RpHere 1 is an n-vector of 1’s. We see that the usual OLS objective is penalised by anadditional term proportional to kβk22 . The parameter λ 0, which controls the severity of1

the penalty and therefore the degree of the shrinkage towards 0, is known as a regularisationparameter or tuning parameter. We have explicitly included an intercept term which is notpenalised. The reason for this is that were the variables to have their origins shifted soe.g. a variable representing temperature is given in units of Kelvin rather than Celsius, thefitted values would not change. However, X β̂ is not invariant under scale transformationsof the variables so it is standard practice to centre each column of X (hence making themorthogonal to the intercept term) and then scale them to have 2 -norm n. PnIt is straightforward tothat after this standardisation of X, µ̂Rλ Ȳ : i 1 Yi /n,Pshownso we may assume that i 1 Yi 0 by replacing Yi by Yi Ȳ and then we can remove µfrom our objective function. In this caseβ̂λR (X T X λI) 1 X T Y.In this form, we can see how the addition of the λI term helps to stabilise the estimator.Note that when X does not have full column rank (such as in high-dimensional situations),we can still compute this estimator. On the other hand, when X does have full columnrank, we have the following theorem.Theorem 1. For λ sufficiently small (depending on β 0 and σ 2 ),E(β̂ OLS β 0 )(β̂ OLS β 0 )T E(β̂λR β 0 )(β̂λR β 0 )Tis positive definite.Proof. First we compute the bias of β̂λR . We drop the subscript λ and superscript R forconvenience.E(β̂) β 0 (X T X λI) 1 X T Xβ 0 β 0 (X T X λI) 1 (X T X λI λI)β 0 β 0 λ(X T X λI) 1 β 0 .Now we look at the variance of β̂.Var(β̂) E{(X T X λI) 1 X T ε}{(X T X λI) 1 X T ε}T σ 2 (X T X λI) 1 X T X(X T X λI) 1 .Thus E(β̂ OLS β 0 )(β̂ OLS β 0 )T E(β̂ β 0 )(β̂ β 0 )T is equal toTσ 2 (X T X) 1 σ 2 (X T X λI) 1 X T X(X T X λI) 1 λ2 (X T X λI) 1 β 0 β 0 (X T X λI) 1 .After some simplification, we see that this is equal toTλ(X T X λI) 1 [σ 2 {2I λ(X T X) 1 } λβ 0 β 0 ](X T X λI) 1 .Thus E(β̂ OLS β 0 )(β̂ OLS β 0 )T E(β̂ β 0 )(β̂ β 0 )T is positive definite for λ 0 if andonly ifTσ 2 {2I λ(X T X) 1 } λβ 0 β 0is positive definite, which is true for λ 0 sufficiently small (we can take 0 λ 2σ 2 /kβ 0 k22 ).2

The theorem says that β̂λR outperforms β̂ OLS provided λ is chosen appropriately. Tobe able to use ridge

eral thousands of genes, but only for a few hundred tissue samples. The classical statistical methods are often simply not applicable in these \high-dimensional" situations. The course is divided into 4 chapters (of unequal size). Our rst chapter will start by introducing ridge regression, a simple generalisation of ordinary least squares. Our study of this will lead us to some beautiful .

Related Documents:

Statistical Methods in Particle Physics - Heidelberg University

Statistical Methods in Particle Physics WS 2017/18 K. Reygers 1. Basic Concepts Useful Reading Material G. Cowan, Statistical Data Analysis L. Lista, Statistical Methods for Data Analysis in Particle Physics Behnke, Kroeninger, Schott, Schoerner-Sadenius: Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods

19 Views

1y ago

STATISTICAL METHODS

STATISTICAL METHODS 1 STATISTICAL METHODS Arnaud Delorme, Swartz Center for Computational Neuroscience, INC, University of San Diego California, CA92093-0961, La Jolla, USA. Email: arno@salk.edu. Keywords: statistical methods, inference, models, clinical, software, bootstrap, resampling, PCA, ICA Abstract: Statistics represents that body of methods by which characteristics of a population are .

67 Views

3y ago

SOME STATISTICAL SOFTWARE APPLICATIONS FOR TAGUCHI METHODS

advanced statistical methods. The paper presents a few particular applications of some statistical software for the Taguchi methods as a quality enhancement insisting on the quality loss functions, the design of experiments and the new developments of statistical process control. Key words: Taguchi methods, software applications 1. Introduction

51 Views

3y ago

Selection of Appropriate Statistical Methods for Research ... - IEJME

Statistical methods are profoundly and widely used in biology and medicine. In biology, there are research areas dedicated to the application of statistical methods in biology; it comprises biometrics, biostatistics; in medical science statistical methods are used for the analysis of experimental data and clinical observations,

8 Views

1y ago

Developments from 1 Statistical Practice into Statistical

In addition to the many applications of statistical graphics, there is also a large and rapidly growing research literature on statistical methods that use graphics. Recent years have seen statistical graphics discussed in complete books (for example, Chambers et al. 1983; Cleveland 1985,1991) and in collections of papers (Tukey 1988; Cleveland

14 Views

1y ago

Statistical Methods and Thermodynamics Chem 472: Lecture …

R1: "Introduction to Modern Statistical Mechanics" by David Chandler (Oxford University Press). Ch. 3-8. Additional textbooks are available at the Kline Science and Engineering library include: R2: "Introduction to Statistical Thermodynamics" by T.L. Hill (Addison Wesley), R3: "Statistical Mechanics" by D. McQuarrie (Harper & Row),

17 Views

2y ago

STATISTICS 2331 Introduction to Statistical Methods May Term 2018 (May ...

STAT 2331, Intro to Statistical Methods, covers the basics of statistical analysis techniques and adequately prepares students for the quantitative components of various degree plans. In this course students learn about common techniques of basic statistical inference, with a focus on applications in business and the social sciences.

9 Views

1y ago

Statistical Physics - Heidelberg University

agree with Josef Honerkamp who in his book Statistical Physics notes that statistical physics is much more than statistical mechanics. A similar notion is expressed by James Sethna in his book Entropy, Order Parameters, and Complexity. Indeed statistical physics teaches us how to think about

62 Views

2y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Common Questions About Home Insurance

Homes with good security will generally be offered lower insurance quotes than the equivalent homes with poor security. In fact, some insurers may not offer quotes at all for homes with poor security. Contents Insurance Is money automatically covered? Most insurance policies will cover a limited amount of money (say up to 500) as part of

1y ago

257 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Modern Statistical Methods

It looks like you're using an ad-blocker