Lecture 1: Introduction To The Mean Eld Asymptotics 1 .

2y ago

38 Views

2 Downloads

331.90 KB

5 Pages

Last View : 3d ago

Last Download : 3m ago

Upload by : Aydin Oneil

Report this link

Download PDF

Transcription

STAT260 Mean Field Asymptotics in Statistical LearningLecture 1 - 01/20/2021Lecture 1: Introduction to the mean field asymptoticsLecturer: Song MeiScriber: Kumar Krishna AgrawalProof reader: Tae Joo AhnIn this course, we study the computational and statistical aspects of statistical models in the highdimensional asymptotic limit (the mean-field asymptotics). We will introduce heuristic tools in physicsincluding the replica method and the cavity method. These tools can be made rigorous using approachesincluding the Gaussian comparison inequality, the leave-one-out analysis, and approximate message passingalgorithms. Applications of these methods include the spiked matrix model, the LASSO problem, and thedouble-descent phenomenon.1Motivating example: The LASSO problemWe will get a flavor of the difference between the non-asymptotic theory and the asymptotic theory usingthe example of LASSO.Let x0 Rd , A Rn d , w Rn and y Ax0 w Rn . We consider the case d n but hope that x0is sparse in some sense (e.g x0 is k-sparse if x0 has k non-zero elements). To recover x0 given A and y, wesolve the following LASSO problemx̂ arg minxλ1ky Axk22 kxk1 .2nn(1)Figure 1 illustrates loss landscape of linear regression with mean-squared error, where LASSO encouragessolutions within some l1 level set. Our objective is to quantify/bound the normalized mean squared error, x̂ x0 22 / x0 22 . Note that different papers use different normalization of the LASSO problem. Here thenormalization I used is such that the presentation is simpler. When you read a paper on LASSO, you shouldfirst look at their normalization and then interpret the results.1.1Non-asymptotic theory of LASSOA line of papers studied the LASSO risk in the non-asymptotic regime. The following result is due to[NRWY12]. Theorem 2 is a fully deterministic statement: the result is satisfied by any deterministic A, x0 ,w, and y.Definition 1 (Restricted strong convexity). We say a matrix A Rn d satisfies the restricted strongconvexity property, if there exists universal constants c1 and c2 , such that for any v Rd , we havekAvk22log d c1 kvk22 c2kvk21 .nn(2)Why this property is called restricted strong convexity? If we define f (x) (1/2n)ky Axk22 , strongconvexity property says that 2 f (x) c1 Id , so that for any direction v, we havekAvk22 c1 kvk22 .nRestricted strong convexity simply says that f is strongly convex in the direction v when kvk1 is small.For sensing matrix A that satisfies RSC property, we have the following control of the LASSO risk.1

Theorem 2 ([NRWY12]). For any A Rn d satisfying the RSC property (2) with constant c1 and c2 , thereexists universal constant c (depending only on c1 , c2 ), such that as long as λ 2kAT wk , for anyx0 Rd and S [d] with S n/(c log d), the LASSO estimator (1) satisfieskx̂ x0 k22 cλ2 S λlog d c kx0,S c k1 ckx0,S c k21 .n2nnTheorem 2 does not tell us whether there exists a matrix that satisfies the RSC property. The followingproposition tells us that, for Gaussian random matrix A, RSC property holds with high probability.Proposition 3. For A Rn d with Aij N (0, 1), Eq. (2) is satisfied for some constant c1 and c2 withhigh probability as n .In the following, we will make simpler assumptions to understand Theorem 2.Corollary 4. Let A Rn d with Aij N (0, 1/kx0 k22 ). Let x0 Rd be k-sparse with the support of x0given by S. Let w be σ 2 -sub-Gaussian. Then for any δ 0, there exists constant C(δ) such that, as long aswe take n C(δ)k log d and λ C(δ) · σ n log d, then with probability at least 1 δ, the LASSO estimator(1) satisfieskx̂ x0 k22C(δ)σ 2 k log d .kx0 k22nThe corollary tells us that, to well-estimate a k-sparse ground truth vector, it is enough to have samplesize n k log d.Remark 5. In the non-asymptotic setting, everything is explicit, i.e there are no limiting statements. Additionally, the assumptions on the distribution of x0 are quite weak.1.2High dimensional asymptotics of LASSONote that the non-asymptotic theory of LASSO does not allow us to consider the proportional regimen k d. In many cases, however, this proportional regime is very interesting. It would be desirable toestablish a theory to characterize the performance of LASSO in this regime.Theorem 6 ([BM11]). We consider the asymptotic limit when n/d δ (0, ) as d . Let A Rn dwith Aij N (0, 1/n). Let x0 Rd with x0,i iid P0 . Let w N (0, σ 2 In ). Let x̂ be the LASSO estimator(1). Then we havelimd,n 1kx̂ x0 k22 E(X0 ,Z) P0 N (0,1) [(η(X0 τ? Z; θ? ) X0 )2 ],dwhere η(x) sign(x) · ( x 1) is the soft thresholding function and τ? τ? (α? ). Here we denote τ? (α) tobe a function such that, for fixed α, τ? (α) is the largest solution ofτ 2 σ 2 δ 1 E(X0 ,Z) P0 N (0,1) {[η(X0 τ Z; ατ ) X0 ]2 },and we denote α? by the unique non-negative solution ofhiλ ατ? (α) · 1 δ 1 E[η 0 (X0 τ? (α)Z; ατ? (α))] .Moreover, for any Lipschitz function ψ, we have almost surelyd1Xψ(x̂i , x0,i ) E(X0 ,Z) P0 N (0,1) [ψ(η(X0 τ? Z; α? τ? ), X0 )].d di 1lim2

Remark 7. The asymptotic error for high-dimensional LASSO estimator is equivalent toEX̂,X0 [(X̂ X0 )2 ],where (X̂, X0 ) following the distribution of(X0 , Z) P0 N (0, 1),Y X0 τ? Z,noX̂ arg min (Y v)2 τ? α? v η(Y, τ? α? ).vThis can be interpreted as an one dimensional LASSO problem.We can plot the limiting risk versus the regularization parameter λ, which is given in Figure 2. Thiscurve gives the precise U-shaped curve for the Bias and Variance tradeoff of LASSO estimator. Note thatthis U-shaped curve cannot be completely captured by the non-asymptotic theory, since the non-asymptotictheory doesn’t give lower and upper bounds that match up to 1 o(1). The sharp characterization of therisk is an advantage of the high dimensional asymptotic ure 2: The risk of the LASSO estimatorFigure 1: LASSO regularizer encourages sparsity.1.3Comparison of non-asymptotic theory and high dimensional asympoticsHere we present a table that compares the non-asymptotic theory versus the asymptotic theory.33

Typical regimeAdvantagesLimitationsWhen useful?Examples22.1Non-asymptotics theory(Relatively) Strong signal-to-noise ratio(n k log d)Less model assumptions. Result holds forany finite parameter size.A gap of upper and lower bounds up to constant or logarithmic factors.Characterize the behavior of a model or analgorithm with general assumptions.Statistical learning theory: bounding excessive risk by uniform convergence. Analyzing the non-convex landscape of empiricalrisk minimization.High dimensional asymptoticsConstant signal-to-noise ratio(n d k)Precise asymptotic formula: upper andlower bounds match sharply.More detailed model assumptions. (Sometimes) hard to control how large should theparameter be so that the asymptotic regimekick in.Identify the exact location of phase transition.The phase transition phenomenon for compressed sensing. Understanding the doubledescent phenomenon. The optimal lossfunction in machine learningMean-field theory and statistical physicsThe mean field theoryThe following definition of the mean field theory is adapted from wikipedia.In physics and probability theory, mean-field theory studies the behavior of high-dimensional random(stochastic) models by studying a simpler model that approximates the original by averaging over degrees offreedom.In our example, the LASSO problem is a high dimensional random model, while the one dimensionalmodel in remark 7 is the simpler model that approximates the original one.2.2Method from statistical physicsThe focus of this course is to analyze statistical models through the high dimensional asymptotic viewpoint.In many cases, we are interested in deriving the asymptotic formula instead of proving the formula rigorously,and statistical physics tools can be used to predict these formula. These predicted formula can be simplyverified through experiments. While some predictions have been made rigorous in some way, typically provingthese formula is much more complicated than deriving them.Figure 3 motivates some of the connections between statitical physics and statistical learning. In thiscourse, we will introduce the “replica method” introduced by physicists early in 1970s. We will show howit can be used to predict the behaviors of statistical models and algorithms in the asymptotic limit. Simplemodels will be used as examples in class: the spiked GOE matrix and the LASSO problem. We will revisitthese models several times. We will first show how the replica method can be used to predict the behavior ofthese models. Then we will show how these predictions can be proved using rigorous tools. These rigoroustools include the Gaussian comparison theorem, the Stieltjes transforms, and approximate message passing(AMP) algorithms.3Level of rigorous of this courseIn this course, we will sometimes adopt a physics level of rigorous, and sometimes adopt a mathematicslevel of rigorous. We will not involve in the measure theoretic issues. That being said, we will assume everyfunction is measurable and most of the time integrable. Sometimes we will assume differentiability, assumeexchange of limits, and assume exchange of limits and differentiation. We will clarify these heuristic whenwe did so.4

Gibbs measure of spin glass modelSherrington-Kirkpatrick model)OptimizationBayesian InferenceApplicationsRigorousHeuristic tools1980s-1990sCoding theoryrandom combinatorialoptimizationReplica MethodCavity methodTAP approachKac-Rice formulaDynamical MF theorystatistical learningCGMTAMPleave-one-outFigure 3: Tools developed in statistical physics with applications to statistical learning.The reason why we don’t adopt the fully rigorous approach is that, it can take a long time to explain everydetails in checking these exchange of limits assumptions, which may make the audience lose the intuitionand the main idea.References[BM11]Mohsen Bayati and Andrea Montanari, The lasso risk for gaussian matrices, IEEE Transactionson Information Theory 58 (2011), no. 4, 1997–2017.[NRWY12] Sahand N Negahban, Pradeep Ravikumar, Martin J Wainwright, and Bin Yu, A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, Statisticalscience 27 (2012), no. 4, 538–557.5

2 Mean- eld theory and statistical physics 2.1 The mean eld theory The following de nition of the mean eld theory is adapted from wikipedia. In physics and probability theory, mean- eld theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model th

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

116 Views

9m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

469 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

328 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

125 Views

9m ago

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

100 Views

2y ago

Kinh Giải Thâm Mật HT. Thích Trí Quang dịch giải

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

1.6K Views

3y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

60 Views

1y ago

Recent Views

MOOSIC PRE ORDER OFFER 2018

9781860960147 Jazz Piano Grade 5: The CD 22.92 17.24 18.76 19.83 9781860960154 Jazz Piano from Scratch 55.00 41.36 45.02 47.58 9781860960161 Jazz Piano Aural Tests, Grades 1-3 18.15 13.65 14.86 15.70 9781860960505 Jazz Piano Aural Tests, Grades 4-5 15.29 11.50 12.52 13.23 Easier Piano Pieces (ABRSM)

3y ago

95 Views

Bethel A.M.E. Annual Women's Day Celebration

Annual Women's Day Celebration Theme: Steadfast and Faithful Women 1993 Bethel African Methodi st Epi scopal Church Champaign, Illinois The Ministry Thi.! Rev. Sleven A. Jackson, Pastor The Rev. O.G. Monroe. Assoc, Minister The Rl. Rev. James Haskell Mayo l1 ishop, f7011rt h Episcop;l) District The Rev. Lewis E. Grady. Jr. Prc. i ding Elder . Cover design taken from: Book of Black Heroes .

3y ago

97 Views

Automotive - Siemens Digital Industries Software

of this system requires a new level of close integration between mechanical, electrical and thermal domains. It becomes necessary to have true multi-domain data exchange between engineering software tools to inform the system design from an early concept stage. At the most progressive automotive OEMs, thermal, electrical

3y ago

51 Views

PRESENTER BIOGRAPHIES

PRESENTER BIOGRAPHIES. MDPH Commissioner Remarks: Cheryl Bartlett, RN Commissioner . MA Department of Public Health . Cheryl Bartlett was named Commissioner of the Massachusetts Department of Public Health in June 2013. As Commissioner Ms. Bartlett chairs the newly appointed Prevention and Wellness Advisory Board, which oversees a 60 million Prevention Trust Fund – the first of its kind in .

3y ago

116 Views

2019 SPLUNK INC. Splunk Certification Certification Exam .

Sample Questions Test Blueprint Splunk Core Certified Consultant Test Blueprint Splunk Certification Exams Table of Contents Please note: Sample questions (where available) are provided to give candidates a general idea of the formatting and type of questions for each of the exams listed above. The test blueprints provide much

3y ago

73 Views

Programme Specification BSc Chemistry (2020-21 )

The BSc Chemistry degree aims to enhance your enthusiasm for chemistry and to provide an intellectually stimulating learning environment. You will gain extensive in-depth knowledge and understanding of chemistry and related subjects, as well as a comprehensive training in practical chemistry and an appreciation of the importance of the discipline in different contexts. The programme will .

3y ago

51 Views

Chimney - Robot Virtual Games

Chimney Junior Each Total Correct balls on the Chimney Each ball will give you points if it is equal to the color indicated by the cube. 40 80 Incorrect balls on the Chimney Each ball will take you points if it is not equal to the color indicated by the cube. -5 -10 Park the robot Robot stops on Finish Area and simulation stops.

3y ago

42 Views

Timeline of the Cold War - truman.library

Timeline of the Cold War 1945 Defeat of Germany and Japan February 4-11: Yalta Conference meeting of FDR, Churchill, Stalin - the 'Big Three' Soviet Union has control of Eastern Europe. The Cold War Begins May 8: VE Day - Victory in Europe. Germany surrenders to the Red Army in Berlin July: Potsdam Conference - Germany was officially partitioned into four zones of occupation. August 6: The .

3y ago

253 Views

skinnytaste Cookbook Index

Naked Persian Turkey Burgers The Skinnytaste Cookbook Perfect Poultry 156 6 6 6 Orecchiette with Sausage, Baby Kale, and Bell Pepper The Skinnytaste Cookbook Perfect Poultry 181 11 11 4. RECIPE COOKBOOK CHAPTER PG SP Roasted Poblanos Rellenos with Chicken The Skinnytaste Cookbook Perfect Poultry 173 7 10 5

3y ago

71 Views

3-in-1 Cooking System - NinjaKitchen

5 ˆˇ 6 Getting to Know the Ninja 3-in-1 Cooking System Control Panel Function Dial Turn the dial to select Stovetop, Slow Cook or Oven mode. Stovetop - Use the Ninja 3-in-1 Cooking System as you would a stovetop.

3y ago

39 Views

BIOLOGY - Michigan

Credit for high school Earth Science, Biology, Physics, and Chemistry will be defined as meeting both essential and core subject area content expectations. Assessment Prerequisite Knowledge and Skills Basic Science Knowledge Orientation Towards Learning Reading, Writing, Communication Basic Mathematics Conventions, Probability, Statistics .

3y ago

27 Views

Investigatingrespiration*in*ectotherms(crickets)*

ets)*

Males"of" ud" chirpingsoundbyrubbingtheir forewingstogether;theydothisto p .

3y ago

34 Views

The Criminal Justice Response to Child Abuse: Lessons .

Rates of Criminal Justice Action on Investigated Cases Study Sample N Rate Tjaden & Thoennes, 1992 CPS 833 4% prosecuted Finkelhor, 1983 State clearing - house data 6096 24% criminal justice action taken Stroud, Martens & Barker, 2000 &KLOGUHQ¶V Advocacy Center 1043 56% referred to p rosecutors

3y ago

45 Views

Curriculum Adaptations for Exceptional Students

Adapting curriculum and instruction . The Center for School and Community Integration, Institute for the Study of Developmental Disabilities. Why do we want to use curriculum adaptations? Looking at learning in new and different ways. Get creative! EM 1.1.8 – Student understands concepts of

3y ago

29 Views

Brunei Darussalam In Brief - information.gov.bn

‘Brunei Darussalam In Brief’ is a publication where it discusses briefly on the socio-economic welfare of Brunei Darussalam in general. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means without prior written permission from

3y ago

65 Views

Lecture 1: Introduction To The Mean Eld Asymptotics 1 .

It looks like you're using an ad-blocker