Introduction To Monte Carlo - Babeș-Bolyai University

2y ago
30 Views
3 Downloads
7.75 MB
45 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Casen Newsome
Transcription

Introduction to Monte CarloAstro 542Princeton UniversityShirley Ho

Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection,Metropolis, Metropolis-Hasting, ExactSampling) Markov Chains -- definition,examples Stationary distribution Markov Chain Monte Carlo -- definitionand examples

Monte Carlo -- a bit of history Credit for inventing the Monte Carlo methodoften goes to Stanislaw Ulam, a Polish bornmathematician who worked for John vonNeumann on the United States ManhattanProject during World War II. Ulam is primarily known for designing thehydrogen bomb with Edward Teller in 1951. He invented the Monte Carlo method in 1946while pondering the probabilities of winning acard game of solitaire. (Rumors: That’s why it is called Monte Carlo(referred to the city of Monte Carlo in Monacowhere lots of gambling go on))

Monte Carlo Method Consider the game ofsolitaire: what’s the chanceof winning with a properlyshuffled deck?Hard to compute analyticallybecause winning or losingdepends on a complexprocedure of reorganizingcardsInsight: why not just play afew hands, and seeempirically how many do infact win?More generally, canapproximate a probabilitydensity function using onlysamples from that density?LoseLoseWinLoseLoseChance of winning is 1 in 5!

Monte Carlo principle Given a very large set X and a distribution p(x) over it We draw i.i.d. (independent and identically distributed) aset of N samples We can then approximate the distribution using thesesamplesxtx21 Np N (x) "1(x (i) x)N i 1!" p(x)N "!

Monte Carlo PrincipleWe can also use these samples to compute expectations1EN ( f ) NN!i 1f ( x (i ) ) " E ( f ) ! f ( x) p( x)N "#x

Examples: Buffon needles More than 200 years beforeMetropolis coined the nameMonte Carlo method, GeorgeLouis Leclerc, Comte deBuffon, proposed thefollowing problem. If a needle of length l isdropped at random on themiddle of a horizontal surfaceruled with parallel lines adistance d l apart, what isthe probability that the needlewill cross one of the lines?Buffon asked what wasthe probability that theneedle would fall acrossone of the lines,marked here in green.That outcome will occuronly if A l sin (theta)

Buffon’s needle continued The positioning of the needle relative to nearby lines can be describedwith a random vector which has components A and " . The randomvector (A,") is uniformly distributed on the region [0,d) [0, " ).Accordingly, it has probability density function 1/(d " ).The probability that the needle will cross one of the lines is given by!the integral!2l !! d" Suppose Buffon’s experiment is performed with the needle beingdropped n times. Let M be the random variable for the number oftimes the needle crosses a line, then the probability of the needlecrossing the line will be: E(M)/n!Thus :" n 2lE(M) d

Applications of Monte Carlo Example1: To understand the behavior of electrons ina semi-conductor materials, we need to solveBoltzmann Transport equation: To do this, we need to integrate some complicated functionsand that’s where Monte Carlo methods come in. But beforedoing the hard stuff, let’s watch the outcome of using MonteCarlo method to understand the electrons in a pure siliconcrystal at 300K

How did we integrate usingMonte Carlo method then?Pairs of random numbers can betransformed into coordinatesuniformly distributed within thebox. The fraction of coordinatesthat falls below the functionmultiplied with the area of thelimiting box, gives the solution ofthe integral.The accuracy of the solutiondepends on the number ofrandom numbers used.The exact solution will be foundwithin some interval around theresult obtained by the MonteCarlo method. For an infinitenumber of coordinates thesolution will be exact.

Sampling Methods Here, we will talk about the samplingmethods: Rejection, Metropolis andexact sampling. Why do we need to know aboutsampling? Correct samples from P(x) will bydefinition tend to come from places in xspace where P(x) is big; how can weidentify those places where P(x) Is big,without evaluating P(x) everywhere(which takes a lot of time especially inhigher dimension systems)?

Rejection Sampling We would like to sample from p(x), butit’s easier to sample from a proposaldistribution q(x)A proposal distribution is a simplerdistribution that we sample fromq(x) satisfies p(x) M q(x) for some M Procedure:– Sample x(i) from q(x)– Accept with probability p(x(i)) /Mq(x(i))– Reject otherwiseThe accepted x(i) are sampled from p(x)!Problem: it works well only if p(x) andq(x) are similar!and we have no easy (and fast) way tomake sure that these two distributionsare similar.If M is too large, we will rarely acceptsamples !In high dimensional space, you will havetoo much to sample from

Transition Since we will need to understand statediagrams and transition between statesto talk about the following two samplingmethods (Metropolis, Metroplis-Hastingand exact sampling) I will switch gear here to introduceMarkov Chains first before we comeback to the sampling methods

Markov Chains0.40.7 0.3 0T 0.30.3 0.4 0.30 0.3 0.7x20.30.7x10.30.30.7x3 Markov chain on a space X with transitions T is arandom process (infinite sequence of randomvariables) (x(0), x(1), x(t), ) X that satisfy(t )p( x x( t !1)(1),., x ) T( x( t !1)(t ),x ) T is the transition probability matrix, P is theprobability for x to be in state x(t) given the history ofthe state. That is, the probability of being in a particular state attime t given the state history depends only on thestate at time t-1 -- Memoryless

Markov chain for sampling In order for a Markov chain to useful for sampling p(x), werequire that for any starting state x(1)p (xt()1) ( x) " p( x)t "! Stationary distribution (" ) : for any pairs of state i,j : " iTij " j T jiEquivalently, the stationary distribution of the Markov chainmust be p(x)[p T]( x) p( x) !!If this is the case, we can start in an arbitrary state, use theMarkov chain to do a random walk for a while, and stop andoutput the current state x(t)The resulting state will be sampled from p(x)!

Stationary distribution example:Consider the Markov chain given above:0.40.7 0.3 0T 0.30.3 0.4 0.30.30.70 0.3 0.7 The stationary distribution isx20.30.3x10.33 0.33 0.33 x 0.7 0.3 0 0.33 0.33 0.330.3 0.4 0.30 0.3 0.7 Some 1,11,2,2,2,3,3,3,2,2,20.7x3

Markov chain and samplingClaim: To ensure that the stationary distribution of the Markovchain is p(x) it is sufficient for p and T to satisfy the detailedbalance (reversibility) conditionp( x)T ( x, y ) p( y )T ( y, x)Proof: for all y we have[p T]( y ) ! p( x)T ( x, y ) ! p( y )T ( y, x) p( y )xx- stationary distribution!Once we know that it is a stationary distribution, we can then takethe samples from the stationary distribution and it should reflectp(x) if we create the Markov chain correctly.! Recall that we want to integrate efficiently some difficultfunctions, and we want to use Monte Carlo integration, but wedon’t want to sample around the regions where the probabilityof accepting is low, now with Markov chains, we can samplemore efficiently!

Metropolis algorithm Suppose our distribution p(x) is easy to sample, and easy tocompute up to a normalization constant, but hard to computeexactly– We tried using Rejection Sampling to sample p(x),but in high dimensional space, there are too many samplesthat is being rejected- BAD– So, we can use a Markov Chain with the following algorithmto make sure that when we sample, we stay very closelyaround where the p(x) is high, thus most of our samples willbe accepted (when you sample from the Markov chain).– How do we do that?– We define a Markov chain with the following process:– Sample a candidate point x* from a proposal distributionq(x* x(t)) which is symmetric: q(x y) q(y x)– Compute the ratio:p( x*)r p( x (t ) )– With probability min(r,1) transition to x*, otherwise stay in thesame state

How does Metropolis work? Why does the Metropolisalgorithm work?– Proposal distribution canpropose anything it likes (as longas it can jump back with thesame probability)– Proposal is always accepted ifit’s jumping to a more likely state– Proposal accepted with the ratioif it’s jumping to a less likely state The acceptance policy,combined with the reversibility ofthe proposal distribution, makessure that the algorithm exploresstates in proportion to p(x)!r 1.0r p(x*)/p(xt )x*xtx*

Detailed BalanceLooking at Metropolis Algorithm and assume thatp(y) p(x) without loss of generalityp( x)T ( x, y ) p( x)q ( y x) p( x)q ( x y )p( x) p( y )q ( x y )p( y ) p( y )T ( y, x)Detailed balance!

Other Sampling Methods Metroplis-Hasting: We do not require theproposal distribution to be symmetric (q(x y q(y x)) that Metropolis needs and insteaduse:*(t )*p( x ) q( x x )r p( x ( t ) ) q( x* x ( t ) )as the ratio to determine whether we acceptor not. Gibbs Sampling: A special case ofMetropolis-Hasting, but we use theconditional P(xj xi) instead.

Practicalities Can we predict how long a Markov chain MonteCarlo simulation will take to equilibrate? (reachingthe stationary distribution)- By considering the random walks involved in aMCMC simulation, we can obtain simple lowerbounds on the time required for convergence. (saythe length scale of the state space is L (the curvatureof the pdf), and step size is s, then you will need Tsteps (L/s)1/2 before the chain equilibrate) But predicting this time more precisely is a difficultproblem. Exact sampling offers a solution to this problem andwe will talk about this later.

More practicalities Can we diagnose or detectconvergence in a mcmc ? - verydifficult Can we speed up the convergence?- Hamiltonian Monte Carlo- Overrelaxation- Simulated annealing

How do we know the Markov chains havereached the equilibrated state?-- Exact sampling We will know for sure for someMarkov chains that they havereached the stationary statesusing exact sampling: 3 Big ideas:1) When the Markov chains (withthe same random number seed)meet, then the chains will notseparate anymore

Exact Sampling:2) Couplings from the past:Let the Markov Chainsrun and then stop at timet 0, if not all the chainsmeet together, then startthe simulation at t -Tusing the same randomnumber seeds (so thatthe same situation ismodeled)3) For some MarkovChains, they never crosseach other, then we justneed to look at themaximal and the minimalstate and look for thetime when they meet.

Exact sampling: Ising Model Exact sampling is very useful especiallyfor spins.Spins can be for example electrons thathave simple 1/2 or -1/2 spin.Since for some of the spins systems, theycan have a maximal and a minimal state,so that we can feasibly use exactsampling.Example: Ferromagnetic system (4 spinsonly):You can order the states from to ---One can evolve the spin chain to the finalequilibrated state in a way such that weonly consider the maximal and minimalstate, and then wait until the Markovchain of both of these states converge,then we know that we will get exactsamples from that equilibrated spin chains

MCMC in Action: WMAP! Used Markov Chain Monte Carlo toget the cosmological parameters with theirconfidence levels. To get the confidence levels of parameter ai, onehas to marginalize over all the other parameters inthe parameter space (thus doing an integration!! Recall how we did Monte Carlo integration!) We need to get the area underneath a certainunknown function, thus, sample the space aroundthe curve (in the rectangular box ) and in thiscase, we are going to use Markov Chain(produced using Metropolis Algorithm) MonteCarlo to get the samples.

Markov Chain Monte Carlo Analysis of WMAP1) Start with a set of cosmological parameters {a1} ,compute the likelihood L1 L(a1 Cˆ l )Cˆ l denotes the best estimator of the C skyl!2) Take a random step in parameter space to obtain a!new set of cosmologicalparameters {a2} . Theprobability distribution of the step is taken to be!Gaussian in each direction i with r.m.s given by " i .We will refer below to " ias the “step size”. Thechoice of the step size is important to optimize thechain efficiency.3) Compute L2 for the new set of cosmological!parameters.!

More on WMAP analysis4.a) If L2 L1, “take the step” i.e. save the new set ofcosmological parameters {a2} as part of the chain, thengo to step 2 after the substitution {a1} 䌝㻡 {a2}.4.b) If L2 L1 , draw a random number x from a uniformdistribution from 0 to 1- If x L2/L1 “do not take the step”, i.e. save theparameter set {a1} as part of the chain and return to step2.- If x L2/L1, “ take the step”, i.e. do as in 4.a).5) For each cosmological model run four chains starting atrandomly chosen, well-separated points in parameterspace.6)When the convergence criterion is satisfied and the chainshave enough points to provide reasonable samples fromthe a posteriori distributions, stop the chains (we will talkabout this step in details the next slide)

Convergence and Mixing(in WMAP analysis) How to test convergence? Let us consider running M chains, each with 2N elements, andwe only consider the last N: {yji} (i runs from 1 to N, j runs from 1 to M) Mean of the chain:Mean of Distribution:NNM1jj1jy " yiy y" iN i 1NM ij 1 Variance within the chain: Variance between chainsNMM11W (y ij " y j ) 2Bn (y j " y ) 2##M(N "1) ij 1M "1 j 1!!

More on convergence andmixing We can define this ratio:N "11W Bn (1 )MR NWnumerator- estimate of the variance that is unbiased if thedistribution is stationary (otherwise, overestimate)denominator- underestimate of the! variance of target distribution ifindividual chains have not convergedWhen the ratio is nearly 1, that means the chain has basicallyachieved convergence. WMAP uses R 1.1WMAP only samples the chain after it reaches this ratio, thus makingsure that it has converged.Mixing: It also uses all the points after the first 200 points in the chainand WMAP claims that using at least 30,000 points in each chains isgood enough to produce marginalized likelihood for all the parameters.This is tested (from David Spergel) by computing the 1-sigma contourof the chains independently and make sure that they agree with eachother to within 10%.

Analysis of WMAP cont’d Finally, after all the Markov Chains finished running,we need to find the marginalized likelihood for all theparameters! First, we have now found a M-dimensionalconfidence region (say we have M parameters). Recall from previous talks: A confidence region (or confidenceinterval ) is just a region of that M-dimensional space (hopefullya small region) that contains a certain (hopefully large)percentage of the total probability distribution.Example: You point to a 99% confidence level region and say,e.g., “there is a 99 percent chance that the true parametervalues fall within this region around the measured value.”

Analysis of WMAP cont’dExpectation value of each parameter:r ai " La i daHow do we do this integral?- Since the underlying distribution function is so complicated andwe want basically an integration over all the other parameters(say you only want 1 parameter ‘w’)!- Monte Carlo integration!!- And we already have a nicely understood systems of MarkovChains and we can sample from states of the Markov Chains.Recall that the Markov Chains are designed to walk and stayaround where the P(a) is large, so then when we sample theMarkov Chain, it will give us a good sampling of the distributionfunction P(a).

Marginalized Likelihood cont’dMarginalized Likelihood for one parameter:r 1" Lai da N #t at,iAnd here N is the number of points in the MarkovChain that has stabilized, at,i is the value ofparameter at the t-th step of the chain !This can be easily understood as the MCMC gives toeach point in the parameter space a “weight”/probability proportional to the number of steps thechain has spent at that particular location. The 100(1-2p)% confidence interval [cp,c1-p] for aparameter is estimated by setting cp to the pthquantile of at,i and cp-1 to the (1-p)th quantile of at,i. - DONE !!

THANK YOU!!! All for coming! References: MacKay’s book and website Mitzenmacher & Upfal, Probability andComputing: Randomized Algorithms andProbabilistic Analysis (not yet published ) Verde et al. 2003 , Spergel et al. 2003 Jokes website:http://www.isye.gatech.edu/ brani/isyebayes/ A lot of pictures from the web (notincluding the list here)

More Monte Carlo application Making fake dataset to testthe algorithm :In high energy physicsexperiment, one generate afake dataset using MonteCarlo to test if the algorithmis correct. To get the probability ofsome phenomena:In strong lensing simulation,you throw sourcesrepeatedly randomly ontothe source plane and raytraced each source throughthe lensing cluster to get theprobability of how often doesthe cluster lens.

Efficient Monte Carlo Methods Hamiltonian Monte Carlo Overrelaxation (only talk about thiswhen Gibbs sampling is introduced) Simulated Annealing

Hamiltonian Monte Carlo If we write P(x) exp(-E(x))/Z If we can augment the state space x withp, and sample from the joint density: P(x,p) 1/Zhexp[-H(x,p)] Algorithm hinges on: when we go to alower energy state (dH 0 or a randomnumber exp(-dH) ), then we move tothis new state.

Simulated Annealing Introduce a parameter T temperature PT(x) 1/Z(T) exp[-E0(x)-E1(x)/T] So that we will have a well behavedfunction at high temperature defined byE0 Pros: Avoids going into very unlikelyregion and get stuck there(unrepresentative probability island) Cons: does not necessarily give you thesample from the exact distribution.

Gibbs Sampling When we can’t really draw fromP(x) directly since P(x) is toocomplicated But P(xj xi) where i j is tractable A special case of MetropolisHasting, but we use the conditionalP(xj xi) instead.

Metropolis-Hasting The symmetry requirement of the Metropolisproposal distribution can be hard to satisfy Metropolis-Hastings is the natural generalization ofthe Metropolis algorithm, and the most popularMCMC algorithm We define a Markov chain with the following process:– Sample a candidate point x* from a proposal distribution q(x* x(t))which is *not* necessarily symmetric– Compute the ratio:p( x* ) q( x ( t ) x* )r p( x ( t ) ) q( x* x ( t ) )– With probability min(r,1) transition to x*, otherwise stay in the samestate x(t)– One can prove that it satisfy detailed balance too !

Break time again :)

X- set of observed Cˆ lP(x a)P(a)P(a x) " P(x a)P(a)daa- a set of cosmological parameters!

Break time .From a Frequentist: A Bayesian is one who, vaguelyexpecting a horse, and catching a glimpse of adonkey, strongly believes he has seen a mule.From a Bayesian(?):A Bayesian and a Frequentist were to be executed. The judgeasked them what were their last wishes.The Bayesian replied that he would like to give the Frequentistone more lecture.The judge granted the Bayesian's wish and then turned to theFrequentist for his last wish.The Frequentist quickly responded that he wished to hear thelecture again and again and again and again.

Astro 542 Princeton University Shirley Ho. Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains -- definition,examples Stationary distribution Markov Chain Monte Carlo -- definition and examples.

Related Documents:

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

Fourier Analysis of Correlated Monte Carlo Importance Sampling Gurprit Singh Kartic Subr David Coeurjolly Victor Ostromoukhov Wojciech Jarosz. 2 Monte Carlo Integration!3 Monte Carlo Integration f( x) I Z 1 0 f( x)d x!4 Monte Carlo Estimator f( x) I N 1 N XN k 1 f( x k) p( x

J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems , JASA, 1998 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources Pierre Del Moral, Feynman-Kac

vi Equity Valuation 5.3 Reconciling operating income to FCFF 66 5.4 The financial value driver approach 71 5.5 Fundamental enterprise value and market value 76 5.6 Baidu’s share price performance 2005–2007 79 6 Monte Carlo FCFF Models 85 6.1 Monte Carlo simulation: the idea 85 6.2 Monte Carlo simulation with @Risk 88 6.2.1 Monte Carlo simulation with one stochastic variable 88

Electron Beam Treatment Planning C-MCharlie Ma, Ph.D. Dept. of Radiation Oncology Fox Chase Cancer Center Philadelphia, PA 19111 Outline Current status of electron Monte Carlo Implementation of Monte Carlo for electron beam treatment planning dose calculations Application of Monte Carlo in conventi

Computational Geometry Aspects of Monte Carlo Approaches to PDE Problems in Biology, Chemistry, and Materials Monte Carlo Methods for PDEs A Little History on Monte Carlo Methods for PDEs Early History of MCMs for PDEs 1.Courant, Friedrichs, and Lewy: Their pivotal 1928 paper has probabilistic interpretations and MC algorithms for linear elliptic