Introduction To Sequential Monte Carlo Methods

2y ago
25 Views
2 Downloads
1.49 MB
55 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Julia Hutchens
Transcription

Introduction to SequentialMonte Carlo MethodsSMC for Static ProblemsOnline Bayesian Parameter EstimationProf. Nicholas ZabarasMaterials Process Design and Control LaboratorySibley School of Mechanical and Aerospace Engineering101 Frank H. T. Rhodes HallCornell UniversityIthaca, NY 14853-3801Email: nzabaras@gmail.comURL: http://mpdc.mae.cornell.edu/April 27, 2013Bayesian Scientific Computing, Spring 2013 (N. Zabaras)1

Contents References and SMC Resources SMC FOR STATIC PROBLEMS: Important Application Problems,Algorithm, Choice of Reversed Kernel ONLINE PARAMETER ESTIMATION: Using MCMC, Recursive MLEParameter Estimation, Importance Sampling Estimation ofSensitivity, Degeneracy of the SMC Algorithm, Marginal ImportanceSampling for Sensitivity Estimation, Examples, Gibbs SamplingStrategy, Marginal Metropolis Hastings Algorithm OPEN PROBLEMSBayesian Scientific Computing, Spring 2013 (N. Zabaras)2

References C.P. Robert & G. Casella, Monte Carlo Statistical Methods, Chapter 11 J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York. A. Doucet, N. De Freitas & N. Gordon (eds), Sequential Monte Carlo in Practice, Springer-Verlag:2001 A. Doucet, N. De Freitas, N.J. Gordon, An introduction to Sequential Monte Carlo, in SMC inPractice, 2001 D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006 E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006 J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems, JASA, 1998 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources Pierre Del Moral, Feynman-Kac models and interacting particle systems (SMC resources) A. Doucet, Sequential Monte Carlo Methods, Video Lectures, 2007 N. de Freitas and A. Doucet, Sequential MC Methods, N. de Freitas and A. Doucet, Video Lectures,2010Bayesian Scientific Computing, Spring 2013 (N. Zabaras)3

References M.K. Pitt and N. Shephard, Filtering via Simulation: Auxiliary Particle Filter, JASA, 1999 A. Doucet, S.J. Godsill and C. Andrieu, On Sequential Monte Carlo sampling methods for Bayesianfiltering, Stat. Comp., 2000 J. Carpenter, P. Clifford and P. Fearnhead, An Improved Particle Filter for Non-linear Problems, IEE1999. A. Kong, J.S. Liu & W.H. Wong, Sequential Imputations and Bayesian Missing Data Problems,JASA, 1994 O. Cappe, E. Moulines & T. Ryden, Inference in Hidden Markov Models, Springer-Verlag, 2005 W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models,JRSS B, 2001 G. Poyadjis, A. Doucet and S.S. Singh, Maximum Likelihood Parameter Estimation using ParticleMethods, Joint Statistical Meeting, 2005 N Gordon, D J Salmond, AFM Smith, Novel Approach to nonlinear non Gaussian Bayesian stateestimation, IEE, 1993 Particle Filters, S. Godsill, 2009 (Video Lectures) R. Chen and J.S. Liu, Predictive Updating Methods with Application to Bayesian Classification,JRSS B, 1996Bayesian Scientific Computing, Spring 2013 (N. Zabaras)4

References C. Andrieu and A. Doucet, Particle Filtering for Partially Observed Gaussian State-Space Models,JRSS B, 2002 R Chen and J Liu, Mixture Kalman Filters, JRSSB, 2000 A Doucet, S J Godsill, C Andrieu, On SMC sampling methods for Bayesian Filtering, Stat. Comp.2000 N. Kantas, A.D., S.S. Singh and J.M. Maciejowski, An overview of sequential Monte Carlo methodsfor parameter estimation in general state-space models, in Proceedings IFAC System Identification(SySid) Meeting, 2009 C. Andrieu, A.Doucet & R. Holenstein, Particle Markov chain Monte Carlo methods, JRSS B, 2010 C. Andrieu, N. De Freitas and A. Doucet, Sequential MCMC for Bayesian Model Selection, Proc.IEEE Workshop HOS, 1999 P. Fearnhead, MCMC, sufficient statistics and particle filters, JCGS, 2002 G. Storvik, Particle filters for state-space models with the presence of unknown static parameters,IEEE Trans. Signal Processing, 2002Bayesian Scientific Computing, Spring 2013 (N. Zabaras)5

References C. Andrieu, A. Doucet and V.B. Tadic, Online EM for parameter estimation in nonlinear-nonGaussian state-space models, Proc. IEEE CDC, 2005 G. Poyadjis, A. Doucet and S.S. Singh, Particle Approximations of the Score and ObservedInformation Matrix in State-Space Models with Application to Parameter Estimation, Biometrika,2011 C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation forGenome Wide Transcript Analysis, Technical report 2008 R. Martinez-Cantin, J. Castellanos and N. de Freitas. Analysis of Particle Methods for SimultaneousRobot Localization and Mapping and a New Algorithm: Marginal-SLAM. International Conference onRobotics and Automation C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion),JRSS B, 2010 A Doucet, Sequential Monte Carlo Methods and Particle Filters, List of Papers, Codes, and Viedolectures on SMC and particle filters Pierre Del Moral, Feynman-Kac models and interacting particle systemsBayesian Scientific Computing, Spring 2013 (N. Zabaras)6

References P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo samplers, JRSSB, 2006 P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation, BayesianStatistics, 2006 P. Del Moral, A. Doucet & S.S. Singh, Forward Smoothing using Sequential Monte Carlo, technicalreport, Cambridge University, 2009 A. Doucet, Short Courses Lecture Notes (A, B, C) P. Del Moral, Feynman-Kac Formulae, Springer-Verlag, 2004 Sequential MC Methods, M. Davy, 2007 A Doucet, A Johansen, Particle Filtering and Smoothing: Fifteen years later, in Handbook ofNonlinear Filtering (edts D Crisan and B. Rozovsky), Oxford Univ. Press, 2011 A. Johansen and A. Doucet, A Note on Auxiliary Particle Filters, Stat. Proba. Letters, 2008. A. Doucet et al., Efficient Block Sampling Strategies for Sequential Monte Carlo, (with M. Briers & S.Senecal), JCGS, 2006. C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation forGenome Wide Transcript Analysis, Stat Comput. 2011.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)7

SEQUENTIAL MONTE CARLOMETHODSFOR STATIC PROBLEMSBayesian Scientific Computing, Spring 2013 (N. Zabaras)8

What about if all distributions πn, n 1 are defined on X? This case can appear often, for example: Sequential Bayesian Estimation: Classical Bayesian Inference: Global Optimization:π n ( x) p ( x y1:n )π n ( x) π ( x)π n ( x) [π ( x)] , γ n γn It is not clear how SMC can be related to this problem since Xn Xinstead of Xn Xn.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)9

What about if all distributions πn, n 1 are defined on X? This case can appear often, for example: Sequential Bayesian Estimation: Global Optimization:π n ( x) p ( x y1:n )π n ( x) [π ( x)] , ηn ηn Sampling from a fixed distribution: π n ( x) ( µ ( x) ) (π ( x) )where µ(x) is an easy to sample from distribution. Use asequenceηn1 ηnη1 1 η2 . η Final 0, i.e. π 1 ( x) µ ( x), π Final ( x) π ( x)Bayesian Scientific Computing, Spring 2013 (N. Zabaras)10

What about if all distributions πn, n 1 are defined on X? This case can appear often, for example: Rare Event Simulation:π ( A) 1: π n ( x) π n ( x)1E ( x), Normalizing Factor Z1 knownnSimulate with sequence : E1 X E2 . EFinal AThe required probability is then:Z Final π ( A) The Boolean Satisfiability Problem Computing Matrix Permanents Computing volumes in high dimensionsBayesian Scientific Computing, Spring 2013 (N. Zabaras)11

What about if all distributions πn, n 1 are defined on X?{ } Apply SMC to an augmented sequence of distributions π nn 1on X n :For example: n ( x , x ) dxπn1:n 11:n 1 π n ( xn ) π n ( x1:n 1 , xn ) π n ( xn ) π n ( x1:n 1 xn ) Use any conditionaldistribution on X n 1 C Jarzynski, Nonequilibrium Equality for Free Energy Differences, Phys. Rev. Lett. 78, 2690–2693 (1997)Gavin E. Crooks, Nonequilibrium Measurements of Free Energy Differences for Microscopically ReversibleMarkovian Systems, Journal of Statistical Physics March 1998, Volume 90, Issue 5-6, pp 1481-1487W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models,. JRSS B, 2001Neal, R. M. (2001) Annealed importance sampling'', Statistics and Computing, vol. 11, pp. 125-139N. Chopin, A sequential partile filter method for static problems, Biometrika, 2002, 89, 3, pp. 539-551.P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation, BayesianStatistics, 2006Bayesian Scientific Computing, Spring 2013 (N. Zabaras)12

SMC For Static Models Let {π n ( x )}n 1 be a sequence of probability distributions defined on Xs.t. each π n ( x ) is known up to a normalizing constant:π n ( x ) 1 Zn γ n ( x ) Unknown Known We are interested to sample from π n ( x ) and compute Znsequentially. This is not the same as the standard SMC discussed earlierπ n ( x1:n ) p ( x1:n y1:n ) which was defined on Xn.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)13

SMC For Static Models We will construct an artificial distribution that is the product of thetarget distribution from which we want to sample and a backwardkernel L as follows:π n ( x1:n ) Z n 1γ n ( x1:n ) , where :n 1γ n ( x1:n ) γ n ( xn ) Lk ( xk xk 1 ) T arg etk 1 BackwardTransitionssuch thatπ n ( xn ) π n ( x1:n ) dx1:n 1 The importance weights now become:n 1γ n ( x1:n )q ( x ) γ n ( x1:n ) Wn Wn 1 n 1 1:n 1 Wn 1qn ( x1:n )γ n 1 ( x1:n 1 ) qn ( x1:n ) Wn 1γ n ( xn ) Ln 1 ( xn 1 xn )γ n 1 ( xn 1 ) qn ( xn xn 1 )γ n ( xn ) Lk ( xk xk 1 )n 2k 1γ n 1 ( xn 1 ) Lk ( xk xk 1 )qn ( xn xn 1 )k 1Bayesian Scientific Computing, Spring 2013 (N. Zabaras)14

SMC For Static ModelsWn Wn 1γ n ( xn ) Ln 1 ( xn 1 xn )γ n 1 ( xn 1 ) qn ( xn xn 1 ) Any MCMC kernel can be used for the proposal q(. .). Since our interest is in computing onlyπ n ( xn ) 1 n ( x ) dxπγ n ( xn ) 1:n1:n 1 Zthere is no degeneracy problem.P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation,Bayesian Statistics, 2006Bayesian Scientific Computing, Spring 2013 (N. Zabaras)15

Algorithm: SMC For Static Problems(1) Initialize at time n 1:(2) At time n 2:(()(i )(i )(i )(i ) (i ) XqxXnXX Samplen 1:nnnn 1 , and augmentn 1 , X n Compute the importance weights(i )nW W(i )n 1( ) (( X ) q ( X(i )(i )(i )γn X n Ln 1 X n 1 Xn(i )n 1γ n 1n(i )nX n 1 (i ))))Then the weighted approximation isNπ n ( xn ) Wn(i )δ X We finally resample fromXi 1(i )n1π n ( xn ) N(i )n( xn ) π n ( xn ) to obtain:N δ ( x )i 1X n( i )nBayesian Scientific Computing, Spring 2013 (N. Zabaras)16

SMC For Static Models: Choice of L A default choice is first using a πn-invariant MCMC kernel qn and thenthe corresponding reversed kernel Ln-1:Ln 1 ( xn 1 xn ) π n ( xn 1 ) qn ( xn xn 1 )π n ( xn ) Using this easy choice, we can simplify the expression for the weights:Wnγ n ( xn ) Ln 1 ( xn 1 xn )γ n ( xn )π n ( xn 1 ) qn ( xn xn 1 )W W n 1n 1γ n 1 ( xn 1 ) qn ( xn xn 1 )γ n 1 ( xn 1 ) qn ( xn xn 1 )π n ( xn )Wn Wn 1γ n ( X n(i )1 )γ n 1 ( X n(i )1 ) This is known as annealed importance sampling. The particular choicehas been used in physics and statistics.W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models, JRSS B, 2001Bayesian Scientific Computing, Spring 2013 (N. Zabaras)17

ONLINE PARAMETERESTIMATIONBayesian Scientific Computing, Spring 2013 (N. Zabaras)18

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown staticparameter θ with some prior p(θ):X 1 µ (.) and X n ( X n 1 xn 1 ) fθ ( xn xn 1 )Yn ( X n xn ) gθ ( yn xn ) Given data y1:n, inference now is based on:p (θ , x1:n y1:n ) p (θ y1:n ) pθ ( x1:n y1:n ) ,wherep (θ y1:n ) pθ ( y1:n ) p (θ ) We can use standard SMC but on the extended space Zn (Xn,θn).f ( zn zn 1 ) δ gθ ( yn xn )θ n 1 (θ n ) fθ ( xn xn 1 ) , g ( yn zn ) Note that θ is a static parameter –does not involve with n.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)19

Online Bayesian Parameter Estimation For fixed θ, using our earlier error estimatesCn Var log pθ ( y1:n ) N In a Bayesian context, the problem is even more severe asp (θ y1:n ) pθ ( y1:n ) p (θ ) Exponential stability assumption cannot hold as θn θ1. To mitigate this problem, introduce MCMC steps on θ. C. Andrieu, N. De Freitas and A. Doucet, Sequential MCMC for Bayesian Model Selection, Proc. IEEEWorkshop HOS, 1999 P. Fearnhead, MCMC, sufficient statistics and particle filters, JCGS, 2002 W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models, JRSS B,2001Bayesian Scientific Computing, Spring 2013 (N. Zabaras)20

Online Bayesian Parameter Estimation When p (θ y1:n , x1:n ) p θ sn ( x1:n , y1:n ) FixedDimensional This becomes an elegant algorithm that however still has thedegeneracy problem since it uses p ( x1:n y1:n ) As dim(Zn) dim(Xn) dim(θ), such methods are not recommendedfor high-dimensional θ, especially with vague priors.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)21

Artificial Dynamics for θ A solution consists of perturbing the location of the particles {θ (i ) } in away that does not modify their distributions; i.e. if at time nθ n(i ) p (θ y1:n )then we would like a transition kernel such that ifθ n'(i ) θ n(i ) M n (θ n(i ) ,.)Then:θ n'(i ) p (θ y1:n ) In Markov chain language, we want M n (θ ,θ ') to be p (θ y1:n )invariant.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)22

Using MCMC There is a whole literature on the design of such kernels known asMarkov chain Monte Carlo e.g. the Metropolis-Hastings algorithm. We cannot use these algorithms directly as p (θ y1:n ) would need tobe known up to a normalizing constant but p ( y1:n θ ) pθ ( y1:n ) isunknown. However, we can use a very simple Gibbs update.θ n(i ) p (θ y1:n , X1:(in) ) Indeed note that ifwe have:(X(i )1:n(X)(), θ n(i ) p (θ , x1:n y1:n ) then if θ n'(i ) p θ y1:n , X1:(in) ,(i )1:n), θ n'( i ) p (θ , x1:n y1:n ) Indeed note that: p (θ , x1:n y1:n ) p (θ ' y1:n ) dθ p (θ ', x1:n y1:n )Bayesian Scientific Computing, Spring 2013 (N. Zabaras)23

SMC with MCMC for Parameter Estimation Given an approximation at time n-1: p (θ , x y ) 11:n 11:n 1N Sample X fθ(i )n(i )n 1(x p (θ , x y )1:n1:nn)N( δ (θi 1(i )(i )n 1 , X1:n 1(i )(i ) X n(i )1, set X 1:n X1:(in) 1 , XnN W δ (θi 1(i )n(i ) (i )n 1 , X 1:n)θ,x) ( 1:n 1 )) and then approximate:( x1:n ) , W(i )n gθ ( i )n 1((i ) yn X n) Resample X1:(in) p ( x1:n y1:n ) , then sample θ n(i ) p (θ y1:n , X1:(in) ) to obtain p (θ , x y ) 11:n1:nNN δ (θi 1(i )(i )n , X1:nθ,x) ( 1:n )Bayesian Scientific Computing, Spring 2013 (N. Zabaras)24

SMC with MCMC for Parameter Estimation Consider the following model:X θ X n σ vVn 1 , Vn N ( 0,1)n 1Y X n σ wWn , Wn N ( 0,1)nX 1 N ( 0, σ 02 ) We set the prior on θ as θ U ( 1,1) . In this case,p (θ x1:n , y1:n ) N (θ , mn , σ n2 ) ( 1,1) (θ ) m σ xx , σ nn 122nnk k 1n k 2 k 22x kBayesian Scientific Computing, Spring 2013 (N. Zabaras)25

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step. The degeneracy problem remains.SMC estimate is shown of [θ y1:n] as n increases (From A. Doucet,lecture notes). The parameter converges to the wrong value.SMC/MCMCTrue ValueBayesian Scientific Computing, Spring 2013 (N. Zabaras)26

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θaccording to p (θ x1:n , y1:n ) , we are still relying implicitly on theapproximation of the joint distribution p ( x1:n y1:n ) As n increases, the SMC approximation of p ( x1:n y1:n ) deterioratesand the algorithm cannot converge towards the right solution.1 n X k2 y1:n n k 1 Sufficient statisticscomputed exactly through the Kalman filter(blue) vs SMC approximation (red) for a fixed value of θ.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)27

Recursive MLE Parameter Estimation The log likelihood can be written as: pθ (Y1:n )ln (θ ) logn log pθ (Yk 1k Y1:k 1 ) Here we compute:pθ (Yk Y1:k 1 ) gθ (Yk xk ) pθ ( xk Y1:k 1 ) dxk{} Under regularity assumptions X n , Yn , pθ ( xn Y1:n 1 ) is aninhomogeneous Markov chain whish converges towards its invariantdistribution:ln (θ )θ)lim l( n n log gθ ( y x)µ (dx)λθ θ (dy, d µ ), *Bayesian Scientific Computing, Spring 2013 (N. Zabaras)28

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monroalgorithm:θ n θ n 1 γ n log pθ (Yn Y1:n 1 )1:n 1where: pθ (Yn Y1:n 1 ) gθ (Yn xn ) pθ ( xn Y1:n 1 ) dxn gθ (Y x ) pθ ( x Y ) dx We thus need to approximate { pθ ( x Y)}nnnn1:n 1n1:n 1Bayesian Scientific Computing, Spring 2013 (N. Zabaras)29

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity: pθ ( x1:n Y1:n 1 ) pθ (Yn Y1:n 1 ) pθ ( x1:n Y1:n 1 ) pθ ( x1:n Y1:n 1 )dx1:n 1 log pθ ( x1:n Y1:n 1 ) pθ ( x1:n Y1:n 1 )dx1:n 1 We thus can use a SMC approximation of the form:N1(i ) pθ ( xn Y1:n 1 ) α n δ X n( i ) ( x1:n )N i 1 log p ( X (i ) Y )α (i ) nθ1:n1:n 1 This is simple but inefficient as based on IS on spaces of increasingdimension and in addition it relies on being able to obtain a goodapproximation of pθ ( x1:n Y1:n 1 )Bayesian Scientific Computing, Spring 2013 (N. Zabaras)30

Degeneracy of the SMC Algorithm Score pθ (Y1:n ) for linear Gaussian models: exact (blue) vs SMCapproximation (red).Bayesian Scientific Computing, Spring 2013 (N. Zabaras)31

Degeneracy of the SMC Algorithm Signed measure θ p ( x n Y) 1:n ) for linear Gaussian models: exact(red) vs SMC (blue/green). Positive and negative particles are mixed.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)32

Marginal Importance Sampling for Sensitivity Estimation An alternative identity is the following: pθ ( xn Y1:n 1 ) log pθ ( xn Y1:n 1 ) pθ ( xn Y1:n 1 ) It suggests using an SMC approximation of the form:N1(i ) pθ ( xn Y1:n 1 ) β n δ X n( i ) ( x1:n )N i 1(i ) (pX (i )(i )θn Y1:n 1 ) βn log pθ ( X n Y1:n 1 ) p ( X ( i ) Y )1:n 1nθ Such an approximation relies on a pointwise approximation of boththe filter and its derivative. The computational complexity is O(N2) asN1(i )( j) p ( X (i ) Y ) f ( X (i ) x ) p ( x Y )dx (fX Xθ1:n 1nnn 1 )θ θ n n 1 θ n 1 1:n 1 n N j 1Bayesian Scientific Computing, Spring 2013 (N. Zabaras)33

Marginal Importance Sampling for Sensitivity Estimation The optimal filter satisfies:ξθ ( xn Y1:n )pθ ( xn Y1:n ) ξθ ( xn Y1:n )dxnξθ ( xn Y1:n ) gθ (Yn xn ) fθ ( xn xn 1 ) pθ ( xn 1 Y1:n 1 )dxn 1 The derivatives satisfy the following: pθ ( xn Y 1:n ) ξθ ( xn Y1:n ) ξθ ( xn Y1:n )dxn pθ ( xn Y1:n ξθ ( x Y )dx ) ξθ ( x ,Y )dx This way we obtain a simple recursion of pθ ( xnfunction of pθ ( xn 1 Y1:n 1 ) and pθ ( xn 1 Y1:n 1 ).nn1:n1:nnn Y1:n ) as aBayesian Scientific Computing, Spring 2013 (N. Zabaras)34

SMC Approximation of the Sensitivity SampleX(i )n qθ (. Yn ) Compute:( j)( j)( j)( j) ( j)ηq. Y,X,η WpY X( n n 1 ) n θ ( n n 1 ) nn 1Nj 1ξ θ ( X n(i ) ,Y1:n ) α ,ρ(i )qθ X n Yn(i )n((i )n) ξ θ ( X n(i ) ,Y1:n )(qθ X n(i ) Yn)N W(i )nα n(i )N α j 1 We have:( j)n, W β(i )n(i )nρ n(i )N( j)α n Wn(i ) ρj 1N( j)n( j)α n j 1 j 1N p ( x Y ) W (i )δ ( i ) ( x ) n X n1:nnθi 1nN(i ) (i ) pθ ( xn Y1:n ) W n β n δ X (i ) ( xn )i 1nBayesian Scientific Computing, Spring 2013 (N. Zabaras)35

Example: Linear Gaussian Model Consider the following model: X n φ X n 1 σ vVnY X n σ wWnn We haveθ {φ , σ v , σ w } We compare next the SMC estimate log pθ ( xn Y1:n ) for N 1000to its exact value given by the Kalman filter derivative (exact withcyan vs SMC with black). pθ (Y1:n )Bayesian Scientific Computing, Spring 2013 (N. Zabaras)36

Example: Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of themarginal (bottom)Bayesian Scientific Computing, Spring 2013 (N. Zabaras)37

Example: Stochastic Volatility Model Consider the following model: X n φ X n 1 σ vVn XnYn β exp 2 Wn We have θ {φ , σ v , β } . We use SMC with N 1000 for batch andon-line estimation. For Batch Estimation: p (Y )θ n θ n 1 γ n log1:nθn 1 For online estimation: p (Y Y )θ n θ n 1 γ n logn1:n 1θ1:n 1Bayesian Scientific Computing, Spring 2013 (N. Zabaras)38

Example: Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate Pound/Dollar81-85Bayesian Scientific Computing, Spring 2013 (N. Zabaras)39

Example: Stochastic Volatility Model Recursive parameter estimation for SV model. The estimatesconverge towards the true values.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)40

SMC with MCMC for Parameter Estimation Given data y1:n, inference relies onp (θ , x1:n y1:n ) p (θ y1:n ) pθ ( x1:n y1:n )wherep (θ y1:n ) pθ ( y1:n ) p (θ ) We have seen that SMC are rather inefficient to sample fromp (θ y1:n ) so we look here at an MCMC approach. For a given parameter value θ, SMC can estimate bothpθ ( x1:n y1:n ) and pθ ( y1:n ) It will be useful if we can use SMC within MCMC to sample fromp (θ , x1:n y1:n ) C. Andrieu, R. Holenstein and G.O. Roberts, Particle Markov chain Monte Carlo methods, J. R.Statist. Soc.B (2010) 72, Part 3, pp. 269–342Bayesian Scientific Computing, Spring 2013 (N. Zabaras)41

Gibbs Sampling Strategy Using a Gibbs Sampling, we can sample iteratively fromand p ( x1:n y1:n , θ ) However, it is impossible to sample from We can sample from p ( xkconvergence will be slow.p (θ x1:n , y1:n )p ( x1:n y1:n , θ ) y1:n , θ , x1:k 1 , xk 1:n ) instead but Alternatively, we would like to use Metropolis Hastings step tosample from p ( x1:n y1:n , θ ) , i.e. sample X 1:*n q ( x1:n y1:n )and accept with probability p ( X1:*n y1:n ,θ ) q ( X1:*n y1:n ) min 1,* p ( X1:n y1:n ,θ ) p ( X1:n y1:n ) Bayesian Scientific Computing, Spring 2013 (N. Zabaras)42

Gibbs Sampling Strategy We will use the output of an SMC method approximatingas a proposal distribution.p ( x1:n y1:n , θ )p ( x1:n y1:n , θ ) degrades an We know that the SMC approximation n increases but we also have under mixing assumptions:L aw( X1:(in) ) p ( x1:n y1:n ,θ ) TV CnNThus things degrade linearly rather than exponentially. These results are useful for small C.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)43

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method(CBMC) in Molecular simulation. There are however significantdifferences. CBMC looks like an SMC method but at each step only one particleis selected that gives N offsprings. Thus if you have done the wrongselection, then you cannot recover later on.D Frenkel, G C A M Mooij and B Smit, Novel scheme to study structural and thermal propertiesof continuously deformable molecules, I. Phys.: Condens. Matter 3 (1991) 3053-3076.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)44

MCMC We use as proposal distributionthe estimate p ( y θ )NX1:*n p N ( x1:n y1:n , θ ) and store1:n It is impossible to compute analytically the unconditional law of aparticle as it would require integrating out (N-1)n variables. The solution inspired by CBMC is as follows: For the current state ofthe Markov chain X 1:n run a virtual SMC method with only N-1 freeparticles. Ensure that at each time step k, the current state Xk isdeterministically selected and compute the associated estimate* p ( y θ ) . Finally accept X1:nwith probabilityN1:n p N ( y1:n θ ) min 1, p N ( y1:n θ ) Otherwise stay where you are.D Frenkel, G C A M Mooij and B Smit, Novel scheme to study structural and thermal propertiesof continuously deformable molecules, I. Phys.: Condens. Matter 3 (1991) 3053-3076.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)45

Example Consider the following model:X k 11Xk X k 1 25 8cos(1.2k ) Vk , Vk N ( 0,15 )221 X k 1X k2Yn Wk , Wk N ( 0, 0.01) , X 1 N ( 0,5 )2 We take the same prior proposal distribution for both CBMC andSMC. Acceptance rates for CBMC (left) and SMC (right) are shown.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)46

Example We now consider the case when both variances σ v2 , σ w2 areunknown and given inverse Gamma (conditionally conjugate) vaguepriors. We sample fromp ( x1:1000 , θ y1:1000 ) using two strategies The MCMC algorithm with N 1000 and the local EKF proposal asimportance distribution. Average acceptance rate 0.43. An algorithm updating the components one at a time using an MH stepof invariance distributionproposal.p ( xk xk 1 , xk 1 , yk ) with the same In both cases we update the parameters according top (θ x1:500 , y1:500 ) Both algorithms are run with the same computational effort.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)47

Example MH one at a time missed the mode and estimates σ v2 y1:500 13.7 ( MH ) σ v2 y1:500 9.9 ( PF MCMC )σ v2 10. If X1:n and θ are very correlated the MCMC algorithm is veryinefficient. We will next discuss the Marginal Metropolis Hastings.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH, the proposal distribution is a Markov Chain withkernel density 𝑞 𝑥, 𝑦 q(y x) and target distribution 𝜋(𝑥).Algorithm: Generic Metropolis Hastings Sampler Initialization: Choose an arbitrary starting value 𝑥 0 Iteration 𝑡 (𝑡 1)( t 1)1. Given 𝑥 𝑡 1 , generate x q ( x , x)2. Compute: π ( x ) / q ( x (t 1) , x ) ρ x (t 1) , x min 1, π x (t 1) / q ( x , xt 1 ) ()()3. With probability 𝜌(𝑥 𝑡 1 , 𝑥 ), accept 𝑥 and set 𝑥 𝑡 𝑥 ;Otherwise reject 𝑥 and set 𝑥 𝑡 𝑥 𝑡 1 . It can be easily shown thatunder weak assumptions:π ( x ') π ( x) K ( x, x ') dx and X(i )Transition Kernel π ( x) as i Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target:p (θ , x1:n y1:n ) p (θ y1:n ) pθ ( x1:n y1:n ) We use the following proposal distribution:(()) () (q x1:*n , θ * ( x1:n , θ ) q θ * θ pθ * x1:*n y1:n Then the acceptance probability becomes: min 1, min 1, (())p (θ * , x1:*n y1:n ) q ( x1:n , θ ) ( x1:*n , θ * ) **p (θ , x1:n y1:n ) q ( x1:n , θ ) ( x1:n , θ ) pθ * ( y1:n ) p (θ * ) q (θ θ *) *pθ ( y1:n ) p (θ ) q (θ θ ) We will use SMC approximations to computepθ ( y1:n ) , and pθ ( x1:n y1:n )Bayesian Scientific Computing, Spring 2013 (N. Zabaras))

Marginal Metropolis Hastings Algorithm Step 1:{}Given : θ (i 1), X1:n (i 1), pθ (i 1) ( y1:n )Sample : θ * q (θ θ (i 1) )Run an SMC to obtain : pθ * ( y1:n ) , pθ * ( x1:n y1:n ) Step 2:Sample : X1:*n pθ * ( x1:n y1:n ) p * ( y ) p (θ * ) q (θ (i 1) θ *) 1:nθ min1, Step 3: With probability* pθ (i 1) ( y1:n ) p (θ (i 1) ) q (θ θ (i 1) ) {} {}Set : θ (i ), X 1:n (i ), pθ ( i ) ( y1:n ) θ * , X 1:n* , pθ * ( y1:n ){} {}Otherwise : θ (i ), X 1:n (i ), pθ ( i ) ( y1:n ) θ (i 1), X 1:n (i 1), pθ (i 1) ( y1:n )Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1:n) was proposed as anapproximate MCMC algorithm to sample from p (θ y1:n) in(Fernandez-Villaverde & Rubio-Ramirez, 2007). When N 1, the algorithm admits exactly p (θ, x1:n y1:n) as invariantdistribution (Andrieu, D. & Holenstein, 2010). A particle version ofthe Gibbs sampler also exists. The higher N, the better the performance of the algorithm: N scalesroughly linearly with n. Useful when Xn is moderate dimensional & θ high dimensional.Admits the plug and play property (Ionides et al., 2006). Jesus Fernandez-Villaverde & Juan F. Rubio-Ramirez, 2007. "On the solution of the growth modelwith investment-specific technological change," Applied Economics Letters, 14(8), pages 549-553.C Andrieu, A Doucet, R Holenstein, Particle Markov chain Monte Carlo methods, Journal of the RoyalStatistical Society: Series B (Statistical Methodology) Volume 72, Issue 3, pages 269–342, June 2010IONIDES, E. L., BRETO, C. AND KING, A. A. (2006). Inference for nonlinear dynamicalsystems. Proceedings of the Nat Acad of Sciences 103 18438-18443. doi. Supporting onlinematerial. Pdf and supporting text.Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

OPEN PROBLEMSBayesian Scientific Computing, Spring 2013 (N. Zabaras)53

Open Problems Thus SMC can be successfully used within MCMC The major advantage of SMC is that it builds automatically efficientvery high dimensional proposal distributions based only on lowdimensional proposals. This is computationally expensive but it is very helpful in challengingproblems. Several open problems remain: Developing efficient SMC methods when you have E R10000? Developing a proper SMC method for recursive Bayesian parameterestimation. Developing methods to avoid O(N2) for smoothing and sensitivity. Developing methods for solving optimal control problems. Establishing weaker assumptions ensuring uniform convergence results.Bayesian Scientif

J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems , JASA, 1998 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources Pierre Del Moral, Feynman-Kac

Related Documents:

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

Fourier Analysis of Correlated Monte Carlo Importance Sampling Gurprit Singh Kartic Subr David Coeurjolly Victor Ostromoukhov Wojciech Jarosz. 2 Monte Carlo Integration!3 Monte Carlo Integration f( x) I Z 1 0 f( x)d x!4 Monte Carlo Estimator f( x) I N 1 N XN k 1 f( x k) p( x

vi Equity Valuation 5.3 Reconciling operating income to FCFF 66 5.4 The financial value driver approach 71 5.5 Fundamental enterprise value and market value 76 5.6 Baidu’s share price performance 2005–2007 79 6 Monte Carlo FCFF Models 85 6.1 Monte Carlo simulation: the idea 85 6.2 Monte Carlo simulation with @Risk 88 6.2.1 Monte Carlo simulation with one stochastic variable 88

Electron Beam Treatment Planning C-MCharlie Ma, Ph.D. Dept. of Radiation Oncology Fox Chase Cancer Center Philadelphia, PA 19111 Outline Current status of electron Monte Carlo Implementation of Monte Carlo for electron beam treatment planning dose calculations Application of Monte Carlo in conventi

Computational Geometry Aspects of Monte Carlo Approaches to PDE Problems in Biology, Chemistry, and Materials Monte Carlo Methods for PDEs A Little History on Monte Carlo Methods for PDEs Early History of MCMs for PDEs 1.Courant, Friedrichs, and Lewy: Their pivotal 1928 paper has probabilistic interpretations and MC algorithms for linear elliptic

2 John plans a day at the park with his daughter John and his 7-year-old daughter, Emma, are spending the day together. In the morning, John uses his computer to look up the weather, read the news, and check a