Introduction To Markov Chain Monte Carlo - Cornell University

7m ago
16 Views
1 Downloads
595.96 KB
23 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Victor Nelms
Transcription

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution – to estimate the distribution – to compute max, mean Markov Chain Monte Carlo: sampling using “local” information – Generic “problem solving technique” – decision/optimization/value problems – generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods; AMS 2002

Lecture Outline Markov Chains notation & terminology – Sampling from prob. distributions using MCMC – uniform – desired target distribution Problem solving using MCMC – fundamental properties of Markov Chains optimization Relevance to Bayesian Networks

Markov Chains Notation & Terminology Countable (finite) state space Ω (e.g. N) Sequence of random variables {Xt} on Ω for t 0,1,2,. Definition: {Xt } is a Markov Chain if P[Xt 1 y Xt xt ,.,X0 x0 ] P[Xt 1 y Xt xt ] Notation: P[Xt 1 i Xt j ] pji – time-homogeneous

Markov Chains Examples Markov Chain – Drawing a number from {1,2,3} with replacement. Xt last number seen at time t NOT a Markov Chain – Drawing a number from {1,2,3} WITHOUT replacement. Xt last number seen at time t

Markov Chains Notation & Terminology Let P (pij) – transition probability matrix – Let t(j) P[Xt j] – dimension Ω x Ω 0 – initial probability distribution Then t t(j) i t-1(i)pij ( t-1P)(j) ( oP )(j) Example: graph vs. matrix representation

Markov Chains Fundamental Properties Theorem: – Under some conditions (irreducibility and aperiodicity), the limit limt Ptij exists and is independent of i; call it (j). If Ω is finite, then j (j) 1 and ( P)(j) (j) and such is a unique solution to xP x ( is called a stationary distribution) DEMO Nice: no matter where we start, after some time, we will be in any state j with probability (j)

Markov Chains Fundamental Properties Proposition: – Assume a Markov Chain with discrete state space Ω. Assume there exist positive distribution on Ω ( (i) 0 and i (i) 1) and for every i,j: (i)pij (j)pji (detailed balance property) then is the stationary distribution of P Corollary: – DEMO If transition matrix P is symmetric and Ω finite, then the stationary distribution is (i) 1/ Ω

Markov Chain Monte Carlo Random Walk on {0,1}m – Ω {0,1}m – generate chain: pick J {1,.,m} uniformly at random and set Xt (z1,.,1-zJ ,.,zm) where (z1,.,zm) Xt-1 Markov Chain Monte Carlo basic idea: – Given a prob. distribution on a set Ω, the problem is to generate random elements of Ω with distribution . MCMC does that by constructing a Markov Chain with stationary distribution and simulating the chain.

MCMC: Uniform Sampler Problem: sample elements uniformly at random from set (large but finite) Ω Idea: construct an irreducible symmetric Markov Chain with states Ω and run it for sufficient time – by Theorem and Corollary, this will work Example: generate uniformly at random a feasible solution to the Knapsack Problem

MCMC: Uniform Sampler Example Knapsack Problem Definition – Given: m items and their weight wi and value vi, knapsack with weight limit b – Find: what is the most valuable subset of items that will fit into the knapsack? Representation: – z (z1,.,zm) {0,1}m, zi means whether we take item i – feasible solutions Ω { z {0,1}m ; iwi zi b} – problem: maximize ivi zi subject to z Ω

MCMC Example: Knapsack Problem Uniform sampling using MCMC: given current Xt (z1,.,zm), generate Xt 1 by: (1) choose J {1,.,m} uniformly at random (2) flip zJ, i.e. let y (z1,.,1-zJ ,.,zm) (3) if y is feasible, then set Xt 1 y, else set Xt 1 Xt Comments: – Pij is symmetric uniform sampling – how long should we run it? – can we use this to find a “good” solution?

MCMC Example: Knapsack Problem Can we use MCMC to find good solution? – Yes: keep generating feasible solutions uniformly at random and remember the best one seen so far. – this may take very long time, if number of good solutions is small Better: generate “good” solutions with higher probability sample from a distribution where “good” solutions have higher probabilities (z) C -1exp( ivi zi )

MCMC: Target Distribution Sampler Let Ω be a countable (finite) state space Let Q be a symmetric transition prob. matrix Let be any prob. distribution on Ω s.t. (i) 0 – the target distribution we can define a new Markov Chain {Xi } such that its stationary distribution is – this allows to sample from Ω according to

MCMC: Metropolis Algorithm Given such Ω, ,Q creates a new MC {Xt }: (1) choose “proposal” y randomly using Q P[Y j Xt i ] qij (2) let min{1, (Y)/ (i)} (acceptance probability) (3) accept y with probability , i.e. Xt 1 Y with prob. , Xt 1 Xt otherwise Resulting pij: pij qijmin{1, (j)/ (i)} for i j pii 1 - j i pij

MCMC: Metropolis Algorithm Proposition (Metropolis works): – The pij's from Metropolis Algorithm satisfy detailed balance property w.r.t i.e. (i)pij (j)pji the new Markov Chain has a stationary distr. Remarks: – we only need to know ratios of values of – the MC might converge to exponentially slowly

MCMC: Metropolis Algorithm Knapsack Problem Target distribution: (z) Cb-1exp( b ivi zi ) Algorithm: (1) choose J {1,.,m} uniformly at random (2) let y (z1,.,1-zJ ,.,zm) (3) if y is not feasible, then Xt 1 Xt (4) if y is feasible, set Xt 1 y with prob. , else Xt 1 Xt where min{1, exp( b ivi (yi-zi)}

MCMC: Optimization Metropolis Algorithm theoretically works, but: – needs large b to make “good” states more likely – its convergence time may be exponential in b try changing b over time Simulated Annealing – for Knapsack Problem: min{1, exp( b(t) ivi (yi-zi)} – b(t) increases slowly with time (e.g. log(t), (1.001)t )

MCMC: Simulated Annealing General optimization problem: maximize function G(z) on all feasible solutions Ω – let Q be again symmetric transition prob. matrix on Ω Simulated Annealing is Metropolis Algorithm with pij qijmin{1, exp( b(t) [G(j)-G(i)]) } for i j pii 1 - j i pij Effect of b(t): exploration vs. exploitation trade-off

MCMC: Gibbs Sampling Consider a factored state space – z Ω is a vector z (z1 ,.,zm ) – notation: z-i (z1 ,.,zi-1 ,zi 1 ,. ,zm ) Assume that target is s.t. P[Zi z-i ] is known Algorithm: (1) pick a component i {1,.,m} (2) sample value of zi from P[Zi z-i ], set Xt (z1 ,.,zm ) A special case of generalized Metropolis Sampling (Metropolis-Hastings)

MCMC: Relevance to Bayesian Networks In Bayesian Networks, we know P[Zi z-i ] P[Zi MarkovBlanket(Zi )] BN Inference Problem: compute P[Zi zi E e] – Possible solution: (1) sample from worlds according to P[Z z E e] (2) compute fraction of those worlds where Zi zi – Gibbs Sampler works: let (z) P[Z z E e], then P[Zi z-i ] satisfies detailed balance property w.r.t (z) (z) is stationary

MCMC: Inference in BN Example P[H S true, B true]

MCMC: Inference in BN Example p(h,l) ( h,l) ? h,l h,l h, l h, l Smoking and Breathing difficulties are fixed

MCMC: Inference in BN Example P[zi MB(Zi )] P[zi Par(Zi )] Y Chld(Z)P[y Par(Y)] p(h,l) ( h,l) P[h gets picked].P[ h MB(H)] ½.P[ h l,s,b] ½.αP[ h s].P[b h,l]

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

Related Documents:

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

Lecture 2: Markov Decision Processes Markov Decision Processes MDP Markov Decision Process A Markov decision process (MDP) is a Markov reward process with decisions. It is an environment in which all states are Markov. De nition A Markov Decision Process is a tuple hS;A;P;R; i Sis a nite set of states Ais a nite set of actions

2.2 Markov chain Monte Carlo Markov Chain Monte Carlo (MCMC) is a collection of methods to generate pseudorandom numbers via Markov Chains. MCMC works constructing a Markov chain which steady-state is the distribution of interest. Random Walks Markov are closely attached to MCMC. Indeed, t

cipher ·Markov chain Monte Carlo algorithm 1 Introduction Cryptography (e.g. Schneier 1996) is the study of algorithms to encrypt and decrypt messages between senders and re-ceivers. And, Markov chain Monte Carlo (MCMC) algo-rithms (e.g. Tierney 1994; Gilks et al. 1996; Roberts and

Markov Chain Monte Carlo method is used to sample from complicated mul-tivariate distribution with normalizing constants that may not be computable and from which direct sampling is not feasible. Recent years have seen the development of a new, exciting generation of Markov Chain Monte Carlo method: perfect simulation algorithms.

MCMC Revolution P. Diaconis (2009), \The Markov chain Monte Carlo revolution":.asking about applications of Markov chain Monte Carlo (MCMC) is a little like asking about applications of the quadratic formula. you can take any area of science, from hard to social, and nd a burgeoning MCMC literature speci cally tailored to that area.

sampling, etc. The most popular method for high-dimensional problems is Markov chain Monte Carlo (MCMC). (In a survey by SIAM News1, MCMC was placed in the top 10 most important algorithms of the 20th century.) 2 Metropolis Hastings (MH) algorithm In MCMC, we construct a Markov chain on X whose stationary distribution is the target density π(x).

Electromagnetics and Applications - MIT OpenCourseWare . Preface - ix -