Markov Decision Process I - Cse.msu.edu

1y ago

14 Views

3 Downloads

1.60 MB

20 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Tia Newell

Report this link

Download PDF

Transcription

Markov Decision Process IHui Liuliuhui7@msu.eduTime & Place. Tu/Th 10:20-11:40 & ZoomAck: Berkeley AI courseCSE-440 Spring 2022

Outline Non-Deterministic Search Markov Decision ProcessCSE-440 Spring 2022

Example: Grid World A maze-like problem Noisy movement: actions do not always go asplanned 80% of the time, the action North takes the agentNorth (if there is no wall there)10% of the time, North takes the agent West; 10% EastIf there is a wall in the direction the agent would havebeen taken, the agent stays putThe agent receives rewards The agent lives in a gridWalls block the agent’s pathSmall “living” reward each step (can be negative)Big rewards come at the end (good or bad)Goal: maximize sum of rewardsCSE-440 Spring 2022

Grid World ActionsDeterministic Grid WorldStochastic Grid WorldCSE-440 Spring 2022

Outline Non-Deterministic Search Markov Decision ProcessCSE-440 Spring 2022

Markov Decision Processes An MDP is defined by: A set of states s Î S A set of actions a Î A A transition function T(s, a, s’) Probability that a from s leads to s’, i.e., P(s’ s, a) Also called the model or the dynamics A reward function R(s, a, s’) Sometimes just R(s) or R(s’) A start state Maybe a terminal state MDPs are non-deterministic search problemsCSE-440 Spring 2022

What is Markov about MDPs? “Markov” generally means that given the present state, the futureand the past are independent For Markov decision processes, “Markov” means action outcomesdepend only on the current stateAndrey Markov(1856-1922) This is just like search, where the successor function could onlydepend on the current state (not the history)CSE-440 Spring 2022

Policies In deterministic single-agent search problems,we wanted an optimal plan, or sequence ofactions, from start to a goal For MDPs, we want an optimalpolicy p*: S A A policy p gives an action for each state An optimal policy is one that maximizes expectedutility if followed An explicit policy defines a reflex agentCSE-440 Spring 2022Optimal policy when R(s, a, s’) -0.03for all non-terminals s

Optimal PoliciesR(s) -0.01R(s) -0.4R(s) -0.03CSE-440 Spring 2022R(s) -2.0

Example: Racing A robot car wants to travel far, quickly Three states: Cool, Warm, Overheated Two actions: Slow, Fast0.5 Going faster gets double reward 1 1FastSlow1.0-100.5WarmSlowFast1.0 1Cool0.5 20.5Overheated 2CSE-440 Spring 2022

Racing Search Tree.5.5CSE-440 Spring 2022

MDP Search Trees Each MDP state projects an expectimax-like search trees is a statesa(s, a) is a qstates, a(s,a,s’) called a transitionT(s,a,s’) P(s’ s,a)s,a,sʼR(s,a,s’)sʼCSE-440 Spring 2022

Utilities of Sequences What preferences should an agent have over reward sequences? More or less?[1, 2, 2]or[2, 3, 4] Now or later?[0, 0, 1]or[1, 0, 0]CSE-440 Spring 2022

Discounting It’s reasonable to maximize the sum of rewards It’s also reasonable to prefer rewards now to rewards later One solution: values of rewards decay exponentiallyWorth NowWorth Next StepCSE-440 Spring 2022Worth In Two Steps

Discounting How to discount? Each time we descend a level, wemultiply in the discount once Why discount? Reward now is better than later Also helps our algorithms converge Example: discount of 0.5 U([1,2,3]) 1*1 0.5*2 0.25*3 U([1,2,3]) U([3,2,1])CSE-440 Spring 2022

Stationary Preferences Theorem: if we assume stationary preferences: Then: there are only two ways to define utilities Additive utility: Discounted utility:CSE-440 Spring 2022

Infinite Utilities?!§ Problem: What if the game lasts forever? Do we get infinite rewards?§ Solutions:§ Finite horizon: (similar to depth-limited search)§ Terminate episodes after a fixed T steps (e.g. life)§ Gives nonstationary policies (p depends on time left)§ Discounting: use 0 g 1§ Smaller g means smaller “horizon” – shorter term focus§ Absorbing state: guarantee that for every policy, a terminal state will eventuallybe reached (like “overheated” for racing)CSE-440 Spring 2022

Example: Discounting Given: Actions: East, West, and Exit (only available in exit states a, e) Transitions: deterministic Q1: For g 1, what is the optimal policy? - - Q2: For g 0.1, what is the optimal policy? - - Q3: For which g are West and East equally good when in state d?1g 10 g3CSE-440 Spring 2022 --

Recap: Defining MDPs Markov decision processes: sSet of states SStart state s0Set of actions ATransitions P(s’ s,a) (or T(s,a,s’))Rewards R(s,a,s’) (and discount g)as, as,a,sʼsʼ MDP quantities so far: Policy Choice of action for each state Utility sum of (discounted) rewardsCSE-440 Spring 2022

Important This Week Homework 2 is released on Mimir, due next Monday.Questions?CSE-440 Spring 2022

Markov Decision Process I Hui Liu liuhui7@msu.edu Time & Place. Tu/Th 10:20-11:40 & Zoom CSE-440 Spring 2022 Ack: Berkeley AI course . Outline . Markov Decision Processes AnMDPisdefinedby: A set of states s ÎS A set of actions a ÎA A transition function T(s, a, s')

Related Documents:

Lecture 2: Markov Decision Processes - Stanford University

Lecture 2: Markov Decision Processes Markov Decision Processes MDP Markov Decision Process A Markov decision process (MDP) is a Markov reward process with decisions. It is an environment in which all states are Markov. De nition A Markov Decision Process is a tuple hS;A;P;R; i Sis a nite set of states Ais a nite set of actions

18 Views

1y ago

TO THE NEW- 2020 Passing Out Batch (Only for Unplaced ...

92 vipul sharma it 93 rishabh jain cse 94 manik arora cse 95 nishant bhardwaj cse . 96 rajit shrivastava it 97 shivansh gaur cse 98 harsh singh cse 99 shreyanshi raj cse 100 rahul bedi cse 101 pallavi anand cse 102 divya cse 103 nihal raj it 104 kanak

106 Views

2y ago

ROLL NO NAME FATHER NAME CSE-001 AAISHA MAKKAR …

cse-148 kuriakose jijo george n t george cse-149 kusum joshi ramesh chandra joshi cse-150 m mithun bose n k mohandasan cse-151 madhuri yadav rajbir yadav cse-152 malini shukla r s sharma cse-153 manisha khattar sunil kumar khattar cse-154 m

55 Views

2y ago

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith ...

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith Israel, Atlanta Georgia Wow ׳ג ׳א:׳א תישארב (א) ׃ץרֶָֽאָּהָּ תאֵֵ֥וְּ םִימִַׁ֖שַָּה תאֵֵ֥ םיקִִ֑לֹאֱ ארָָּ֣ Îָּ תישִִׁ֖ארֵ Îְּ(ב) חַורְָּ֣ו ם

444 Views

2y ago

The Markov Chain Monte Carlo Revolution

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

27 Views

2y ago

Markov Decision Processes - Johns Hopkins University

Markov Decision Processes Philipp Koehn 3 November 2015 Philipp Koehn Artiﬁcial Intelligence: Markov Decision Processes 3 November 2015. Outline 1 Hidden Markov models Inference: ﬁltering, smoothing, best sequence Kalman ﬁlters (a brief mention) Dynamic Bayesian networks

14 Views

1y ago

Using Multiple Imputation to Simulate Time Series: A ...

2.2 Markov chain Monte Carlo Markov Chain Monte Carlo (MCMC) is a collection of methods to generate pseudorandom numbers via Markov Chains. MCMC works constructing a Markov chain which steady-state is the distribution of interest. Random Walks Markov are closely attached to MCMC. Indeed, t

18 Views

2y ago

Sector shutdowns during the coronavirus crisis: which ...

Sector shutdowns during the coronavirus crisis: which workers are most exposed? Authors: Robert Joyce (IFS) and Xiaowei Xu (IFS) Summary The lockdown in response to the Covid-19 pandemic has effectively shut down a number of sectors. Restaurants, shops and leisure facilities have been ordered to close, air travel has halted, and public transport has been greatly reduced. Our analysis shows .

27 Views

3y ago

Recent Views

Saint Robert Bellarmine - WordPress

Aug 08, 2018 · Sister Laura Gorman Sister Anna Frances Portisch Sister Mary Edward Haren Sister Dolores Priske (Helen Julie) Sister Scholastica Healy Sister olette Marie Quinn Sister lara . S. Heidelman Sister Alice Mary Reilly Sister Genevieve Henneberry (Fidelis) Sister Genevieve Rigney

2y ago

160 Views

Sunday, September 12, 2021 10:00 a.m.

Sep 12, 2021 · On our 154th Church Anniversary, We salute the members of Mount Pleasant Baptist Church who have served for 50 years or more. Sister Brenda Bradley Sister Mary Lockett Sister Aaronita Brown Sister June Marshall Deacon Carlton Brown Sister Barbara Moore Sister Gwendolyn Brown Sister Frances Robinson Deaconess Josephine Byrd Sister Frances Ross

2y ago

344 Views

MRS Title 21-A. ELECTIONS - Maine Legislature

stepgrandchild, stepsister, stepbrother, mother-in-law, father-in-law, brother-in-law, sister-in-law, son-in-law, daughter-in-law, guardian, former guardian, domestic partner, the half-brother or half-sister of a person's spouse, or the spouse of a person's half-brother or half-sister. [PL 2009, c. 253, §2 (AMD).] 21. Incoming voting list.

1y ago

118 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Immaculata, Pennsylvania 19345-0200 Catholic Schools

Fall, 2012 Cover Sister Monica Therese Sicilia, I.H.M. IHM Best Practices Sister Margaret Rose Adams, I.H.M For Teachers: Sister Adrienne Saybolt, I.H.M. “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Rita James Murphy, I.H.M. Good Writer

2y ago

117 Views

Winter 2012 - IHM EDUCATIONAL RESOURCES - Home

IHM Best Practices Sister Margaret Rose Adams, IHM For Teachers: Sister Adrienne Saybolt, IHM “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Elaine deChantal Brookes, IHM Sister

2y ago

138 Views

Tributes in Honor of: SISTER JANET AHLER, CSA CSA SISTERS .

Everett & Jeannine Solon SISTER CORINNE HEIMANN, CSA St Mary's Hospital Board of Directors Teresa Hebble John & Mary Sterba SISTER MARY VERONICA HEIMANN, CSA Sybil Teehan Teresa Hebble Rebecca & Gary Tirevold MR EDWARD HELSTOSKY Bonnie Young Barbara Britz SISTER JOELLEN FLYNN, CSA RAY HINZ Susan Flynn Carol Hinz Fran Frigo JEAN W HOFF

2y ago

341 Views

How to Use These “Snippets” and Poems

For Sale By Shel Silverstein One sister for sale! One sister for sale! One crying and spying young sister for sale! I’m really not kidding, So who’ll start the bidding? Do I hear the dollar? A nickel? A penny? Oh, isn’t there, isn’t there, isn’t there any One kid that will buy this old sister for sale,

2y ago

367 Views

CODIS2006 - Mixture Interpretation - Butler FINAL

“Things we do not do: Calculate mixture ratios for casework – Calculation used for this study: Find loci with 4 alleles (2 sets of sister alleles). Make sure sister alleles fall within 70%, then take the ratio of one allele from one sister set to one allele of the second sister set, figure ratios for all combinations and average.

2y ago

315 Views

CONSECRATA

Salesian Sisters of St. John Bosco Sister Marie Amata D’Amico, C.K. School Sisters of Christ the King Sister Mary Stephany Rose, O.S.H.J. Oblate Sisters of the Sacred Heart of Jesus Sister Brigid Mary Meeks, R.S.M. Religious Sisters of Mercy of Alma Sister Hae-Jin Lim, F.M.A. Salesian Sisters of

1y ago

111 Views

Sister Makes House Calls During the Pandemic

Sister Patricia Deckert, RSM. As an elementary school teacher, Sister Patricia (Pat) taught in the Trenton, Metuchen and Camden dioceses in New Jersey, serving eight years at Cathedral School in Trenton, and seven years at St. James School in Red Bank. Attending nursing school at the age of 50, Sister Pat first ministered at McAuley Hall

10m ago

86 Views

Markov Decision Process I - Cse.msu.edu

It looks like you're using an ad-blocker