Markov Decision Processes{ Solution - Idm-lab

1y ago

7 Views

2 Downloads

695.40 KB

8 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Raelyn Goode

Report this link

Download PDF

Transcription

Markov Decision Processes– Solution1) Invent a simple Markov decision process (MDP) with the following properties:a) it has a goal state, b) its immediate action costs are all positive, c) all of itsactions can result with some probability in the start state, and d) the optimalpolicy without discounting differs from the optimal policy with discounting anda discount factor of 0.9. Prove d) using value 91000.9990.001S3With no discount factor, action a1 is preferred over action a2 in state s1:iState1 a1a2State2 0.With discount factor 0.9, action a2 is preferred over action a1 in state s1:iState1 a1a2State2 211.090047610.2) Consider the following problem (with thanks to V. Conitzer): Consider a roverthat operates on a slope and uses solar panels to recharge. It can be in one ofthree states: high, medium and low on the slope. If it spins its wheels, it climbsthe slope in each time step (from low to medium or from medium to high) orstays high. If it does not spin its wheels, it slides down the slope in each timestep (from high to medium or from medium to low) or stays low. Spinning itswheels uses one unit of energy per time step. Being high or medium on theslope gains three units of energy per time step via the solar panels, while being

low on the slope does not gain any energy per time step. The robot wants togain as much energy as possible.a) Draw the MDP graphically. b) Solve the MDP using value iteration with adiscount factor of 0.8. c) Describe the optimal policy.Answer:spinLdon‘t spin0-1spindon‘t spin322spinMdon‘t spin3Hwhere L low, M medium and H high.Starting with 0 as initial values, value iteration calculates the 20*12.20*10.0010.0012.2012.2012.76*12.76*At each iteration, the value of a state is the value of the maximizing actionin that state (since we are trying to maximize energy) and is marked with anasterisk. For instance, in iteration 4, the value of L, v4 (L), is computed asfollows:v4 (L, spin) 0.8 v3 (M ) 1 0.8 6.32 1 4.06v4 (L, don0 t) 0.8 v3 (L) 0 0.8 2.52 2.02v4 (L) max(v4 (L, spin), v4 (L, don0 t)) 4.06The optimal policy is to spin when the rover is low or medium on the slope andnot to spin when it is high on the slope.

Now answer the three questions above for the following variant of the robotproblem: If it spins its wheels, it climbs the slope in each time step (from lowto medium or from medium to high) or stays high, all with probability 0.3. Itstays where it is with probability 0.7. If it does not spin its wheels, it slidesdown the slope to low with probability 0.4 and stays where it is with probability0.6. Everything else remains unchanged from the previous problem.Answer:0.70.7Ldon‘t spin-1spin0.400.3don‘t spin30.6M2spinspin20.3don‘t spin3H0.60.4Starting with 0 as initial values, value iteration calculates the In this variant, in iteration 4, the value of L, v4 (L), is computed as follows:v4 (L, spin) 0.8 (0.3 v3 (M ) 0.7 v3 (L)) 1 0.8 (0.3 5.55 0.7 0.07) 1 0.37v4 (L, don0 t) 0.8 v3 (L) 0 0.8 0.07 0.056v4 (L) max(v4 (L, spin), v4 (L, don0 t)) 0.37The optimal policy is to spin, wherever the rover is on the slope.

3) Consider the following problem (with thanks to V. Conitzer): Consider a roverthat operates on a slope. It can be in one of four states: top, high, medium andlow on the slope. If it spins its wheels slowly, it climbs the slope in each timestep (from low to medium or from medium to high or from high to top) withprobability 0.3. It slides down the slope to low with probability 0.7. If it spinsits wheels rapidly, it climbs the slope in each time step (from low to medium orfrom medium to high or from high to top) with probability 0.5. It slides downthe slope to low with probability 0.5.Spinning its wheels slowly uses one unit of energy per time step. Spinning itswheels rapidly uses two units of energy per time step. The rover is low on theslope and aims to reach the top with minimum expected energy consumptions.a) Draw the MDP graphically. b) Solve the MDP using undiscounted valueiteration (that is, value iteration with a discount factor of 1). c) Describe theoptimal 0.5(1)slowly0.3(2)rapidly0.50.5H(1)slowly 0.3(2)rapidly 0.5T0.5where L low, M medium H high, and T top.Starting with 0 as initial values, value iteration calculates the *5.39*5.84*25.33*25.6723.1322.00*18.7314.67*

7*14.67*At each iteration, the value of a state is the value of the minimizing action inthat state (since we are trying to minimize cost). For instance, in iteration 4,the value of L, v4 (L), is computed as follows:v4 (L, slowly) 0.3 v3 (M ) 0.7 v3 (L) 1 ' 3.97v4 (L, rapidly) 0.5 v3 (M ) 0.5 v3 (L) 2 ' 4.96v4 (L) min(v4 (L, slowly), v4 (L, rapidly)) 3.97The optimal policy is to spin slowly in the low state and to spin rapidly in theother states.Here’s a sample C code for this value iteration.#include s t d i o . h #define min ( x , y ) ( ( ( x) (y ) ? ( x ) : ( y ) ) )main ( ){int i t e r a t i o n 0 ;float l 0 . 0 ;float m 0 . 0 ;float h 0 . 0 ;float t 0 . 0 ;f l o a t l s l o w , l r a p i d , m slow , m rapid , h slow , h r a p i d ;while ( 1 ){l s l o w 1 0 . 7 l 0 . 3 m;l r a p i d 2 0 . 5 l 0 . 5 m;m slow 1 0 . 7 l 0 . 3 h ;m rapid 2 0 . 5 l 0 . 5 h ;h slow 1 0.7 l 0.3 t ;h rapid 2 0.5 l 0.5 t ;p r i n t f ( ”%d : ” , i t e r a t i o n ) ;p r i n t f ( ”%.2 f %.2 f ” , l s l o w , l r a p i d ) ;p r i n t f ( ”%.2 f %.2 f ” , m slow , m rapid ) ;p r i n t f ( ”%.2 f %.2 f \n” , h slow , h r a p i d ) ;l min ( l s l o w , l r a p i d ) ;m min ( m slow , m rapid ) ;h min ( h slow , h r a p i d ) ;}}

4) You won the lottery and they will pay you one million dollars each year for 20years (starting this year). If the interest rate is 5 percent, how much money doyou need to get right away to be indifferent between this amount of money andthe annuity?Answer:A million dollars we get right away is worth a million dollars to us now. Amillion dollars we get in a year from now is worth γ 1/(1 0.05) milliondollars to us now because, with interest, it would be (1/1.05) 1.05 1 milliondollars in a year. Similarly, a million dollars we get in 19 years from now (inthe beginning of the 20th year) is worth only (1/1.05)19 0.4 million dollars tous now. Therefore, getting paid a million dollars each year for 20 years is worth1 γ γ 2 · · · γ 19 (1 γ 20 )/(1 γ) 0.623/0.0476 13.08 million dollarsto us now.5) Assume that you are trying to pick up a block from the table. You drop itaccidentally with probability 0.7 while trying to pick it up. If this happens, youtry again to pick it up. How many attempts does it take on average before youpick up the block successfully?Answer:We can model this problem as an MDP as follows:0.7on1table pick up0.3pickedupNow, we can calculate the expected cost with X 1 0.7X 0.3Y and Y 0(where X on table and Y picked up), which gives us X 1/0.3 3.33.The Markov assumption is the assumption that, whenever one executes anaction in a state, the probability distribution over its outcomes is the same, nomatter how one got into the state. Is this assumption realistic here?No. If the pickup operation does not work the first time (e.g. because the blockis too slippery), it will likely not work the next time either.6) Assume that you use undiscounted value iteration (that is, value iteration witha discount factor of 1) for a Markov decision process with goal states, wherethe action costs are greater than or equal to zero. Give a simple example thatshows that the values that value iteration converges to can depend on the initialvalues of the states, in other words, the values that value iteration converges toare not necessarily equal to the expected goal distances of the states.Answer:

0Sa11a2GConsider the initial values v 0 (S) 0 and v 0 (G) 0. Value iteration determines the values after convergence to be v (S) 0 and v (G) 0, yet the (expected) goal distance of S is 1, not 0. Now consider the initial values v 0 (S) 2and v 0 (G) 0. Value iteration determines the values after convergence to bev (S) 1 and v (G) 0.7) An MDP with a single goal state (S3) is given below. a) Given the expectedgoal distances c(S1) 7, c(S2) 4.2, and c(S3) 0, calculate the optimalpolicy. b) Suppose that we want to follow a policy where we pick action a2 instate S1 and action a3 in state S2. Calculate the expected goal distances of S1and S2 for this policy.Answer:a) We use c(s, a) to denote the expected cost of reaching a goal state if onestarts in state s, executes action a and then acts according to the policy. SinceS3 is a goal state and S2 has only one available action, we only need to calculatec(S1, a1) and c(S1, a2) in order to decide whether to execute a1 or a2 at S1.c(S1, a1) 0.25(1 c(S3)) 0.75(2 c(S1)) 0.25(1 0) 0.75(2 7)) 7c(S1, a2) 0.5(1 c(S2)) 0.5(2 c(S1)) 0.5(1 4.2) 0.5(2 7)) 7.1Since c(S1, a1) c(S1, a2), in the optimal policy, we execute a1 at S1.b) Since the given policy executes a2 at S1, we simply ignore a1 during ourcomputation. We first generate the following set of equations:c(S1) c(S1, a2) 0.5(1 c(S2)) 0.5(2 c(S1))

c(S2) c(S2, a3) 0.6(1 c(S3)) 0.4(2 c(S1))c(S3) 0Plugging c(S3) 0 into our second equation, we get:c(S2) 0.6(1 0) 0.4(2 c(S1))c(S2) 1.4 0.4c(S1)Plugging this value into our first equation, we get:c(S1) 0.5(1 1.4 0.4c(S1)) 0.5(2 c(S1))c(S1) 2.2 0.7c(S1)c(S1) 2.2/0.3 7.33Finally, we get:c(S2) 1.4 0.4c(S1) 1.4 0.4(7.33) 4.333

Markov Decision Processes{ Solution 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive, c) all of its actions can result with some probability in the start state, and d) the optimal

Related Documents:

Lecture 2: Markov Decision Processes - Stanford University

Lecture 2: Markov Decision Processes Markov Decision Processes MDP Markov Decision Process A Markov decision process (MDP) is a Markov reward process with decisions. It is an environment in which all states are Markov. De nition A Markov Decision Process is a tuple hS;A;P;R; i Sis a nite set of states Ais a nite set of actions

18 Views

1y ago

Markov Decision Processes - Johns Hopkins University

Markov Decision Processes Philipp Koehn 3 November 2015 Philipp Koehn Artiﬁcial Intelligence: Markov Decision Processes 3 November 2015. Outline 1 Hidden Markov models Inference: ﬁltering, smoothing, best sequence Kalman ﬁlters (a brief mention) Dynamic Bayesian networks

14 Views

1y ago

The Markov Chain Monte Carlo Revolution

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

27 Views

2y ago

Identity Management (IDM) User Guide - CMS

1.1 Identity Management (IDM) System Overview CMS created the IDM System to provide Business Partners with a means to request and obtain a single User ID which they can use to access one or more CMS applications. The IDM System uses a cloud-based distributed architecture that supports the needs of both legacy and new

7 Views

1y ago

Markov Decision Processes with Applications to Finance (Universitext)

so-calledﬁltered Markov Decision Processes.MoreoverPiecewise Determinis-tic Markov Decision Processes are discussed and we give recent applications . to the students at Ulm University and KIT who struggled with the text in their seminars. Special thanks go to Rolf B auerle and Sebastian Urban for

10 Views

1y ago

Using Multiple Imputation to Simulate Time Series: A ...

2.2 Markov chain Monte Carlo Markov Chain Monte Carlo (MCMC) is a collection of methods to generate pseudorandom numbers via Markov Chains. MCMC works constructing a Markov chain which steady-state is the distribution of interest. Random Walks Markov are closely attached to MCMC. Indeed, t

18 Views

2y ago

1 Markov Decision Processes - MIT

Stationary policy composed of stochastic history-dependent Markov decision rules ˇ(s t) (U(M s;M s 10) ifs t s t 1 2 0 otherwise Non-stationary policy composed of deterministic Markov decision rules ˇ t(s) (M s ift 6 b(M s) 5c otherwise As one can see, any combination of di erent types of decision rules and policies can be .

17 Views

1y ago

Mengerjakan Prosedur Akuntansi Hutang Jangka Pendek & Panjang

Prosedur Akuntansi Hutang Jangka Pendek & Panjang BAGIAN PROYEK PENGEMBANGAN KUR IKULUM DIREKTORAT PENDIDIKAN MENENGAH KEJURUAN DIREKTORAT JENDERAL PENDIDIKAN DASAR DAN MENENGAH DEPARTEMEN PENDIDIKAN NASIONAL 2003 Kode Modul: AK.26.E.6,7 . BAGIAN PROYEK PENGEMBANGAN KURIKULUM DIREKTORAT PENDIDIKAN MENENGAH KEJURUAN DIREKTORAT JENDERAL PENDIDIKAN DASAR DAN MENENGAH DEPARTEMEN PENDIDIKAN .

115 Views

3y ago

Recent Views

Litigation Hold Basics

Outside Counsel through In-House Counsel Generally case law uses the terms attorney/counsel 5 When rIssuance of the Litigation Hold Notification Letter Issue a Litigation Hold Notification Letter at the onset of litigation or whenever litigation is reasonably anti

2y ago

124 Views

Public Relations Litigation - Vanderbilt University

patent litigation compensate for lawsuits that are unlikely to succeed). 10. The relationship between social media and litigation is not unilateral. While litigation can fuel social media activity, social media activity can also increase the possibility and affect the outcomes of litigation by increasing the information available to attorneys.

1y ago

121 Views

GAO-13-465, INTELLECTUAL PROPERTY: Assessing Factors That Affect Patent .

litigation on key factors that have contributed to recent patent litigation; (3) what developments in the judicial system may affect patent litigation; and (4) what actions, if any, PTO has recently taken that may affect patent litigation in the future. GAO reviewed relevant laws, analyzed patent infringement litigation data from 2000

1y ago

139 Views

Injury Litigation, Insurance Law, Arbitration, Mediation .

Injury Litigation, Insurance Law, Arbitration, Mediation and Construction. Partner!DirectorlShareholder Matsumoto LaFountaine & Chow August 1, 1994 to April 30, 1999 General practice of law, including Civil and Commercial Litigation, Personal Injury Litigation, Insur

2y ago

142 Views

WA LITIGATION GUARANTEE

WA Litigation Guarantee STEWART TITLE GUARANTY COMPANY Litigation Guarantee Page 1 of 6 Order No. 21-11333-TO LITIGATION GUARANTEE Issued by STEWART TITLE GUARANTY COMPANY a corporation, herein called the Company SCHEDULE A Guarantee No.: 949242578 Premium: 578.00 Sales Tax: 51.44 Prepared by:

2y ago

349 Views

Civil Litigation - Pearson

CHAPTER 5 Causes of Actions and Litigation Strategies 121 CHAPTER 6 Evidence 143 CHAPTER 7 Interviews and Investigation in Civil Litigation 167 UNIT THREE DOCUMENTS IN CIVIL LITIGATION CHAPTER 8 Pleadings: Complaint, Summons, and Service 203 CHAPTER 9 Motions Practice 237 CH

2y ago

314 Views

Food Litigation 2018 Year in Review - Perkins Coie

PERKINS COIE IS PLEASED TO PRESENT ITS THIRD ANNUAL FOOD LITIGATION YEAR IN REVIEW, summarizing important developments in consumer litigation affecting the food and beverage industry. Class action litigation against the food and beverage industry continued unabated in 2018, with 158 new lawsuits filed—a figure equaling 2015's high-water mark.

1y ago

108 Views

Strategic Litigation Impacts

of strategic litigation. As discussed in the Foreword to this volume, strategic litigation is of keen interest to the Open Society Foundations (OSF), which both supports strate-gic litigation and engages in it directly—and thus has an interest in gaining an unbi-ased view of its promises and limitations.

1y ago

138 Views

Shareholder Litigation and Corporate Innovation

shareholder litigation as an "uncontrolled tax on innovation".1 We investigate the impact of shareholder litigation on corporate innovation by relying on a staggered law change that reduces a manager's exposure to shareholder litigation.2 We explicitly test two conflicting hypotheses that can be drawn from the literature.

1y ago

109 Views

The Status of Climate Change Litigation

litigation seeking to challenge either their facial validity or their particular application has followed. So too has litigation aimed at pressing legislators and policymakers to be more ambitious and thorough in their approaches to climate change. In addition, litigation seeking to fill the gaps left by legislative and

1y ago

106 Views

Texas Civil Litigation

The litigation practice in Texas state courts is quite simi-lar in many areas to the litigation practice in the federal courts. However, there are many differences between the two systems, including the times for filing or responding and the format of pleadings. The role of the paralegal in a litigation law firm is much the same whether the .

1y ago

122 Views

Federal Civil Ip Litigation

IPRs may be filed during a patent litigation (within 1 year of filing complaint) Used to attack patentability and scope of patent Generally less costly than patent litigation ( 100K - 500K depending on what stage is reached) Less time consuming than patent litigation-- USPTO will take up to 6 months to decide whether or not to .

1y ago

123 Views

The Role of Precedents in Repeated Litigation

poral allocation of litigation efforts, and nuisance suits. This article is organized as follows. Section 2 briefly reviews a one-period litigation problem. Section 3 introduces repeated litigation with correlated decisions and examines parties' settlement behavior, and Section 4 explores further implications of correlated decisions.

1y ago

111 Views

Federal and State Litigation Regarding Pharmacy Benefit Managers

In re Pharmacy Benefit Managers Antitrust Litigation, 582 F.3d 432 (3d Cir. 2009). 26 In re Pharmacy Benefit Managers Antitrust Litigation, No. 03-cv-04731-JF, 2006 WL 3759712 (E.D. Pa. Dec. 18, 2006) . 26 In re Pharmacy Benefit Managers Antitrust Litigation, No. 07-1151 (3d Cir. Jan. 24, 2007). . 26 In re Pharmacy Benefits Managers .

1y ago

107 Views

Recent developments in Health and Safety Law - IOSH

Recent Developments in Health and Safety Law Litigation Privilege and Internal Investigations 2 recent cases have called into question whether litigation privilege can apply to internal investigations including health and safety matters. To claim litigation privilege the following must apply: - Litigation must be in progress or in .

1y ago

116 Views

Markov Decision Processes{ Solution - Idm-lab

It looks like you're using an ad-blocker