A Series Of Lectures On Approximate Dynamic Programming

2y ago
7 Views
2 Downloads
1.04 MB
19 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Josiah Pursley
Transcription

A Series of Lectures onApproximate Dynamic ProgrammingDimitri P. BertsekasLaboratory for Information and Decision SystemsMassachusetts Institute of TechnologyLucca, ItalyJune 2017Bertsekas (M.I.T.)Approximate Dynamic Programming1 / 24

Our AimDiscuss optimization by Dynamic Programming (DP)and the use of approximationsPurpose: Computational tractability in a broad variety of practical contextsBertsekas (M.I.T.)Approximate Dynamic Programming2 / 24

The Scope of these LecturesAfter an intoduction to exact DP, we will focus on approximate DP for optimalcontrol under stochastic uncertaintyThe subject is broad with rich variety of theory/math, algorithms, and applicationsApplications come from a vast array of areas: control/robotics/planning, operationsresearch, economics, artificial intelligence, and beyond .We will concentrate on control of discrete-time systems with a finite number ofstages (a finite horizon), and the expected value criterionWe will focus mostly on algorithms . less on theory and modelingWe will not cover:Infinite horizon problemsImperfect state information and minimax/game problemsSimulation-based methods: reinforcement learning, neuro-dynamic programmingA series of video lectures on the latter can be found at the author’s web siteReference: The lectures will follow Chapters 1 and 6 of the author’s book“Dynamic Programming and Optimal Control," Vol. I, Athena Scientific, 2017Bertsekas (M.I.T.)Approximate Dynamic Programming3 / 24

Lectures PlanExact DPThe basic problem formulationSome examplesThe DP algorithm for finite horizon problems with perfect state informationComputational limitations; motivation for approximate DPApproximate DP - IApproximation in value space; limited lookaheadParametric cost approximation, including neural networksQ-factor approximation, model-free approximate DPProblem approximationApproximate DP - IISimulation-based on-line approximation; rollout and Monte Carlo tree searchApplications in backgammon and AlphaGoApproximation in policy spaceBertsekas (M.I.T.)Approximate Dynamic Programming4 / 24

First LectureEXACT DYNAMINC PROGRAMMINGBertsekas (M.I.T.)Approximate Dynamic Programming5 / 24

Outline1Basic Problem2Some Examples3The DP Algorithm4Approximation IdeasBertsekas (M.I.T.)Approximate Dynamic Programming6 / 24

Basic Problem Structure for DPDiscrete-time systemk 0, 1, . . . , N 1xk 1 fk (xk , uk , wk ),xk : State; summarizes past information that is relevant for future optimization attime kuk : Control; decision to be selected at time k from a given set Uk (xk )wk : Disturbance; random parameter with distribution P(wk xk , uk )For deterministic problems there is no wkCost function that is additive over time(EgN (xN ) N 1X)gk (xk , uk , wk )k 0Perfect state informationThe control uk is applied with (exact) knowledge of the state xkBertsekas (M.I.T.)Approximate Dynamic Programming8 / 24

Optimization over Feedback Policieswkuk µk (xk )Systemxkxk 1 fk (xk , uk , wk )µkFeedback policies: Rules that specify the control to apply at each possible state xkthat can occurMajor distinction: We minimize over sequences of functions of stateπ {µ0 , µ1 , . . . , µN 1 }, with uk µk (xk ) Uk (xk ) - not sequences of controls{u0 , u1 , . . . , uN 1 }Cost of a policy π {µ0 , µ1 , . . . , µN 1 } starting at initial state x0(Jπ (x0 ) EgN (xN ) N 1Xgk xk , µk (xk ), wk) k 0Optimal cost function:J (x0 ) min Jπ (x0 )πBertsekas (M.I.T.)Approximate Dynamic Programming9 / 24

Scope of DPAny optimization (deterministic, stochastic, minimax, etc) involving a sequence ofdecisions fits the frameworkA continuous-state example: Linear-quadratic optimal controlLinear discrete-time system: xk 1 Axk Buk wk , k 0, . . . , N 1xk n : The state at time kuk m : The control at time k (no constraints in the classical version)wk n : The disturbance at time k (w0 , . . . , wN 1 are independent randomvariables with given distribution)Quadratic Cost Function(ExN0 QxN N 1Xxk0 Qxk uk0 Ruk) k 0where Q and R are positive definite symmetric matricesBertsekas (M.I.T.)Approximate Dynamic Programming11 / 24

22p2n2n11nn2(n1)1)(n C1)(n1)(n p1)np22p2(np2(npCp(nSB / CCC1)CCDCC1)(nBC1)(n1) yACBACDCABCABCCCCABCDACCCDCnnADDBABADCDBDDBAB 1 1)CCBDpatp2n CkpCCDpDAp(nCDAwk xk uk Demandp at Periodk Stockk2(nat Period22 Period2(n1) p(n1)n pnnp2(npnn02nd0Stock1 p1.50.52nd11)(n1 1)0.51.50 222 p2n p2(n 1) 0/ TimidGame/ BoldPlay1) 1 pGame(n0 1)(n1)(nPlay1)n12Cost of Period k Stock Ordered at pair10 20CnBDnp11pC12p.1(nPlayk 1DB02 C1/CD1 1 aypPlay2nd/22CDPlay2ndGame/p11)BoldPlayr(uk )Discrete-State cuk xk 1 xk u Deterministick wCGameCCGameCC221)(nk notABRepairDABDDBABDoRepair1C1stp2(np1)12p2(n1st/ ADTimidPlayPlaypGameppC1 1)pC. p(n 1)111nSTimidSn/ BoldCCC /pp(nC.wBC1) w2(ndk 1(ndAB PlayABACCD emandkp1(nPeriod2 Do3 k RepairRepair2atnPeriod11np Stockp12pUp11n121)2(nproblem approximationu1kSystemu2ndukRepairuw4kkGameu/15kx RelaxationU2nd GameTimidPlay/unBoldGame/kConstraint/µnBold1xwk xk uTailStockatd kUk 1stk2ndfPeriod(x,PlayuxPlayw1st(x11ku)Playwk Demand at Period k Stock at Periodkk 1k )(xku kkkµ p(xr(ucuxkkk 1xu wk p22Periodpk2npk2(np, pµnn 1xpkk 1Costof Period k Stock Ordered atInventorySystemµk) 2(n1)1)System(nf 1)ku,pw(nk1)(nk , k )11)nkk k.)0010011.50.510.51.51057839612wk atx Periodu Demandat Periodkp02nat01stPeriodkCAStockatPeriod/ TimidPlayBoldPlaypdpC1n1/ dp12k 1Stockk 1 ypdp1(n/1stPlayPlpGamep0.50 kDo1 kpC2(n0 AB1AC1.51Game0.50p1/nn22C1.5CCGameC.pAB1)1st2(n1) 2nd1)(n1)ABADr(u) k cukk xk 1 xk up wRepairn t/ BoldPlaypDB1pnnpdpnotpRepairp02(nppnnd 1)ppGamepDemandp1(nk1)np(n222n pPlay2(n1)Game1)(n1)(n222nk 1 k2(n1)1)(n2(n1)1)1st Game/ tockatPeriodkStod 1pw010011.50.510.51.502kkk3 5Game2 4CostPeriodk 1BoldInitialA Ck2nd/ 6TimidPlay/ OrderedPlayStatePeriodk2 GameStockat Period32of5Stock2 4 at62ndkp 0 0f1kp0kPlay1kInventory0,System1upSystem1.50.51k (x1,wu0.51.501.5CostPeriod CABk Stockatxk 1PeriodACBof ACDCADOrderedCDAx /k 1 xpABCf)0.5,w) u1)np2µnnkGamekk (xk2) 2n0r(u1uk2(n00kDo11)(n1.51nn1st/BoldTimidPlay/ Bold2(n1)(n1)(nk)kxkkk 1) ukkkp1)Stockat Periodk 1k 2ndInitialStateABACCDpGamep2n/GamepApatk me1.50.5112Play11st1.50 p0112 PlayRepairRepair20.51 e/nµBoldPlaywCostxkk ofuDemandat uPeriodStockPeriodk1 StockPeriodPeriodk StockOrderedatwPeriodkInventorySystemkcur(uk ) xkk 1 x kw001000.510.51.50kkSystemx f(x,w)u (x)µwk12Pxk 1k k k kStockk 1kk kk Stockuk5 DemandPeriodatAt atStatexFiniteProblemsk xk Periodwkkw x1k xuk uDemandPeriodk kStockk atStockPeriodatk PeriodStockatCAB1stGameTimidPlaypdCDA1k pat107/Horizon8 103 atPlay2 9Ch.59ACB76 1st81ACD3Game6/ Bold1 Period2CADd pPeriodkCostStockatPeriodk k) kk Demandr(uxCk 1 CxatuC k wk ofPeriodkBoldOrderedatSystemxCk 1 5fk2(xPlay, u6kStock, 2ndwupkd 1µ/k Bold(xµSAk cuSBCkGameCCBk 1k2 C1BCk4k ) Gamek )pPeriodk1 wkpkwxkInvenABACCA idPlayStockatPeriodk 1InitialStateACABACBACDCABCADCDASystemx f(x,u,w)u e/BoldPlayk 1k 1kkpkukkpk 1)k(xk1)(nkkwkxSystemx f(x,u,w)u µ)µppp1stGame/TimidPlay1stGame/BoldPlayp1pp1k 2nr(u) cux x kwwwdd2(n1)2(n(n1)(n1)ndkkk 1kkatPeriodk 1InitialStateACABACCACDABCwk ystemxk 1 k ofk Period k Stock Orderedk (xk , u3k , System5wk2) 4uk 6 2µk (xk ) µk wk xkCostat Periodk fInventoryDeterministicProblemsCh.SB 0.5CABCBC0 System0AC1 CA0 0 CD1SA1.51 12CAC0.5 C1.52CA 0 CCDStockatPeriodk Inventoryk Cost1r(ukC) cuk CxADxkDA u CkCDwk C0BD33 1of59 1.52Period4 160.52k 1 Stockk 1 CCostOrderedk Inventory hedule1036Play11Bold2Ch.CABCCAC 10C1stC/ABTimid/11stBold1Ap0.5pw nventorySystemABCA0GameBCCDr(uk ) Costcuk xof SxAEmptyukB kStockw Ordereddk d25Gamef9Play(x,CADwk 1Periodk SHorizonGame/wTimidGamePla0Game12/ 181 0.52)Ck 1kProblemskkk10(x11st020.50k3xk 1.50.51kCDA1.52)datInitialStateAC11.5ACBACDCABk kCAD3 5 CDA2 Period4 6k )02 k1cuStockr(uxPerioduProblems 1051.57 861at2Periodk xk 1kk Ch.wPeriodStockatatHorizonPeriodk xk uk Demandwxk ACuk SystemDemandatPeriodkk , 9uStockk(xStockatwkPeriodr(uxk 1 ACDxatkk u kwk ABCCCCCxDBx f(x,w)u µ)µCkBDk ) cuk ACBkABADDACDk 1kkkkkkkStochasticProblemsStockatPeriod 1InitialStateACCACDABCCAB OrderedCAD CDACost of Period k Stockat Period10k 5InventorySystem.7839612k 1. 0.5fCD,pu1.5)0.5 1)µ(xwC 1 CACStock at Periodk 1 CInitialStateABABCkkCk, wkkB1k ) Cµk xk2 CCDS2(nCCABCC12not RepairRepair1 A2 kkCnSystempk 1p pu0x1CA0DB0p(x1k66ABBC2 PlayADu CkDACD10 k ) Docu f51(n(x,u)4 (xw10532System fu6,µuProblemsw)1C0µCAuk0.5 Gameµk CDwCk CBx211kCADACBCABkk9kkµ21.5k k/)0CAkkkkx(xSAk xSk 1C xk CCwC0atC2710 C5 CD7System8CBC3 0x9k 16CBStockPeriodk kk ABACCAACBACDABCAD StateCDAFiniteStockSat Periodk CAB 1 InitialACHorizonABACCACD3Problems5Perfect-State2 4CFinite6Ch.2 CBCDDonotRepairRepair12n1np11ACD kCABCADCDA at3 PeriodCost ACBof PeriodStockOrderedkofInventorySystemk ℓ deredatPeriodkInventorySystem."C#)k 1).uCµCD 0.5p22 atp2np2(n k1) 1p2(np(n 1)(npf2nnx (xu,4w)0C2ADu1CµProblems(xwxk)1 CABDABDDB1) State1)nk 1k3k k6ABCk 2 Systemx k 1fAB 1CDµC(xwkCxSCCCC1.5DoRepair1 System2BCn1(nnpStochasticpProblems ABk (xkk1)k1.5k2kBC µ1kCCBkCDApuB1nACCA ,6r(uk ) cuk Stockxk 1xC u kAC InitialwC11Sx12CSABCRepairCCC1058pk2(n3k0CDA1(n1), uCABCADSPeriodCnotCCkCACr(uCkADDACDBDA BABCDming4Finite(x)0kC gwx(x cuxk 1 w7HorizonCh.19m µmm ), wm J k ℓ (xk )CkECBkkABk ,ACDDeterministicProblemsCh.ACDCABCDA105k k7 Horizon8kProblems32 9Problems6 1 k2 Ch.uCAD,µ,.,µSA ACBSBCABCABCCACCCBCDBCCD CACCDBCCCADCCCBDFinite1 p2(nkCk 1k ℓ lemsCh.1m k 1Problems Ch. 2105 5/7Bold84 392 6 Deterministic1 22ndGame/TimidPlay2ndGamePlay326CAB ACDCpCCDCBDC1)DBABCABInfoCh.System fkC(xu23k1C, wkC) 1 uCµk (xCABCAD105 371)CA9 6C61Perfect-StateDA p2(n107823CxCA9k 12BDCCStateCkCk notRepairRepair1, BCnp11p12k ) pµp(npAD1DACDStock atACBPeriodkAD n2nn1) pS8pBCCCC.A6ACCDABCB ABCD ABCatSAPeriodkCD 1 (npABp2(nStochasticDACDSA DoCCACnCDCCC11Problems121n1(n 1) Info1) . 3BAB CCAC CBDCBCDDeterministicProblems2. Ch.2nd Game/ TimidPlay 2nd Game / BoldStochasticProblemsDeterministicProblemsCh.2 .1) blemsCh.11st/ TimidPlay1stGame/ Boldp3.21 2 pp4w 6 Ch.1057Play83pd 91Play6. ppd1wProblemsFinite1 s 1) p(n 1)(n 1) p(n 1)n pDoSGamenot RepairRepair1C2CAn Play1CCDn 2ndp11pCBpHorizon52nCABCCpGame12 p1nC1(nC1)C2(n1)BACBCp 2ndGame/ DCCDCnDB{Φr rBD} x1 CABDA10571)CD3S6CDA1C212 p2ℜACBCABCADProblemsDonotRepairRepairnx̃ABp11 p12 p1n 1 2Cn 11)Cn ADCCp1(n2n Repairnn1112DB1n2(n2(n2(n1) p(n msCh.1Problems1stGame/TimidFiniteHorizonCh. 13Play 1st 1Game / Bold fect-StateInfoCh.222nnn2(n1)2(n 1) 1)(n1(n1) 1)(n1)(n1)np22 lemsCh.2 / Bold Playnn Deterministic01001021) p12(n 1.51) p(n0.51)n p1.5ProblemsCh.2103 pw9C6 must1 2GameCADCDACCB,CDCDCCDBDA,ABPlay2nd1stPlay1stGame/ BoldPlayp2nd1 Gamep5d Cp7/3wTimidSA psequenceSCBABCACC/pCTimidCCd InfoFind optimaloperationsC,(AmustprecedeB81 CC1)(nPerfect-State2(n1) of1) CA(n1)(n 2nd1) p(n1)npCh.pCpn.pCDA/ABpCA Ch.CB 1)CCD22CAC2n meBoldPlay2(n1)(n p(n 1)nnotRepair22(nn31)pp1np pnn p11 p12FiniteHorizonProblems1 12ProblemsPerfect-StateInfoCh.Do 0010011.50.5 11(n 11) 0.52(nDeterministicCh.211121n1(n1)2(n1)2nd/ TimidPlay Play2nd Game/GameBold PlayPerfect-StateInfoPlayCh. 32ndGameGame/ x,u,w)u µ)µwx. Timidk / Boldk withk fixedkkk Parametrick1stPerfect-StateInfo /Ch.3 Play 1st imationtheendPlayMon/CTimidGamePlayGame/ Boldpd0 k 10Play1 k2nd0 CkSimulation0BD1 nCk1.51 C121 p0.51.50StochasticCAB2ndCGameCRepairCABADDACDDB1 / atDonotGameRepair1 21st1 n0.5pBoldpp1(npp22(nCCC11)CAB11 pGame/pw2(nTimidGameBold1nnPlayAD 1nCCDBD.pPlayDB 2ndDP ProblemFormulation1)StochasticProblems1st/ TimidPlayGame/ABp2nd12(np1)ppPlaypCdDAd Ch.22 p2n1) w(n 1)(n 1) p(n 1)n pDeterministic2SystemPlay 1)1stpGame/ BoldPlaypProblemspwFinitew 1p22 1stp1stp2(n/ Timidp2(np(nd 1 pProblemsd pStochasticHorizonProblemsxk 1, uk ,1wk ) uk µk (xkProblems2nGamenn1)(n1st1)(n1) Stochastic1)n1 fk (xkCh.Game/TimidPlayGame/BoldPlayp1pp1pwwdd1st Game / Timid Play 1st Game / Bold Play pd 1 pDeterministicp w 1 pwInfoPerfect-StateInfoCh.1 30.5 1 1 0.5 1.5 .00. xdecisions1k Ch.0 30Ch.12 1.5System xk 1 Controls:fpk (xk , uk , wk )Stageuk d µPerfect-State(x) p0.5nn1stTimid/ BoldPlay pd 1 .pd1) 0.51)(n1)nDo notp22Repair211.5n1.51 (n0.5p111p112 1.5p1n2ndp1.5. n 2ndPerfect-StateInfo00p2n00 1pRepair12(n0 001)0 11p2(n1 0.50Ch.232(nGame/pGameTimidGame/12BoldPlay1n λ)(c)2ndGame/TimidPlay2ndGame/BoldPlayProblems2 1) p2(n 1) .Perfect-StateInfoCh.3Perfect-StateInfo3T(x) T(x)x P(x)0 0 the0.5 1.51 into0 0 10 0 00 1 1 1.50.51 11.51 0.501 20.5 1.5 0 2DP )Finite HorizonProblemsCh. 1StochasticProblemsSystemx f(x,u,w)u µk (xk ) µkk 1kkkkkx/k 1 fk (x, uk,, 2ndwuk) puµk/(xw)kGame2ndSystemGameTimidPlayBoldk(xk )µ µ(x0Timid0 1st1 1.50.51 Play1 0.5pd 1.5p22 p2npSystempkPlayukk ), wGame µxkk01)w/kCh.x1k 0Play1stGame/ Bold1 p0 p2 1nnk 1kkp22kp2(n 1) xp2(n 1)fkp(nk 1)(n1)(n1)npk2nPerfect-State2(nInfo1) p(n 11)n pnnSystemxk 1 last fkx(xdecisionuk µk (xk/) Bold1st Game/ TimidGameppd2(np3w1)Stochastic1p(nFinitep 1)(n Problems1 1 d wk , uPlayk , wk ) 1stkµ x(xk pd 1Start fromtheandSystemuµkkbackwards wPlayHorizon Problems Ch.k 1 fk (xk , uk , wk )gok k ) µk w k xk1 wDeterministic Problems Ch. 211Perfect-StateInfo3Problems1Play1st Game/FiniteTimidPlay1stGame/ Ch.Bold1 0pd0 pxwk 111.5pfwk (x Ch.)0.5uk 1.5µk (xk , w1kPlay2nd Game/ TimidPlayHorizon2nd Game/ BoldPlay0 p0d 1System12nd0.5k ,/u1Bold0 k ) 2 µk w k xkGame0 0 1 0 0 1 1.5 0.5 1 1 2nd0.5 Game1.5 /0 Timid2 Play Perfect-StateInfo Ch.3DeterministicProblemsCh. 2Bertsekas (M.I.T.)Approximate Dynamic Programming12 / 241

ayBoldPlayxBD Repairf2(n(x,1kRepairukpn,(nwPlay) 1u tk1nk 1k/nkatk1knnk Periodk 1.p1(n1)2(nk2kCABSA ckppwADDBAB2n121n1)1)1)(n1)(npDApRepairp1wpkCp12SB / CCCC1)CCC222nnn2(n2(n1)1)(n1) CAD1)nABACCABCCB2nd ADCDBD C/DBABkpC/CD 1C1)DApatp2npDAp(nCDApnnwk xk uk Demandp at Periodk Stockk2(nStockatCPeriod22 PeriodCCCC2(n1) p(n1)(n n1)(n 1)(n1)(nPlay1)n 2nd Game / Bold Playost of Period k Stock Orderedk0notInventory0 at02(nPeriod1 1)0Do1Repair1.50.511/CD0.51.52121ABp/1nC0DAC1st101 20CnBDp011pCp.1(nPlayk 1ABDB02 C1 1 1.50.51Bold0.5211) . p1) 2npGamep22ppPlayp/np1)(nnnPlay. e/BoldpTimidppp2nd/22CDPlay2ndGameBoldPlay cuk xk 1 xk u k wCGameCCCC2n.2(n1)2(n1)(n1)1)nk notABRepairDABDDBABDoRepair1n1nppppp1st/ TimidPlay1stGame/BoldPlayp1pp1p1112 C1nS/TimidSBPlayCABC//CACpC. CBCCDwwBC1(n1) C2(ndACCD nPeriodp111pd12pStock. pkd1)Stockk 1(n2nd GameTimidPlay/uPlay1stGame/k2Timidwk xk uk Demand at Periodk Stockat Periodk 2ndStockatPeriodd k1pxkpd pw p 1,Playu w1st)(xu,k µ)PlaywDo2k nkxGame1 pk k)(xkpkkµpp1n11121(n1)2(n) fk cuuxkkkk 1xkk 1u,n kk)(xwpRepairp1, pwpµnn1xpRepair pkfnot,0 r(uuInventory,0 µ)µwxCost of Period k StockSystemOrderedxk 1atpSystemkk )(n1)nk 11 /Game21.52nd/2(nTimidPlay2ndBoldPlaywk atx Periodu Demandat Periodkp02nat01stPeriodkCAStockatPeriod/ Gamenn2/2ndp0.5p112tockk 1 InitialStateAStockABCDw1)w 11)d1Game/17(nTimidPlayBoldPlaypdp1(np 10pp2(n/1stPlayBoldPlaypGamep0.50 kDo1 1) 2nd1)(nABADDABDr(u) k cukk xk 1 xk up wRepairnp e/0C2Timid1st/ 1BoldPlaypDB1pnnpd dpw w1 ppnotpRepairppp01st01011.50.510.51.52d 01)ppppp222n pPlaynn2(n1)2(n1)(n1)(n1)(n1)n222n1 k2(n1)2(n1)(n1)(n1)(n1)n1st Game/ atPeriodkStockatPeriww1k 20 0 1 1.5 d0.5 1 d 1 0.5 1.5 0 2k02 k43p2225Gamep2n0CostpStockp(nStatepnnCk ABStockatPeriod 1BoldInitialAAC/p6TimidPlay/ OrderedPeriodk2 Gameat1)nPeriodInventor3 of51)2 p42(n62nd2(n1)(n k 1)(n1) Play3 5at2Period4k 0 62nd1/0kPlay1kInventory0,System1upSystem1.50.51k (x1,wu0.51.5011.5CostPeriod CABk StockACBof ACDCADOrderedCDAx /k 1 xpABCf)0.5,w)pdu1)n1stTimidPlay1stGame/ Boldp0dk amewkSystemx2n .51nn2) 1µkppw1st/BoldTimidPlay/ Bold1 xp1)(n1)(nk 1k)kkkPlaykk x2(n 1)ukkPlayStockatPeriodk 1InitialStateABACCDppppp1)k 12Play11st1.50µnnDoRepairRepair20.5nkBold1k )npk112wpk12 xpk1nd Game/PlaywCostxkk ofuDemandat uPeriodStockk1 StockatPeriodx f(x,u,w)u µ(xPeriodStockPeriodkInventorySystemkcu xkk 1 xk k k 0k Orderedw0atGame01.50.510.51.50k 1kkkkkk0 1 at PeriodSystemxk 1 Periodfk (xk ,kukStock, Ch.wk ) 1atukPeriod µk (xkk ) Stockµk watxkwuk5DemandFiniteProblemsk xk Perioduk uDemandat Periodk Stockk atStockPeriodatk PeriodStockatCABGameTimid1stPlaypPlay1 pdk D3Game9 Game6/ Bold1 at2/CADw 1dCDA2ndGameTimid2ndBoldPeriodkStockatPeriodk cukk Demand)x CxatuC k w10 5C7 2System81st39k CostofPeriodkBoldOrderedPeriod 1.5(xPlay, u6kStock,12ndwupk0.5 1µ/k1.5(xµSB xCk 1CkGamek 1k ) 1Gamek )p0k12wkpkwxkInventory SysteAABACCA 1stCDCBCD/1BCTimidPlay1stGame/kk4PlaypState3x5f0 k 2at2u3CAD5 k204CDA60 2nd1C0 0xC1Stock0.5wdInitialdµGame/TimidBoldPlayPeriodk 1ACABACACBACDCABSystemxf(x,u,w)u (x)µwx010011.50.5110.51.50 pCA2 1C2ndGame/TimidPlay2ndGame/BoldPlayk 1kkkkkkkkSystemf(x,,w)u k 1kkukµkpkx1)pd/pk(nk1)(nk1)kd(n 1k1stGame/TimidPlay1stGameBoldPlay222nr(u) cux x kww pw kpdpnnd2(n1)2(n1)nwkkk 1kkStockatPeriodk PeriodkStockatPeriodSystemx f(x,u,w)u µ(x)w352462kkk 1 k Inventoryk k 3k Systemkkk5 k2 4 k 6 2 k kCost of Period k Stock Ordered at PeriodDeterministicProblemsCh.SB 0.51stCABC1.50 1st0AC1 CA0 /0Timid1SA1.51 Game12CAC0.52p CCA 0 CCDBCp CpCB 11 InitialStateC 3ABABCGamePlay/ Boldf StockPeriodk PeriodStockOrderedat Periodk AInventorywd 1 Systemd cuk CxatxkDA ku Ck59System2Period4 160.52k 1 CDk 1 CCostStockOrderedk wPlayInventoryABADCD wk 0CD8C3Timid6Play2105839612Ch.ACBCABCADCDASystemx f(x,uw)u (x)1AµCABCAC 10C1stCCC/1stGame/BoldPlayppkw pwkpCAGame/TimidPlay1stGame/BoldPlayp/xpt xofPeriodkInventorySystemCA0GameBCCBCDk 1kkkkkkkkpk SxAk SukB kStockwk Ordereddd1uABd1µ352462352462Systemx f(x,uw)C)d µpkwCDw1k ABxk 02 1cuFinite0 Stock0 010Horizon1 1.50.5110.51.5k 1kProblemskk ,02ndk2kk10(x010011.50.5110.51.52atPeriodk xx u k57839612k Horizonk10 k 1kkDemandat Periodk eriodukStockx xu kwkk ACk SystemCkDACABxk 1 SystemfkC(xAB,w µkC(xwk CxDBk 1kk 1CABkADCDk , u.kCk) uk ) µCkBDkStochasticProblemsat PeriodInitialOrderedStateA CDACCACDABCACBACDCADtk ofPeriodk Stockat ABPeriodkInventory0 5kABC18 k3)0 u90k 0.51.5 0 2106 1µ1k1.52k ) µ. 0.5fCD,07u1.5(xw1Cx1k2 0.5 1 CACat Periodk 1 CInitialStateABkk, wkSubproblemS2(nCCDCCABCDonot RepairRepair1 A2 kkCnSystempk 1p p315CA0DB0p(x166B11.5ABBC2 PlayADu CkDACD10 n111n30xBD2C124106 5Ch.1stGame/.(xTimidPlay1stBoldpd C1CD pd f51(n(x,u)4 (xw52System fu6,µCuProblemsw)1C0µCAuk0.5 Gameµk CDwCk CBxABC5Deterministic211kCADACBCABkkk 1k9kkµ21.5k k/)0CAkkkkx(xkSuAk xSk 1C xk CCwC0atC2710 C5 CD7System8CBC3 0x9k 16CBStockPeriodk BACCAACBACDABCABCAD lemsCh.1kSat Periodk 1InitialACABACCACDABC3 C5Perfect-State2 4CFinite6CD2 otRepairRepair1k (x2 kn) µ1Systemn p p p1n pACD kCABCADCDA at3 PeriodPeriodStockOrderedSystemSystemxCk 1 )CfukProblems(x,atu(x, w)Ch.u1kkwC InventoryµxHorizonkPeriodk wk11 xk125 Cost2 1)4 p2(n k1) 1p2(np(n 1)(npf2nn (x,9u,06 k0,CµABDA Ch.CDBDFinite1C22 at1) State1)nk 1k3k k6k, uk ACCACDABC3n5CpAB25ACB6CD10785kHorizon2 4w2ADSystemx3 k 1fAB(xw)19 1CDµkk2(x)1 CµDBwkCABxSCCkkBCC1.5DoRepair1 System2BC1xn4FinitepStochasticppProblemsk1kk1)kCD 0ACCAxC u kAC 1)ACDCABCADS 1SPeriodCnotCCkCACr(uCkADABDACDBDABA BABCD cuDBxk 1 xAupBkCw7Ch.Problemsk )Ck Stochastic105 7 Horizon8 3 9Problems6 1 2 Ch. 1CAB CCADCBCCCBCDBCCD CProblemsBCABACCDCCACBDFiniteDA CCCDFiniteABp22Problemsp2n ProblemspCh.p2(nDeterministicCh. 1)2 p(n 1)(n 1) p(n 1)n pnn2(n1)HorizonProblemsCh.16 3524nd/ TimidPlayGame/CBold3 p5102 54 Play6 82pC3109 CCABGameCpCC2ndC1)DBProblemsCh.2Ch.InfoSystem fkC(xu23k1C, wkC) 1 uCµk (xw1(nCBACDCABCAD61Perfect-StateDA p2(nCDBD7823CxCA9k 12BDCCkCk k pk x1)knotRepairRepair1, BCnp11p12k ) pµpAD1nDACDPeriodkAD rfect-State2(nAB41)n2nn1) p(n AB1)CASDeterministicCCCC.A6BACCDABCB ABCD ABCatSAPeriodkCD 2(nStochasticDACDCCACnCDCCC11Problems121n1(n 1)1) . 3BAB CCAC CBDCBCDPerfect-StateCh.3 InfoDeterministicProblems2. Ch.2ndGame/ TimidPlay 2nd Game / Bold StochasticProblems11121n1(n2(n1) 1Ch.DeterministicProblemsFiniteProblemsCh. 1105pHorizon831Horizon21 2st/ TimidPlay1stGame/ Boldp.1)1057Play83pd1)96. p72wProblemsFiniteCh.DoSGamenot RepairRepair1C2CAn Play1CCDn 2ndp11pCB29pp2(n4w6CCABCCpGameC1Play12 p1nC1(n2(n1)BACBCpd1p52n2ndGame/ Timid/StochasticBoldStochastic22ACDCABCADCDA1)1) pC1) p(n 1)n pnnCACDCCnDBABDA1071)CD33Problems6CDA1 6 BD212 p22(nACBCABCADProblemsDonotRepairRepair1 n(nABp1)(np12p1n p1(n 1) p2(n 1)11 1FiniteProblemsCh.1 9CCDp22pCpCDpp(npADRepair1 2Cn 11)Cn n2n Repairnn1112DB1n2(n2(n2(n1) p(n Ch.11stGame/TimidPlay1stGame/ Bold Play pd eterministicProblemsCh. 2 Ch.2(n1)2(n 1) 1)(n1(n1) 1)(n1)(n1)np22 lemsCh.2nn Deterministic1001021) p12(n 1.51) p(n0.51)n p1.5Problems2105d p7/Ch.81 33 pw9 6 2nd1 2Game / Bold PlayCADCDACCACCDBCAB13wTimid.B CpCDCCDBDCBCPlay1st/CTimidPlay1st/ BoldPlayp2nd1 GamepCh.C pGameCd InfoPerfect-StateCBCDpCnnn ABSAGameSBDoCC.pCDC1BCC1)(nPerfect-State2(n1) AC2(n1) p(n1)(n 2nd1) p(n1)npRepairpCh.pCpppABCAInfoCBp 1)CCD22CAC2n Repairnn lay2(n1)2(n1)(n(n1)nnot12nnppp11121n1) ateInfoFiniteHorizonnotRepairRepair1Play2 nGame1 .2Problems01Problems0 1 3 0Ch.0 121 1.5 0.5 11(n 11) 0.52(n 1.50 2DeterministicCh.11Play121) p2(n1)2ndGame/ oblemsCh.1ystemxk 1Play fk2nd(xkGame, uk , w(xk Finite) µwHorizon. Timidk ) uPlayk µk Problemsk xk1stPerfect-StateInfo /Ch.3 PlayProblemsStochasticProblemsame/CTimid1st/ BolddecisionPlay pd 1 pd p0each0CRepair1state0 C0BD1 /nCBold1.51k pC121 p0.51.50Stochastic2GameCABD AtDACDDBAB1 GamenotRepair1of2stage1 CCCC11 record1nC2ndGame/TimidPlay2ndGame/BoldPlayAD pDACDBDDBAB1(n1)2(n1)1StochasticProblems1stGame/ TimidPlay1stGame/Playp1pp1pppppppwwd 2(n 1)d Ch.222nnn2(n1)(n 1)(n 1)(n 1)nDeterministic2Problems1stPlay 1)1stpGame/ BoldPlayp1)npProblemsp1stp2(n/ Timidp2(nd 1 pDeterministicd pStochasticProblemsSystem, uk ,1wk ) uk µk (xk ) µk wkCh.2nGamenn1)1) Stochastic1 2fk1(xkCh.Game/ TimidPlay (n1st1)(nGame/p(nBoldPlayProblemspdwp 11 ppwdFinitepwProblems1Horizonpw xk 1me / Timid Play 1st Game / Bold Play pdPerfect-State1 pDeterministicw 1wd pPerfect-StateInfoCh.1 30.5 1 1 0.5 1.5 .0 2Problems2 1.5Info30Ch. x31k Ch.Perfect-StateInfoCh.0001Systemx f(x,u,w)u µ(x)µw1k 1kkkkkkkkk.p0.5pnn1stGamePlay pd 1 .pd pw 1 p1) 0.51)(n1)nRepair211.5n1.51p(n0.5p111p112 1.5p1n2ndp1.5. n 2ndPerfect-StateInfo0epair12(n0 001)0 11p2(n1 0.50Ch.232(nGame/pGameTimidGame/ Bold/ BoldPlay0p2n00 1p1n h.p1stp1(nStochasticProblems112ndGamePlayProblems2 1) p2(n 1) .Perfect-StateInfo3 p12 1p1nCh.StochasticProblems0 0 00/1Timid0 Bertsekas0Play1.50.5 1.51/ Bold1 20.51.50Info2Ch. Deterministic1Game1 1.50.51 1(M.I.T.)1 0.50Perfect-StateApproximate Dynamic Programming13 / 24

layBoldPlaySystemx RepairfBold(x,1kRepairukpn,(nwPlay) 1u mid2ndGame/k2pDBPlayxuDemandPeriodatk1nk 1k/nkatk1knnk Periodk ockppwABS CBDAB22p2n2n121n1)1)1)(n1)(npDApRepairp1wpkCp12SB / CCC1)CCC22nn2(n2(n1)1)(n1) dPlay2ndGameBoldPlayADABABADCDBD C/DBABkpC/CD 1C1)DApatp2npDAp(nCDApnn .wk xk uk Demandp at Periodk Stockk2(nStockatCPeriod22 PeriodCCCC2(n1) p(n1)(n ndGame/CBoldPlay 1) .2(n1)(n(nCost of Period k Stock Orderedat02(nPeriodk0notInventoryDo not11.52 C1)(nnDA11)C1nCDp1PlaypnBD11121n1(n1)0 Repair1 1)0Repair1Repair0.51 21)n0.51.5C1stDo1p1.5pp0ppp212p1pk 1ABDB0Game02Bold1p/ Timid0Play0C0.51AB12(n0.51.5211) . p11.1(nPlay1) BoldpTimidppppp2nd/22GamePlay2ndGameBoldPlay cuk xk 1 xk u k1st wCGameCCCC2n.2(n1)1)(n1)(n1)(n1)nk notABRepairDACDDBABDoRepair1n1nppppp.1st/ TimidPlay1stGame/BoldPlayp1pp1p1112 C1nS/TimidSBPlayCABC//CACpC. CBCCDwwBC1(n1) C2(ndACCD nPeriodp111pd12pStock. pkd1)Stockk )1(n2ndTimidPlay/uPlay1stGame/k2Timidwk xk uk Demand at Periodkp2nStockat Periodk 2ndStockatPeriodd k1pxkpd pw p 1f ,Playu w1st)(xup,k µ)PlaywDo2k(xnkxGame1 pk k)(xkkpkkµpp1np22 Gamep2(np 2(n1112nn1(n1)2(n1)1)1)(n)1)(ncuuxpkk 1 u,n kk)(xwpkfnotpRepairp1, pwpµnn1xpRepairSystemxk 1,0pr(uu(n,0 µ)µwxCost of Period k StockOrderedatpSystemkk m(nkf1)n1)(n1)(n1)nk (xkkSystemkkkk 1kkkkkkkkk.1.50621.510581)(n900.511 /Game21.52nd/2(nTimidPlay2ndBoldPlay01 pC02(n1 Period1.51Timid113ABC0.50Play22/2ndwk atxkPerioduk Demandat0Periodkp02nat01stk0.5Stockat/ ir1pGamennp0.5p112tockk 1 InitialStateAStockABCACDw1)w 11)d111Game/17(nPlay1stBoldPlaypdp1(npd10

Simulation-based methods: reinforcement learning, neuro-dynamic programming A series of video lectures on the latter can be found at the author’s web site Reference: The lectures will follow Chapters 1 and 6 of the author’s book “Dynamic Programming and Optimal Control," Vol. I, Athena Scientific, 2017

Related Documents:

SMB_Dual Port, SMB_Cable assembly, Waterproof Cap RF Connector 1.6/5.6 Series,1.0/2.3 Series, 7/16 Series SMA Series, SMB Series, SMC Series, BT43 Series FME Series, MCX Series, MMCX Series, N Series TNC Series, UHF Series, MINI UHF Series SSMB Series, F Series, SMP Series, Reverse Polarity

Summer School on Language and Understanding, University of San Marino, 1995 (5 lectures) Numazu (Japan) Linguistic Seminar, 1996 (8 lectures) Fall School in Syntax and Semantics, NTNU, Trondheim, Norway, 1999 (3 lectures) University of Leipzig/Max Planck Institute of Cognitive Neuroscience, 2000 (4 lectures)

HSC REVISION LECTURES Please consider the Revision Lectures above. The School for Excellence (TSFX) also offer Trial HSC Exam Lectures at Sydney University too but obviously these involve considerable cost and the inconvenience and disruption of travelling to Sydney. These lectures are very high quality. Information is available at

tion to Luther’s Genesis lectures was provided by a course of lectures and readings given by Dr. Ulrich Asendorf as a visiting scholar at Concordia Theological Seminary in 1993. During doctoral studies, my research in the lectures was facilitated by two seminars in Luther interpretation by Dr.

Forster, Lectures on Riemann Surfaces, Springer-Verlag Griffiths and Harris, Principles of Algebraic Geometry, Wiley Also recommended: Cornalba et al, Lectures on Riemann Surfaces, World Scientific Griffiths, Lectures on Algebraic Curves, AMS 1. Holomorphic functions in on

Lectures suivies H3-H4 p.65 Lectures suivies H5-H6 p.66 Lectures suivies H7-H8 p.72 DEGRES SECONDAIRES Fonds existant Lectures suivies SEC I (Harmos9, Harmos10, Harmos11) p.80 Nouveau

Martin Luther’s Lectures on Romans (1515–1516): Their Rediscovery and Legacy 227 which he gave over a two-year period, 1513–1515.1 Volume 2 of the Weimar fea-tured Luther’s first lectures on Paul’s Letter to the Galatians. Luther delivered these lectures in 1516–1517 and prepared them for publication in 1519. 2 Conspicu-

Main Topic Areas 3 Digital circuits (3 lectures) Programmable Processor (2 lectures) 6502 CPU: Stack, Subroutines (4 lectures) Midterm MIPS: Branch Prediction, Cache (10 lectures)