On The Convergence Of Stochastic Dual Dynamic Programming .

3y ago
40 Views
2 Downloads
307.19 KB
6 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Ronnie Bonney
Transcription

Operations Research Letters 36 (2008) 450–455www.elsevier.com/locate/orlOn the convergence of stochastic dual dynamic programming andrelated methodsA.B. Philpott , Z. GuanDepartment of Engineering Science, The University of Auckland, Private Bag 92019, Auckland, New ZealandReceived 12 November 2007; accepted 24 January 2008Available online 4 March 2008AbstractWe discuss the almost-sure convergence of a broad class of sampling algorithms for multistage stochastic linear programs. We provide aconvergence proof based on the finiteness of the set of distinct cut coefficients. This differs from existing published proofs in that it does notrequire a restrictive assumption.c 2008 Elsevier B.V. All rights reserved.Keywords: Multistage stochastic programming; Monte-Carlo sampling; Benders decomposition1. IntroductionMultistage stochastic linear programs with recourse are wellknown in the stochastic programming community, and arebecoming more common in applications. The typical approachto solving these problems is to approximate the randomvariables using a finite set of outcomes forming a scenariotree (see e.g. [1]). This yields a large-scale mathematicalprogramming problem that can be attacked using a suitablealgorithm. One algorithm that has been widely applied inenergy and logistics settings is the stochastic dual dynamicprogramming (SDDP) method of Pereira and Pinto [9]. Thisalgorithm constructs feasible dynamic programming policiesusing an outer approximation of a (convex) future cost functionthat is computed using Benders cuts. The policies definedby these cuts can be evaluated using simulation, and theirperformance measured against a lower bound on their expectedcost. This provides a convergence criterion that may be appliedto terminate the algorithm when the estimated cost of thecandidate policy is close enough to its lower bound. The SDDPalgorithm has led to a number of related methods (see [2–4,6,10]) that are based on the same essential idea, but seek toimprove the method by exploiting the structure of particularapplications. Corresponding author.E-mail addresses: a.philpott@auckland.ac.nz (A.B. Philpott),z.guan@auckland.ac.nz (Z. Guan).0167-6377/ - see front matter c 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.orl.2008.01.013Since its publication in 1991, a number of authors havestudied the convergence behaviour of SDDP and relatedalgorithms. In his Ph.D. thesis [3] (and in [4]) Donohue statesthat “finite convergence of this algorithm follows from the finiteconvergence of the Nested Decomposition algorithm, sincethe scenarios from which the optimality cuts are generatedare re-sampled at each iteration.” This remark which, strictlyspeaking, should be a statement of convergence with probability1, is not accompanied by a formal proof.The first formal proof of the almost-sure convergence ofmultistage sampling algorithms was published by Chen andPowell [2] who derived this for their CUPPS algorithm. Thisproof was extended by Linowsky and Philpott [8] to coverother multistage sampling algorithms (SDDP [9], AND [4],ReSa [6]) that use outer approximation. The convergence proofsin [2,8] make use of an unstated assumption regarding theindependence of sampled random variables and convergentsubsequences of algorithm iterates. The absence of an argumentshowing that this assumption holds weakens the analysis inthese papers, and leaves open the question of convergence ingeneral.The purpose of this paper is to give a simpler proofof almost-sure convergence for a broad class of samplingalgorithms that include SDDP, AND, ReSa and CUPPS. Ourproof does not require the independence assumption referredto above, but follows the finiteness argument that Donohuealluded to in his thesis, and makes this argument explicit,

451A.B. Philpott, Z. Guan / Operations Research Letters 36 (2008) 450–455by basing it on two well-defined properties of the samplingprocedure. Although it could be claimed that this proof isalready part of the stochastic programming folklore, ourcontribution is to illuminate the key properties that underliealmost-sure convergence of these methods.2. Multistage Benders decompositionWe follow the notation and terminology of [8], and restrictattention to multistage stochastic programs with the followingproperties.(A1) Random quantities appear only on the right-hand side ofthe linear constraints in each stage.(A2) The set Ωt of random outcomes in each stage t 2, 3, . . . , T is discrete and finite(Ωt {ωti i 1, . . . , qt } with probabilitiespti 0 , i).(A3) Random quantities in different stages are independent.(A4) The feasible region of the linear program in each stage isnonempty and bounded.The Assumption (A3) includes the case where the right-handsides of the linear constraints follow an ARMA process,which can be accommodated by augmenting x with variablesthat model this, along with stagewise independent residuals(see [7]). Under the above assumptions, the multistagestochastic linear program can be written in the following form.Solve the problem defined by[L P1 ]subject toQ 1 min c1 x1 Q2 (x1 )x1A 1 x 1 b1 ,x1 0,qtXQ t (xt 1 , ωti ) is defined by the problemsubject tox1 ,θ2A 1 x 1 b1 ,jθ2 (β2 ) x1 α2, j ,x1 0,subject toj 0, . . . , k 1,and, for t 2, . . . , T 1, we solvek[A Ptk ] Ctk (xt 1, ωt ) min ct xt θt 1xt ,θt 1kAt xt ωt Bt 1 xt 1,j θt 1 (βt 1 ) xt αt 1, j ,j 0, . . . , k 1,xt 0.subject toFinally for every k, we set [A PTk ] [L PT ]. The problems[A Ptk ] are approximations of [L Pt ] in the sense that Qt 1 (xt )is approximated (below) by the polyhedral functionmaxj{αt 1, j (βt 1 ) xt }.j 0,.,k 1This means that any solution to [A Ptk ] has a value that is a lowerbound on the optimal value of [L Pt ].For all stages, the first cut ( j 0) is set as the trivialkcutPqt θt 1 k . We use the notation Ct (xt 1 ) to denoteki 1 pti C t (x t 1 , ωt ). In the last stage, T , we have [A PT ] [L PT ], and so for every x T 1 and ωTC Tk (x T 1 , ωT ) Q T (x T 1 , ωT ),k 1, 2, . . . .Since cuts are added from one iteration to the next, andno cuts are taken out, the optimal values of [A Ptk ] form amonotonic sequence, i.e. for k 1, 2, . . .t 2, 3, . . . , T,andpti Q t (xt 1 , ωti ),i 1[L Pt ][A P1k ] C1k min c1 x1 θ2Ctk 1 (xt 1 , ωt ) Ctk (xt 1 , ωt ),where for all t 2, . . . , T,Qt (xt 1 ) For t 1, we solve the linear programQ t (xt 1 , ωt ) min ct xt Qt 1 (xt )xtAt xt ωt Bt 1 xt 1 ,xt 0,and we set QT 1 0.The problem [L Pt ] depends on the choice of ωt and xt 1 ,and so we could write [L Pt (xt 1 , ωt )], though we choose tosuppress this dependence in the notation. By Assumption (A3),[L Pt ] is independent of ωt 1 , ωt 2 . . .In the class of sampling algorithms that we consider in thispaper the functions Qt (xt 1 ) in each stage are approximatedby the maximum of a collection of linear functions, each ofwhich is called a cut. In each iteration k 1, 2, . . . , the typeof algorithm we consider computes a set of feasible solutions{xtk : t 1, . . . , T 1}, and a set of cuts, one for each staget 1, . . . , T 1. This gives rise to a sequence of approximateproblems [A Ptk ], k 1, 2, . . . , for each stage. These aredefined as follows:C1k 1 C1k .Observe that under Assumption (A4),k{xt At xt ωt Bt 1 xt 1, xt 0}is nonempty and bounded so [A Ptk ] always has a nonemptyfeasible set (with θt 1 chosen large enough) and hence anoptimal solution. Thus its dual has an optimal solution (πt , ρt ),where πt corresponds to the equality constraints, and ρtcorresponds to the cut constraints. Furthermore, by Assumption(A1), the set of extreme points of the dual of [A Ptk ] isindependent of the outcomes of the random quantities, whichallows us to construct a valid cut at each stage based on anassembled collection Dtk of extreme-point dual solutions fromdifferent samples.Initially at iteration k 0, Dt0 . At any subsequentiteration k the coefficients of the cuts at each stage t 1, 2, . . . , T 1, are calculated as follows.Cut Calculation Algorithm (CCA)1. Choose a sample Ωtk Ωt , solve [A Ptk ] for all ωti Ωtk ,and add the optimal extreme-point dual solutions to Dtk .

452A.B. Philpott, Z. Guan / Operations Research Letters 36 (2008) 450–455k ), ρ i (x k )) be the best dual solution in D k for2. Let (πti (xt 1t t 1t[A Ptk ] for each ωti Ωt , that is, if t T thenkk(πti (xt 1), ρti (xt 1)) arg max{πt (ωtik 1k Bt 1 xt 1) ρt αt 1 (πt , ρt ) Dtk },andπTi (x Tk 1 ) arg max{πT (ωT i BT 1 x Tk 1 ) πT DTk }otherwise.3. The cut has the formulawhereqtX kβtk pti Bt 1πti (xt 1)αt,k kGt 1 m t 1 .It follows that there exists kt 1 , so that if k kt 1 thenkt 1kGt 1 Gt 1and the cut at iteration k kt 1 is a repeat ofsome cut in the existing cuts. Consider the feasible region ofthe dual of [A Ptk ], namely(k 1XjjkHt (πt , ρt ) A π βt 1 ρt ct ,t tj 1θt αt,k (βtk ) xt 1i 1qtXNow suppose at t that there exists m t 1 such that for all kk 1X)jρt 1, ρt 0 .j 1for 2 t T,ihk 1 i kk) ρt (xt 1 )pti ωti πti (xt 1) (αt 1i 1for 2 t T 1,qTXpT i ωT i πTi (x Tk 1 ).αT,k i 1CCA is the same algorithm as that used to define a sampledk 1cut in [8]. Observe that αt,k is a scalar, whereas αt 1denotesa (k 1)-dimensional vector. This means that the dimensionsk 1k ) are increasing as the iteration countof αt 1and ρti (xt 1k increases, and thus the collection of extreme-point solutionsof the dual of [A Ptk ] may be infinite. On the other hand, thecollection of distinct values of (βtk , αt,k ) is provably finite, aswe show in the following lemma.Lemma 1. For each t 2, 3, . . . , T , define the setIf k kt 1 then any extreme point (πtk , ρtk ) of Htk correspondskto an extreme point (π, ρ) of Ht t 1 with the same dual objectivevalue, obtained by choosing π πtk and basic columnsjjβt 1 for j kt 1 that match the basic columns βt 1 , kt 1 jj k. This is because each latter column βt 1 and its costkt 1coefficient αt 1, j is a duplicate of some (β, α) Gt 1. Sincethere are a finite number, say et , of extreme-point solutions tokHt t 1 , there are at most et distinct values ofhik 1 i kk) ρt (xt 1 )ωti πti (xt 1) (αt 1and so (et )qt distinct values ofαt,k qtXhik 1 i kkpti ωti πti (xt 1) ρt (xt 1 ) .) (αt 1i 1Similarly,qtXGtk {(βt , αt, j ) : j 1, 2, . . . , k 1}.βtk Then for any sequence Gtk , k 1, 2, . . . generated by therepeated application of CCA there exists m t such that for all kcan take at most (et )qt values and so if m t (et )2qt thenj kpti Bt 1πti (xt 1),i 1Gtk m t ,Gtk m t . Furthermore, there exists kt , so that if k kt then Gtk Gtkt .which proves the result.Proof. Consider any realization of the sequence Gtk , k 1, 2, . . . generated by the repeated application of CCA. We useinduction on t to construct m t such that Gtk m t . The secondpart of the lemma follows immediately. First at T , ρT 0 andπT is an extreme point of {π A T π cT } of which there areat most m T 1 , say. Then the cut coefficientsLinowsky and Philpott [8] define a class of sampling-baseddecomposition algorithms, the Multistage Sampled BendersDecomposition (MSBD), which includes SDDP, AND, ReSaand CUPPS. Here we define a different class of algorithms thatwe call Dynamic Outer Approximation Sampling Algorithms(DOASA). For DOASA the sampling procedures must satisfytwo distinct properties FPSP and BPSP that are defined below.WeQmake use of the terminology scenario to denote an elementT 1of t 2Ωt indexed by j so()TY 1TY 1Ωt ω( j) j 1, 2, . . . ,qt .αT,k qTXpT i ωT i πTi (x Tk 1 ),i 1βTk qTXpT i BT 1 πTi (x Tk 1 ),i 1t 2t 2qeach can only take at most m TT 1 values, and thus if m T 2qTm T 1, then for all kGTk m T .Algorithms in the DOASA class perform the followingsteps:Step 0: (Initialization) Set k 1.

453A.B. Philpott, Z. Guan / Operations Research Letters 36 (2008) 450–455Step 1: (Forward pass)Sample a single outcome ωt of the random variablein each stage t 2, 3, . . . , T 1, to give a singlescenario {ωtk } satisfying FPSP. For each stage t k )1, 2, . . . , T 1, compute the primal solution (xtk , θt 1of the problem [A Ptk ].Step 2: (Cut Generation)For each stage t T, T 1, . . . , 2, apply CCA tok with sample Ω k satisfying BPSP.generate a cut at xt 1tStep 3: Set k k 1 and go to Step 1.We now formally state the sampling properties as follows:Forward Pass Sampling PropertyQT 1 (FPSP):For each j 1, 2, . . . , t 2qt , with probability 1nok : {ωtk t 2, 3, . . . , T 1} ω( j) .Backward Pass Sampling Property (BPSP):For each t 2, 3, . . . , T and i 1, 2, . . . , qt , withprobability 1onk : ωti Ωtk .FPSP states that each scenario ω( j) is traversed infinitelymany times with probability 1 in the forward pass. BPSPstates that each scenario outcome ωti is visited infinitely manytimes with probability 1 in the backward pass. There aremany sampling methods satisfying these two properties. Forexample, consider independently sampling a single outcomein each stage with a positive probability for each ωti inthe forward pass and backward pass respectively. Then bythe Borel–Cantelli lemma (see [5]) this method satisfies bothproperties. Another sampling method that satisfies FPSP andBPSP is to repeat anQexhaustive enumeration of each scenarioT 1ω( j), j 1, 2, . . . , t 2qt in both the forward pass and thebackward pass, although such a method would be prohibitivelyexpensive in all but the smallest examples.3. Convergence of DOASA algorithmsPrevious published results in [2,8] give proofs for thealmost-sure convergence of CUPPS and MSBD respectively.The proofs in both of these papers require an important butunstated assumption. Here we state this assumption formallyand discuss it.Let the iterations of the algorithm be indexed by N {1, 2, . . .} and suppose t {1, . . . , T 1}. Let {ωtn , xtn }n Nbe the sequence generated by the sampling algorithm at stage t.Assumption 1. For any infinite subsequence {xtk }k K ofj{xtn }n N there exists a convergent subsequence {xt } j J thatjis independent of {ωt 1 } j J .Remark 4.1 in [2] correctly claims that if N is infinite thenwith probability one N has an infinite subset Nti correspondingto draws of outcome ωti for any i 1, . . . , qt and t 2, . . . , T . This follows by an application of the Borel–Cantellilemma, because each ωtn in {ωtn }n N is independently sampledand Pr[ωtn ωti ] 0.However, the situation becomes more subtle in the proofof Lemma 5.2 in [2]. Here the authors claim that for anyinfinite subset K of N , there exists an infinite subset J with ajconvergent subsequence {x T 1 } j J such that with probabilityone there exists an infinite subset Ji of J corresponding todraws of each sample ωT i for i 1, . . . , qT . The convergentjsubsequence {x T 1 } j J in this lemma is constructed usingthe assumed compactness of the set X in which x T 1 lies.Of course, compactness guarantees a convergent subsequencej{x T 1 } j J of {x Tk 1 }k K , but it cannot be deduced from thisand Remark 4.1 in [2] that there are an infinite number of ωT ijin {ωT } j J for every i 1, . . . , qT . (The problem here is thatfor every convergent subsequence it might be the case that thereare only finitely many ωT i for some i 1, . . . , qT , and thispossibility needs to be ruled out somehow.)In claiming the independence of the sampling procedurefrom the convergence of the subsequence, the authors of [2]are making an implicit assumption (Assumption 1), which isneeded to make the proof of Lemma 5.2 valid. The proof in [8]is based on Lemma 5.2 in [2], and so it is also flawed in theabsence of Assumption 1.We now proceed to give a direct proof of almost-sureconvergence that does not rely on Assumption 1. The newproof formalizes the assertion by Donohue [3] that convergencefollows from re-sampling. To understand how the proof worksconsider an approach (DOASA-N ) in which a finite set of Nscenarios is sampled in advance, and the forward pass of thealgorithm traverses each of these scenarios.DOASA-NStep 0: (Initialization) Set k 1. For s 1 to N , select ateach stage t 2, 3, . . . , T 1, a single outcome ωstof the random variable to give a set of N scenarios.Step 1: (Forward pass)For each scenario s, and stage t 1, 2, . . . , T k1, compute the primal solution (xstk , θs,t 1) of thekproblem [A Pt ].Step 2: (Cut Generation)For each stage t T, T 1, . . . , 2, apply CCA tokk ,generate N cuts at the states xs,t 1with samples Ωs,ts 1, 2, . . . , N .Step 3: Set k k 1 and go to Step 1.Lemma 2. In every realization of iterations, DOASA-Nconverges in a finite number of iterations to a policy givingvalue limk C1k which is at most equal to the optimal expectedcost of [L P1 ].Proof. For each s 1, 2, . . . , N , since k 1, 2, . . . andone cut is constructed in each iteration k, then by Lemma 1,for t {2, . . . , T }, there exists ks,t , so that if k ks,t thenkGtk Gt s,t and thus there is no further change in the cutsk (xdefining Cs,ts,t 1 ), that is, for every x s,t 1maxj{αt, j (βt ) xs,t 1 }j 0,.,k 1 maxj{αt, j (βt ) xs,t 1 }.j 0,.,ks,t 1

454A.B. Philpott, Z. Guan / Operations Research Letters 36 (2008) 450–455N {k }, then forFor t {2, . . . , T }, if we choose kt maxs 1s,teach k kt there is no change in the cuts defining Ctk (xt 1 ),that is, for every xt 1j{αt, j (βt ) xt 1 }maxj 0,.,k 1 maxj 0,.,kt 1j{αt, j (βt ) xt 1 }.Thus all solutions (x1k , θ2k ) to [A P1k ] are the same for k k2 ,k ) to [ A P k ], t 2, 3, . . . , T , soas are all solutions (xtk , θt 1tDOASA-N terminates after iteration k2 .It is easy to see that for every k the optimal value of [A P1k ]is a lower bound on the optimal expected cost of [L P1 ]. QTA 1special case of DOASA-N uses the universe of N t 2 qt scenarios.Lemma 3. Under BPSP, DOASA-N with the universe ofscenarios converges with probability 1 to an optimal solutionto [L P1 ] in a finite number of iterations.Proof. From Lemma 2 in every realization of iterationsDOASA-N will converge in a finite number of steps to a policythat has limk C1k giving a lower bound on the true expected cost.Now consider a realization of DOASA-N iterations, and denotethe limiting policy by (x̄1 , x̄2 (ω2 ), x̄3 (ω2 , ω3 ), . . .), which isobtained at iteration k̄, say. For any scenario ω2 , ω3 , . . . , ωT ,we denote x̄t (ω2 , . . . , ωt ) by x̄t (ω). We claim that for everyk k̄, and any scenario ω,CTk (x̄ T 1 (ω)) QT (x̄ T 1 (ω)),(1)with probability 1, which implies C Tk (x̄ T 1 (ω), ωT ) Q T (x̄ T 1 (ω), ωT ) for all ωT . Otherwise for some particularoutcome ω̂T , we have ω̂T 6 Ωtk , for every k k̄, with positiveprobability which violates BPSP.Now we claim that if k k̄ then for every scenario ωCTk 1 (x̄ T 2 (ω)) QT 1 (x̄ T 2 (ω)).(2)Otherwise for some particular outcome ω̂T 1 ,C Tk 1 (x̄ T 2 (ω), ω̂T 1 ) Q T 1 (x̄ T 2 (ω), ω̂T 1 ).ButC Tk 1 (x̄ T 2 (ω), ω̂T 1 ) subject tomin cT 1 x T 1 θTx T 1 ,θTA T 1 x T 1 ω̂T 1 BT 2 x̄ T 2 (ω),jθT (βT ) x T 1 αT, j ,j 0, . . . , k 1,x T 1 0,which has optimal solution(x T 1 , θT ) (x̄ T 1 (ω),maxj{αT, j (βT ) x̄ T 1 (ω)})j 0,.,k 1with ωT 1 ω̂T 1 .(3)Fig. 1. A new cut shown in bold would be created if θT CTk (x̄ T 1 (ω)).If θT CTk (x T 1 ), then for any k k̄maxj 0,.,k 1j{αT, j (βT ) x̄ T 1 (ω)} CTk (x T 1 ) QT (x̄ T 1 (ω))(4)by (1). But by BPSP we have with probability 1 that for eachk(ω )ωT there is some k(ωT ) k̄ with ωT ΩT T . If we let k̂denote the maximum of the k(ωT ) then the height of the cut atx̄ T 1 (ω) evaluated at iteration k̂ is QT (x̄ T 1 (ω)) contradicting(4) (see Fig. 1). Thus we haveθT CTk (x T 1 ) QT (x T 1 )andC Tk 1 (x̄ T 2 (ω), ω̂T 1 ) cT 1 x T 1 QT (x T 1 ) Q T 1 (x̄ T 2 (ω), ω̂T 1 )contradicting (3), thereby demonstrating (2). Observe thatsince ω̂T 1 w

Keywords: Multistage stochastic programming; Monte-Carlo sampling; Benders decomposition 1. Introduction Multistage stochastic linear programs with recourse are well known in the stochastic programming community, and are becoming more common in applications. The typical approach to solving these problems is to approximate the random

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Jul 09, 2010 · Stochastic Calculus of Heston’s Stochastic–Volatility Model Floyd B. Hanson Abstract—The Heston (1993) stochastic–volatility model is a square–root diffusion model for the stochastic–variance. It gives rise to a singular diffusion for the distribution according to Fell

are times when the fast stochastic lines either cross above 80 or below 20, while the slow stochastic lines do not. By slowing the lines, the slow stochastic generates fewer trading signals. INTERPRETATION You can see in the figures that the stochastic oscillator fluctuates between zero and 100. A stochastic value of 50 indicates that the closing