Automation & Robotics Research Institute (ARRI) The University Of Texas .

1y ago

21 Views

2 Downloads

517.89 KB

56 Pages

Last View : 21d ago

Last Download : 3m ago

Upload by : Alexia Money

Report this link

Download PDF

Transcription

F.L. Lewis & Draguna Vrabie Moncrief-O’Donnell Endowed Chair Head, Controls & Sensors Group Supported by : NSF - PAUL WERBOS Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington Adaptive Dynamic Programming (ADP) For Discrete-Time Systems Talk available online at http://ARRI.uta.edu/acs

Bill Wolovich "Linear Multivariable Systems" New York: Springer-Verlag, 1974. "Robotics: Basic Analysis and Design" , 1987. “Automatic Control Systems: Basic Analysis and Design,” Wolovich, 1994. Interactor Matrix & Structure Falb and Wolovich, “Decoupling in the design and synthesis of multivariable control systems, IEEE Trans. Automatic Control,” 1967. Wolovich and Falb, “On the structure of multivariable systems,” SIAM J. Control, 1969. Wolovich, “The use of state feedback for exact model matching,” SIAM J. Control, 1972. Falb and Wolovich, “The role of the interactor in decoupling, JACC, 1977. Invariants and canonical forms under dynamic compensation, W. Wolovich and P. Falb, SIAM, J. on Control, 14, 1976. The solution of the input-output cover problems WOLOVICH [1972], MORSE [1976], HAMMER and HEYMANN [1981], WONHAM [1974 Pole Placement via Static Output Feedback is NP-Hard Morse, A.S., Wolovich, W.A., Anderson, B.D.O. "GENERIC POLE ASSIGNMENT - PRELIMINARYRESULTS." IEEE Transactions on Automatic Control 28 503 - 506, 1983.

Discrete-Time Optimal Control xk 1 f ( xk ) g ( xk )uk system Vh ( x k ) γ i k r ( x i , u i ) cost i k Example Vh ( xk ) r ( xk , uk ) γ Value function recursion Control policy Example γ i k 1 i ( k 1) r ( xk , uk ) xkT Qxk ukT Ruk r ( xi , ui ) Vh ( xk ) r ( xk , h( xk )) γ Vh ( xk 1 ) , Vh (0) 0 u k h( xk ) the prescribed control input function uk Kxk Linear state variable feedback

Discrete-Time Optimal Control cost Vh ( x k ) γ i k r ( x i , u i ) i k Value function recursion Vh ( xk ) r ( xk , h( xk )) γVh ( xk 1 ) u k h( xk ) the prescribed control policy H ( xk , V ( xk ), h) r ( xk , h( xk )) γVh ( xk 1 ) Vh ( xk ) Hamiltonian V * ( xk ) min(r ( xk , h( xk )) γVh ( xk 1 )) Optimal cost h Bellman’s Principle V * ( xk ) min (r ( xk , u k ) γV * ( xk 1 )) uk Backwards in time solution Optimal Control h * ( xk ) arg min (r ( xk , u k ) γV * ( xk 1 )) uk System dynamics does not appear

The Solution: Hamilton-Jacobi-Bellman Equation xk 1 f ( xk ) g ( xk )uk System V ( xk ) xiT Qxi uiT Rui i k DT HJB equation V ( xk ) min xkT Qxk ukT Ruk V ( xk 1 ) Difficult to solve Contains the dynamics uk min xkT Qxk ukT Ruk V ( f ( xk ) g ( xk )uk ) uk Minimize wrt uk dV ( xk 1 ) 2 Ruk g ( xk )T 0 dxk 1 dV ( xk 1 ) 1 1 T u ( xk ) R g ( xk ) 2 dxk 1

DT Optimal Control – Linear Systems Quadratic cost (LQR) system cost xk 1 Axk Buk V ( xk ) xiT Qxi uiT Rui i k Fact. The cost is quadratic V ( xk ) xkT Pxk for some symmetric matrix P HJB DT Riccati equation 0 AT PA P Q AT PB( R BT PB) 1 BT PA Optimal Control uk Lxk L ( R BT PB) 1 BT PA Optimal Cost V * ( xk ) xkT Pxk Off-line solution Dynamics must be known

Discrete-Time Optimal Adaptive Control Vh ( x k ) γ i k r ( x i , u i ) cost i k Value function recursion Vh ( xk ) r ( xk , h( xk )) γVh ( xk 1 ) u k h( xk ) the prescribed control policy Hamiltonian H ( xk , V ( xk ), h) r ( xk , h( xk )) γVh ( xk 1 ) Vh ( xk ) Optimal cost V * ( xk ) min(r ( xk , h( xk )) γVh ( xk 1 )) h Bellman’s Principle V * ( xk ) min (r ( xk , u k ) γV * ( xk 1 )) uk Optimal Control h * ( xk ) arg min (r ( xk , u k ) γV * ( xk 1 )) Focus on these two eqs uk

Discrete-Time Optimal Control Solutions by Comp. Intelligence Community Value function recursion Vh ( xk ) r ( xk , h( xk )) γ Vh ( xk 1 ), u k h( xk ) the prescribed control policy The Lyapunov Equation Theorem: Let Vh ( xk ) solve the Lyapunov equation. Then Vh ( xk ) γ i k r ( xi , h( xi )) i k Gives value for any prescribed control policy Policy Evaluation for any given current policy Policy must be stabilizing Vh (0) 0

Optimal Control h * ( xk ) arg min (r ( xk , u k ) γV * ( xk 1 )) uk Bellman’s result What about? h '( xk ) arg min(r ( xk , uk ) γ Vh ( xk 1 )) uk for a given policy h(.) ? Theorem. Bertsekas. Let Vh ( xk ) be the value of any given policy h(xk). Then Vh ' ( xk ) Vh ( xk ) Policy Improvement One step improvement property of Rollout Algorithms

e.g. Control policy SVFB DT Policy Iteration h( xk ) Lxk Cost for any given control policy h(xk) satisfies the recursion Vh ( xk ) r ( xk , h( xk )) γVh ( xk 1 ) Lyapunov eq. Recursive form Consistency equation Recursive solution Pick stabilizing initial control Policy Evaluation V j 1 ( xk ) r ( xk , h j ( xk )) γV j 1 ( xk 1 ) f(.) and g(.) do not appear Policy Improvement h j 1 ( xk ) arg min(r ( xk , u k ) γV j 1 ( xk 1 )) uk Howard (1960) proved convergence for MDP

Adaptive Critics The Adaptive Critic Architecture Value update V j 1 ( xk ) r ( xk , h j ( xk )) γV j 1 ( xk 1 ) Control policy update cost Policy Evaluation (Critic network) h j 1 ( xk ) arg min (r ( xk , u k ) γV j 1 ( xk 1 )) uk Action network System h j ( xk ) Control policy Leads to ONLINE FORWARD-IN-TIME implementation of optimal control

Different methods of learning Reinforcement learning Ivan Pavlov 1890s We want OPTIMAL performance - ADP- Approximate Dynamic Programming Actor-Critic Learning Desired performance Reinforcement signal Critic environment Tune actor Adaptive Learning system Actor Control Inputs System outputs

Adaptive (Approximate) Dynamic Programming Four ADP Methods proposed by Paul Werbos Critic NN to approximate: Heuristic dynamic programming Value V ( xk ) Dual heuristic programming Gradient V x AD Heuristic dynamic programming (Watkins Q Learning) Q function Q ( xk , u k ) AD Dual heuristic programming Gradients Q , x Action NN to approximate the Control Bertsekas- Neurodynamic Programming Barto & Bradtke- Q-learning proof (Imposed a settling time) Q u

DT Policy Iteration – Linear Systems Quadratic Cost- LQR xk 1 Axk Buk For any stabilizing policy, the cost is V ( xk ) xiT Qxi u T ( xi ) Ru ( xi ) i k LQR value is quadratic DT Policy iterations V ( x) xT Px Solves Lyapunov eq. without knowing A and B V j 1 ( xk ) xkT Qxk u Tj ( xk ) Ru j ( xk ) V j 1 ( xk 1 ) dV j 1 ( xk 1 ) 1 1 T u j 1 ( xk ) R g ( xk ) 2 dxk 1 Equivalent to an Underlying Problem- DT LQR: ( A BL j )T Pj 1 ( A BL j ) Pj 1 Q LTj RL j L j 1 ( R BT Pj 1 B) 1 BT Pj 1 A DT Lyapunov eq. Hewer proved convergence in 1971 ADP Solves Riccati equation WITHOUT knowing System Dynamics

DT Policy Iteration – How to implement online? Linear Systems Quadratic Cost- LQR V ( xk ) xiT Qxi u ( xi ) Ru ( xi ) xk 1 Axk Buk LQR cost is quadratic i k V ( x) xT Px for some matrix P Solves Lyapunov eq. without knowing A and B DT Policy iterations V j 1 ( xk ) xkT Qxk u Tj ( xk ) Ru j ( xk ) V j 1 ( xk 1 ) xkT Pj 1 xk xkT 1 Pj 1 xk 1 xkT Qxk u Tj Ru j x 1 k [ p11 p x 11 p12 2 k p12 p12 x1k 1 2 xk 1 p22 xk ( x1k ) 2 p22 ] 2 x1k xk2 [ p11 ( xk2 ) 2 W jT 1 [ϕ ( xk ) ϕ ( xk 1 ) ] 2 k 1 x p12 p 11 p12 p12 x1k 1 p22 xk2 1 ( x1k 1 ) 2 p22 ] 2 x1k 1 xk2 1 ( xk2 1 ) 2 Quadratic basis set

Implementation- DT Policy Iteration Value Function Approximation (VFA) V ( x) W T ϕ ( x) weights basis functions LQR case- V(x) is quadratic V ( x) xT Px W T ϕ ( x) ϕ ( x) W T [ p11 Quadratic basis functions p12 L] Nonlinear system case- use Neural Network

Implementation- DT Policy Iteration Value function update for given control V j 1 ( x k ) r ( x k , h j ( x k )) γV j 1 ( x k 1 ) Assume measurements of xk and xk 1 are available to compute uk 1 VFA Then V j ( xk ) W jT ϕ ( xk ) Since xk 1 is measured, do not need knowledge of f(x) T W j 1 [ϕ ( xk ) γϕ ( xk 1 )] r ( xk , h j ( xk )) or g(x) for value fn. update Indirect Adaptive control with identification of the optimal value regression matrix Solve for weights using RLS or, many trajectories with different initial conditions over a compact set Then update control using h j ( xk ) L j xk ( R BT Pj B) 1 BT Pj Axk Model-Based Policy Iteration Need to know f(xk) AND g(xk) for control update Robustness?

1. Select control policy Solves Lyapunov eq. without knowing dynamics 2. Find associated cost V j 1 ( x k ) r ( x k , h j ( x k )) γV j 1 ( x k 1 ) dV j ( xk 1 ) 1 1 T 3. Improve control u j 1 ( xk ) R g ( xk ) 2 dxk 1 observe xk Needs 10 lines of MATLAB code apply uk Direct optimal adaptive control observe cost rk observe xk 1 update V k k 1 do until convergence to Vj 1 update control to uj 1

Adaptive Control Identify the performance valueOptimal Adaptive Identify the system modelIndirect Adaptive Identify the ControllerDirect Adaptive Plant control output

Greedy Value Fn. Update- Approximate Dynamic Programming ADP Method 1 - Heuristic Dynamic Programming (HDP) Paul Werbos Policy Iteration V j 1 ( xk ) r ( xk , h j ( xk )) γV j 1 ( xk 1 ) h j 1 ( xk ) arg min(r ( xk , u k ) γV j 1 ( xk 1 )) uk Lyapunov eq. For LQR ( A BL j ) Pj 1 ( A BL j ) Pj 1 Q L RL j Underlying RE L j ( R BT Pj B) 1 BT Pj A T T j Hewer 1971 Initial stabilizing control is needed ADP Greedy Cost Update Two occurrences of cost allows def. of greedy update V j 1 ( xk ) r ( xk , h j ( xk )) γV j ( xk 1 ) h j 1 ( xk ) arg min(r ( xk , u k ) γV j 1 ( xk 1 )) Simple recursion uk Pj 1 For LQR Underlying RE ( A BL j )T Pj ( A BL j ) Q LTj RL j L j ( R BT Pj B) 1 BT Pj A Lancaster & Rodman proved convergence Initial stabilizing control is NOT needed

Implementation- DT HDP Value function update for given control V j 1 ( xk ) r ( xk , h j ( xk )) γ V j ( xk 1 ) Since xk 1 is measured, do not need knowledge of f(x) or g(x) for value fn. update Assume measurements of xk and xk 1 are available to compute uk 1 VFA Then V j ( xk ) W jT ϕ ( xk ) regression matrix Old weights W jT 1 [ϕ ( xk )] r ( xk , h j ( xk )) γ W jT [ϕ ( xk 1 )] Solve for weights using RLS or, many trajectories with different initial conditions over a compact set Then update control using h j ( xk ) L j xk ( R BT Pj B) 1 BT Pj Axk Need to know f(xk) AND g(xk) for control update

DT HDP vs. Receding Horizon Optimal Control Forward-in-time HDP T T 1 T Pi 1 AT PA Q A PB ( R B PB ) B PA i i i i P0 0 Backward-in-time optimization – RHC Pk AT Pk 1 A Q AT Pk 1 B( R BT Pk 1 B) 1 BT Pk 1 A PN Control Lyapunov Function overbounding P

Hongwei Zhang Dr. Jie Huang Adaptive Terminal Cost RHC Standard RHC xk 1 Axk Buk V ( xk ) k N 1 ( x Qx u T i i k i T i Rui ) xkT N P0 xk N P0 is the same for each stage T 1 T Pi 1 AT Pi A Q AT PB P0 i ( R B PB i ) B Pi A , T 1 T RH ukRH 1 ( R B PN 1 B ) B PN 1 A xk 1 LN xk 1 Requires P0 to be a CLF that overbounds the optimal inf. horizon cost, or large N Our ATC RHC V ( xk ) k N 1 ( x Qx i k T i i uiT Rui ) Final cost from previous stage xkT N PkN xk N T 1 T Pi 1 AT Pi A Q AT PB PkN i ( R B PB i ) B Pi A , HWZ Theorem- Let N 1 under the usual suspect observability and controllability assumptions ATC RHC guarantees ultimate uniform exponential stability for ANY P0 0. Moreover, our solution converges to the optimal inf. horizon cost.

Q Learning - Action Dependent ADP Value function recursion for given policy h(xk) Vh ( xk ) r ( xk , h( xk )) γVh ( xk 1 ) Define Q function Qh ( xk , u k ) r ( xk , u k ) γVh ( xk 1 ) Note uk arbitrary policy h(.) used after time k Qh ( xk , h( xk )) Vh ( xk ) Recursion for Q Qh ( xk , u k ) r ( xk , u k ) γQh ( xk 1 , h( xk 1 )) Simple expression of Bellman’s principle V * ( xk ) min (Q * ( xk , u k )) uk h * ( xk ) arg min (Q * ( xk , u k )) Optimal Adaptive Control (for unknown DT systems) uk

Draguna Vrabie Continuous-Time Optimal Control x& f ( x, u ) System Cost t t Off-line solution Dynamics must be known V ( x(t )) r ( x, u ) dt (Q( x) u T Ru ) dt Hamiltonian T T V V V & H ( x, , u ) V& r ( x, u ) x r ( x , u ) f ( x, u ) r ( x, u ) x x x c.f. DT Hamiltonian Optimal cost Bellman Optimal control H ( xk , V ( xk ), h) r ( xk , h( xk )) γVh ( xk 1 ) Vh ( xk ) T T * V * V x& min r ( x, u ) f ( x, u ) 0 min r ( x, u ) x u (t ) u (t ) x * V h ( x(t )) 1 2 R g ( x) x 1 * T HJB equation T T * * dV * 1 T dV 1 dV f Q ( x) 4 gR g 0 dx dx dx , V ( 0) 0

Bill Wolovich Interactor Matrix & Structure Theorem The solution of the input-output cover problems Pole Placement via Static Output Feedback Thank you for your inspiration and motivation in 1970

Q Function Definition u j h( x j ); Specify a control policy j k , k 1,. Define Q function Qh ( xk , u k ) r ( xk , u k ) γVh ( xk 1 ) Note uk arbitrary policy h(.) used after time k Qh ( xk , h( xk )) Vh ( xk ) Recursion for Q Qh ( xk , u k ) r ( xk , u k ) γQh ( xk 1 , h( xk 1 )) Q * ( xk , u k ) r ( xk , u k ) γV * ( xk 1 )) Optimal Q function Q * ( xk , u k ) r ( xk , u k ) γQ * ( xk 1 , h * ( xk 1 )) Optimal control solution V * ( xk ) Q * ( xk , h* ( xk )) min(Qh ( xk , h( xk ))) h h * ( xk ) arg min (Qh ( xk , h( xk )) h Simple expression of Bellman’s principle V * ( xk ) min (Q * ( xk , u k )) uk h * ( xk ) arg min (Q * ( xk , u k )) uk

Q Function ADP – Action Dependent ADP Q function for any given control policy h(xk) satisfies the recursion Qh ( xk , u k ) r ( xk , u k ) γQh ( xk 1 , h( xk 1 )) Recursive solution Pick stabilizing initial control policy Find Q function Q j 1 ( xk , u k ) r ( xk , u k ) γQ j ( xk 1 , h j ( xk 1 )) Update control h j 1 ( xk ) arg min(Q j 1 ( xk , u k )) uk Now f(xk,uk) not needed Bradtke & Barto (1994) proved convergence for LQR

Q Learning does not need to know f(xk) or g(xk) For LQR V ( x) W T ϕ ( x) x T Px V is quadratic in x Qh ( xk , u k ) r ( xk , u k ) Vh ( xk 1 ) xkT Qxk u kT Ru k ( Axk Bu k )T P( Axk Bu k ) x k u k T T T Q AT PA xk xk H xx AT PB xk xk H T T R B PB u k u k B PA u k u k H ux H xu xk H uu u k Q is quadratic in x and u Control update is found by so 0 Q 2[ B T PAxk ( R B T PB)u k ] 2[ H ux xk H uu u k ] u k 1 u k ( R B T PB) 1 B T PAxk H uu H ux xk L j 1 xk Control found only from Q function A and B not needed

Implementation- DT Q Function Policy Iteration For LQR Q function update for control u k L j xk is given by Q j 1 ( x k , u k ) r ( x k , u k ) γQ j 1 ( x k 1 , L j xk 1 ) Assume measurements of uk, xk and xk 1 are available to compute uk 1 QFA – Q Fn. Approximation Q ( x, u ) W T ϕ ( x, u ) Then Now u is an input to the NN- Werbos- Action dependent NN regression matrix W jT 1 [ϕ ( xk , uk ) γϕ ( xk 1 , L j xk 1 )] r ( xk , L j xk ) Solve for weights using RLS or backprop. For LQR case ϕ ( x) Since xk 1 is measured, do not need knowledge of f(x) or g(x) for value fn. update

Model-free policy iteration Q Policy Iteration Q j 1 ( x k , u k ) r ( x k , u k ) γQ j 1 ( x k 1 , L j xk 1 ) [ ] Bradtke, Ydstie, Barto W jT 1 ϕ ( xk , u k ) γϕ ( xk 1 , L j xk 1 ) r ( xk , L j xk ) Control policy update Stable initial control needed h j 1 ( xk ) arg min(Q j 1 ( xk , u k )) uk 1 u k H uu H ux xk L j 1 xk Greedy Q Fn. Update - Approximate Dynamic Programming ADP Method 3. Q Learning Action-Dependent Heuristic Dynamic Programming (ADHDP) Greedy Q Update Model-free ADP Paul Werbos Q j 1 ( xk , u k ) r ( xk , u k ) γQ j ( xk 1 , h j ( xk 1 )) W jT 1ϕ ( xk , u k ) r ( xk , L j xk ) W jT γϕ ( xk 1 , L j xk 1 ) target j 1 Update weights by RLS or backprop.

Q learning actually solves the Riccati Equation WITHOUT knowing the plant dynamics Model-free ADP Direct OPTIMAL ADAPTIVE CONTROL Works for Nonlinear Systems Proofs? Robustness? Comparison with adaptive control methods?

Discrete-Time Zero-Sum Games Consider the following continuous-state and action spaces discrete-time dynamical system m1 n u R k x R x k 1 Ax k Bu k Ewk y Rp wk R m yk xk , 2 with quadratic cost V ( xk ) i k xiT Qxi uiT ui γ 2 wiT wi The zero-sum game problem can be formulated as follows: T T 2 T [ V ( xk ) min max x Qx u u γ wi wi ] i i i i i k u w The goal is to find the optimal strategies (State-feedback) u * ( x) Lx w* ( x) Kx

Asma Al-Tamimi DT Game Heuristic Dynamic Programming: Forward-in-time Formulation An Approximate Dynamic Programming Scheme (ADP) where one has the following incremental optimization { } Vi 1 ( xk ) min max xkT Qxk u kT u k γ 2 wkT wk Vi ( xk 1 ) uk wk which is equivalently written as Vi 1 ( x k ) x kT Qx k u iT ( x k )u i ( x k ) γ 2 wiT ( x k ) wi ( x k ) Vi ( x k 1 )

Game Algebraic Riccati Equation Using Bellman optimality principle “Dynamic Programming” V ( xk ) minmax( xkT Qxk ukT uk γ 2 wkT wk V ( xk 1 )) uk wk xkT Pxk minmax(r ( xk , uk , wk ) xkT 1Pxk 1 ). uk wk The Game Algebraic Riccati equation GARE 1 P AT PA Q [ AT PB T T T I B PB B PE B PA T A PE ] T T 2 T E PA E PE I E PA γ The condition for saddle point are I BT PB 0 I γ 2 E T PE 0

Game Algebraic Riccati Equation The optimal policies for control and disturbance are L ( I BT PB BT PE ( E T PE γ 2 I ) 1 E T PB) 1 ( BT PE ( E T PE γ 2 I ) 1 E T PA BT PA). K ( E T PE γ 2 I E T PB( I BT PB) 1 BT PE ) 1 ( E T PB( I BT PB) 1 BPA E T PA).

Linear Quadratic case- V and Q are quadratic Asma Al-Tamimi V ( xk ) xkT Pxk Q learning for H-infinity Control Q ( xk , uk , wk ) r ( xk , uk , wk ) V ( xk 1 ) xkT ukT wkT H xkT ukT wkT T Q function update Qi 1 ( xk , uˆi ( xk ), wˆ i ( xk )) xkT Rxk uˆi ( xk )T uˆi ( xk ) γ 2 wˆ i ( xk )T wˆ i ( xk ) Qi ( xk 1 , uˆi ( xk 1 ), wˆ i ( xk 1 )) [ xkT ukT wkT ]H i 1[ xkT ukT wkT ]T xkT Rxk ukT u k γ 2 wkT wk [ xkT 1 u kT 1 wkT 1 ]H i [ xkT 1 u kT 1 wkT 1 ]T Control Action and Disturbance updates ui ( xk ) Li xk , wi ( xk ) K i xk i i 1 i i i 1 i Li ( H uui H uw H ww H wu ) 1 ( H uw H ww H wx H uxi ), i i i i i K i ( H ww H wu H uui 1 H uw ) 1 ( H wu H uui 1 H uxi H wx ). H xx H ux H wx H xu H uu H wu H xw H uw H ww A, B, E NOT needed

Compare to Q function for H2 Optimal Control Case Qh ( xk , u k ) r ( xk , u k ) Vh ( xk 1 ) xkT Qxk u kT Ru k ( Axk Bu k )T P( Axk Bu k ) x k u k T T T Q AT PA xk xk H xx AT PB xk xk H u u H T T u u R B PB k k B PA k k ux H-infinity Game Q function H xu xk H uu u k

Quadratic Basis set is used to allow on-line solution Qˆ ( z , hi ) zT H i z hiT z where z xT wT uT T Asma Al-Tamimi z ( z12 ,K , z1 zq , z22 , z2 z3 ,K , zq 1 zq , zq2 ) and Quadratic Kronecker basis Q function update Qi 1 ( xk , uˆi ( xk ), wˆ i ( xk )) xkT Rxk uˆi ( xk )T uˆi ( xk ) γ 2 wˆ i ( xk )T wˆ i ( xk ) Qi ( xk 1 , uˆi ( xk 1 ), wˆ i ( xk 1 )) Solve for ‘NN weights’ - the elements of kernel matrix H h z ( xk ) x Rxk uˆi ( xk ) uˆi ( xk ) γ wˆ i ( xk ) wˆ i ( xk ) h z ( xk 1 ) T i 1 T k T 2 T T i Use batch LS or online RLS Control and Disturbance Updates uˆi ( x ) Li x wˆ i ( x ) K i x Probing Noise injected to get Persistence of Excitation uˆei ( xk ) Li xk n1k wˆ ei ( x k ) K i x k n2 k Proof- Still converges to exact result

Asma Al-Tamimi

ADHDP Application for Power system System Description x(t ) [Δf (t ) ΔPg (t ) ΔX g (t ) ΔF (t )]T 1/ Tp [0.033,0.1] 1/ Tp 0 A 1/ RTG KE K p / Tp [4,12] K p / Tp 1/ TT B [ 0 0 1/ TG T E T 1 K p / Tp 0 0 0 1/ TT 1/ TG 0 0] 0 1/ TG 0 0 1/ TT [2.564,4.762] 1/ TG [9.615,17.857] 1/ RTG [3.081,10.639] 0 0 0 The Discrete-time Model is obtained by applying ZOH to the CT

ADHDP Application for Power system The system state Δf incremental frequency deviation (Hz) ΔPg incremental change in generator output (p.u. MW) ΔXg incremental change in governor position (p.u. MW) ΔF incremental change in integral control. ΔPd is the load disturbance (p.u. MW); and The system parameters are: - TG the governor time constant TT turbine time constant TP plant model time constant Kp planet model gain R speed regulation due to governor action KE integral control gain.

ADHDP Application for Power system 3 P 11 P 2 12 P 13 P 1 22 P 23 P 33 0 P 34 P -1 44 0 1000 2000 time (k) 3000 The convergence of the control po ADHDP policy tuning The convergence of P 1 0 L L -1 L L -2 -3 0 11 12 13 14 1000 2000 Time (k) 3000

ADHDP Application for Power system Comparison 0.15 0.2 0.1 0.15 states x1, x2, x3,x4 0.05 0 -0.05 Frequency deviation -0.1 Incrmental change of the governer out Incrmental change of the governer pos -0.15 Incrmental change of the in itegral cont -0.2 -0.25 0 X: 0.5 Y: -0.2024 states x1, x2, x3,x4 0.05 0.1 0 -0.05 10 15 Time in sec The ADHDP controller design 20 Incrmental change of the generator ou Incrmental change of the governer pos -0.15 Incrmental change of the in itegral cont -0.2 -0.25 -0.3 5 Frequency deviation -0.1 0 X: 0.5794 Y: -0.2507 5 10 15 20 Time sec The design from [1] The maximum frequency deviation when using the ADHDP controller is improved by 19.3% from the controller designed in [1] [1] Wang, Y., R. Zhou, C. Wen, “Robust load-frequency controller design for power systems”, IEE Proc.-C, Vol. 140, No. I , 1993

Discrete-time nonlinear HJB solution using Approximate dynamic programming : Convergence Proof Problem Formulation xk 1 f ( xk ) g ( xk )uk V ( xk ) min xi Qxi ui Rui uk i k requires solving the DT HJB V ( xk ) min xkT Qxk ukT Ruk V ( xk 1 ) uk min xkT Qxk ukT Ruk V ( f ( xk ) g ( xk )uk ) uk 1 1 T dV ( xk 1 ) u ( xk ) R g ( xk ) 2 dxk 1

Asma Al-Tamimi Discrete-time Nonlinear Adaptive Dynamic Programming: System dynamics xk 1 f ( xk ) g ( xk )u ( xk ) V ( xk ) i k xiT Qxi uiT Rui Value function recursion V ( xk ) x Qxk u Ruk i k 1 xiT Qxi uiT Rui T k T k xkT Qxk ukT Ruk V ( xk 1 ) HDP ui ( xk ) arg min( xkT Qxk u T Ru Vi ( xk 1 )) u Vi 1 min( xkT Qxk u T Ru Vi ( xk 1 )) u xkT Qxk uiT ( xk ) Rui ( xk ) Vi ( f ( xk ) g ( xk )ui ( xk ))

Asma Al-Tamimi Proof of convergence of DT nonlinear HDP Flavor of proofs

Standard Neural Network VFA for On-Line Implementation NN for Value - Critic NN for control action Vˆi ( xk ,WVi ) WViT φ ( xk ) HDP (can use 2-layer NN) uˆi ( xk , Wui ) W σ ( xk ) T ui Vi 1 min( xkT Qxk u T Ru Vi ( xk 1 )) u x Qxk uiT ( xk ) Rui ( xk ) Vi ( f ( xk ) g ( xk )ui ( xk )) T k ui ( xk ) arg min( xkT Qxk u T Ru Vi ( xk 1 )) u d (φ ( xk ), WViT ) xkT Qxk uˆiT ( xk ) Ruˆi ( xk ) Vˆi ( xk 1 ) Define target cost function xkT Qxk uˆiT ( xk ) Ruˆi ( xk ) WViT φ ( xk 1 ) Explicit equation for cost – use LS for Critic NN update WVi 1 arg min{ W WVi 1 φ ( xk ) d (φ ( xk ), W ) dxk } T Vi 1 T Vi 2 Ω WVi 1 φ ( xk )φ ( xk )T dx Ω 1 φ ( x )d k T (φ ( xk ),WViT ,WuiT )dx Ω Implicit equation for DT control- use gradient descent for action update x Qxk uˆ ( xk ,α ) Ruˆ ( xk ,α ) Wui arg min α ˆ ˆ V f x g x u x α ( ( ) ( ) ( , )) k k k i Ω T k T Wui ( j 1) Wui ( j ) α ( xkT Qxk uˆiT( j ) Ruˆi ( j ) Vˆi ( xk 1 ) Wui ( j ) Wuij 1 Wuij ασ ( xk )(2 Ruˆi ( j ) g ( xk )T φ ( xk 1 ) WVi )T xk 1 Backpropagation- P. Werbos

Issues with Nonlinear ADP Selection of NN Training Set LS solution for Critic NN update WVi 1 φ ( xk )φ ( xk )T dx Ω 1 T T T φ ( x ) d ( φ ( x ), W , W )dx k k Vi ui Ω x2 x2 x1 x1 time time Integral over a region of state-space Approximate using a set of points Take sample points along a single trajectory Batch LS Recursive Least-Squares RLS Set of points over a region vs. points along a trajectory For Linear systems- these are the same Conjecture- For Nonlinear systems They are the same under a persistence of excitation condition - Exploration

Interesting Fact for HDP for Nonlinear systems Linear Case h j ( xk ) L j xk ( I B T Pj B ) 1 B T Pj Axk must know system A and B matrices NN for control action uˆi ( xk , Wui ) WuiT σ ( xk ) Implicit equation for DT control- use gradient descent for action update x Qxk uˆ ( xk ,α ) Ruˆ ( xk ,α ) Wui arg min α ˆ ˆ V f x g x u x α ( ( ) ( ) ( , )) k k k i Ω T k Wui ( j 1) Wui ( j ) α T ( xkT Qxk uˆiT( j ) Ruˆi ( j ) Vˆi ( xk 1 ) Wui ( j ) Wuij 1 Wuij ασ ( xk )(2 Ruˆi ( j ) g ( xk )T φ ( xk 1 ) WVi )T xk 1 Note that state internal dynamics f(xk) is NOT needed in nonlinear case since: 1. NN Approximation for action is used 2. xk 1 is measured

Discrete-time nonlinear HJB solution using Approximate dynamic programming : Convergence Proof Simulation Example 1 The linear system – Aircraft longitudinal dynamics 0 -0.0541 -0.0153 1.0722 0.0954 4.1534 1.1175 0 -0.8000 -0.1010 A 0.1359 0.0071 1.0 0.0039 0.0097 0 0 0 0.1353 0 0 0 0 0 0.1353 -0.0453 -1.0042 B 0.0075 0.8647 0 -0.0175 -0.1131 0.0134 0 0.8647 Unstable, Two-input system The HJB, i.e. ARE, Solution P 55.8348 7.6670 16.0470 -4.6754 -0.7265 7.6670 2.3168 1.4987 -0.8309 -0.1215 16.0470 1.4987 25.3586 -0.6709 0.0464 -4.6754 -0.8309 -0.6709 1.5394 0.0782 -0.7265 -0.1215 0.0464 0.0782 1.0240 -4.1136 -0.7170 -0.3847 0.5277 0.0707 L -0.6315 -0.1003 0.1236 0.0653 0.0798

Discrete-time nonlinear HJB solution using Approximate dynamic programming : Convergence Proof Simulation The Cost function approximation Vˆi 1 ( xk , WVi 1 ) WViT 1φ ( xk ) φ T ( x) x 2 x1 x2 x1 x3 x1 x4 WVT [ wV 1 wV 2 wV 3 wV 4 x3 1 x 22 x2 x3 x4 x2 wV 5 wV 6 wV 7 wV 8 x4 x5 ] x1 x5 The Policy approximation uˆi WuiT σ ( xk ) σ T ( x) [ x1 x2 wu12 w WuT u11 wu 21 wu 22 wu13 wu14 wu 23 wu 24 wu15 wu 25 x2 x5 wV 9 x32 wV 10 x3 x4 x3 x5 wV 11 wV 12 x42 x4 x5 wV 13 x52 wV 14 wV 15 ]

Discrete-time nonlinear HJB solution using Approximate dynamic programming : Convergence Proof Simulation The convergence of the cost WVT [55.5411 15.2789 31.3032 -9.3255 -1.4536 24.8262 -1.3076 P11 P 21 P31 P41 P51 P12 P13 P14 P22 P23 P24 P32 P33 P34 P42 P43 P44 P52 P53 P54 0.0920 1.5388 P15 wV 1 P25 0.5wV 2 P35 0.5wV 3 P45 0.5wV 4 P55 0.5wV 5 P 0.1564 2.3142 2.9234 -1.6594 -0.2430 1.0240] 0.5wV 2 0.5wV 3 0.5wV 4 wV 6 0.5wV 7 0.5wV 8 0.5wV 7 wV 10 0.5wV 11 0.5wV 8 0.5wV 11 wV 13 0.5wV 9 0.5wV 12 0.5wV 14 0.5wV 5 0.5wV 9 0.5wV 12 0.5wV 14 wV 15 55.8348 7.6670 16.0470 -4.6754 -0.7265 7.6670 2.3168 1.4987 -0.8309 -0.1215 16.0470 1.4987 25.3586 -0.6709 0.0464 -4.6754 -0.8309 -0.6709 1.5394 0.0782 -0.7265 -0.1215 0.0464 0.0782 1.0240

Discrete-time nonlinear HJB solution using Approximate dynamic programming : Convergence Proof Simulation The convergence of the control policy -4.1136 -0.7170 -0.3847 0.5277 0.0707 L -0.6315 -0.1003 0.1236 0.0653 0.0798 4.1068 0.7164 0.3756 -0.5274 -0.0707 Wu 0.6330 0.1005 -0.1216 -0.0653 -0.0798 L11 L12 L 21 L22 L13 L14 L23 L24 L15 wu12 w u11 L25 wu 21 wu 22 wu13 wu14 wu 23 wu 24 wu15 wu 25 Note- In this example, internal dynamics matrix A is NOT Needed.

Falb and Wolovich, "Decoupling in the design and synthesis of multivariable control systems, IEEE Trans. Automatic Control," 1967. Wolovich and Falb, "On the structure of multivariable systems," SIAM J. Control, 1969. Wolovich, "The use of state feedback for exact model matching," SIAM J. Control, 1972.

Related Documents:

Sound Waves Practice Problems PSI AP Physics 1 Name ...

PSI AP Physics 1 Name_ Multiple Choice 1. Two&sound&sources&S 1∧&S p;Hz&and250&Hz.&Whenwe& esult&is:& (A) great&&&&&(C)&The&same&&&&&

383 Views

3y ago

Elenco Libri della Biblioteca dei ragazzi 2012-13

Argilla Almond&David Arrivederci&ragazzi Malle&L. Artemis&Fowl ColferD. Ascoltail&mio&cuore Pitzorno&B. ASSASSINATION Sgardoli&G. Auschwitzero&il&numero&220545 AveyD. di&mare Salgari&E. Avventurain&Egitto Pederiali&G. Avventure&di&storie AA.&VV. Baby&sitter&blues Murail&Marie]Aude Bambini&di&farina FineAnna

218 Views

3y ago

Taico&Incentive&Services&Inc.&&&&&&&&&&&&&&&&&&&&845&228&4438 ...

The program, which was designed to push sales of Goodyear Aquatred tires, was targeted at sales associates and managers at 900 company-owned stores and service centers, which were divided into two equal groups of nearly identical performance. For every 12 tires they sold, one group received cash rewards and the other received

68 Views

10m ago

The Robotics Primer - University of California, San Diego

The Future of Robotics 269 22.1 Space Robotics 273 22.2 Surgical Robotics 274 22.3 Self-Reconﬁgurable Robotics 276 22.4 Humanoid Robotics 277 22.5 Social Robotics and Human-Robot Interaction 278 22.6 Service, Assistive and Rehabilitation Robotics 280 22.7 Educational Robotics 283

56 Views

3y ago

CHAPTER 6:UNIFORMCIRCULARM OTION ANDGRAVITATION

College"Physics" Student"Solutions"Manual" Chapter"6" " 50" " 728 rev s 728 rpm 1 min 60 s 2 rad 1 rev 76.2 rad s 1 rev 2 rad , π ω π " 6.2 CENTRIPETAL ACCELERATION 18." Verify&that ntrifuge&is&about 0.50&km/s,∧&Earth&in&its& orbit is&about p;linear&speed&of&a .

186 Views

3y ago

ProjectinHistoryofControl: TheHistoryofRobotControl

The Robotics and Automation Council determined to initially focus on three major activities: IEEE Journal of Robotics and Automation, which in 1989 was re-named to IEEE Transactions on Robotics and Automation. In 2004 the journal was split into IEEE Transactions on Robotics and IEEE Transactions on Automation Science and Engineering.

12 Views

1y ago

Music OERs for Grade Level - Maine.gov

theJazz&Band”∧&answer& musical&questions.&Click&on&Band .

166 Views

3y ago

ADVANCING TECHNOLOGY FOR HUMANITY - Institute of Electrical and ...

Robotics and automation technologies have the potential to raise the quality of life for people around the globe. At the IEEE International Conference on Robotics and Automation held in Hong Kong, the IEEE Robotics and Automation Society asked academic and non-academic communities to solve important global problems in the Humanitarian Robotics .

8 Views

1y ago

Recent Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

McKinsey on Finance

finance and strategy 23 How M&A practitioners enable their success Perspectives on Corporate Finance and Strategy Number 56, Autumn 2015 Finance McKinsey on. McKinsey on Finance. is a quarterly publication written by corporate-finance experts and practitioners at McKinsey & Company. This publication offers readers insights into value-creating .

3y ago

272 Views

SAP Simple Finance - tutorialspoint

SAP Simple Finance is only known as S/4 HANA Finance and this will be the only name of other releases of SAP Simple Finance. During the installation of SAP S/4 HANA Finance, various front-end and back-end components get installed. 2. SAP Simple Finance Introduction

3y ago

252 Views

pwc Finance Function Transformation

PwC’s finance effectiveness framework looks at 3 core areas within finance, to frame a programme of work that makes the finance function more effective, and to increase its interaction with the business: Finance efficiency Risk, Compliance and Control Finance Insights (the key lever in

2y ago

285 Views

Sustainable Finance: A Primer and Recent Developments

Social (impact) finance RBC Wealth Management Green finance Resonance Fund Impact finance Bridges Fund Management Socially responsible finance Nutmeg . Source: Author's own research. Despite this variety of definitions, some consistency of terminology has coalesced around the construct of "sustainable finance" in terms of a range of

1y ago

151 Views

The International Finance Corporation's Blended Finance Operations

The International Finance Corporation's Blended Finance Operations . 1. Context. Blended finance is a risk mitigation tool applied to investments for which it is difficult to attract commercial funding. Blended finance refers to the combination of concessional and commercial funding in private sector-led projects. Its rationale is

1y ago

187 Views

Agile Finance Reimagined Reimagining Finance for the New Normal

6 Agile Finance Reimagined: Reimagining finance for the new normal While the global impact of COVID-19 is still evolving, this much is clear: finance functions have been forced to deliver more value to the business, beyond simply driving down costs. "We are seeing that shift from finance being focused on efficiency to effectiveness," said the

1y ago

130 Views

Oracle Banking Supply Chain Finance User Guide

Oracle Banking Supply Chain Finance User Guide 7 2. Supply Chain Finance - An Overview 2.1 Supply Chain Finance Supply Chain Finance commonly known as (SCF) is a type of supplier finance which enables the supplier to cash his receivables early than the actual payment date, thereby freeing up its working capital.

1y ago

132 Views

Automation & Robotics Research Institute (ARRI) The University Of Texas .

It looks like you're using an ad-blocker