Neurocomputing 165 (2015) 90–98Contents lists available at ScienceDirectNeurocomputingjournal homepage: www.elsevier.com/locate/neucomNeural-network-based decentralized control of continuous-timenonlinear interconnected systems with unknown dynamics Derong Liu n, Chao Li, Hongliang Li, Ding Wang, Hongwen MaThe State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, Chinaart ic l e i nf oa b s t r a c tArticle history:Received 28 February 2014Received in revised form24 June 2014Accepted 5 July 2014Available online 17 April 2015In this paper, we establish a neural-network-based decentralized control law to stabilize a class ofcontinuous-time nonlinear interconnected large-scale systems using an online model-free integralpolicy iteration (PI) algorithm. The model-free PI approach can solve the decentralized control problemfor the interconnected system which has unknown dynamics. The stabilizing decentralized control law isderived based on the optimal control policies of the isolated subsystems. The online model-free integralPI algorithm is developed to solve the optimal control problems for the isolated subsystems withunknown system dynamics. We use the actor-critic technique based on the neural network and the leastsquares implementation method to obtain the optimal control policies. Two simulation examples aregiven to verify the applicability of the decentralized control law.& 2015 Elsevier B.V. All rights reserved.Keywords:Adaptive dynamic programmingDecentralized controlOptimal controlPolicy iterationNeural networks1. IntroductionDecentralized control method using local information of eachsubsystem is an efﬁcient and effective way in the control ofinterconnected systems. This overcomes the limitations of thetraditional control method that requires sufﬁcient informationbetween subsystems. Unlike a centralized controller, a decentralizedcontroller can be designed independently for local subsystems andmake full use of the local available signals for feedback. Therefore,the decentralized controllers have simpler architecture, and are morepractical than the traditional centralized controllers. Various decentralized controllers have been established for large-scale interconnected systems in the presence of uncertainties and informationstructure constraints [1–7]. Generally speaking, a decentralizedcontrol law is composed of some noninteracting local controllerscorresponding to the isolated subsystems, not the overall system. Inmany situations, the design of the isolated subsystems is veryimportant. In , the decentralized controller was derived for thelarge-scale system using the optimal control policies of the isolatedsubsystems. Therefore, the optimal control method can be applied tofacilitate the design process of the decentralized control law.The optimal control problem of nonlinear systems has beenwidely studied in the past few decades. The optimal control policy This work was supported in part by the National Natural Science Foundation ofChina under Grants 61034002, 61233001, 61273140, 61304086, and 61374105.nCorresponding author.E-mail addresses: [email protected] (D. Liu), [email protected] (C. Li),[email protected] (H. Li), [email protected] (D. Wang),[email protected] (H. 0925-2312/& 2015 Elsevier B.V. All rights reserved.can be obtained by solving Hamilton–Jacobi–Bellman (HJB) equationwhich is a partial differential equation. Because of the curse ofdimensionality , this is a difﬁcult task even in the case ofcompletely known dynamics. Among the methods of solving theHJB equation, adaptive dynamic programming (ADP) has receivedincreasing attention owing to its learning and optimal capacities[10–20]. Reinforcement learning (RL) is another computationalmethod and it can interactively ﬁnd an optimal policy [21–24]. AlTamimi et al.  proposed a greedy iterative ADP to solve theoptimal control problem for nonlinear discrete-time systems. Parket al.  used multilayer neural networks (NNs) to design a ﬁnitehorizon optimal tracking neuro-controller for discrete-time nonlinearsystems with quadratic cost function. Abu-Khalaf and Lewis established an ofﬂine optimal control law for nonlinear systems withsaturating actuators. Vamvoudakis and Lewis  derived a synchronous policy iteration (PI) algorithm to learn online continuous-timeoptimal control with known dynamics. Vrabie and Lewis  derivedan integral RL method to obtain direct adaptive optimal control fornonlinear input-afﬁne continuous-time systems with partiallyunknown dynamics. Jiang and Jiang  presented a novel PIapproach for continuous-time linear systems with completeunknown dynamics. Liu et al.  extended the PI algorithm tononlinear optimal control problem with unknown dynamics anddiscounted cost function. Lee et al. [32,33] presented an integral Qlearning algorithm for continuous-time systems without the exactknowledge of the system dynamics.It is difﬁcult to obtain the exact knowledge of the system dynamicsfor large-scale systems, such as transportation systems and powersystems. The novelty of this paper is that we relax the assumptions ofexact knowledge of the system dynamics required in the optimal
D. Liu et al. / Neurocomputing 165 (2015) 90–98controller design presented in . In this paper, we use an onlinemodel-free integral PI to solve the decentralized control of a class ofcontinuous-time nonlinear interconnected systems. We establish thestabilizing decentralized control law by adding feedback gains to thelocal optimal polices of the isolated subsystems. The optimal controlproblems for the isolated subsystems with unknown dynamics arerelated to develop the decentralized control law. To implement thisalgorithm, a critic NN and an action NN are used to approximate thevalue function and control policy of the isolated subsystem, respectively. The effectiveness of the decentralized control law established inthis paper is demonstrated by two simulation examples.The rest of this paper is organized as follows. In Section 2, wepresent the decentralized control problem of the continuous-timenonlinear large-scale interconnected system. Section 3 presentsthe decentralized stabilization control law for the continuous-timeinterconnected system by adding appropriate feedback gains tothe local optimal polices of the isolated subsystems. In Section 4,we derive a model-free PI algorithm using NN implementation toobtain the decentralized control law. Two simulation examples areprovided in Section 5 to illustrate the effectiveness of the deriveddecentralized control law. In Section 6, we conclude the paperwith a few remarks.2. Problem formulationWe consider a continuous-time nonlinear large-scale systemcomposed of N interconnected subsystems described byΣð1Þwhere xi ðtÞ A Rni is the state, ui ðxi ðtÞÞ A Rmi is the control inputvector of the ith subsystem. The overall state of the large-scalePsystem Σ is denoted by x ¼ ½xT1 xT2 xTN T A Rn , where n ¼ Ni ¼ 1 ni .The local states are represented by x1, x2, , xN, whereas u1 ðx1 Þ,u2 ðx2 Þ, , uN ðxN Þ are local controls. For the ith subsystem, fi is acontinuous nonlinear internal dynamics function from Rni into Rnisuch that f i ð0Þ ¼ 0. g i ðxi Þ is the input gain function from Rni intoRni mi . Z i ðxðtÞÞ is the interconnected term for the ith subsystem.The ith isolated subsystem Σi is given byΣ i : x i ðtÞ ¼ f i ðxi ðtÞÞ þ gi ðxi ðtÞÞui ðxi ðtÞÞ:ð3Þtwhere xi ðτÞ denotes the solution of the ith subsystem (2) for theinitial condition xi ðtÞ A Ωi and the input fui ðτÞ; τ 4 tg. r i ðxi ; ui Þ ¼Q i ðxi Þ þ uTi ðxi ÞRi ui ðxi Þ, where Q i ðxi Þ is a positive deﬁnite function, i.e.,8 xi a 0, Q i ðxi Þ 40 and xi ¼ 0 ) Q i ðxi Þ ¼ 0, and Ri A Rmi mi is apositive deﬁnite matrix.3. Decentralized control lawIn this section, we present the decentralized controller design.The optimal control problem of the isolated subsystems isdescribed under the framework of HJB equations. The decentralized control law is derived by adding some local feedback gains tothe isolated optimal control policies.3.1. Optimal controlIn this paper, to design the decentralized control law, we needto solve the optimal control problems for the N isolated subsystems. According to the optimal control theory, we know that thedesigned feedback control policy must not only stabilize thesubsystem on Ωi, but also guarantee that the cost function (3) isﬁnite. That is to say, the control policy must be admissible.Deﬁnition 1. Consider the ith isolated subsystem, a control policyΣ : x i ðtÞ ¼ f i ðxi ðtÞÞ þ gi ðxi ðtÞÞðui ðxi ðtÞÞ þ Z i ðxðtÞÞÞi ¼ 1; 2; ; Ncost function:Z 1J i ðxi ðtÞÞ ¼r i ðxi ðτÞ; ui ðτÞÞ dτ91ð2ÞFor the ith isolated subsystem, we assume that the subsystem iscontrollable, f i þ g i ui is Lipschitz continuous on a set Ωi in Rni , andthere exists a continuous control policy that asymptotically stabilizes the subsystem. Additionally, we let the following assumptions hold through this paper.Assumption 1. The state vector xi ¼0 is the equilibrium of the ithsubsystem, i ¼ 1; 2; ; N.Assumption 2. The functions f i ð Þ and g i ð Þ are differentiable intheir arguments with f i ð0Þ ¼ 0, where i ¼ 1; 2; ; N.Assumption 3. The feedback control vector ui ðxi Þ ¼ 0 when xi ¼0,where i ¼ 1; 2; ; N.In this paper, we aim at ﬁnding N feedback control policies u1 ðx1 Þ,u2 ðx2 Þ, , uN ðxN Þ as the decentralized control law to stabilize thelarge-scale system (1) when dealing with the decentralized controlproblem. In the control pair ðu1 ðx1 Þ; u2 ðx2 Þ; ; uN ðxN ÞÞ, the ith controlpolicy ui ðxi Þ is only a function of the corresponding local state, namelyxi. As shown in , the decentralized control law of the interconnected system is related to the optimal controllers of the isolatedsubsystems. To deal with the optimal control problem, we need toﬁnd the optimal control policy uni ðxi Þ of the ith isolated subsystem.The optimal control policy minimizes the following inﬁnite horizonμi ðxi Þ is deﬁned as admissible with respect to (3) on Ωi, denoted byμi ðxi Þ A Ψ i ðΩi Þ, if μi ðxi Þ is continuous on Ωi, μi ð0Þ ¼ 0, μi ðxi Þ stabilizes the ith isolated subsystem (2) on Ωi, and J i ðxi ðtÞÞ is ﬁnite8 xi0 A Ωi .We consider the ith isolated subsystem Σi in (2). For anyadmissible control policy μi ðxi Þ A Ψ i ðΩi Þ, we assume that theassociated value functionZ 1V i ðxi ðtÞÞ ¼r i ðxi ðτÞ; μi ðτÞÞ dτtis continuously differentiable. The inﬁnitesimal version of thisvalue function is the nonlinear Lyapunov equationr i ðxi ; μi Þ þ ð V i ðxi ÞÞT ðf i ðxi Þ þ g i ðxi Þμi ðxi ÞÞ ¼ 0ð4Þwith V i ð0Þ ¼ 0. In (4), the term V i ðxi Þ ¼ V i ðxi Þ xi denotes thepartial derivative of the local value function V i ðxi Þ with respect tothe local state xi.The optimal value function of the ith isolated subsystem can beformulated asZ 1V ni ðxi ðtÞÞ ¼ minr i ðxi ðτÞ; μi ðτ ÞÞ dτ;ð5Þμi A Ψ i ðΩi Þ tand it satisﬁes the so-called HJB equation0¼min H i ðxi ; μi ; V ni ðxi ÞÞμi A Ψ i ðΩi Þwhere V ni ðxi Þ ¼ V ni ðxi Þ xi . The Hamiltonian function of the ithisolated subsystem is deﬁned byH i ðxi ; μi ; V i ðxi ÞÞ¼ r i ðxi ; μi Þ þ ð V i ðxi ÞÞT ðf i ðxi Þ þ g i ðxi Þμi ðxi ÞÞ:ð6ÞBy minimizing the Hamiltonian function (6), the optimal controlpolicy for the ith isolated subsystem can be obtained asuni ðxi Þ ¼ argmin H i ðxi ; μi ; V ni ðxi ÞÞμi A Ψ i ðΩi Þ¼ 12 Ri 1 g Ti ðxi Þ V ni ðxi Þ:ð7Þ
92D. Liu et al. / Neurocomputing 165 (2015) 90–98Substituting the optimal control policy (7) into the nonlinearLyapunov equation (4), we can obtain the formulation of the HJBequation in terms of V ni ðxi Þ as follows:0 ¼ Q i ðxi Þ þð V ni ðxi ÞÞT f i ðxi Þ 14 ð V ni ðxi ÞÞT g i ðxi ÞRi 1 g Ti ðxi Þ V ni ðxi Þð8Þwith V ni ð0Þ ¼ 0.3.2. Stabilizing decentralized control lawNext, we provide the modiﬁed theorem which can be used toestablish the stabilizing decentralized control law for the largescale system (1).Theorem 2. For interconnected system (1), there exist N positivenumbers π ni 40, i ¼ 1; 2; ; N, such that for any π i 4 π ni , the feedbackcontrols developed by (9) ensure that the closed-loop interconnectedsystem is asymptotically stable. That is to say, the control pair ðu1 ðx1 Þ,u2 ðx2 Þ; ; uN ðxN ÞÞ is the decentralized control law of the large-scaleinterconnected system (1).According to , we modify the local optimal control lawsun1 ðx1 Þ, un2 ðx2 Þ, , unN ðxN Þ by proportionally adding some localfeedback gains to obtain a stabilizing decentralized control lawfor the interconnected large-scale system (1). Now, we give thefollowing theorem to indicate how to add the feedback gains andhow to guarantee the asymptotic stability of the subsystems.Proof. According to Theorem 1, we observe that V ni ðxi Þ is Lyapunov function. Here, we select a composite Lyapunov functiongiven byΣi (2), thewhere θi is an arbitrary positive constant. Taking the timederivative of L(x) along the trajectories of the closed-loop interconnected system, we haveTheorem 1. Considering the ith isolated subsystemfeedback controlui ðxi Þ ¼ π i uni ðxi Þ ¼ 12 π i Ri 1 g Ti ðxi Þ V ni ðxi Þð9Þcan ensure that the ith closed-loop isolated subsystem is asymptotically stable 8 π i Z 1 2.Proof. The theorem can be proved by showing V ni ðxi Þ is a Lyapunov function. Considering (5), we notice that V ni ðxi Þ 4 0 for anyxi a0 and V ni ðxi Þ ¼ 0 when xi ¼0, which implies that V ni ðxi Þ is apositive deﬁnite function. Then, the derivative of V ni ðxi Þ along thecorresponding trajectory of the closed-loop isolated subsystem isgiven bynV i ðxi Þ ¼ ð V ni ðxi ÞÞT x i¼ ð V ni ðxi ÞÞT ðf i ðxi Þ þ g i ðxi Þui ðxi ÞÞ:ð10ÞAdding and subtracting ð1 2Þð V ni ðxi ÞÞT g i ðxi Þuni ðxi Þ to (10), andconsidering (7)–(9), we havenV i ðxi Þ ¼ ð V ni ðxi ÞÞT f i ðxi Þ 12 ðπ i 12 Þ 14 ð V ni ðxi ÞÞT g i ðxi ÞRi 1 g Ti ðxi Þ V ni ðxi Þ 1 2 Tg i ðxi Þ V ni ðxi Þ‖2 :ð11ÞnIn light of (11), we can obtain that V i ðxi Þ o 0 for all π i Z 1 2 andxi a0. Therefore, the conditions for Lyapunov local stability theoryare satisﬁed. The proof is completed. To demonstrate the theorem related to the stabilizing decentralized control law, we assume that the interconnected termZ i ðxðtÞÞ is characterized by a bound on its magnitude asJ Z i ðxÞ J rNXρij hij ðxj Þ;i ¼ 1; 2; ; Nð12Þj¼11 2where Z i ðxÞ ¼ Ri Z i ðxÞ and Ri is the positive deﬁnite matrixdeﬁned in (3). hij ðxj Þ is a positive semi-deﬁnite function, and ρijis a non-negative constant with i; j ¼ 1; 2; ; N. If we deﬁne hi ðxi Þ ¼maxfh1i ðxi Þ; h2i ðxi Þ; ; hNi ðxi Þg, the condition (12) can be rewritten asJ Z i ðxÞ J rNXλij hj ðxj Þ;i ¼ 1; 2; ; Nð13Þj¼1where λij Z ρij hij ðxj Þ hj ðxj Þ is also a non-negative constant. Weassume that hi ðxi Þ satisﬁes2hi ðxi Þ r Q i ðxi Þ;i ¼ 1; 2; ; Nwhere Q i ðxi Þ is the positive deﬁnite function in (3).NXθi V ni ðxi Þð15Þi¼1LðxÞ¼NXnθi V i ðxi Þi¼1¼NXθi fð V ni ðxi ÞÞT ðf i ðxi Þ þ gi ðxi Þui ðxi ÞÞ þ ð V ni ðxi ÞÞT gi ðxi ÞZ i ðxÞg:i¼1ð16ÞThen, considering (11), (13) and (14), and after some basicmanipulations, (16) can be turned into the following form: 11π i ‖ð J ni ðxi ÞÞT gi ðxi ÞRi 1 2 ‖222i¼19 NX 1 21 2nTJλij Q j ðxj Þ : J ð J i ðxi ÞÞ g i ðxi ÞRi;r LðxÞNX θi Q i ðxi Þ þð17Þj¼1 ð V ni ðxi ÞÞT g i ðxi ÞRi 1 g Ti ðxi Þ V ni ðxi Þ¼ Q i ðxi Þ 12 ðπ i 12 Þ‖RiLðxÞ ¼ð14ÞLike the result presented in , we can transform (17) to thefollowing compact form:23TΘ 12Λ ΘLðxÞ r ξT 45ξ 12ΘΛΘΠ9 ξ AξTð18Þwhere Θ, Λ, Π, and ξ are chosen as those denoted in . In light of(18), we know that sufﬁciently large πi can be chosen to guaranteethat the matrix A is positive deﬁnite. That is to say, there exist π niso that all π i Z π ni are large enough to guarantee the positiveo 0. Therefore, the conditionsdeﬁniteness of A. Then, we have LðxÞfor Lyapunov stability theory are satisﬁed, and the closed-loopinterconnected system is asymptotically stable under the action ofcontrol pair ðu1 ðx1 Þ; u2 ðx2 Þ; ; uN ðxN ÞÞ. The proof is completed. 4. NN-based implementation using online model-free PIalgorithmIn this section, we discuss the implementation of the decentralized control law presented in Section 3. We introduce theonline PI algorithm in the ﬁrst subsection. A model-free integral PIalgorithm is derived to solve the optimal control problem withcompletely unknown dynamics in the second subsection. A NNbased implementation of the established model-free integral PIalgorithm is discussed at last.
D. Liu et al. / Neurocomputing 165 (2015) 90–984.1. Online PI algorithmThe formulation developed in (7) displays an array of closedform expression of the optimal control policy for the ith isolatedsubsystem, which obviates the need to search for the optimalcontrol policy via optimization process. The existence of V ni ðxi Þsatisfying (8) is the necessary and sufﬁcient condition for optimality. However, it is generally difﬁcult and impossible to obtain thesolution V ni ðxi Þ of the HJB equation.We make effort to obtain the approximation solution of the HJBequation related to the optimal control problem. Instead of directlysolving (8), the solution V ni ðxi Þ can be obtained by successivelysolving the nonlinear Lyapunov equation (4) and updating thepolicy based on (7). This successive approximation is known as thePI algorithm, and it is described in Algorithm 1 as the fundamentalfor the model-free PI method. In , it was shown that forAlgorithm 1 on the domain Ωi, V ðpÞi ðxi Þ uniformly converges toþ 1ÞðpÞV ni ðxi Þ with monotonicity 0 oV ðpðxi Þ o V ðpÞii ðxi Þ, and μi ðxi Þ isnadmissible and converges to ui ðxi Þ. The online PI algorithmconsisting of policy evaluation and policy improvement can bedemonstrated as follows.Algorithm 1. Online PI.1: Give a small positive real number ϵ. Let p ¼0 and start withan initial admissible control policy μð0Þi ðxi Þ.2: Policy Evaluation: Based on the control policy μðpÞ ðxi Þ,solveithe following nonlinear Lyapunov equations forðpÞðpÞTi ðxi ÞÞ Ri i ðxi ÞðpÞTþ ð V ðpÞi ðxi ÞÞ ðf i ðxi Þ þ g i ðxi Þ i ðxi ÞÞ:0 ¼ Q i ðxi Þ þ ðμV ðpÞi ðxi Þ:μμ(19)3: Policy Improvement: Update the control policy byþ 1Þμðpðxi Þ ¼ 12 Ri 1 g Ti ðxi Þ V ðpÞii ðxi Þ:(20)4: If J V ðpÞ ðxi Þ V ðp 1Þ ðxi Þ J r ϵ, stop and obtain the approximateiioptimal control law of the ith isolated subsystem; else, setp ¼ p þ 1 and go to Step 2.ð21ÞThe derivative of the value function with respect to time along thetrajectory of the subsystem (21) is calculated asV i ðxi Þ ¼ V Ti ðxi Þðf i ðxi Þ þ g i ðxi Þ½μi ðxi Þ þ ei Þð22ÞWe present a lemma which is essential to prove the convergence of the model-free PI algorithm for the isolated subsystems.Lemma 1. Solving for V i ðxi Þ in the following equation:Z t þT V Ti ðxi Þg i ðxi Þei dτV i ðxi ðt þ TÞÞ V i ðxi ðtÞÞ ¼tZ t þT r i ðxi ; μi ðxi ÞÞ dτwhich must hold for any xi on the system trajectories generated bythe stabilizing policy μi ðxi Þ. According to (24), we haveV i ðxi Þ ¼ V i ðxi Þ þc. As this relation must hold for xi ðtÞ ¼ 0, we knowV i ð0Þ ¼ V i ð0Þ þ c ) c ¼ 0. Thus, V i ðxi Þ ¼ V i ðxi Þ, i.e., (23) has a uniquesolution which is equal to the solution of (22). The proof iscompleted. Integrating (22) from t to t þT with any time interval T 4 0, andconsidering (19) and (20), we haveðpÞV ðpÞi ðxi ðt þ TÞÞ V i ðxi ðtÞÞZ t þTþ 1Þðμðpðxi ÞÞT¼ 2it Ri ei dτ ZtþTtðpÞTfQ i ðxi Þ þ ðμðpÞi ðxi ÞÞ Ri μi ðxi Þg dτ :ð25ÞEq. (25) which is derived by (20) and (23) plays an importantrole in relaxing the assumption of knowing the system dynamics,since f i ðxi Þ and g i ðxi Þ do not appear in the equation. It means thatthe iteration can be done without knowing the system dynamics.Thus, we obtain the online model-free integral PI algorithm.Algorithm 2. Online Model-free Integral PI.1: Give a small positive real number ϵ. Let p ¼0 and start withpolicy μðpÞi ðxi Þ, solve the following nonlinear LyapunovWe will develop an online mo
The optimal control problem of the isolated subsystems is described under the framework of HJB equations. The decentra-lized control law is derived by adding some local feedback gains to the isolated optimal control policies. 3.1. Optimal control In this paper, to design the decentralized control law, we need