Optimal Control Of Complex Systems Through Variational .

2y ago
30 Views
3 Downloads
1.16 MB
9 Pages
Last View : 5d ago
Last Download : 3m ago
Upload by : Xander Jaffe
Transcription

Optimal Control of Complex Systems throughVariational Inference with a Discrete Event Decision ProcessFan YangBo LiuWen DongUniversity at BuffaloBuffalo, New Yorkfyang24@buffalo.eduAuburn UniversityAuburn, Alabamaboliu@auburn.eduUniversity at BuffaloBuffalo, New Yorkwendong@buffalo.eduABSTRACTComplex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal controlof a real-world complex system has many applications, includingroad traffic management, epidemic prevention, and informationdissemination. However, such real-world complex system controlis difficult to achieve because of high-dimensional and non-linearsystem dynamics, and the exploding state and action spaces for thedecision maker. Prior methods can be divided into two categories:simulation-based and analytical approaches. Existing simulationapproaches have high-variance in Monte Carlo integration, andthe analytical approaches suffer from modeling inaccuracy. Weadopted simulation modeling in specifying the complex dynamicsof a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensionalstate-action space. To capture the complex system dynamics, weformulate the complex social network decision making problemas a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces incomplex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropyapproximation, and develop an expectation propagation algorithm.Our proposed algorithm leads to higher system expected rewards,faster convergence, and lower variance of value function in a realworld transportation scenario than state-of-the-art analytical andsampling approaches.ACM Reference Format:Fan Yang, Bo Liu and Wen Dong. 2019. Optimal Control of Complex Systems through Variational Inference with a Discrete EventDecision Process. In Proc. of the 18th International Conference onAutonomous Agents and Multiagent Systems (AAMAS 2019), May13-17, 2019, Montréal, Canada. IFAAMAS, 9 pages.1INTRODUCTIONA complex social system is a collective system composed of a largenumber of interconnected entities that as whole exhibit properties and behaviors resulted from the interaction of its individualparts [27]. Achieving optimal control of a real-world complex socialsystem is important and valuable. For example, in an urban transportation system, a central authority wants to reduce the overalldelay and the total travel time by controlling traffic signals usingtraffic data collected from the Internet of Things [2]. In an epidemicProc. of the 18th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2019), N. Agmon, M. E. Taylor, E. Elkind, M. Veloso (eds.), May 13–17, 2019,Montreal, Canada. 2019 International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.system, an optimal strategy is pursued to minimize the expected discounted losses resulting from the epidemic process over an infinitehorizon, with specific aims such as varying the birth and death rates[18] or early detection of epidemic outbreaks [13]. In the sphere ofpublic opinion, components such as information consumers, newsmedia, social websites, and governments aim to maximize theirown utility functions with optimal strategies. A specific objectivein the public opinion environment is tackling fake news to create aclearer, more trusted information environment [1]. These scenarios exemplify the enormous potential applications of a versatileoptimal control framework for general complex systems.However, key characteristics of complex systems make establishing such a framework very challenging. The system typically hasa large number of interacting units and thus a high-dimensionalstate space. State transition dynamics in complex systems are already non-linear, time-variant, and high-dimensional, and thenthese frameworks must account for additional control variables.Prior research in decision making for complex social systems hasgenerally gone in one of two directions: a simulation or analytical approach [21]. Simulation approaches specify system dynamicsthrough a simulation model and develop sampling-based algorithmsto reproduce the dynamic flow [29, 41, 46, 47]. These approachescan capture the microscopic dynamics of a complex system withhigh fidelity, but have a high variance and are time-consuming[20]. Analytical approaches instead formulate the decision-makingproblem as a constrained optimization problem with an analyticalmodel through specifying the macroscopic state transitions directly,deriving analytical solutions for optimizing strategies [6, 36]. Theseapproaches can provide a robust solution with less variance, butare applicable only to scenarios with small state spaces [16], orcases with low resolution intervention [9], due to modeling costsand errors [20]. The above research points to a new direction inresearch opportunities that combines the ability to capture systemdynamics more precisely from simulation approaches with the benefit of having less variance and being more robust from analyticalapproaches. In this paper, we adopted simulation modeling in specifying the dynamics of a complex system, and developed analyticalsolutions for searching optimal strategies in a complex networkwith high-dimensional state-action space specified through simulation modeling.We formulate the problem of decision making in a complexsystem as a discrete event decision process (DEDP), which identifies the decision making process as a Markov decision process(MDP) and introduces a discrete event model — a kind of simulation model [17] — to specify the system dynamics. A discrete eventmodel defines a Markov jump process with probability measureon a sequence of elementary events that specify how the system

components interact and change the system states. These elementary events individually effect only minimal changes to the system,but in sequence together are powerful enough to induce non-linear,time-variant, high-dimensional behavior. Comparing with an MDPwhich specifies the system transition dynamics of a complex system analytically through a Markov model, a DEDP describes thedynamics more accurately through a discrete event model thatcaptures the dynamics using a simulation process over the microscopic component-interaction events. We will demonstrate thismerit through benchmarking with an analytical approach based ona MDP in the domain of transportation optimal control.To solve a DEDP analytically, we derived a duality theoremthat recasts optimal control to variational inference and parameter learning, which is an extension of the current equivalenceresults between optimal control and probabilistic inference [24, 38]in Markov decision process research. With this duality, we caninclude a number of existing probabilistic-inference and parameterlearning techniques, and integrate signal processing and decisionmaking into a holistic framework. When exact inference becomesintractable, which is often the case in complex systems due to theformidable state space, our duality theorem implies the possibilityof introducing recent approximate inference techniques to infercomplex dynamics. The method in this paper is an expectationpropagation algorithm, part of a family of approximate inference algorithms with local marginal projection. We will demonstrate thatour approach is more robust and has less variance in comparisonwith other simulation approaches in the domain of transportationoptimal control.This research makes several important contributions. First, weformulate a DEDP — a general framework for modeling complexsystem decision-making problems — by combining MDP and simulation modeling. Second, we reduce the problem of optimal controlto variational inference and parameter learning, and develop anapproximate solver to find optimal control in complex systemsthrough Bethe entropy approximation and an expectation propagation algorithm. Finally, we demonstrate that our proposed algorithmcan achieve higher system expected rewards, faster convergence,and lower variance of value function within a real-world transportation scenario than even state-of-the-art analytical and samplingapproaches.2BACKGROUNDIn this section, we review the complex social system, the discreteevent model, the Markov decision process, and the variational inference framework for a probabilistic graphical model.the state change of one entity is dependent on others through theinteractions. Adaptivity means the entities can adapt to differentenvironments automatically.In this paper, we temporarily exclude the attribute of adaptivity, the study of which will be future work. We focus on studyingthe optimal control of a complex social system with the attributesof diversity, interactivity, and interdependency, which leads to asystem containing a large number of diverse components, the interactions of which lead to the states change. Examples of complexsocial systems include the transportation system where the trafficcongestions are formed and dissipated through the interaction andmovement of individual vehicles, the epidemic system where thedisease is spread through the interaction of different people, andthe public opinion system where people’s minds are influenced andshaped by the dissemination of news through social media.2.2Discrete Event ModelA discrete event model defines a discrete event process, also calleda Markov jump process. It is used to specify complex system dynamics with a sequence of stochastic events that each changes thestate only minimally, but when combined in a sequence inducecomplex system evolutions. Specifically, a discrete event modeldescribes the temporal evolution of a system with M species X {X (1) , X (2) , · · · , X (M ) } driven by V mutually independent eventsparameterized by rate coefficients c (c 1 , . . . , cV ). At any specific(1)(M )time t, the populations of the species are x t (x t , . . . , x t ).A discrete event process initially in state x 0 at time t 0 can besimulated by: (1) Sampling the event v { , 1, · · · , V } according to categorical distribution v (1 h 0 , h 1 , . . . , hV ), whereÎM(m) (m)hv (x, cv ) cv m 1дv (x t ) is the rate of event v, which equalsÎM(m)to the rate coefficients cv times a total of m 1дv (x (m) ) differÍent ways for the individuals to react, and h 0 (x, c) Vv 1 hv (x, cv )the rate of all events. The formulation of hv (x, cv ) comes fromthe formulations of the stochastic kinetic model and stochasticpetri net [43, 44]. (2) Updating the network state deterministicallyx x v , where v represents how an event v changes thesystem states, until the termination condition is satisfied. In a socialsystem, each event involves only a few state and action variables.This generative process thus assigns a probabilistic measure to asample path induced by a sequence of events v 0 , . . . , vT happeningbetween times 0, 1, · · · ,T , where δ is an indicator function.P(x 0:T , v 0:T ) p(x 0 )TÖt 02.1Complex Social SystemA complex social system is a collective system composed of a largenumber of interconnected entities that as whole exhibit properties and behaviors resulted from the interaction of its individualparts. A complex system tends to have four attributes: Diversity, interactivity, interdependency, and adaptivity [27]. Diversity meansthe system contains a large number of entities with various attributes and characteristics. Interactivity means the diverse entities interact with each other in an interaction structure, such as afixed network or an ephemeral contact. Interdependency meansp(vt x t )δ x t 1 x t vt ,(where p(vt x t ) 1 h 0 (x t , c), vt hk (x t , c k ),vt k,The discrete event model is widely used by social scientists tospecify social system dynamics [3] where the system state transitions are induced by interactions of individual components. Recentresearch [5, 26, 45] has also applied the model to infer the hiddenstate of social systems, but this approach has not been explored insocial network intervention and decision making.

2.3Markov Decision ProcessA Markov decision process [33] is a framework for modeling decision making in situations where outcomes are partly randomand partly under the control of a decision maker. Formally, anMDP is defined as a tuple MDP⟨S, A, P, R, γ ⟩, where S representsthe state space and st S the state at time t, A the action spaceand at the action taken at time t, P the transition kernel of statessuch as P(st 1 st , at ), R the reward function such as R(st , at ) [6]or R(st ) [42] that evaluates the immediate reward at each step,and γ [0, 1) the discount factor. Let us further define a policyπ as a mapping from a state st to an action at µ(st ) or a distribution of it parameterized by θ — that is, π p(at st ; θ ). Theprobability measure of a length T MDP trajectory is p(ξT ) Î 1p(s 0 ) Tt 0p(at st ; θ )p(st 1 st , at ), where ξT (s 0:T , a 0:T ). Solving an MDP involves finding the optimal policy π or its associated parameter θ to maximize the expected future reward —Íarg maxθ Eξ ( t γ t R t ; θ ).The graphical representation of an MDP is shown in Figure 1,where we assume that the full system state st can be represented(1)(M )as a collection of component state variables st (st , ., st ),so that the state space S is a Cartesian product of the domains of(m)component state st : S S (1) S (2) ··· S (M ) . Similarly, the actionvariable at can be represented as a collection of action variables(1)(D)at (at , ., at ), and the action space A A(1) A(2) · · · A(D) .Here M is not necessarily equal to D because st represents the stateof each component of the system while at represents the decisionstaken by the system as a whole. For example, in the problem ofoptimizing the traffic signals in a transportation system where strepresents the locations of each vehicle and at represents the statusof each traffic light, the number of vehicles M may not necessarilyequal to the number of traffic lights D. Usually in complex socialsystems, the number of individual components M is much greaterthan the system decision points D.Prior research in solving a Markov decision process for a complex social system could be generally categorized into simulationor analytical approaches. A simulation approach reproduces the dynamic flow through sampling-based method. It describes the statetransition dynamics with a high-fidelity simulation tool such asMATSIM [11], which simulates the microscopic interactions of thecomponents and how these interactions leads to macroscopic statechanges. Given current state st and action at at time t, a simulationapproach uses a simulation tool to generate the next state st 1 .An analytical approach develops analytical solutions to solvea constrained optimization problem. Instead of describing the dynamics with a simulation tool, an analytical approach specifies thetransition kernel analytically with probability density functions thatdescribe the macroscopic state changes directly. Given current statest and action at , it computes the probability distribution of the nextstate st 1 according to the state transition kernel p(st 1 st , at ).However, approximations are required to make the computationtractable. For an MDP containing M binary state variables and Dbinary action variables, the state space is 2M , the action space is2D , the policy kernel is a 2M 2D matrix, and the state transitionkernel (fixed action) is a 2M 2M matrix. Since M is usually muchlarger than D in complex social systems, the complexity bottleneckis usually the transition kernel with size 2M 2M , the complexityof which grows exponentially with the number of state variables.Certain factorizations and approximations must be applied to lowerthe dimensionality of the transition kernel.Usually analytical approaches solve complex social system MDPsapproximately by enforcing certain independence constraints [31].For example, Cheng [4] assumed that a state variable is only dependent on its neighboring variables. Sabbadin, Peyrard and Sabbadin[28, 30] exploited a mean field approximation to compute and update the local policies. Weiwei approximated the state transitionkernel with differential equations [22]. These assumptions introduces additional approximations that results in modeling errors.In the next section, we propose a discrete event decision processwhich reduces the complexity of the transition probabilities, andwhich does not introduce additional independence assumptions.Two specific approaches of solving an MDP are optimal control[32] and reinforcement learning [34]. Optimal control problemsconsist of finding the optimal decision sequence or the time-variantstate-action mapping that maximizes the expected future reward,given the dynamics and reward function. Reinforcement-learningproblems target the optimal stationary policy that maximizes theexpected future reward while not assuming knowledge of the dynamics or the reward function. In this paper, we address the problemof optimizing a stationary policy to maximize the expected futurereward, assuming known dynamics and reward function.2.4Variational InferenceA challenge in evaluating and improving a policy in a complexsystem is that the state space grows exponentially with the numberof state variables, which makes probabilistic inference and parameter learning intractable. For example, in a system with M binarycomponents, the size of state space S will be 2M , let alone theexploding transition kernel. One way to resolve this issue is applying variation inference to optimize a tractable lower bound of thelog expected future reward through conjugate duality. Variationalinference is a classical framework in the probabilistic graphicalmodel community [40]. It exploits the conjugate duality betweenlog-partition function and the entropy function for exponentialfamily distributions.Specifically, it solvesthe variational prob lem log exp ⟨θ, ϕ(x)⟩ dx supq(x )q(x) ⟨θ, ϕ(x)⟩ dx H (q) ,where θ and ϕ(x) are respectively canonical parameters and sufficient statistics of an exponential family distribution, q is an auxiliary distribution and H (q) dxq(x) log q(x) is the entropyof q. For a tree-structured graphical model (here we use the simplified notation of a series of x t ), the Markov property admits afactorization of q into the product and division of local marginalsÎÎq(x) t qt (x t 1,t )/ t qt (x t ). Substituting the factored formsof q into the variational target, we get an equivalent constrainedoptimization problem involving local marginals and consistencyconstraints among those marginals, and a fixed-point algorithminvolving forward statistics α t and backward statistics βt . Thereare two primary forms of approximation of the original variationalproblem: the Bethe entropy problem and a structured mean field.The Bethe entropy problem is typically solved by loopy belief propagation or an expectation propagation algorithm.

a1,t-1a1,ta1,t-1aD,t-1aD,taD,t-1aD,tvts1,ts1,t 1s1,t-1s1,ts1,t 1sM,t-1sM,tsM,t 1sM,t-1sM,tsM,t 1t-1tTimet 1t-1tTimet 1METHODOLOGYIn this section, we develop a DEDP and present a duality theoremthat extends the equivalence of optimal control with probabilisticinference and parameter learning. We also develop an expectationpropagation algorithm as an approximate solver to be applied inreal-world complex systems.3.1vt-1s1,t-1Figure 1: Graph representation of an MDP3a1,tDiscrete Event Decision ProcessThe primary challenges in modeling a real-world complex systemare the exploding state space and complex system dynamics. Oursolution is to model the complex system decision-making processas a DEDP.The graphical representation of a DEDP is shown in Figure 2. Formally, a DEDP is defined as a tu

a MDP in the domain of transportation optimal control. To solve a DEDP analytically, we derived a duality theorem that recasts optimal control to variational inference and param-eter learning, which is an extension of the current equivalence results between optimal control and probabilistic inference [24, 38] in Markov decision process research.

Related Documents:

II. Optimal Control of Constrained-input Systems A. Constrained optimal control and policy iteration In this section, the optimal control problem for affine-in-the-input nonlinear systems with input constraints is formulated and an offline PI algorithm is given for solving the related optimal control problem.

not satisy \Dynamic Programming Principle" (DPP) or \Bellman Optimality Principle". Namely, a sequence of optimization problems with the corresponding optimal controls is called time-consistent, if the optimal strategies obtained when solving the optimal control problem at time sstays optimal when the o

optimal control, dynamic programming, Pontryagin maximum principle. I. INTRODUCTION he optimal control of HEVs (Hybrid Electric Vehicles) is an important topic not only because it is useful for power-management control but also indispensible for the optimal des

compared to the three previous methods. ’ Some previous algorithms achieve optimal mapping for restricted problem domains: Chortle is optimal when the input network is a tree, Chortle-crf and Chortle-d are optimal when the input network is a tree and h’ 5 6, and DAG- Map is optimal when the mapping constraint is monotone, which is true for .

april 2016 « IEEE CONTROL SYSTEMS MAGAZINE 103 This article describes a coarse-grained optimal control approach for multiscale dynamical systems, referred to as distributed optimal control (DOC), which enables the opti-mization of density functions, and/or their moments, sub-ject to the agents' dynamic constraints. (See Table 1 for DOC .

The optimal control problem of the isolated subsystems is described under the framework of HJB equations. The decentra-lized control law is derived by adding some local feedback gains to the isolated optimal control policies. 3.1. Optimal control In this paper, to design the decentralized control law, we need

4. Linear-quadratic-Gaussian control, Riccati equations, iterative linear approximations to nonlinear problems. 5. Optimal recursive estimation, Kalman –lter, Zakai equation. 6. Duality of optimal control and optimal esti

For first award of AS level in Summer 2017 For first award of A level in Summer 2018 Subject Code: 3210 CCEA GCE Specification in Business Studies Version 3: 13 November 2018. Version 3: 8 November 2018 . Contents . 1 Introduction 3 . 1.1 Aims 4 1.2 Key features 4 1.3 Prior attainment 4 1.4 Classification codes and subject combinations 4 . 2 Specification at a Glance 5 3 Subject Content 6 . 3 .