Designing Self-organizing Systems With Deep Multi-agent Reinforcement .

1y ago

26 Views

5 Downloads

1.50 MB

12 Pages

Last View : 16d ago

Last Download : 3m ago

Upload by : Alexia Money

Report this link

Download PDF

Transcription

Proceedings of the ASME 2019 International Design Engineering Technical Conferences andComputers and Information in Engineering ConferenceIDETC/CIE 2019Aug 18-21, 2019, Hilton Anaheim, Anaheim, CAIDETC2019-98268DESIGNING SELF-ORGANIZING SYSTEMS WITH DEEP MULTI-AGENTREINFORCEMENT LEARNINGHao JiIMPACT LaboratoryDept. of Aerospace & Mechanical EngineeringUniversity of Southern CaliforniaLos Angeles, California, 90089haoji@usc.eduABSTRACTSelf-organizing systems (SOS) are able to perform complextasks in unforeseen situations with adaptability. Previous workhas introduced field-based approaches and rule-based socialstructuring for individual agents to not only comprehend the tasksituations but also take advantage of the social rule-based agentrelations in order to accomplish their overall tasks without acentralized controller. Although the task fields and social rulescan be predefined for relatively simple task situations, when thetask complexity increases and task environment changes, havinga priori knowledge about these fields and the rules may not befeasible. In this paper, we propose a multi-agent reinforcementlearning based model as a design approach to solving the rulegeneration problem with complex SOS tasks. A deep multi-agentreinforcement learning algorithm was devised as a mechanism totrain SOS agents for acquisition of the task field and social ruleknowledge, and the scalability property of this learning approachwas investigated with respect to the changing team sizes andenvironmental noises. Through a set of simulation studies on abox-pushing problem, the results have shown that the SOSdesign based on deep multi-agent reinforcement learning can begeneralizable with different individual settings when the trainingstarts with larger number of agents, but if a SOS is trained withsmaller team sizes, the learned neural network does not scale upto larger teams. Design of SOS with a deep reinforcementlearning model should keep this in mind and training should becarried out with larger team sizes.Keywords: deep Q-learning, complex system, self-organizingsystem, scalability, robustnessYan Jin*IMPACT LaboratoryDept. of Aerospace & Mechanical EngineeringUniversity of Southern CaliforniaLos Angeles, California, 90089yjin@usc.edu(*corresponding author)1 INTRODUCTIONSelf-organizing systems can consist of simple agents thatwork cooperatively to achieve complex system level behaviorswithout requiring global guidance. Design of SOS takes abottom-up approach and the top-level system complexity can beachieved through local agent interactions [1,2]. Complex systemdesign by applying a self-organizing approach has manyadvantages, such as scalability, adaptability, and reliability [3,4].Moreover, compared to traditional engineering systems withcentralized controllers, self-organizing systems can be morerobust to external changes and more resilient to system damagesor component malfunctions [5-7]. A swarm of robots is anexample of self-organizing systems. In such systems, robotsusually have compact sizes, limited functionality and adoptsimple rules of interaction. Such systems often consist of manyhomogenous robots [8]. The collaborative behavior of the swarmrobots can emerge, and such emergent phenomenon has beenapplied to situations such as search and rescue, distributedsensing, unmanned aerial vehicle patrolling, traffic control, andbox-pushing [5,9,10].Various approaches have been proposed to support the designof SOS. The field-based behavior regulation (FBR) approach[11] models the task environment with a field function and thebehavior of the agents is regulated based on the positions of theseagents in the field by applying a field transformation function.Generally, an agent is striving to maximize its own interests bymoving toward higher (or lower, depending on definition)positions. The advantage of this approach is that the agents’behaviors are simple hence requires little knowledge to performtasks since moving toward a higher or lower position is the solebehavior. This behavioral simplicity has its limits in solvingmore complex domain problems because the field representation1Copyright 2019 by ASME

has its limits in capturing all features of the task domains and theinter-agent relations are ignored in this approach.To overcome the limit of the FBR approach, an evolutionarydesign method [4] and the social structuring approach [5] havebeen proposed to make design of SOS parametric andoptimizable, and to allow a SOS to deal with more complexdomain tasks by considering both task fields and social fieldsmodeled by social rules [5]. It has been demonstrated thatapplying social rules can promote the level of coherence amongagents’ behaviors by avoiding potential conflicts and utilizingmore cooperation opportunities. A fundamental issue with thissocial-rule based approach is that a designer must know a prioriwhat the rules are and how they should be applied, which maynot be the case especially when the tasks are highly complex andchangeable.One of the long-term goals of our research is to developmechanisms for self-organizing robotic agents to autonomouslycarry out physical structure assembly, as in space structureconstruction, and disassembly, as in disaster rescue situations,without centralized control or external guidance. It is anticipatedthat the task situations for these task domains can become highlycomplex and unpredictable, making it a challenge to predefinetask fields and social rules. Therefore, in this research, we take areinforcement learning approach to capture the self-organizingknowledge for agent behavior regulation in SOS design. Morespecifically, a multiagent Q-learning algorithm is explored toaddress two research questions: What are the factors that impacton the stability of learning dynamics in self-organizing systems?Will the knowledge captured from reinforcement learning berobust enough to be applied in a wide range of task situations?In the reinforcement learning literature, multiple agents canbe trained using either a universal neural network or independentneural networks. Individual agents gather the state informationand can be trained either collaboratively as a team or individuallybased on the reward they receive from the interactions with theenvironment [12]. The problem of designing self-organizingsystems comes down to training the system either as a team usinga centralized single agent reinforcement learning approach or asseparate individuals going through multi-agent reinforcementlearning. Although the centralized learning of joint actions ofagents as a team can solve coordination problems and avoidlearning non-stationarity, it does not scale well as the joint actionspace grows exponentially with the number of agents [13].Secondly, learning to differentiate joint actions can be highlydifficult. Further, the neural networks obtained from thecentralized learning are only applicable to the situations with thesame number of agents as the trained cases, because the actionspace is fixed by the trained cases.In contrast to the centralized single agent reinforcementlearning, during the multi-agent reinforcement learning, eachagent can be trained using its own independent neural network.Such approach solves the problem of curse of dimensionality ofaction space when applying single agent reinforcement learningto multi-agent settings. Although theoretical proof ofconvergence of multi-agent independent Q-learning is notmathematically given, there are numerous successful practices inreal-world applications [13]. Thus, applying the state-of-the-artindependent multi-agent reinforcement learning is a promisingapproach in tackling the existing problems faced by SOS design.In the rest of this paper, we first review the relevant work inself-organizing systems and reinforcement learning in Section 2.After that, we present a multi-agent independent Q-learningframework as a complex system design approach in Section 3and illustrate the system design implications. In Section 4, a boxpushing case study is introduced that applies our proposed Qlearning model. Section 5 provides a detailed analysis of thesimulation results. Finally, in Section 6, the conclusions aredrawn from the case study and future work directions are pointedout.2 RELATED WORK2.1 Artificial Self-organizing SystemsAn artificial self-organizing system is a system that isdesigned by human and has emergent behavior and adaptabilitylike nature [1]. Much research has been done regarding thedesign of an artificial self-organizing system. Werfel developeda system of homogenous robots to build a pre-determined shapeusing square bricks [14]. Beckers et al. introduced a roboticgathering task where robots have to patrol around a given area tocollect pucks [15]. As robots prefer to drop pucks in high-densityareas, the collective positive feedback loop contributes to a densegroup of available pucks [2,15]. Khani et al developed a socialrule-based regulation approach in enforcing the agents to selforganize and push a box toward the target area [5-6]. Swarms ofUAVs can self-organize based on a set of cooperation rules andaccomplish tasks such as target detection, collaborativepatrolling and formation [16-19]. Chen and Jin used a fieldbased regulation (FBR) approach and guides self-organizingagents to perform complex tasks such as approaching longdistance targets while avoiding obstacles [11]. Price investigatedinto the use of genetic algorithm (GA) in optimizing Selforganizing multi-UAV swarm behavior. He tested theeffectiveness of GA algorithm in both homogenous andheterogeneous UAV in accomplishing the ‘destroying retaliatingtarget’ task [20]. The robotic implementations mentioned abovehave demonstrated the potentials of building self-organizingsystems, and the design methods of self-organizing systems[5,6,11] have had their drawbacks as indicated in Section 1.2.2 Multi-Agent Reinforcement LearningMulti-agent reinforcement learning applies to multiagentsettings and is based largely on the concept of single agentreinforcement learning such as Q-learning, policy gradient andactor-critic [12, 21]. Compared to single agent reinforcementlearning, multi-agent learning is faced with the non-stationarylearning environment due to the simultaneous learning of themultiple agents.In the past several years, there has seen a move from tabularbased methods to the deep reinforcement learning approach,resulting from the need to deal with the high-dimensionality ofstate and action spaces in multi-agent environments and toapproximate state-action values [22-24]. Multi-agent systemscan be classified into cooperative, competitive, and mixed2Copyright 2019 by ASME

cooperative and competitive categories [22]. Cooperative agentsreceive the same rewards, competitive agents (often in two-agentsettings) have the opposite rewards, and the mix cooperative andcompetitive settings assume agents are not only cooperating butalso have individual preferences. In the SOS design, we focus onthe cooperative agents since they share the same task goals.One natural approach for multi-agent reinforcement learningis to optimize the policy or value functions of each individual.The most commonly used value-function based multi-agentlearning is independent Q-learning [25]. It trains eachindividual’s state-action values using Q-learning [25-26] and isserved as a common benchmark in the literature. Tampuu [22]extended previous Q-learning to deep neural networks andapplied DQN [27] to train two independent agents playing thegame Pong. His simulation shows us how the cooperative andcompetitive phenomenon can emerge based on the individual’sdifferent reward schemes [22].Foerster applied COMA framework to train multiple agents.He used a centralized critic to evaluate decentralized actors andestimated a counterfactual advantage function based on eachagent and allocated credit among agents [23]. He trained multipleagents to learn to cooperatively play StarCraft games. In anotherwork by Foerster, he analyzed his replay stabilization methodsfor independent Q-learning based on StarCraft combat scenarios[28].As multi-agent environment is usually partially observable,Hauskneche & Stone [29] used deep recurrent networks such asLSTM [30] or GRU [31] to speed up learning when agents arelearning over long time periods. Lowe et al developed Multiagent Deep Deterministic Policy Gradient (MADDPG), whichuses centralized training with decentralized execution [32]. Theyproposed an extension of the actor-critic policy gradient methodand augmented critic with additional information about thepolicies of other agents and then tested their algorithm inpredator-prey, cooperative navigation, and other environments.Their training algorithm shows good convergence properties[32]. Drogoul & Zucker developed a framework for multi-agentsystem design called ‘Andromeda’, which combines machinelearning approach with agent oriented, role-based approachnamed ‘Cassiopeia’ [33,34]. The idea is to let learning occurwithin different layers of ‘Cassiopeia’ framework such asindividual roles, relational roles and organizational roles so thatthe design of multi-agent system can be more systematic andmodular [33,34]. However, as real multi-agent environment israther complex, agent’s actions are affected by not only its ownroles but also by other agents and the group. Separating learninginto different layers of abstraction may not be feasible.Most approaches to multi-agent reinforcement learning focuson achieving optimal system reward or desirable convergenceproperties. Many training algorithms are based on fullyobservable states. Training of multi-agent reinforcement modelis usually conducted on prespecified environments and thegeneralizability of the training network to various multi-agentteam sizes is not analyzed or considered, which is an importantfactor of consideration in SOS design. It is crucial to develop amulti-agent learning framework that is scalable to various teamsizes, and also to provide guidelines on how design should beimplemented and analyzed. Such areas are often omitted in theliterature and are the focus of this paper.3 A DEEP MULTI-AGENT REINFORCEMENTLEARNING MODEL3.1 Single Agent Reinforcement LearningIt is important to discuss single agent reinforcement learningbefore moving into multiagent reinforcement learning becausemany concepts and algorithms of multi-agent reinforcementlearning are based on the single agent reinforcement learning.Single agent reinforcement learning is used to optimizesystem performance based on training so that the system canautomatically learn to solve complex tasks from the raw sensoryinput and the reward signal. In single agent reinforcementlearning, learning is based on an important concept calledMarkov Decision Process (MDP). An MDP can be defined by atuple of S, A, P, R, g . S is the state space, which consists of allthe agent’s possible sense of environment information. A is theaction space, including all the actions that could be taken by theagent. P is the transition matrix, which is usually notgiven/unknown in a model-free learning environment. R is thereward function, and g is the discount factor, which means thefuture value of the reward is discounted and worth less than thepresent value. At any given time t, the agent’s goal is to maximize&its expected future discounted return, 𝑅" (") 𝛾 " '" 𝑟" , where Tis the time when the game ends. Also, agents estimate the actionvalue function 𝑄(𝑠, 𝑎) at each time step using Bellmanequation (1) below as an update. E represents the expected value.Eventually, such value iteration algorithm will converge tooptimal value function.𝑄123 (𝑠, 𝑎) 𝐸[𝑟 𝛾 max𝑄1 (𝑠 ) , 𝑎) ) 𝑠, 𝑎]&:(1)Researchers in the past uses Q-learning as a common trainingalgorithm in single agent reinforcement learning [35-36]. Qlearning is based on Q-tables, each state-action value pair isstored in a single Q-table and such training algorithm has beenapplied in simple tasks with small discrete state and action spaces[35-36]. However, in real-life engineering applications, statespace can often be continuous and action space vast, making itdifficult or impossible to build a look-up Q-table to store everystate-action value pair. To overcome such problems in Qlearning, deep neural networks are introduced as functionalapproximator to replace the Q-table for estimating Q values.Such learning methods are called deep Q-learning [27]. A Qnetwork with weights qi can be trained by minimizing the lossfunction at each iteration i, illustrated in equation (2),𝐿1 (𝜃1 ) 𝐸[(𝑦1 𝑄(𝑠, 𝑎; 𝜃1 ))C ](2)𝑦1 𝐸[𝑟 𝛾 max𝑄(𝑠 ) , 𝑎) ; 𝜃1'3 )]&(3)where:is the target value for iteration i. The gradient can be calculatedwith the following equation (4):3Copyright 2019 by ASME

EF 𝐿1 (𝜃1 ) 𝐸G,:,H,G& [{r 𝛾 max𝑄(𝑠 ) , 𝑎) ; 𝜃1'3 )&: Q(s, a, 𝜃1 ) } EF Q(s, a, 𝜃1 )](4)Various approaches have been introduced to stabilize trainingand increase sample efficiency for training deep Q-networks. Inour multi-agent training algorithm, the neural network of everysingle agent is built based on the following two approaches:Experience Replay:During each training episode, the agents’ experiences 𝑒" (𝑠" , 𝑎" , 𝑟" , 𝑠"23 ), which represents state, action, reward and nextstate, are stored and appended to an experience replay memory𝐷 (𝑒3 , 𝑒C , , 𝑒Q ). N represents the capacity of the experiencereplay memory. At every training interval, mini-batches arerandomly sampled from experience replay memory D and fedinto Q-learning updates. At the same time, an agent selects itsaction based on the e-greedy policy, which means the agentselects its action based on exploration of random actions andexploitation of best decision given current information.Experience replay, as Minh described in his paper [37], increasesdata sample efficiency and can break down the correlationsbetween subsequent experiences and is used to stabilize trainingperformance.Dueling DQNThe Dueling DQN architecture can identify the right actionduring policy evaluation faster than other algorithms as itseparates the Q value into the representation of state value V andaction advantages A, which are state-dependent [38]. In every Qvalue update, the dueling architecture’s state value V is updated,which contrasts with the single-stream architecture, where onlyvalue for one of the actions is updated, leaving other actions notupdated. This more frequent updating allows for a betterapproximation of the state values and leads to faster training andbetter training performance.3.2 Multi-Agent Reinforcement LearningAs mentioned above, there are generally two approaches inmulti-agent training. One is to train the agents as a team, treatingthe entire multiagent system as ‘one agent.’ It has goodconvergence property similar to single agent reinforcementlearning, but can hardly scale up. To increase learning efficiencyand maintain scalability, we adopt a multi-agent independentdeep Q-learning approach. In this approach, Ai , i 1, . . . , n (n:number of agents) are the discrete sets of actions available to theagents, yielding the joint action set A A1 · · · An. Allagents share the same state space and the same reward function.During training, each agent has its own dueling neuralnetwork and is trained by applying deep Q-learning withexperience replay. Agents perceive the state space through theirlocal sensors. Each agent learns its own policy and valuefunction individually to choose its actions based on its ownneural network given the reward from the shared reward functionfrom the environment. As each agent’s action space size is thesame, each agent’s trained neural network can be reused andapplied in various team sizes and such multi-agent system canscale well to agent teams of different sizes.In our multiagent reinforcement learning mechanismdescribed above, each agent i (i 1, 2, , n) engages in learningas if it is in the single agent reinforcement learning situation. Theonly difference is that the next state of the environment, St 1, isupdated in response to the joint action at {a1, a2, , an}, insteadof its own action ai, in addition to the current state St. Throughthis research, we aim to explore the stability or convergenceissue of the learning process—i.e., whether the knowledge canbe acquired in the form of neural networks throughreinforcement learning, and the adaptability issue—i.e., whetherthe learned neural networks can be effectively applied to thesituations of similar tasks but different agent team sizes.3.3 Simulation based System DesignThere have been several methods for guiding the design ofself-organizing systems [5-6,39]. Based on the previous work[39], a simulation-based system design method is proposed asshown in Figure1. Like other design or system engineeringmethodologies [5-6,39], it starts with breaking down the tasksinto subtasks, analyzing system constraints and then representsthe state space of the agents. Functional design defines bothindividual and group level functions for agents to achieve. As theagent-level behavior is the focus of SOS design, the major designfactors will include (1) the agent-level actions and (2) rewardschema.Agent’s state, action and reward schemaIn self-organizing systems, agents have only its local view ofthe environment, due to their limited sensor capability and motorconstraints. An MDP with such property is called a partiallyobservable MDP. An Agent’s state representation of the designprocess should reflect such characteristics of self-organizingsystems. For homogeneous self-organizing systems, whereagents share the same functionality and capabilities, the agentsshare the same actions from which they can choose. However, atany given time, different agents may perform different actions,resulting in the combined impact on the overall transition of thestate. For heterogeneous systems, on the other hand, agents mayhave individualized action sets to choose from. This researchexplores the homogeneous situations and the heterogeneouscases will be dealt with as future work.Training of individual agents is based on how much rewardeach agent receives. Designing and allocating reward is veryimportant in self-organizing system design and has been provento be important in the past multi-agent deep Q-learningalgorithms as well [22]. Good reward schema or functions canlead to optimal agent level and system level performance,whereas the bad reward structure leads to nonconvergentlearning or undesirable performance.Simulation/OptimizationSince the dynamics of a complex environment is hard to bemodeled and captured analytically, simulation becomes animportant step for social rule knowledge capturing. Thesimulation should be combined with optimization algorithms forsearching the optimal policy and value functions given theagent’s sensor information. Multi-agent deep Q-learning4Copyright 2019 by ASME

networks can be integrated with simulation to perform suchsimulation optimization tasks. The hope is that the output of thetrained neural networks can be applied to various team sizes forsystem implementation.networks and tested successfully trained network parameterswith various team sizes between 3-6 and analyzed its scalabilitycharacteristics.A graphical illustration of the box-pushing case study isshown in Figure 2. The game screen has a width x of 600 pixelsand a height y of 480 pixels. Numerous agents (the greensquares) with limited pushing and sensing capabilities need toself-organize in order to push and rotate the box (the brownrectangle) towards the goal (the white dot with a “ ” mark). Asthere is an obstacle (the red dot) on the path and walls (the whitesolid lines) along the side, the agents cannot just simply push thebox but have to rotate the box when necessary [5,6]. This addscomplexity to the task. The box has sensors deployed at itsoutside boundary. When the outside perimeter of the box reacheshorizontal x-coordinate of the goal, which is represented as awhite vertical dotted line, the simulation is deemed success.Figure 1. Steps for Simulation based Design of Self-OrganizingSystems4 CASE STUDYTo test the concepts and explore the multiagent reinforcementlearning algorithm discussed above, a box-pushing case studyhas been carried out. In choosing this case example, severalrequirements were considered based on our long-term goal ofdeveloping robotic self-organizing assembly systems. First, thetask environment requires relatively intense agent interactions,instead of sparse interactions, for efficient learning. For example,the ant foraging task may be less desirable as the interactionbetween agents during training is only passive, causing it slowand ineffective. Second, the tasks require cooperative workamong agents. Although each agent might have different shortterm rewards, in the long run they work for the same maximumreward. Lastly, we consider only the homogeneous cases, forsimplicity at this stage of research, and the action space shouldbe the same for all the agents. This will allow us to add moreagents to the system using the same learned neural networks.After considering several options, the box-pushing problemwas finally chosen for the case study.4.1 The Box-Pushing ProblemThe box-pushing problem is often categorized as a trajectoryplanning or piano mover’s problem [40]. Many topological andnumerical solutions have been developed in the past [40]. In ourpaper, we adopt a self-organizing multi-agent deep Q-learningapproach to solve the box-pushing problem. During the selforganizing process, each agent acts based on its trained neuralnetwork, and collectively all agents can push the box towards agoal without system level global control.In this research, the box-pushing case study was implementedin pygame, a multi-agent game package in the Pythonenvironment. In the box-pushing case study, we trained eachindividual with independent deep Q-learning (IQL) neuralFigure 2. Graphical illustration of Box-pushing taskThere are four major tasks of box-pushing, as summarizedbelow. Agents need to move, rotate the box, and keep the boxaway from potential walls and obstacles.T1 Move Box to Goal T2 Rotate Box to Goal T3 Move Box away from Walls T4 Move Box away from Obstacle In pygame, the distance is measured by pixels. Each pixel isa single square area in the simulation environment. As anexample, the box in our simulation is 60 pixels wide and 150pixels long.In box-pushing, agents have limited sensing andcommunication capabilities. They can receive information fromthe sensor of the box, which measures orientation of the box andsenses obstacles at a range of distance. They have limited storageof observation information: trained neural network parametersand their experiences such as state, action, reward and next state.They possess a neural network that can transform the perceivedstate information into action. These assumptions are in line withthe definition of the “minimalist” robot [41] and are reasonablewith the current applications of physical robot hardware [42].5Copyright 2019 by ASME

4.2 State and ActionBased on the task decomposition and constraint analysismentioned above, the state space of the box-pushing task isdefined as shown in Figure 3. To gather relevant environmentinformation, a sensor is deployed in the center of the box, whichcan sense nearby obstacles. The radius of sensor range is 150pixels and the entire circular sensor coverage is split into 8sectors of equal size with each sector corresponding to a bitrepresentation of state information. For example in Figure 3,there are three red obstacles within the sensor detection range,and the corresponding state s3, s5, s7 are having value 1,indicating the presence of the obstacle. If there is no obstacle ina sector, the sector’s state value will be 0. Like the past literature,we assume sensors can also detect the orientation of the box fromthe box’s x-axis with respect to the goal position, illustrated withangle q [35-36]. This is a reasonable assumption based on realworld sensor capability. In Figure 3, the current angle q is around30 degrees. And such degree information can be shaped into therange of [-1,1] by applying equation (5). Angle 𝜃 ) serves as thefinal input state s9. This shaped angle method can facilitate deepQ-network training and is used commonly in practice [35-36].𝜃) (E'3ST)3ST -0.83(5)pushes will move the box 10 pixels in a given direction. Torqueis assumed to be exerted on the centroid of the box and equals tothe sum of moment arm of all vector forces of the pushing agents.2 pushes with a moment arm of 75 pixels each will rotate the box20 degrees. We assume the box carries a large moment of inertiaand when it hits the obstacle, which is considered rather small, itwill continue its movement until its expected end position isreached.Figure 4. The six regions of box neighborhoodAgent action space: The agent action space is defined basedon the box neighborhood and simulated box dynamics. As eachtime step, an agent can choose a place in one of the six regionsof the box neighborhoods to push the box. Therefore, the agentsshare the same actions space of A {a1, a2, a3, a4, a5, a6}, asshown in Figure 4. For instance, if an agent chooses action a1, itwill move to box region “1” and push from there, the box willmove downwards along the box’s y-axis based on the simulatedbox dynamics and the same logic applies to other agent actions.4.3 Reward Schema and Training modelFigure 3. Box State RepresentationGiven the above, the state representation of Figure 3 can beexpressed as a 9-digit tuple 0,0,1,0,1,0,1,0, -0.83 .As during training, each agent is close to the vicinity of thebox center, it can receive the sensor information broadcastedlocally among agents. Sensor can also sense the distance fromthe center of the box to the goal area, analogous to real-worldradar sensor, and is also like the gradient-based approach inlitera

In contrast to the centralized single agent reinforcement learning, during the multi-agent reinforcement learning, each agent can be trained using its own independent neural network. Such approach solves the problem of curse of dimensionality of action space when applying single agent reinforcement learning to multi-agent settings.

Related Documents:

Root-cause analysis through machine learning in the cloud

In this report we explore an approach to root-cause-analysis using a machine learning model called self-organizing maps. Self-organizing maps provides data classification, while also providing visualization of the model which is something many machine learning models fail to do. We show that self-organizing maps are a promising solution

8 Views

1y ago

Self-trust and self-confidence: some distinctions - Rivier University

self-respect, self-acceptance, self-control, self-doubt, self-deception, self-confidence, self-trust, bargaining with oneself, being one's own worst enemy, and self-denial, for example, are thought to be deeply human possibilities, yet there is no clear agreement about who or what forms the terms between which these relations hold.

18 Views

1y ago

A Massively Parallel Architecture for a Self-Organizing ...

2. SELF-SCALING COMPUTATIONAL UNITS, SELF-ADJUSTING ME.r.fORY SEARCH, DIRECT ACCESS, AND A TTENTIONAL VIGILANCE Four properties are basic to the workings of the networks that we characterize herein. A. Self-Scaling Computational Units: Critical Feature Patterns Properly defining signal and noise in a self-organizing system raises a number of

21 Views

3y ago

Case Application (Structure) Starbucks—Organizing

Case Application (Structure) Starbucks—Organizing Organizing is an important task of managers. Once the organization’s goals and plans are in place, the organizing function sets in motion the process of seeing that those goals and plans are pursued. When managers organize, they’re defining what work ne

36 Views

2y ago

A Review of Morphogenetic Engineering

components and robots, self-organizing builder robots, self-morphing particle swarms, self-coding software, self-balancing pervasive services, but also self-constructing buildings, self-con guring man-ufacturing lines, or self-managing energy grids. They are all based on a multitude of components,

39 Views

3y ago

Organizing for the Future - ocw.mit.edu

Organizing for the Future Arnoldo C. Hax Galbraith. Designing Organizations. Jossey-Bass, 1995. Galbraith, and Lawler III, et al. Organizing for the Future: The New Logic for Managing Complex Organizations, Jossey-Bass, 1993.

13 Views

2y ago

Adults that Experience Sexual Shame: Effects on Self-Esteem and Sexual ...

3.6 Sexual Shame and Self-esteem; Self-esteem expert Rosenberg (1965) defined self-esteem as an attitude towards one's self, a self-worth with levels of positive and/or negative feelings about the self. Coopersmith (1967) described self-esteem as being an appreciation of oneself and showing self-respect,

22 Views

1y ago

An Investigation of the Assumptions of Self-handicapping

associated with higher level osf self-handicapping i n young people. Moreover, certainty of self-esteem and the trait of self-handicapping wer noe t associated with self-handicapping. Stud 6 explorey d the relationship between self-esteem and self-handicapping using domain-specific measure of self-esteems an, d task specific self-efficacy.

12 Views

10m ago

Recent Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Dangerous Defendants - Yale Law Journal

Law School, Louisiana State University Paul M. Hebert Law Center, Roger Williams University School of Law, Rutgers Law School, Sandra Day O'Connor College of Law, Southern Methodist University Dedman School of Law, University of Georgia School of Law, and University of Utah S.J. Quinney College of Law. For institutional support, I am grateful .

1y ago

169 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

APPLYING TO LAW SCHOOL - University of Pennsylvania

You will apply to law school through the Law School Admission Council (LSAC). 1 6 4 5 3 2 Individual Law School Application Personal Statement Law School Resume 1-3 Letters of Recommendation Dean’s Letter/Certification LSAC Law School Report with official academic transcript(s) and LSAT score(s)

2y ago

160 Views

OF THE LAW LIBRARY - University at Buffalo Libraries

the Law School. 1910 Bang's Law Library is sold, and a fund is established to develop a Law School Library (with many notable donors); students pay an extra 10 library fee. 1936-37 Law Library adds 6,300 books, allowing the Law School to become accredited by the American Bar Association. Law School moves to the new Ellicott Square Building in

1y ago

88 Views

CRIMINAL LAW: CASES, MATERIALS, AND LAWYERING

UTK Distinguished Professor of Law, University of Tennessee College of Law; John T. Parry, professor of law, Lewis & Clark Law School; Penelope Pether, professor of law, Villanova University School of Law. --Third edition. pages cm Includes index. ISBN 978-0-7698-8270-3 1. Criminal law--Unit

2y ago

189 Views

A Trail Guide to Careers in Environmental Law

law, constitutional law, property law, bankruptcy law, criminal law, food and drug law, land use planning law, and international law. A distinctive aspect of environmental practice is the role of science in advocacy efforts.

3y ago

241 Views

Accounting Technicians Diploma (ATD) Examination Syllabus

Apply law of contract and tort in various scenarios Apply general principles of business law in practice. CONTENT 2.1 Elements of the legal system 2.1.1 Nature, purpose and classification of law - Meaning of law - Nature of law - Purpose of law - Classification of law - Law and morality 2.1.2 Sources of law - The Constitution

3y ago

216 Views

PRINCIPLES OF BUSINESS LAW - DPHU

ABE Diploma in Business Administration Study Manual PRINCIPLES OF BUSINESS LAW Contents Study Unit Title Page Syllabus i 1 Nature and Sources of Law 1 Nature of Law 3 Historical Origins 6 Sources of Law 9 The European Community and UK Law: An Overview 13 2 Common Law, Equity and Statute Law 23 Custom 25 Case Law 26 Nature of Equity 32

3y ago

285 Views

Designing Self-organizing Systems With Deep Multi-agent Reinforcement .

It looks like you're using an ad-blocker