Combating Stagnation In Reinforcement Learning Through .

2y ago

2 Views

1 Downloads

1.02 MB

8 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Angela Sonnier

Report this link

Download PDF

Transcription

Combating Stagnation in ReinforcementLearning Through ‘Guided Learning’ With‘Taught-Response Memory’?Keith Tunstead1[0000 0002 9769 1009] and Joeran Beel1[0000 0002 4537 5573]Trinity College Dublin, School of Computer Science and Statistics, ArtificialIntelligence Discipline, ADAPT Centre, Dublin, Ireland{tunstek,beelj}@tcd.ieAbstract. We present the concept of Guided Learning, which outlinesa framework that allows a Reinforcement Learning agent to effectively‘ask for help’ as it encounters stagnation. Either a human or expert agentsupervisor can then optionally ‘guide’ the agent as to how to progressbeyond the point of stagnation. This guidance is encoded in a novelway using a separately trained neural network referred to as a ‘TaughtResponse Memory’ that can be recalled when another ‘similar’ situationarises in the future. This paper shows how Guided Learning is algorithmindependent and can be applied in any Reinforcement Learning context.Our results achieved superior performance over the agents non-guidedcounterpart with minimal guidance, achieving, on average, increases of136% and 112% in the rate of progression of the champion and averagegenomes respectively. This is due to the fact that Guided Learning allowsthe agent to exploit more information and thus, the agent’s need forexploration is reduced.Keywords: Active learning · Agent teaching · Evolutionary algorithms· Interactive adaptive learning · Stagnation1IntroductionOne of the primary problems with training any kind of modern AI in a Reinforcement Learning environment is stagnation. Stagnation occurs when the agentceases to make progress in solving the current task prior to either the goal or theagents maximum effectiveness being reached. The reduction of stagnation is animportant topic for reducing training times and increasing overall performancein cases where training times are limited.This paper will present a method to reduce stagnation and define a frameworkfor a kind of interactive teaching/guidance where either a human or expert agentsupervisor can guide a learning agent past stagnation.*This publication emanated from research conducted with the financial support ofScience Foundation Ireland (SFI) under Grant Number 13/RC/2106.c 2019 for this paper by its authors. Use permitted under CC BY 4.0.96

GuidedLearning2K. Tunstead and J. BeelIn terms of related work, we will briefly discuss Teaching and Interactive Adaptive Learning. The concept of Teaching[3] encompasses agent-to-agent [6], agentto-human [8] and human-to-agent teaching [1]. Guided Learning is a form ofTeaching that can take advantage of both human-to-agent and agent-to-agent.Interactive Adaptive Learning is defined as a combination of Active Learning,a type of Machine Learning where the algorithm is allowed to query some information source in order to obtain the desired outputs, and Adaptive StreamMining which concerns itself with how the algorithm should adapt when dealingwith time changing data [2].2Guided LearningGuided Learning encodes guidance using what we refer to as Taught ResponseMemories (TRMs), which we define as: a memory of a series of actions that anagent has been taught in response to specific stimuli. A TRM is an abstract concept but its representation must allow for some plasticity in order to adapt thememory over time, this allows a TRM to tend towards a more optimal solutionfor a single stimulus or towards its applicability, more generally, to other stimuli.In this paper we represent TRMs as separately trained feed-forward neural networks. TRMs may consist of multiple actions and this can cause non-convergencewhen conflicting actions are presented, therefore we define a special case TRM,referred to as a Single Action TRM (SATRM). Using SATRMs, multiple actionscan be split into their single action components, therefore removing any conflicting actions. Due their independence from the underlying algorithm, TRMs (andsubsequently Guided Learning) can be used with any Reinforcement Learningalgorithm.The ideal implementation of Guided Learning can be best described using anexample. In the game Super Mario Bros, when a reinforcement agent stagnates atthe first green pipe (see Fig. 1 in Appendix A), the agent can request guidancefrom a supervisor. If no guidance is received within a given time period, thealgorithm will continue as normal. Any guidance received is encoded as a newTRM. The TRM can be ‘recalled’ in order to attempt to jump over, not onlythe first green pipe but the second, and the third and so on. A TRM is ‘recalled’if the current stimulus falls within a certain ‘similarity threshold’, θ t, of thea.bwhere a and b arestimulus for which the TRM was trained, i.e. θ arccos a b the stimulus vectors. Because each TRM is plastic, it can tend towards gettingmore optimal at either jumping over that one specific green pipe or jumping overmultiple green pipes. This also helps in cases where guidance is sub-optimal. Afull implementation of Guided Learning can recall the TRM, not only in thefirst level or in other levels of the game but in other games entirely with similarmechanics to the original game (i.e. another platform or ‘jump and run’ basedgame, where the agent is presented with a barrier in front of it). For moreinformation please refer to the extended version of this manuscript [7].97

Guided Learning3Guided Learning3MethodologyThe effectiveness of a limited implementation of Guided Learning1 will be measured using the first level of the game Super Mario Bros2 . The underlyingReinforcement Learning algorithm used was Neural Evolution of AugmentingTopologies (NEAT)[5]. NEAT was chosen firstly due to it’s applicability as aReinforcement Learning algorithm and secondly due to NEATs nature as anEvolutionary Algorithm. The original intent was to reuse TRMs across multiplegenomes. While this worked to an extent (see Avg Fitness metric in Fig. 3 inAppendix B.1), it was not as successful as originally hoped. This is because different genomes tend to progress in distinct ways and future work still remainsin regards to TRM reuse. Stagnation was defined as evaluating 4 generationswithout the champion genome making progress.To evaluate Guided Learning, a baseline was created that only consisted of theNEAT algorithm. The stimulus was represented as raw pixel data with somedimensionality reduction (see Fig. 2 in Appendix A). The Guided Learning implementation then takes the baseline and makes the following changes: 1) Allows the agent to ‘ask for help’ from a human supervisor when stagnation isencountered. 2) Encodes received guidance as SATRMs. 3) Activates SATRMsas ‘similar’ situations are encountered.Both the baseline and Guided Learning algorithms were evaluated 50 times,each to the 150th generation. ‘Best Fitness’ and ‘Average Fitness’ results referto the fitness of the champion genome and average fitness of the population ateach generation respectively. Where ‘fitness’ is defined as the distance the agentmoves across the level.4Results & DiscussionFor Guided Learning, an average of 10 interventions were given over an averageperiod of about 8 hours. Interventions were not given at each opportunity presented and were instead lazily applied, averaging to 1 intervention for every 3requests. The run-time of Guided Learning was mostly hindered by the overheadof checking for stimulus similarity, this resulted in an extra run-time of about2x the baseline. This run-time can be substantially improved with some futurework.Guided Learning achieved 136% and 112% improvements in the regression slopesfor both the Mean Best Fitness and Mean Average Fitness respectively (see Fig.3 in Appendix A). We also looked at the best and worst performing cases. Theseresults can be seen in Fig. 4 and Table 2 in Appendix Disclaimer: The ROM used during the creation of this work was created as an archivalbackup from a genuine NES cartridge and was NOT downloaded/distributed overthe internet.98

GuidedLearning4K. Tunstead and J. BeelThe results obtained show good promise for Guided Learnings potential as suchresults were obtained with only a partial implementation and much future workstill remains.Some of the limitations of Guided Learning include the need for some kindof supervisor, its current run-time and its domain dependence i.e. a TRM for‘jump and run’ games would not work in other games with different mechanicsor reinforcement scenarios.Future work will include: 1) Building Guided Learning using more state of the artReinforcement Learning algorithms [4]. 2) Using a more generalized encoding ofthe stimulus to allow TRMs to be re-used more readily while still balancing thefalse-negative and false-positive activation trade-off (i.e. feeding raw pixel datainto a trained classifier). 3) Implementing TRM adaptation. 4) Taking advantageof poorly performing TRMs as a method of showing the agent what not to do[3]. 5) Run-time optimization by offloading the similarity check and guidancerequest to separate threads, this would mean that the agent would no longerwait for input and TRM selection predictions can also be made as the currentstimulus converges towards a valid TRM stimulus.References1. Hussein, A., Elyan, E., Gaber, M.M., Jayne, C.: Deep reward shaping from demonstrations. In: 2017 International Joint Conference on Neural Networks (IJCNN). pp.510–517. IEEE (2017)2. learning,line; accessed June 18, 2019](2018),[On-3. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planningand teaching. Machine learning 8(3-4), 293–321 (1992)4. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G.,Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)5. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmentingtopologies. Evolutionary computation 10(2), 99–127 (2002)6. Taylor, M.E., Carboni, N., Fachantidis, A., Vlahavas, I., Torrey, L.: Reinforcementlearning agents providing advice in complex video games. Connection Science 26(1),45–63 (2014)7. Tunstead, K., Beel, J.: Combating stagnation in reinforcement learning through‘guided learning’ with ‘taught-response memory’ [extended version]. arXiv (2019)8. Zhan, Y., Fachantidis, A., Vlahavas, I., Taylor, M.E.: Agents teaching humans inreinforcement learning tasks. In: Proceedings of the Adaptive and Learning AgentsWorkshop (AAMAS) (2014)99

Guided LearningAGuided Learning5Figures & TablesFig. 1. First pipe encounter in Super Mario Bros.(a)(b)(c)(d)Fig. 2. Input Reduction Pipeline Examples. (a) Raw RGB Frame (b) GrayscaledFrame (c) Aligned and Tiled Frame (d) Radius Tiles Surrounding Mario, r 4100

GuidedLearning6K. Tunstead and J. BeelTable 1. NEAT Configuration Used During EvaluationParameterInitial Population SizeActivation FunctionActivation Mutation RateInitial Weight/Bias Distribution MeanInitial Weight/Bias Distribution Std. DeviationWeight & Bias Max ValueWeight & Bias Min ValueWeight Mutation RateBias Mutation RateNode Add ProbabilityNode Delete ProbabilityConnection Add ProbabilityConnection Delete ProbabilityInitial number of Hidden NodesMax .20.10.30.16205Results Figures & TablesAverage Results Over 50 TrialsFig. 3. Baseline vs. Guided Learning Average Results Per Generation (Higher is better).101

Guided LearningB.2Guided Learning7Best & Worst Case Results(a)(b)Fig. 4. Baseline vs. Guided Learning Best and Worst Case Results (Higher is better).(a) Best Fitness. (b) Avg Fitness.102

GuidedLearning8K. Tunstead and J. BeelTable 2. Baseline vs. Guided Learning Best and Worst Case Slope ResultsBest Fitness (Highest Slope)Best Fitness (Lowest Slope)Avg Fitness (Highest Slope)Avg Fitness (Lowest Slope)Baseline Guided Learning .981.4447%103

example. In the game Super Mario Bros, when a reinforcement agent stagnates at the rst green pipe (see Fig. 1 in Appendix A), the agent can request guidance from a supervisor. If no guidance is received within a given time period, the algo

Related Documents:

COMBATING CONSERVATION CRIME LEARNING AGENDA …

COMBATING CONSERVATION CRIME LEARNING AGENDA Latin America and the Caribbean Environment November 2020. COMBATING CONSERVATION CRIMES. A LAC Environment Learning Group. COMBATING CONSERVATION CRIME LEARNING AGENDA Latin America and the Caribbean Environment. Photo by Eric Stoner Forest Canopy in Pará state Brazil. Purpose and Content. Purpose .

20 Views

2y ago

1 Introduction to reinforcement learning - GitHub Pages

IEOR 8100: Reinforcement learning Lecture 1: Introduction By Shipra Agrawal 1 Introduction to reinforcement learning What is reinforcement learning? Reinforcement learning is characterized by an agent continuously interacting and learning from a stochastic environment. Imagine a robot movin

23 Views

2y ago

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

97 Views

1y ago

Multi-Objective Reinforcement Learning using Sets of Pareto Dominating ...

In this section, we present related work and background concepts such as reinforcement learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement learning (Sutton and Barto, 1998) environment is typically formalized by means of a Markov decision process (MDP). An MDP can be described as follows. Let S fs 1 .

11 Views

1y ago

Multi-Agent Patrolling with Reinforcement Learning1

learning techniques, such as reinforcement learning, in an attempt to build a more general solution. In the next section, we review the theory of reinforcement learning, and the current efforts on its use in other cooperative multi-agent domains. 3. Reinforcement Learning Reinforcement learning is often characterized as the

10 Views

1y ago

MetaLight: Value-based Meta-reinforcement Learning for Traffic Signal ...

Meta-reinforcement learning. Meta reinforcement learn-ing aims to solve a new reinforcement learning task by lever-aging the experience learned from a set of similar tasks. Currently, meta-reinforcement learning can be categorized into two different groups. The ﬁrst group approaches (Duan et al. 2016; Wang et al. 2016; Mishra et al. 2018) use an

15 Views

1y ago

Reinforcement Learning for Optimal Control of Queueing Systems

Reinforcement learning methods provide a framework that enables the design of learning policies for general networks. There have been two main lines of work on reinforcement learning methods: model-free reinforcement learning (e.g. Q-learning [4], policy gradient [5]) and model-based reinforce-ment learning (e.g., UCRL [6], PSRL [7]). In this .

22 Views

1y ago

Detailing Aspects of the Reinforcement in Reinforced Concrete Structures

Using a retaining wall as a case-study, the performance of two commonly used alternative reinforcement layouts (of which one is wrong) are studied and compared. Reinforcement Layout 1 had the main reinforcement (from the wall) bent towards the heel in the base slab. For Reinforcement Layout 2, the reinforcement was bent towards the toe.

8 Views

1y ago

Recent Views

Family Law and You Booklet - lsc.sa.gov.au

FAMILY LAW AND YOU The Family Law Act is the main law that deals with divorce, disputes about children and property matters. All children are covered by the Family Law Act, no matter where in Australia they live or who their parents are. The courts that can make decisions under the Family Law Act are federal courts called Family Law Courts.

1y ago

143 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Case Law Update by Victor P. Valmus Family Law uarterly

Family Law uarterly Official Publication of the Cobb County Family Law Section The Cobb Case Law Update The Cobb Family Law uarterlyJune, 201 The Cobb Family Law Quarterly June, 2014 In this Edition Business Valuation and Reporting in Matrimonial Disputes by Marc L. Effron, CPA/ CFF, JD, CVA and Kevin P. Couillard, ASA, CFA

1y ago

114 Views

Board Beans Collection - BOARD BEANS - Board Beans

Catan Family 3 4 4 Checkers Family 2 2 2 Cherry Picking Family 2 6 3 Cinco Linko Family 2 4 4 . Lost Cities Family 2 2 2 Love Letter Family 2 4 4 Machi Koro Family 2 4 4 Magic Maze Family 1 8 4 4. . Top Gun Strategy Game Family 2 4 2 Tri-Ominos Family 2 6 3,4 Trivial Pursuit: Family Edition Family 2 36 4

2y ago

384 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

What is Family Law? - Courts and Tribunals Judiciary

What is family law? After all, the law of inheritance is usually thought of as a branch of property law and thus a matter for the Chancery rather than the Family Division. And family 1 Changing families: family law yesterday, today and tomorrow - a view from south of the Border [2018] Fam Law 538, 542-3.

1y ago

128 Views

Domestic Violence and Family Law in Papua New Guinea

Family law in PNG Family law deals with issues relating to family and domestic relationships. Major topics covered by family law include marriage, divorce, child maintenance, prop - erty claims following separation and the custody and adoption of children (Jessep and Luluaki 1985:11). Much of PNG's family law legislation was adopted as

1y ago

126 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Family Law for the Future — An Inquiry into the Family Law .

Review of the Family Law System On 27 September 2017, the Australian Law Reform Commission received Terms of Reference to undertake an inquiry into the family law system. On behalf of the Members of the Commission involved in this Inquiry, and in accordance with the Australian Law

3y ago

136 Views

Practice Material - Family - Law Society of British Columbia

The Law Society's . Report of the Family Law Task Force: Best Practice Guidelines for Law-yers Practising Family Law. Family law has undergone significant changes over the past several years, and more changes are underway. 2. It is important to verify that your legal knowledge and re-sources are current. For example, note these changes:

1y ago

125 Views

Combating Stagnation In Reinforcement Learning Through .

It looks like you're using an ad-blocker