Lecture 1: Introduction To Reinforcement Learning

1y ago
6 Views
2 Downloads
2.86 MB
46 Pages
Last View : 8d ago
Last Download : 2m ago
Upload by : Jacoby Zeller
Transcription

Lecture 1: Introduction to Reinforcement LearningLecture 1: Introduction to ReinforcementLearningDavid Silver

Lecture 1: Introduction to Reinforcement LearningOutline1 Admin2 About Reinforcement Learning3 The Reinforcement Learning Problem4 Inside An RL Agent5 Problems within Reinforcement Learning

Lecture 1: Introduction to Reinforcement LearningAdminClass InformationThursdays 9:30 to m/group/csml-advanced-topicsContact me: d.silver@cs.ucl.ac.uk

Lecture 1: Introduction to Reinforcement LearningAdminAssessmentAssessment will be 50% coursework, 50% examCourseworkAssignment A: RL problemAssignment B: Kernels problemAssessment max(assignment1, assignment2)ExaminationA: 3 RL questionsB: 3 kernels questionsAnswer any 3 questions

Lecture 1: Introduction to Reinforcement LearningAdminTextbooksAn Introduction to Reinforcement Learning, Sutton andBarto, 1998MIT Press, 1998 40 poundsAvailable free online!http://webdocs.cs.ualberta.ca/ sutton/book/the-book.htmlAlgorithms for Reinforcement Learning, SzepesvariMorgan and Claypool, 2010 20 poundsAvailable free online!http://www.ualberta.ca/ szepesva/papers/RLAlgsInMDPs.pdf

Lecture 1: Introduction to Reinforcement LearningAbout RLMany Faces of Reinforcement LearningComputer BoundedRationalityEconomicsPsychology

Lecture 1: Introduction to Reinforcement LearningAbout RLBranches of Machine ineLearningReinforcementLearning

Lecture 1: Introduction to Reinforcement LearningAbout RLCharacteristics of Reinforcement LearningWhat makes reinforcement learning different from other machinelearning paradigms?There is no supervisor, only a reward signalFeedback is delayed, not instantaneousTime really matters (sequential, non i.i.d data)Agent’s actions affect the subsequent data it receives

Lecture 1: Introduction to Reinforcement LearningAbout RLExamples of Reinforcement LearningFly stunt manoeuvres in a helicopterDefeat the world champion at BackgammonManage an investment portfolioControl a power stationMake a humanoid robot walkPlay many different Atari games better than humans

Lecture 1: Introduction to Reinforcement LearningAbout RLHelicopter Manoeuvres

Lecture 1: Introduction to Reinforcement LearningAbout RLBipedal Robots

Lecture 1: Introduction to Reinforcement LearningAbout RLAtari

Lecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardRewardsA reward Rt is a scalar feedback signalIndicates how well agent is doing at step tThe agent’s job is to maximise cumulative rewardReinforcement learning is based on the reward hypothesisDefinition (Reward Hypothesis)All goals can be described by the maximisation of expectedcumulative rewardDo you agree with this statement?

Lecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardExamples of RewardsFly stunt manoeuvres in a helicopter ve reward for following desired trajectory ve reward for crashingDefeat the world champion at Backgammon / ve reward for winning/losing a gameManage an investment portfolio ve reward for each in bankControl a power station ve reward for producing power ve reward for exceeding safety thresholdsMake a humanoid robot walk ve reward for forward motion ve reward for falling overPlay many different Atari games better than humans / ve reward for increasing/decreasing score

Lecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardSequential Decision MakingGoal: select actions to maximise total future rewardActions may have long term consequencesReward may be delayedIt may be better to sacrifice immediate reward to gain morelong-term rewardExamples:A financial investment (may /badness of statesAnd therefore to select between actions, e.g. vπ (s) Eπ Rt 1 γRt 2 γ 2 Rt 3 . St s

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentExample: Value Function in Atari

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentModelA model predicts what the environment will do nextP predicts the next stateR predicts the next (immediate) reward, e.g.a0Pss0 P[St 1 s St s, At a]Ras E [Rt 1 St s, At a]

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze ExampleStartRewards: -1 per time-stepActions: N, E, S, WStates: Agent’s locationGoal

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example: PolicyStartGoalArrows represent policy π(s) for each state s

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example: Value 11-22-8-6-19-5-20-4-21-22-9-7-3-2-1GoalNumbers represent value vπ (s) of each state s

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example: Model-1Start-1-1-1-1-1-1-1-1Agent may have an internalmodel of the environment-1-1-1Dynamics: how actionschange the state-1-1-1-1-1-1GoalRewards: how much rewardfrom each stateThe model may be imperfectaGrid layout represents transition model Pss0Numbers represent immediate reward Ras from each state s(same for all a)

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentCategorizing RL agents (1)Value BasedNo Policy (Implicit)Value FunctionPolicy BasedPolicyNo Value FunctionActor CriticPolicyValue Function

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentCategorizing RL agents (2)Model FreePolicy and/or Value FunctionNo ModelModel BasedPolicy and/or Value FunctionModel

Lecture 1: Introduction to Reinforcement LearningInside An RL AgentRL Agent TaxonomyModel-FreeValue del-BasedModel

Lecture 1: Introduction to Reinforcement LearningProblems within RLLearning and PlanningTwo fundamental problems in sequential decision makingReinforcement Learning:The environment is initially unknownThe agent interacts with the environmentThe agent improves its policyPlanning:A model of the environment is knownThe agent performs computations with its model (without anyexternal interaction)The agent improves its policya.k.a. deliberation, reasoning, introspection, pondering,thought, search

Lecture 1: Introduction to Reinforcement LearningProblems within RLAtari Example: Reinforcement LearningobservationactionOtAtrewardRtRules of the game areunknownLearn directly frominteractive game-playPick actions onjoystick, see pixelsand scores

Lecture 1: Introduction to Reinforcement LearningProblems within RLAtari Example: PlanningRules of the game are knownCan query emulatorrightperfect model inside agent’s brainleftIf I take action a from state s:what would the next state be?what would the score be?Plan ahead to find optimal policye.g. tree searchrightleftrightleft

Lecture 1: Introduction to Reinforcement LearningProblems within RLExploration and Exploitation (1)Reinforcement learning is like trial-and-error learningThe agent should discover a good policyFrom its experiences of the environmentWithout losing too much reward along the way

Lecture 1: Introduction to Reinforcement LearningProblems within RLExploration and Exploitation (2)Exploration finds more information about the environmentExploitation exploits known information to maximise rewardIt is usually important to explore as well as exploit

Lecture 1: Introduction to Reinforcement LearningProblems within RLExamplesRestaurant SelectionExploitation Go to your favourite restaurantExploration Try a new restaurantOnline Banner AdvertisementsExploitation Show the most successful advertExploration Show a different advertOil DrillingExploitation Drill at the best known locationExploration Drill at a new locationGame PlayingExploitation Play the move you believe is bestExploration Play an experimental move

Lecture 1: Introduction to Reinforcement LearningProblems within RLPrediction and ControlPrediction: evaluate the futureGiven a policyControl: optimise the futureFind the best policy

Lecture 1: Introduction to Reinforcement LearningProblems within RLGridworld Example: PredictionAB3.3 8.8 4.4 5.3 1.5 5 101.5 3.0 2.3 1.9 0.50.1 0.7 0.7 0.4 -0.4B’-1.0 -0.4 -0.4 -0.6 -1.2ActionsA’(a)-1.9 -1.3 -1.2 -1.4 -2.0(b)What is the value function for the uniform random policy?

Lecture 1: Introduction to Reinforcement LearningProblems within RLGridworld Example: ControlAB 5 10B’22.0 24.4 22.0 19.4 17.519.8 22.0 19.8 17.8 16.017.8 19.8 17.8 16.0 14.416.0 17.8 16.0 14.4 13.0A’a) gridworld14.4 16.0 14.4 13.0 11.7b) V*v c) π * What is the optimal value function over all possible policies?What is the optimal policy?

Lecture 1: Introduction to Reinforcement LearningCourse OutlineCourse OutlinePart I: Elementary Reinforcement LearningIntroduction to RLMarkov Decision ProcessesPlanning by Dynamic ProgrammingModel-Free Prediction5 Model-Free Control1234Part II: Reinforcement Learning in PracticeValue Function ApproximationPolicy Gradient Methods3 Integrating Learning and Planning4 Exploration and Exploitation5 Case study - RL in games12

Classical/Operant Conditioning Optimal Control Reward System Operations Research Bounded Rationality Reinforcement Learning. Lecture 1: Introduction to Reinforcement Learning . Examples of Rewards Fly stunt manoeuvres in a helicopter ve rewar

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

IEOR 8100: Reinforcement learning Lecture 1: Introduction By Shipra Agrawal 1 Introduction to reinforcement learning What is reinforcement learning? Reinforcement learning is characterized by an agent continuously interacting and learning from a stochastic environment. Imagine a robot movin

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Using a retaining wall as a case-study, the performance of two commonly used alternative reinforcement layouts (of which one is wrong) are studied and compared. Reinforcement Layout 1 had the main reinforcement (from the wall) bent towards the heel in the base slab. For Reinforcement Layout 2, the reinforcement was bent towards the toe.

Footing No. Footing Reinforcement Pedestal Reinforcement - Bottom Reinforcement(M z) x Top Reinforcement(M z x Main Steel Trans Steel 2 Ø8 @ 140 mm c/c Ø8 @ 140 mm c/c N/A N/A N/A N/A Footing No. Group ID Foundation Geometry - - Length Width Thickness 7 3 1.150m 1.150m 0.230m Footing No. Footing Reinforcement Pedestal Reinforcement

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .