Deep Reinforcement Learning: Q-Learning

2y ago

16 Views

2 Downloads

7.39 MB

138 Pages

Last View : 25d ago

Last Download : 3m ago

Upload by : Mariam Herr

Report this link

Download PDF

Transcription

Deep ReinforcementLearning: Q-LearningGarima LalwaniKaran GanjuUnnat Jain

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Q-LearningDavid Silver’s Introduction to RL lecturesPeter Abbeel’s Artificial Intelligence - Berkeley (Spring 2015)

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Function Approximation - Why? Value functions Every state s has an entry V(s)Every state-action pair (s, a) has an entry Q(s, a)How to get Q(s,a) Table lookupWhat about large MDPs? Estimate value function with function approximation Generalise from seen states to unseen states

Function Approximation - How? Why Q? How to approximate? Features for state s s,a x(s,a)Linear model Q(s,a) wTx(s,a)Deep Neural Nets - CS598 Q(s,a) NN(s,a)

Function Approximation - Demo

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Deep Q Network1) Input:4 images currentframe 3 previousMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.2) Output: Q(s,ai)Q(s,a1)Q(s,a2)Q(s,a3).Q(s,a18)

Deep Q Network1) Input:4 images currentframe 3 previous?Mnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.2) Output: Q(s,ai)Q(s,a1)Q(s,a2)Q(s,a3).Q(s,a18)

Deep Q Network1) Input:4 images currentframe 3 previous(s)Mnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.2) Output: Q(s,ai)Q(s,a1)Q(s,a2)Q(s,a3).Q(s,a18)

Supervised SGD (lec2) SGD update assuming supervisionDavid Silver’s Deep Learning Tutorial, ICML 2016vsQ-Learning SGD

Supervised SGD (lec2) SGD update assuming supervisionDavid Silver’s Deep Learning Tutorial, ICML 2016vsQ-Learning SGD SGD update for Q-Learning

Training tricks Issues:a.b.Data is sequential Successive samples are correlated, non-iid An experience is visited only once in online learningPolicy changes rapidly with slight changes to Q-values Policy may oscillateMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Training tricks Issues:a.Data is sequential Successive samples are correlated, non-iid An experience is visited only once in online learningMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Training tricks Issues:a. Data is sequential Successive samples are correlated, non-iid An experience is visited only once in online learningSolution: ‘Experience Replay’ : Work on a dataset - Sample randomly and repeatedly Build dataset Take action at according to -greedy policy Store transition/experience (st , at ,rt 1,st 1) in dataset D (‘replay memory’) Sample randomly mini-batch (32 experiences) of (s, a, r, s’)) from DMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Training tricks Issues:a.b.Data is sequentialExperience replay Successive samples are correlated, non-iid An experience is visited only once in online learningPolicy changes rapidly with slight changes to Q-values Policy may oscillateMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Network 1Training tricks Issues:a.b. Data is sequentialExperience replay Successive samples are correlated, non-iid An experience is visited only once in online learningPolicy changes rapidly with slight changes to Q-values Policy may oscillateSolution: ‘Target Network’ : Stale updates C step delay between update of Q and its use as targetsMnih, Volodymyr, et al. "Human-level control through deepWikiCommonslearning."[Img link] Nature 518.7540 (2015): 529-533.reinforcementTells me Q(s,a) targets(wi -1)Network 2Q values are updated every SGD step(wi )

Network 1Training tricks Issues:a.b. Data is sequentialExperience replay Successive samples are correlated, non-iid An experience is visited only once in online learningPolicy changes rapidly with slight changes to Q-values Policy may oscillateSolution: ‘Target Network’ : Stale updates C step delay between update of Q and its use as targetsMnih, Volodymyr, et al. "Human-level control through deepWikiCommonslearning."[Img link] Nature 518.7540 (2015): 529-533.reinforcementTells me Q(s,a) targets(wi )After 10,000 SGD updatesNetwork 2Q values are updated every SGD step(wi 1)

DQN: ResultsWhy not just use VGGNet features?Mnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

DQN: ResultsMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Q-Learning for RouletteHasselt, Hado V. "Double Q-learning." In Advances in NeuralInformation Processing Systems, pp. 2613-2621. pielbank-wiesbaden-by-RalfR-094.jpg

Q-Learning Overestimation : Function ApproximationQ EstimateActual Q ValueVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Q-Learning Overestimation : Function ApproximationVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

One (Estimator) Isn’t Good Enough?

One (Estimator) Isn’t Good Enough?Use g

Double Q-Learning Two estimators: Estimator Q1 : Obtain best action Estimator Q2 : Evaluate Q for the above action Chances of both estimators overestimating at same action is lesserVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Double Q-Learning Two estimators: Estimator Q1 : Obtain best action Estimator Q2 : Evaluate Q for the above actionQ TargetVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Double Q-Learning Two estimators: Estimator Q1 : Obtain best action Estimator Q2 : Evaluate Q for the above actionQ1Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.Q2

Results - All Atari GamesVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Results - Solves OverestimationsVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Pong - Up or DownMnih, Volodymyr, et al. "Human-level control through deepreinforcement learning." Nature 518.7540 (2015): 529-533.

Enduro - Left or omunidade/Enduro.png

Enduro - Left or g

Advantage FunctionLearning action values Inherently learning both state values and relative value of theaction in that state!We can use this to help generalize learning for the state values.Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, andNando de Freitas. "Dueling network architectures for deep reinforcement learning."arXiv preprint arXiv:1511.06581 (2015).

Dueling g dqn.htmlAggregating Module

ResultsWhere does V(s) attend to?Where does A(s,a) attend to?Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, andNando de Freitas. "Dueling network architectures for deep reinforcement learning."arXiv preprint arXiv:1511.06581 (2015).

ResultsImprovements of dueling architecture overPrioritized DDQN baseline measured bymetric above over 57 Atari gamesWang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, andNando de Freitas. "Dueling network architectures for deep reinforcement learning."arXiv preprint arXiv:1511.06581 (2015).

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Moving to more General and Complex Games All games may not be representable using MDPs; some may be POMDPs FPS shooter gamesScrabbleEven Atari gamesEntire history a solution?

Moving to more General and Complex Games All games may not be representable using MDPs; some may be POMDPs FPS shooter gamesScrabbleEven Atari gamesEntire history a solution? LSTMs !

Deep Recurrent Q-LearningFCHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Deep Recurrent Q-LearningHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Deep Recurrent Q-Learningh1h2h3Hausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partially observable mdps." arXiv preprint arXiv:1507.06527 (2015).

DRQN Results MissesHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

DRQN Results PaddleDeflectionHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

DRQN Results WallDeflectionsHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Results - Robustness to partial observabilityHausknecht, Matthew, and Peter Stone."Deep recurrent q-learning for partiallyobservable mdps." arXiv preprintarXiv:1507.06527 (2015).POMDPMDP

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

Application of DRQN: Playing ‘Doom’Lample, Guillaume, and Devendra Singh Chaplot. "Playing FPSgames with deep reinforcement learning."

Doom Demo

How does DRQN help ?-Observe ot instead of stLimited field of viewInstead of estimating Q(st, at), estimate Q(ht,at) where ht LSTM(ht-1,ot)

Architecture: Comparison with Baseline DRQN

Training Tricks Jointly training DRQN model and game feature detection

Training Tricks Jointly training DRQN model and game feature detectionWhat do you think is the advantage of this ?

Training Tricks Jointly training DRQN model and game feature detectionWhat do you think is the advantage of this ? CNN layers capture relevant information about features of the game thatmaximise action value scores

Modular ArchitectureEnemyspottedAction Network(DRQN)All clear!

Modular ArchitectureEnemyspottedDRQNAction Network(DRQN)All clear!DQN

Modular Network : Advantages-Can be trained and tested independentlyBoth can be trained in parallelReduces the state-action pairs space : Faster TrainingMitigates camper behavior : “Tendency to stay in one area of the map andwait for enemies”

Rewards Formulation for DOOMWhat do you think ?

Rewards Formulation for DOOMWhat do you think ? Positive rewards for Kills and Negative rewards for suicidesSmall Intermediate Rewards : Positive Reward for object pickupNegative Reward for losing healthNegative Reward for shooting or losing ammoSmall Positive Rewards proportional to the distance travelled since last step(Agent avoids running in circles)

Performance with Separate Navigation Network

Results

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

h-DQN

Double DQNVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Dueling NetworksVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

How is this game different ? Complex Game EnvironmentSparse and Longer Range DelayedRewards Insufficient Exploration : We needtemporally extended explorationKulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

How is this game different ? Complex Game EnvironmentSparse and Longer Range Delayed Rewards Insufficient Exploration : We need temporally extended explorationDividing Extrinsic Goal into Hierarchical Intrinsic Subgoals

Intrinsic Goals in Montezuma’s RevengeKulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

Hierarchy of DQNsEnvironmentAgent

Architecture Block for h-DQNKulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

h-DQN Learning Framework (1) V(s,g) : Value function of a state for achieving the given goal g GOption : A multi-step action policy to achieve these intrinsic goals g GCan also be primitive actions: Policy Over Options to achieve goal gAgents learnsg which Intrinsic goals are importantgcorrect sequence of such policiesgKulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

h-DQN Learning Framework (2)Objective Function for Meta-Controller :-Maximise Cumulative Extrinsic RewardFt Objective Function for Controller :-Maximise Cumulative Intrinsic RewardRt Kulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

Training-Two disjoint memories D1 and D2 for Experience ReplayExperiences (st , gt , ft , st N) for Q2 are stored in D2Experiences (st , at , gt , rt , st 1) for Q1 are stored in D1Different time scales-Transitions from Controller (Q1) are picked at every time stepTransitions from Meta-Controller (Q2) are picked only when controller terminates on reachingthe intrinsic goal or epsiode endsKulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

Results :Kulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B.Tenenbaum. “Hierarchical Deep Reinforcement Learning: Integrating TemporalAbstraction and Intrinsic Motivation.” NIPS 2016

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQN

ReferencesBasic RL David Silver’s Introduction to RL lectures Peter Abbeel’s Artificial Intelligence - Berkeley (Spring 2015)DQN Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015):529-533. Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).DDQN Hasselt, Hado V. "Double Q-learning." In Advances in Neural Information Processing Systems, pp. 2613-2621. 2010. Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." In AAAI,pp. 2094-2100. 2016.Dueling DQN Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. "Dueling networkarchitectures for deep reinforcement learning." arXiv preprint arXiv:1511.06581 (2015).

ReferencesDRQN Hausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partially observable mdps." arXiv preprintarXiv:1507.06527 (2015).Doom Lample, Guillaume, and Devendra Singh Chaplot. "Playing FPS games with deep reinforcement learning."h-DQN Kulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B. Tenenbaum. “Hierarchical DeepReinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation.” NIPS 2016Additional NLP/Vision applications Narasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based games usingdeep reinforcement learning." EMNLP 20155 Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforcement learning." Proceedingsof the IEEE International Conference on Computer Vision. 2015. Zhu, Yuke, et al. "Target-driven visual navigation in indoor scenes using deep reinforcement learning." arXiv preprintarXiv:1609.05143 (2016).

Deep Q Learning for text-based gamesNarasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

Text Based Games : Back in 1970’s Predecessors to Modern Graphical GamesMUD (Multi User Dungeon Games) still prevalentNarasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

State Spaces and Action Spaces-Hidden state space h H but given textual description{ ψ : H S}-Actions are commands (action-object pairs) A {(a,o)}Thh’(a,o) : Transition ProbabilitiesJointly learn state representations and control policies as learnedStrategy/Policy directly builds on the text interpretationNarasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

Learning Representations and Control PoliciesNarasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

Results (1) :Learnt UsefulRepresentations for theGameNarasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

Results (2) :Narasimhan, Karthik, Tejas Kulkarni, and Regina Barzilay. "Language understanding for text-based gamesusing deep reinforcement learning." EMNLP 2015

Today’s takeaways Bonus RL recapFunctional ApproximationDeep Q NetworkDouble Deep Q NetworkDueling NetworksRecurrent DQN Solving “Doom”Hierarchical DQNMore applications: Text based gamesObject DetectionIndoor Navigation

Object Detection as a RL problem?-States:Actions:

Object detection as a RL problem?-States:Actions:[image-link]

Object detection as a RL problem?-States:Actions:- c*(x2-x1) , c*(y2-y1) relative translation- scale- aspect ratio- trigger when IoU is high[image-link]J. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?-States: fc6 feature of pretrained VGG19Actions:- c*(x2-x1) , c*(y2-y1) relative translation- scale- aspect ratio- trigger when IoU is high[image-link]J. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?--States: fc6 feature of pretrained VGG19Actions:- c*(x2-x1) , c*(y2-y1) relative translation- scale- aspect ratio- trigger when IoU is highReward:[image-link]J. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?Q(s,a1 scale up)State(s)CurrentboundingboxQ(s,a2 scale down)Q(s,a3 shift left)Q(s,a9 trigger)J. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?HistoryQ(s,a1 scale up)State(s)CurrentboundingboxQ(s,a2 scale down)Q(s,a3 shift left)Q(s,a9 trigger)J. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?Fine details:- Class specific, attention-action model- Does not follow a fixed sliding window trajectory, image dependent trajectory- Use 16 pixel neighbourhood to incorporate contextJ. Caicedo and S. Lazebnik, ICCV 2015

Object detection as a RL problem?J. Caicedo and S. Lazebnik, ICCV 2015

Navigation as a RL problem?- States:- Actions:“Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning”Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

Navigation as a RL problem?- States: ResNet-50 features- Actions:“Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning”Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

Navigation as a RL problem?- States: ResNet-50 feature- Actions:“Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning”Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

Navigation as a RL problem?- States: ResNet-50 feature- Actions:- Forward/backward 0.5 m- Turn left/right 90 deg- Trigger“Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning”Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

Navigation as a RL problem?Q(s,a1 forward)State (s)Current frame andthe target frameQ(s,a2 backward)Q(s,a3 turn left)Q(s,a2 turn right)Q(s,a6 trigger)“Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning”Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

Navigation as a RL problem?Simulated environmentReal environment

Q-Learning Overestimation : Function ApproximationVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

Q-Learning Overestimation : Intuition[Jensen’s com/2015/12/doubleqposter.pdf

Q-Learning Overestimation : Intuition[Jensen’s Inequality]What we 5/12/doubleqposter.pdf

Q-Learning Overestimation : Intuition[Jensen’s Inequality]What we estimate .com/2015/12/doubleqposter.pdfWhat we want

Double Q-Learning : Function ApproximationVan Hasselt, Hado, Arthur Guez, and David Silver. "Deep ReinforcementLearning with Double Q-Learning." In AAAI, pp. 2094-2100. 2016.

ResultsMean and median scores across all 57 Atari games, measured inpercentages of human performanceWang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, andNando de Freitas. "Dueling network architectures for deep reinforcement learning."arXiv preprint arXiv:1511.06581 (2015).

Results : Comparison to 10 frame DQN Captures in one frame (and history state) what DQN captures in a stack of 10for Flickering Pong 10 frame DQN conv-1captures paddleinformationHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Results : Comparison to 10 frame DQN Captures in one frame (and history state) what DQN captures in a stack of 10for Flickering Pong 10 frame DQN conv-2captures paddle and balldirection informationHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Results : Comparison to 10 frame DQN Captures in one frame (and history state) what DQN captures in a stack of 10for Flickering Pong 10 frame DQN conv-3captures paddle, balldirection, velocity anddeflection informationHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partiallyobservable mdps." arXiv preprint arXiv:1507.06527 (2015).

Results : Comparison to 10 frame DQNScores are comparable to10-frame DQN outperforming in some andlosing in someHausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning forpartially observable mdps." arXiv preprint arXiv:1507.06527 (2015).

Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533. Training tricks Issues: a. Data is sequential Experience replay . Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 5

Related Documents:

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

99 Views

1y ago

GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning

Deep Reinforcement Learning: Reinforcement learn-ing aims to learn the policy of sequential actions for decision-making problems [43, 21, 28]. Due to the recen-t success in deep learning [24], deep reinforcement learn-ing has aroused more and more attention by combining re-inforcement learning with deep neural networks [32, 38].

78 Views

1y ago

1 Introduction to reinforcement learning - GitHub Pages

IEOR 8100: Reinforcement learning Lecture 1: Introduction By Shipra Agrawal 1 Introduction to reinforcement learning What is reinforcement learning? Reinforcement learning is characterized by an agent continuously interacting and learning from a stochastic environment. Imagine a robot movin

24 Views

2y ago

Enhancing Deep Reinforcement Learning Agent for Angry Birds

A representative work of deep learning is on playing Atari with Deep Reinforcement Learning [Mnih et al., 2013]. The reinforcement learning algorithm is connected to a deep neural network which operates directly on RGB images. The training data is processed by using stochastic gradient method. A Q-network denotes a neural network which approxi-

31 Views

8m ago

Detailing Aspects of the Reinforcement in Reinforced Concrete Structures

Using a retaining wall as a case-study, the performance of two commonly used alternative reinforcement layouts (of which one is wrong) are studied and compared. Reinforcement Layout 1 had the main reinforcement (from the wall) bent towards the heel in the base slab. For Reinforcement Layout 2, the reinforcement was bent towards the toe.

8 Views

1y ago

Isolated Footing Design (IS 456-2000) - Bentley

Footing No. Footing Reinforcement Pedestal Reinforcement - Bottom Reinforcement(M z) x Top Reinforcement(M z x Main Steel Trans Steel 2 Ø8 @ 140 mm c/c Ø8 @ 140 mm c/c N/A N/A N/A N/A Footing No. Group ID Foundation Geometry - - Length Width Thickness 7 3 1.150m 1.150m 0.230m Footing No. Footing Reinforcement Pedestal Reinforcement

8 Views

1y ago

Multi-Objective Reinforcement Learning using Sets of Pareto Dominating ...

In this section, we present related work and background concepts such as reinforcement learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement learning (Sutton and Barto, 1998) environment is typically formalized by means of a Markov decision process (MDP). An MDP can be described as follows. Let S fs 1 .

11 Views

1y ago

Multi-Agent Patrolling with Reinforcement Learning1

learning techniques, such as reinforcement learning, in an attempt to build a more general solution. In the next section, we review the theory of reinforcement learning, and the current efforts on its use in other cooperative multi-agent domains. 3. Reinforcement Learning Reinforcement learning is often characterized as the

10 Views

1y ago

Recent Views

Saint Robert Bellarmine - WordPress

Aug 08, 2018 · Sister Laura Gorman Sister Anna Frances Portisch Sister Mary Edward Haren Sister Dolores Priske (Helen Julie) Sister Scholastica Healy Sister olette Marie Quinn Sister lara . S. Heidelman Sister Alice Mary Reilly Sister Genevieve Henneberry (Fidelis) Sister Genevieve Rigney

2y ago

160 Views

Sunday, September 12, 2021 10:00 a.m.

Sep 12, 2021 · On our 154th Church Anniversary, We salute the members of Mount Pleasant Baptist Church who have served for 50 years or more. Sister Brenda Bradley Sister Mary Lockett Sister Aaronita Brown Sister June Marshall Deacon Carlton Brown Sister Barbara Moore Sister Gwendolyn Brown Sister Frances Robinson Deaconess Josephine Byrd Sister Frances Ross

2y ago

344 Views

MRS Title 21-A. ELECTIONS - Maine Legislature

stepgrandchild, stepsister, stepbrother, mother-in-law, father-in-law, brother-in-law, sister-in-law, son-in-law, daughter-in-law, guardian, former guardian, domestic partner, the half-brother or half-sister of a person's spouse, or the spouse of a person's half-brother or half-sister. [PL 2009, c. 253, §2 (AMD).] 21. Incoming voting list.

1y ago

118 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Immaculata, Pennsylvania 19345-0200 Catholic Schools

Fall, 2012 Cover Sister Monica Therese Sicilia, I.H.M. IHM Best Practices Sister Margaret Rose Adams, I.H.M For Teachers: Sister Adrienne Saybolt, I.H.M. “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Rita James Murphy, I.H.M. Good Writer

2y ago

117 Views

Winter 2012 - IHM EDUCATIONAL RESOURCES - Home

IHM Best Practices Sister Margaret Rose Adams, IHM For Teachers: Sister Adrienne Saybolt, IHM “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Elaine deChantal Brookes, IHM Sister

2y ago

138 Views

Tributes in Honor of: SISTER JANET AHLER, CSA CSA SISTERS .

Everett & Jeannine Solon SISTER CORINNE HEIMANN, CSA St Mary's Hospital Board of Directors Teresa Hebble John & Mary Sterba SISTER MARY VERONICA HEIMANN, CSA Sybil Teehan Teresa Hebble Rebecca & Gary Tirevold MR EDWARD HELSTOSKY Bonnie Young Barbara Britz SISTER JOELLEN FLYNN, CSA RAY HINZ Susan Flynn Carol Hinz Fran Frigo JEAN W HOFF

2y ago

341 Views

How to Use These “Snippets” and Poems

For Sale By Shel Silverstein One sister for sale! One sister for sale! One crying and spying young sister for sale! I’m really not kidding, So who’ll start the bidding? Do I hear the dollar? A nickel? A penny? Oh, isn’t there, isn’t there, isn’t there any One kid that will buy this old sister for sale,

2y ago

367 Views

CODIS2006 - Mixture Interpretation - Butler FINAL

“Things we do not do: Calculate mixture ratios for casework – Calculation used for this study: Find loci with 4 alleles (2 sets of sister alleles). Make sure sister alleles fall within 70%, then take the ratio of one allele from one sister set to one allele of the second sister set, figure ratios for all combinations and average.

2y ago

315 Views

CONSECRATA

Salesian Sisters of St. John Bosco Sister Marie Amata D’Amico, C.K. School Sisters of Christ the King Sister Mary Stephany Rose, O.S.H.J. Oblate Sisters of the Sacred Heart of Jesus Sister Brigid Mary Meeks, R.S.M. Religious Sisters of Mercy of Alma Sister Hae-Jin Lim, F.M.A. Salesian Sisters of

1y ago

111 Views

Sister Makes House Calls During the Pandemic

Sister Patricia Deckert, RSM. As an elementary school teacher, Sister Patricia (Pat) taught in the Trenton, Metuchen and Camden dioceses in New Jersey, serving eight years at Cathedral School in Trenton, and seven years at St. James School in Red Bank. Attending nursing school at the age of 50, Sister Pat first ministered at McAuley Hall

10m ago

86 Views

Deep Reinforcement Learning: Q-Learning

It looks like you're using an ad-blocker