Applying Reinforcement Learning On Automated Cryptocurrency Trading

1y ago

15 Views

3 Downloads

553.92 KB

20 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Amalia Wilborn

Report this link

Download PDF

Transcription

Applying Reinforcement Learning on AutomatedCryptocurrency TradingCOMP4971C - Independent Work (Fall 2020)December, 2020CHONG, Cheuk HeiSupervised by Dr. David RossiterDepartment of Computer Science and Engineering, HKUSTAbstractIn this research project, we examine the feasibility of automated cryptocurrencytrading using reinforcement learning in order to learn an optimal policy by themachine itself. Technical indicators will also be added into the model to increasethe chance of performing better action in every state. Evaluation on the modelperformance will also be conducted to verify the feasibility of implementing tradingbot with reinforcement learning in the practical scenerio.1

Reinforcement Learning on Cryptocurrency TradingCOMP4971CContents1 Introduction32 Disclaimer33 Related Work3.1 Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33554 Data4.1 Finding Price History Datasets4.2 Time Interval of Dataset . . . .4.3 Adding Technical Indicators . .4.4 Data Normalization . . . . . . .77889.99910101011121213.13141414141417187 Future Extension7.1 Strategy Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2 Exchange API Integration . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191919198 Conclusion19.5 Methodology5.1 Advantages of Reinforcement Learning5.2 Architecture of Reinforcement Learning5.3 Model Settings . . . . . . . . . . . . .5.3.1 Environment . . . . . . . . . .5.3.2 State . . . . . . . . . . . . . . .5.3.3 Action . . . . . . . . . . . . . .5.3.4 Rewards . . . . . . . . . . . . .5.3.5 Agent . . . . . . . . . . . . . .5.3.6 Neural Network . . . . . . . . .6 Experiment6.1 Experiment Environment . . . . . . .6.2 Training and Testing Data . . . . . .6.3 Hyper-parameter Setting . . . . . . .6.4 Evaluation . . . . . . . . . . . . . . .6.4.1 Training Result . . . . . . . .6.4.2 Testing Result . . . . . . . . .6.4.3 Outperforming in Coronavirus2. . . . . . . . . . . . . . . . . . .Period.

Reinforcement Learning on Cryptocurrency Trading1COMP4971CIntroductionWith the rise of blockchain technology in recent years, more people are discussing oncryptocurrencies and starting to enter the trading market on cryptocurrencies. With itshigh volatility on price and 24/7 trading time characteristics, trading on cryptocurrenciesmight be a high-risk assessment when people do not have self-disciplines to constructstrict policies on trading and do not monitor the pricing movement from time to time.Huge profit loss might occur to those FOMO traders when they observe the prices beingdecreasing rapidly.Therefore, if people would like to use the optimal efforts to earn profits on cryptocurrency trading, it is necessary to implement automated trading algorithm which can execute buy, sell and hold actions at the right time by adopting ”Buy low, sell high” strategy.In this project, we will adopt the concepts of reinforcement learning on trading. The welltrained model will replace humans to perform actions without affected by feelings. We willalso examine the prediction power of reinforcement learning and evaluate its effectivenessquantitatively.2DisclaimerThe information presented in this research is not intended as, and shall not be understood as financial advice to enter in any security transactions or to engage in any of theinvestment strategies.3Related WorkThere are some approaches which apply different methods on the price prediction of cryptocurrency. Based on those approaches, people will get the trend of the price movement orthe prediction price of certain cryptocurrency by the well-trained algorithm. The followingare some approaches to work on the trading price prediction:3.1Technical AnalysisTechnical Analysis is a way to forecasting the general movement of the price in the futurebased on the previous price movements, which can be applied on any trading instruments.Chart analysis is frequent nowadays to know the trend of the price movement and peoplehave invented different technical indicators to understand the price stock in a quantitativeway. The following are the examples of some basic technical indicators:3

Reinforcement Learning on Cryptocurrency TradingCOMP4971CTechnical IndicatorsTypesDescriptionMovingAverageTrendshow the relationship between two moving avConvergence Divererages on a price and know the momentum isgence(MACD)increasing or decreasingRelative Strength In- Momentum measures the speed of the pricing movementdex(RSI)and the trading strength, helps evaluatewhether is overbought or underboughtRSI 100 Bollinger Bands(B.B.)100up1 nndown(1)Volatilitydefine the upper and the lower rate boundariesof price in extreme stort-term, can take advantage during oversold conditionr Pn2i 1 (yj M A)(2)U pperBB M A Dnr Pn2i 1 (yj M A)LowerBB M A D(3)nutilize the flow of trading volume to predictstock price change, bullish divergence andbearish divergence can predict whether it willbreak resistance vol, if close closeprevOBV OBVprev 0, if close closeprev vol, if close ge(ATR)TrueVolatilitydiscover the degree of price volatilityn1XAT R T Rin i 1(5)However, only using technical analysis is not adequate for earning profits as it mightbe too late to reflect the trend. For example, MACD, B.B. are the lagging indicatorswhich a substantial portion of moving has been executed when those indicators are juststarting to reflect the trend. Also, human bias might be invlvoed when analysing. It ispossible to use different indicators to conclude different insights even with the same chart.Therefore, technical analysis is just a reference for the traders.4

Reinforcement Learning on Cryptocurrency Trading3.2COMP4971CRNNRNN stands for Recurrent Neural Network, which the nerual network will take the outputfrom the previous step as the input of the current step to make decisions. It is commonlyused in predictng sequential data and applied on Nautral Language Processing(NLP),stock prediction, and so on.In RNNs, there is an ”internal” state which will be updated when the sequence is processed. At first, sequence of input vector x will be fed into each RNN cell. Recurrenceformula will be then applied to every time step, which will use the input xt and the oldstate ht 1 as the parameters to evaulate the new state in the next RNN cell:ht fW (ht 1 , xt )(6)Assume that we used tanh as the activation function, therefore the value of internal stateht and output yt can be expressed as:ht tanh(Whh ht 1 Wxh xt )(7)yt Why ht(8)This procedure will continue until the all the time step is completed. Finally, backpropTagation will be performed from state ht to state ht 1 multiplying by Whh. As RNN canstore every information in each step time, it is commonly used on predicting time-seriesdata. However, with computing the gradient, it invloves many W and repeated tanhtcalculation in each time step. The gradient will vanish rapidly at last ( h ht 1 1) andparameter update is not significant at all. Therefore, RNN is diffcult to process datawith long sequences. Trading price prediction invloves long sequential data which is notdesirable to adopt it.ytAxt y0y1y2ytAAAAx0x1x2.xtFigure 1: Workflow of Recurrent Neural Network (RNN)3.3LSTMLSTM stands for Long Short Term Memory networks. It is a type of Recurrent NeuralNetwork (RNN) which solves the gradient vanishing problem and it is capable of learninglong-term sequential data. LSTM have a similar structure with RNN but LSTM have a5

Reinforcement Learning on Cryptocurrency TradingCOMP4971Cnew cell state and gates to control the data flow.Cell State c. It allows the information in the past several time steps passing throughthe network being remain unchanged.Forget Gate f . It uses to determine whether the information will be earsen in thecell state. Data flow is controlled by the sigmoid function.ft σ(Wf [ht 1 , xt ])(9)Input Gate i. It is with a sigmoid layer which decides whether to write to the cell. Tanhlayers will be then passed through to calculate cˆt which will be used to update the cellstate. Finally, Ct is calculated to replace the old cell state.it σ(Wi [ht 1 , xt ])(10)cˆt tanh(Wc [ht 1 , xt ])(11)ct ft ct 1 it cˆt(12)Output Gate o. It is to decide how much to reveal the cell. After passing the sigmoidleyer, it will be multiplied by the value of existing cell state with passing the tanh layerto calculate hidden state ht .ot σ(Wo [ht 1 , xt ])(13)ht ot tanh(ct )(14)For backpropagation, as the gradient will pass backward from ct to ct 1 which do notneed to do matrix multiplication with W , the gradient flow is uniterrupted so that it canprevent the gradient vanishing problem.With this advantage, most of the trading prediction applications are using with LSTM.However, LSTM is just for predicting the price but wihtout the invlovement of experiencereplay. Therefore, we are going to use reinforcement learning approach to evaulate theeffectiveness of each action taken in trading.6

Reinforcement Learning on Cryptocurrency TradingCOMP4971ChtCellct 1 ct Tanh σ σHidden Tanh σht 1InputhtxtFigure 2: Workflow of Long Short Term Memory(LSTM)4DataBefore the introduction of the reinforcement learning methodology, data preparation process is with the same importance which can be explained into different parts.4.1Finding Price History DatasetsIn this project, we are planning to consider the most popular cryptocurrencies in theportfolio, Bitcoin(BTC), which is in large trading volumes so that the price movementwill not controlled by someone easily. The daily price movements of BTC has beencollected in Yahoo Finance in CSV format and the data time range is from Aug 2015 toDec 2020. The data includes the daily open price, close price, low price, high price, adj.close and the volume. For simplicity, we will consider the close price and the volume inthis project.Figure 3: Data Sample of BTC price from Yahoo Finance, from 07/08/2015 to 17/12/20207

Reinforcement Learning on Cryptocurrency TradingCOMP4971CFigure 4: Price of Bitcoin, from 07/08/2015 to 17/12/20204.2Time Interval of DatasetIn the initial plan, as the cryptocurrency price is more volatile than stock market, it isaimed to get price movement with shorter time frame. However, with the considerationof the limited access of history data and extreme high training time, the price dataset isin daily interval which simplifies the training process.4.3Adding Technical IndicatorsPricing data is not adequate to learn the patterns of the movement by the neural networkof the agent, therefore we will calculate some relevant technical indicators based on thedata in Yahoo Finance and add the values of those indicators as the input of the neuralnetwork. The included indicators are as follows:Technical IndicatorsTypeMoving Average Convergence DiTrendvergence(MACD)Relative Strength Index(RSI)MomentumBollinger Bands(B.B.) LowVolatilityBollinger Bands(B.B.) HighVolatilityOn-balance ose indicators are from different types, some can analyze the trend of the movement,volatility of the pricing, and volume. Those data gives more clear image to the nerualnetwork to learn the relationship between the movement trend and the price.8

Reinforcement Learning on Cryptocurrency Trading4.4COMP4971CData NormalizationIt is needed to bear in mind that states of the agent, including the number of shares owned,coin prices and cash are in different ranges. With different ranges of value, the gradientsmay oscillate back and forth in the neural network which will affect the agent performance.This might cause the agent not performing the optimal actions and rewards might belowered. With data normalization, we will standardize the data by removing the meanand scaling into unit variance. In this project, sklearn.preprocessing.StandardScalerhas been used to perform the normalization. It can improve the gradient flow which makesthe network much easier to train, and increase the rewards by performing optimal actionsin each state.z 5x µσ(15)MethodologyIn the ”Related Work” session, we can see some interesting approaches working on theprice prediction of the cryptocurrency. Although the result could tell humans whether theprice will go up or down in the future, if we would like to implement a fully-automatedrobot which authorize the right to execute actions on ”buy”, ”sell” or ”hold”, machinelearning approaches might not be the best solution as it still needs to involve human’sinterruption. Therefore, this project is going to implement the automation with reinforcement learning.5.1Advantages of Reinforcement LearningReinforcement learning is a kind of simulation of human beings. When facing with unfamiliar enivronment, it will first try the possible actions randomly. After getting moreexperience, it will adjust the policy to correct the error made before and execute a moreoptimal action in the next state. The alogrithm will then keep improving with moretraining loops. After training, we can also know how the agent behaves in each state.Comparing with other prediction methods, RL can solve complex problems which traditional one would not be able to solve.Cryptocurrency trading is a good example to implement RL as it could teach humanswhen is the best time to ”buy” or ”sell” the coin. Also, as pricing pattern could bechanged in the future, RL can explore more possibilities which do not encounter in theprevious data by learning from the mistake.5.2Architecture of Reinforcement LearningReinforcement Learning (RL) is the training of a model which machine can make differentdecisions at certain states. The role of the model can be explained as an agent to overcome a game-like problem. Unlike the RNN/LSTM approach which predicts the futureresults, The ultimate goal of the RL is to take actions at a right time for maximizing therewards under a specified environment.9

Reinforcement Learning on Cryptocurrency TradingCOMP4971CThe reinforcement learning involves the following terms: Agent: responsible on performing actions Environment: The world setting for the agent to perform actions State: the current situation faced by the agent Action: set of actions which can be performed in the environment Rewards: a scalar feedback which reflects how well the action performs at a state s Neural Network: a process of learning on making decision used by the agentFor the workflow of the reinforcement learning, an environment will be first initialized forthe agent to do observation. After that, states relevant to the environment will act asthe input of the agent. After ”thinking” by the neural network in the agent, agent willtake an action according to its policy setting. The environment will be changed by theaction executed and a reward value will be sent to the agent to know whether it is a gooddecision so as to update its current policy.Agent, withneural networkAction AtReward RtState StEnvironmentFigure 5: Illustration of Reinforcement Learning Workflow5.35.3.1Model SettingsEnvironmentIn this project, in order to simplify the formation of investment portfolio, therefore inthe environment setting, we assume that a user usually trade with only one type ofcryptocurrencies at the same time, which is Bitcoin(BTC). However, commission fee willbe also considered in the simulated environment.5.3.2StateState can be summarized into 7 types:10

Reinforcement Learning on Cryptocurrency TradingCOMP4971C Number of shares of BTC owned Technical Indicator BB Low Open price of BTC Technical Indicator OBV Technical Indicator RSI Technical Indicator ATR Technical Indicator MACD Cash remaining for purchasing morecrpytocurrencies Technical Indicator BB HighFor example, when we have 3BTC, the current price of the three cryptocurrencyis 228.121, the value of those indicators are 39.667, -10.1075, 268.0265, 202.2588, 92971000, 9.9250 respectively, and the cash remaining is 0. We can combine thosevalue into a vector of state: [3, 228.121, 39.667, -10.1075, 268.0265, 202.2588,-92971000, 9.9250, 0]. This set of vector will be fed into the neural network for training and it will explain more in the ”Neural Network” part.5.3.3ActionSame with the real trading environment, we have set the actions into 3 types:ID012ActionDescriptionSellm m pi ni (1 r), i 1, 2, 3Holdm mBuym m pi ni (1 r), i 1, 2, 3 m: remaining cash r: rate of commission fee pi : the price of each cryptocurrency ni : the number of share of each cryptocurrency (minimum amount to execute thetrade: 1)It is understandable that users can partially buy or sell coins in the real world. Howeverin this simulation. for selling, we have simplified the situation that the agent will sellall the existing shares of that coin that we own at once. For buying, the agent will buythat coin as many as possible unless cash is not sufficient. Therefore, we can explain thisaggressive strategy as ”all-or-nothing”.However, the commission rate r by exchange platform will be considered in this simulated environment. In summary, in each state, a set of actions can be expressed as: [0],[1], [2]., which can form 3 possibilities to indicate actions perform on the coin.It is also noticeable that the agent may not have sufficient money or number of shareto perform ”buy”, ”sell” actions in some scenarios, Therefore, if the remaining cash is notgreater than the amount of buying the coins and the number of share is 0, no trading willbe performed.11

Reinforcement Learning on Cryptocurrency TradingCOMP4971Cbuy as much as possibleBuyno cash do nothingTrading DecisionHolddo nothingSellsell all coinscan’t sell do nothingFigure 6: Illustration of Trading Decision5.3.4RewardsThe reward is the change in portfolio value between the current state and the previousstate:0Reward(s, a, s ) (2Xn0i p0ii 05.3.52X m) (ni pi m)0(16)i 0AgentThe agent will follow the Epsilon-Greedy Algorithm. In this algorithm, we can decidewhich action to take in terms of exploration and exploitation.(arg maxa Actions Q(s, a), p 1 π(s) a actions, p (17)Exploration: agent will make an action which does not try before, with the probability . It can improve the current knowledge and increase the diversity of actions made by theagent in the long term.Exploitation: agent will evaluate by using current action-value estimates and choosethe most greedy action with getting the highest rewards. However, if it keeps using thisapproach to choose the action, that action might not be the most optimal one. Therefore,we have set the probability high initially to allow agents trying something new in thebeginning. With getting more ”experience”, will decrease by multiplying a decay rate rso that the agent will choose the action by considering more on action-value estimates.12

Reinforcement Learning on Cryptocurrency TradingCOMP4971CFor the calculation of current action-value estimates, we will use the Bellman OptimalityEquation:Q(s, a) γ arg max Q(s0 , a0 ) r(s, a)(18)which Q is the utility, γ is discount factor, r is reward.5.3.6Neural NetworkThe neural network is formed by a multi-layer perception, which consists of 8 inputs,corresponding to the 8 states on a single-cryptocurrency trading environment. A hiddenlayer with ReLU activation is created between the input layer and the output layer. Thenumber of hidden neurons has configured to 64. Finally, output layer is formed whichis corresponded to the three different actions defined previously. MSE will be used oncalculating the loss and Adam will be used for the optimizer.8 Inputs64 Hidden3 OutputsI1H1I2I3.H64.O1O3I8Figure 7: Illustration of the neural network architecture6ExperimentIn this section, we will first train the datasets with our reinforcement learning algorithm,then proceed to analyze and evaluate the performance with using training and testingdata. Results will be summarized quantitatively to determine the effectiveness of usingreinforcement learning on trading.13

Reinforcement Learning on Cryptocurrency Trading6.1COMP4971CExperiment EnvironmentAll the process, including data prepossessing and reinforcement learning are operated inGoogle Colab Pro environment, configured with a Nvidia T4/P100 GPU, 16GB DDR6/12GB HBM2 memory capacity. Also, TensorFlow v2 has been adopted in order to construct the neural network for the agent during training.6.2Training and Testing DataThe history price data has been split into training data and testing data, which occupies70 % and 30% of the total data respectively.Data TypeTrainingTesting6.3Percentage(%)7030Time Range2015-09-01 to 2019-04-202019-04-21 to 2020-12-17Hyper-parameter SettingThe following table are the values of the hyper-parameter for the training:Hyper-parameterMoneyEpochBatch sizeDiscount rate γExploration rate decay ratehidden layer size6.46.4.1Value Description100000 Initial capital100Number of passing through the entire trainingdataset32Number of samples propagating through thenetwork0.95Importance of considering rewards in the distant future1.0Probability of choosing actions in random( min 0.01)0.9027 Decrease the probability of choosing actions inrandom, converged when epoch 4564Number of hidden neuronsEvaluationTraining ResultAfter training the agent within 100 epochs, the value of rewards increased steadily withsome oscillations. However, the trend of the rewards is generally increasing. With 100,000 investment in the beginning, the final profolio value is with an average of 4577886.16 and a maximum of 6363199.08, which the average reward is 45.78 timesof the orginial values, which is an excellent result.Average Rewards Min Rewards4577886.163012720.1014Max Rewards6363199.08Average APY1208.9%

Reinforcement Learning on Cryptocurrency TradingCOMP4971CFigure 8: Trend of Training Data Rewards within 100 epochsLooking at the actions taken on trading in different epochs, it is found that the agenthas not ideas when to sell the BTC in the 10th epoch. However, with more epochs, it islearnt to sell BTC at a higher price to earn profits. However, it is needed to notice thatthe agent might be too quick to sell BTC in the later epochs when the price increased toabout 19,000 at around late 2017.All in all, the results are satisfactory in the training data which could reuse the model totest on the testing data.Figure 9: Actions on Bitcoin (Testing Data) in the 10th epoch15

Reinforcement Learning on Cryptocurrency TradingCOMP4971CFigure 10: Actions on Bitcoin (Testing Data) in the 55th epochFigure 11: Actions on Bitcoin (Testing Data) in the 80th epochFigure 12: Actions on Bitcoin (Testing Data) in the 100th epoch16

Reinforcement Learning on Cryptocurrency Trading6.4.2COMP4971CTesting ResultIn this testing part, we also tested the testing data with 100 epochs to test whether themodel trained previously could work on the new pricing environment. After the testing,the rewards are still excellent. With 100,000 investment in the beginning, the finalprofolio value is with an average of 405551.16 and a maximum of 475200.33, whichthe average reward is 4.06 times of the orginial values. Looking on the distribution oftesting data rewards, most of the epochs could reach the rewards above 400,000. Thisproves that the model trained before is workable in the new data with a relatively stableperformance.Average Rewards Min Rewards405551.16236409.73Max Rewards475200.33Average APY192.29%It is needed to emphasize that although the average APY in testing data is not ashigh as in training data, the BTC price movement in testing data is not as extreme as inthe training data which limit the reward amount in the testing data.Figure 13: Distibution of Testing Data Rewards within 100 epochsLooking on the actions taken in the testing data, the actions is more optimal than inthe training data. In the following figure, the agent is able to buy at almost the lowestprice and sell at almost the highest price in every short interval in order to maximizethe profits. This proves the model trained is both workable in a complete new pricingenvironment. This encourages the futher development of trading bot in real life.17

Reinforcement Learning on Cryptocurrency TradingCOMP4971CFigure 14: Actions on Bitcoin (Testing Data)6.4.3Outperforming in Coronavirus PeriodIf we try to focus the actions taken during December 2019 to June 2020, there is a drasticdecrease during March 2020, which is also known as Coronavirus Crash. At that period,the BTC price follows the stock market movement which resulted in a 39% decrease inBlack Thursay (12 March 2020). It reached the lowest price level within 7 years. Thismade lots of traders being fear and strongly sold the BTC. However, the price keep goingup until the present (Decemeber 2020) which caused them regreted selling BTC too quick.At the same time, the reinforcement learning model has outperformed at that time. Inthe beginning of the Coronavirus outbreak (mid Feb 2020), the model has already sold allthe BTC at that time which is the highest reached before the crash. After that, it keepsindicating ”sell” actions until late March 2020. When the model bought BTC again atlate March 2020, the price trend started to increase and reached to the history record of 23,000 USD in 17 December 2020.Figure 15: Actions on Bitcoin During Dec 2019 to Jun 202018

Reinforcement Learning on Cryptocurrency TradingCOMP4971CTherefore, it can be shown that the model has performed well even facing in the suddenfall movement. The results is encouraging which is out of the expectation. It also showsthe importance of including technical indicators in the training to understand the futuremovement of the price.7Future ExtensionIn the current progress, we achieved to make excellent net profits on cryptocurrencytrading, with the condition of trading multiple coins at the same time. However, effortsare still needed to adopt this approach into the real life.7.1Strategy SelectionsIt is observable that the maximizing the reward is the only aim of the agent performed.However, sometimes there might be other considerations and human preferences that couldchange the policy decision. For example, when it enters bull markets or bear markets,agents can decide when is the right time to all in or sell most to adjust the risk level.Aggressive or preservative modes could be added in the future to tailor-made differentpeople’s investment preferences.7.2Exchange API IntegrationThe current experiment result is tested by the history data in a simulation environment.To be more applicable in real life further, exchange API could be integrated with theactions defined in the reinforcement learning policies. On the other hand, commissionfee is also one of the potential issues as it might limit the trading frequencies in shortterm/day trading strategy and affect the rewards of the agent.7.3Sentiment AnalysisIn this project, technical indicators are the only consideration of determining the action.As sometimes sentiment in the social media will affect the price trend. Therefore, itis possible to implement sentiment analysis by searching posts or contents on Twitter,forum, and so on. After that, natural language processing is used to identify the level ofimplication on what trading actions should be taken.8ConclusionThe implementation of using reinforcement learning in this project is an exciting startwhich successfully earn huge profits and obey ”buy low, sell high” principle in both training data and testing data. It even prevents trendmenous loss when Bitcoin fell drasticallyduring March 2020. It can be conclued that the project is a huge success in the initialstage.This project lays out the foundation of applying reinforcement learning on automated19

Reinforcement Learning on Cryptocurrency TradingCOMP4971Ccryptocurrency trading. Once price data with a more shorter time interval and the integration with exchnage API is available, it is possible to apply on the real market worldand reduce the time spent on investment and generate passive income with ease.20

Reinforcement Learning on Cryptocurrency Trading COMP4971C Tanh Tanh c t 1 Cell h t 1 Hidden Input x t c t h t h t Figure 2: Work ow of Long Short Term Memory(LSTM) 4 Data Before the introduction of the reinforcement learning methodology, data preparation pro-cess is with the same importance which can be explained into di erent parts.

Related Documents:

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

102 Views

1y ago

1 Introduction to reinforcement learning - GitHub Pages

IEOR 8100: Reinforcement learning Lecture 1: Introduction By Shipra Agrawal 1 Introduction to reinforcement learning What is reinforcement learning? Reinforcement learning is characterized by an agent continuously interacting and learning from a stochastic environment. Imagine a robot movin

26 Views

2y ago

Lecture Slides 'Reinforcement Learning - uni-paderborn.de

applying reinforcement learning methods to the simulated experiences just as if they had really happened. Typically, as in Dyna-Q, the same reinforcement learning method is used both for learning from real experience and for planning from simulated experience. The reinforcement learning method is thus the ÒÞnal common pathÓ for both learning

14 Views

1y ago

Multi-Objective Reinforcement Learning using Sets of Pareto Dominating ...

In this section, we present related work and background concepts such as reinforcement learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement learning (Sutton and Barto, 1998) environment is typically formalized by means of a Markov decision process (MDP). An MDP can be described as follows. Let S fs 1 .

12 Views

1y ago

Multi-Agent Patrolling with Reinforcement Learning1

learning techniques, such as reinforcement learning, in an attempt to build a more general solution. In the next section, we review the theory of reinforcement learning, and the current efforts on its use in other cooperative multi-agent domains. 3. Reinforcement Learning Reinforcement learning is often characterized as the

11 Views

1y ago

MetaLight: Value-based Meta-reinforcement Learning for Traffic Signal ...

Meta-reinforcement learning. Meta reinforcement learn-ing aims to solve a new reinforcement learning task by lever-aging the experience learned from a set of similar tasks. Currently, meta-reinforcement learning can be categorized into two different groups. The ﬁrst group approaches (Duan et al. 2016; Wang et al. 2016; Mishra et al. 2018) use an

15 Views

1y ago

Reinforcement Learning for Optimal Control of Queueing Systems

Reinforcement learning methods provide a framework that enables the design of learning policies for general networks. There have been two main lines of work on reinforcement learning methods: model-free reinforcement learning (e.g. Q-learning [4], policy gradient [5]) and model-based reinforce-ment learning (e.g., UCRL [6], PSRL [7]). In this .

23 Views

1y ago

Detailing Aspects of the Reinforcement in Reinforced Concrete Structures

Using a retaining wall as a case-study, the performance of two commonly used alternative reinforcement layouts (of which one is wrong) are studied and compared. Reinforcement Layout 1 had the main reinforcement (from the wall) bent towards the heel in the base slab. For Reinforcement Layout 2, the reinforcement was bent towards the toe.

8 Views

1y ago

Isolated Footing Design (IS 456-2000) - Bentley

Footing No. Footing Reinforcement Pedestal Reinforcement - Bottom Reinforcement(M z) x Top Reinforcement(M z x Main Steel Trans Steel 2 Ø8 @ 140 mm c/c Ø8 @ 140 mm c/c N/A N/A N/A N/A Footing No. Group ID Foundation Geometry - - Length Width Thickness 7 3 1.150m 1.150m 0.230m Footing No. Footing Reinforcement Pedestal Reinforcement

8 Views

1y ago

Reinforcement Learning for Humanoid Robotics

Abstract. Reinforcement learning o ers one of the most general frame-work to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to highdimensional movement systems like humanoid robots remains an unsolved problem. In this pa-per, we discuss di erent approaches of reinforcement learning in .

16 Views

1y ago

Grounded action transformation for sim-to-real reinforcement learning

eectiveness for applying reinforcement learning to learn robot control policies entirely in simulation. Keywords Reinforcement learning · Robotics · Sim-to-real · Bipedal locomotion . Reinforcement learning (RL) provides a promising alternative to hand-coding skills. Recent applications of RL to high dimensional control tasks have seen .

18 Views

1y ago

Actorq: Quantization for Actor-learner Distributed Reinforcement Learning

of quantization on various aspects of reinforcement learning (e.g: training, deployment, etc) remains unexplored. Applying quantization to reinforcement learning is nontrivial and different from traditional neural network. In the context of policy inference, it may seem that, due to the sequential decision making nature of reinforcement learning,

12 Views

1y ago

Exponential Moving Average Based Multiagent Reinforcement Learning ...

Keywords Multi-agent learning systems Reinforcement learning. 1 Introduction Reinforcement learning (RL) is a learning technique that maps situations to actions so that an agent learns from the experience of interacting with its environment (Sutton and Barto, 1998; Kaelbling et al., 1996). Reinforcement learning has attracted attention and been .

9 Views

1y ago

Designing Self-organizing Systems With Deep Multi-agent Reinforcement ...

In contrast to the centralized single agent reinforcement learning, during the multi-agent reinforcement learning, each agent can be trained using its own independent neural network. Such approach solves the problem of curse of dimensionality of action space when applying single agent reinforcement learning to multi-agent settings.

27 Views

1y ago

Applying reinforcement learning to economic problems

Reinforcement learning methods can also be easy to parallelize and generally provide greater ﬂexibility to trade-off computation time and accuracy. 3.1 Q-learning Q-learning (Watkins and Dayan 1992) is the canonical 'model free' reinforcement learning method. Q-learning works on the 'state-action' value function Q : S 5

7 Views

1y ago

Reinforcement Learning - University of Colorado Boulder

Machine Learning: Jordan Boyd-Graber j Boulder Reinforcement Learning j 4 of 32. Control Learning One Example: TD-Gammon [Tesauro, 1995] Learn to play Backgammon Immediate reward 100 if win . where s0is the state resulting from applying action a in state s Machine Learning: Jordan Boyd-Graber j Boulder Reinforcement Learning j 14 of 32. Q .

8 Views

1y ago

Introduction to Reinforcement Learning - wnzhang

Introduction to Reinforcement Learning Model-based Reinforcement Learning Markov Decision Process Planning by Dynamic Programming Model-free Reinforcement Learning On-policy SARSA Off-policy Q-learning

17 Views

2y ago

Driving Maneuver Learning via Reinforcement Learning for Automated ...

There have been some efforts in applying reinforcement learning to automated vehicles (6) (7) (8), however, in some of the applications the state space or action space are arbitrarily discretized to fit into the RL algorithms (e.g. Q-learning) without considering the specific characteristics of the studied cases.

6 Views

1y ago

Learning Tetris With Reinforcement Learning

In recent years, scientists have started applying reinforcement learning in Tetris as it displays e ective results in adapting to video game environments, exploit mechanisms and deliver extreme performances. Current thesis aims to introduce Memory Based Learning, a reinforcement learning algo-

9 Views

1y ago

Catalog of Technical Publications 2000 -- 2007

American Gear Manufacturers Association 500 Montgomery Street, Suite 350 Alexandria, VA 22314--1560 Phone: (703) 684--0211 FAX: (703) 684--0242 E--Mail: tech@agma.org website: www.agma.org Leading the Gear Industry Since 1916. February 2007 Publications Catalogiii How to Purchase Documents Unless otherwise indicated, all current AGMA Standards, Information Sheets and papers presented at Fall .

81 Views

3y ago

Recent Views

Cyber Security Guide for NZ Law Firms - WordPress

2 Incident Response Solutions Cyber Security Guide for NZ Law Firms Welcome to the Cyber Security Guide for NZ Law Firms The storage of sensitive client information and management of large funds make law firms an attractive target for cybercriminals. It is therefore critical for law firms to understand and mitigate the cyber risks they face.

1y ago

135 Views

New Prudential Regime for Investment Firms - Allen Overy

(iii) Investment firms - often referred to as 'Class 2 firms' - these are non-systemic investment firms that do not carry out dealing on own account or underwriting activities. This category of firms are subject to the full scope of the prudential regime is set out in the IFR and IFD. (iv) Small and non-interconnected investment firms -

1y ago

98 Views

The new EU prudential regime for investment firms

In any event, many bank and non-bank financial groups operating through investment firms in the UK have created new EU27 investment firms (or are scaling up existing EU27 investment firms) to serve EU27 clients as part of their Brexit planning. These firms will be subject to the new EU prudential regime. New Classification of Investment Firms

4m ago

48 Views

Actionable Intelligence: Successful Bi for Law Firms

Source: Gartner, Business Intelligence Imperative, 2001 ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 3. A decade later, the fact gap remains a core issue. Law firms have more data than ever about . 1990 Mid-2000s 2015 A CONDENSED HISTORY OF BUSINESS INTELLIGENCE ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 5.

1y ago

129 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Investment banks hedge funds private equity

investment banks, hedge funds, and private equity firms can use the book to broaden their understanding of their industry and competitors. Finally, professionals at law firms, accounting firms, and other firms that advise investment banks, hedge funds, and private equity firms should

2y ago

372 Views

2021 Report on the State of the Legal Market

1 Thomson Reuters Peer Monitor data are based on reported results from 162 U.S.-based law firms, including 45 Am Law 100 firms, 56 Am Law Second 100 Firms, and 61 additional Midsize firms. 2 Malcolm Gladwell, The Tipping Point

2y ago

136 Views

Cyber Security for Law Firms

Cyber Security and Legal Practice (Australia) Cyber security threats are increasing. 2019 Cyber Security Report - American Bar Association (ABA)(United States) Over a quarter of firms report that they have experienced some sort of security breach Less than a third of law firms have an incident response plan. 2019 PwC Law Firms' Survey

1y ago

130 Views

MARTINDALE-HUBBELL TOP RANKED LAW FIRMS METHODOLOGY TOP - Fee, Smith

view the entire list online at: fortune.com & law.com martindale-hubbell top ranked law firms methodology ranked firms law top page proof—for approval only presents leal leaders coming in 2015 featured in women leaders law in the 2015 for more information call: 855-808-4520 or e-mail legalleaders@alm.com page proof—for approval .

1y ago

92 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Growth Processes of High- Growth Firms in the UK - Nesta

Interest in high-growth ﬁrms (HGFs) has exploded in recent years, once the job-creating prowess of a minority of fast-growing ﬁrms became recognized - roughly 4% of ﬁrms can be expected to generate 50% of jobs (Storey, 1994, p. 117). Research into high-growth ﬁrms has itself undergone high-growth. However, the level of analysis has of-

1y ago

120 Views

Socio-economic profile Coastal and marine ecosystem and economy

According to the Philippine Plastics Industry Association, Inc. (PPIA), there are 1,088 firms throughout the Philippines. The majority of the plastics companies are situated in the National Capital Region (NCR) with 642 firms. This is followed by CALABARZON area with 176 firms. While Central Luzon registered 87 firms. Central Visayas have 87 firms.

1y ago

120 Views

Applying Reinforcement Learning On Automated Cryptocurrency Trading

It looks like you're using an ad-blocker