Adaptive Traffic Control System Using Reinforcement

2y ago
31 Views
2 Downloads
379.55 KB
5 Pages
Last View : 25d ago
Last Download : 3m ago
Upload by : Jerry Bolanos
Transcription

Published by :http://www.ijert.orgInternational Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 9 Issue 02, February-2020Adaptive Traffic Control System usingReinforcement LearningKranti ShingateKomal JagdaleYohann DiasDepartment of Computer EngineeringFCRIT, Vashi, Navi Mumbai,IndiaDepartment of Computer EngineeringFCRIT, Vashi, Navi Mumbai,IndiaDepartment of ComputerEngineering FCRIT, Vashi, NaviMumbai, IndiaAbstract— The advent of the automobile revolution has led tovarious traffic congestion problems. People can't arrive at theirdestination on time because of gigantic traffic. The frameworkutilized for coordinating traffic isn't reliant on the ongoingsituation of an intersection. Traffic Light Control System withpre-set clocks are broadly used to invigilate and control thetraffic generated at the intersections of numerous streets.However, the synchronization of multiple traffic light systems atadjacent intersections is a complicated problem given thevarious parameters involved. To handle such traffic eitherexpansion of road networks or adaptive traffic control systemwhich handles such traffic intelligently. This paper presents asystem which handles traffic using Artificial Intelligencetechnique for adapting signal according to the density of trafficthereby automatically increasing or decreasing traffic signaltime using Experience Replay mechanism. In this system, theReinforcement Learning algorithm was used to determineoptimal traffic light configuration and using deep NeuralNetworks the obtained results were used to extract the featuresrequired to make a decision.Keywords— Reinforcement Learning (RL), Traffic LightControl System (TLCS), Experience Replay mechanism,Artificial Intelligence, Deep Neural NetworksI.INTRODUCTIONTraffic congestion is ceaselessly developing everywherethroughout the world and it has become a hindrance forcommuters. As an outcome of increasing population andurbanization, the transportation request is consistently risingin the cities around the world. The broad routine trafficvolumes carry pressures to existing urban traffic foundation,bringing about ordinary traffic clogs. One more issue inautomobile overload is a deferral of red light. Traffic blockagecan likewise be advanced by huge red light delay. This delayissue is incited because lights in the rush hour gridlock controlare systematized and it isn't subject to actual traffic.The existing system widely used are traffic signals withpre-set timers which operate under fixed time operation anddisplay green light to each approach for the same time everycycle regardless of the traffic conditions. This may be bestsuited for heavily congested areas but for low traffic density,the sequence is not as beneficial as no vehicles are waiting.With advancements in technology, the Adaptive Traffic SignalControl System has been developed in Bhubaneswar city. Thesystem gets input from sensors embedded in the road andsynchronizes the group of traffic signals accordingly. Thissignalling system is run on solar power.IJERTV9IS020159The system is infeasible and costly since it requires asystem embedded in roads. LQF (longest queue first)scheduling algorithm minimizes the queue sizes at eachapproach to the intersection. The goal is to lower vehicle delayas compared to a current state signal control method. A focusis given by giving preference vehicles (such as emergencyvehicles or large trucks). As the system concentrates onreducing the queue length, the stability of the system is amajor concern. Therefore way out for this issue is a TrafficControl System using reinforcement learning (RL) – an AIstructure that endeavours to estimate an ideal basic leadershippolicy.The framework gives a solution for diminishing traffic inmetropolitan urban areas by contemplating constant trafficsituations and the reinforcement learning algorithm to improveafter some time. Since the traditional traffic controlframework utilizes basic convention that alternate green andred light for a fixed interval. Such traffic control frameworkswork admirably when there is a limited amount of trafficnetwork. To build up a traffic control framework that handlesand directs the traffic shrewdly using reinforcement learningalgorithm and to accomplish smooth transportation of vehiclesand to curb natural issues like raised air pollution, wastage offuel and danger of mishap.In contrast, the adaptive traffic control system offers aresponse to reduce traffic in metropolitan urban networks byconsidering continuous traffic circumstances and reinforcedlearning computation to improve over time. The system willwithout a doubt examine a 4-way intersection for theincoming traffic density to take an optimized step towardsreducing it. The estimation used learns over a while so thefundamental periods of system most likely won't give perfectresults for the recognized traffic.As a result, a reinforcement learning algorithm thatnaturally extracts all highlights (machine-created highlights)helpful for versatile traffic signal control from raw real-timetraffic data furthermore and learns the ideal traffic signalcontrol arrangement is needed.www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)443

Published by :http://www.ijert.orgII.International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 9 Issue 02, February-2020The reinforcement learning-based framework uses the presenttraffic situation for creating an improved traffic lightconfiguration. The recognition of vehicles in a partiallyobservable condition utilizes DSRC (Dedicated short-rangecommunication) [1]. If an intersection consists of an enormousdistinction of detected and undetected vehicles then the agentmakes a one-sided move in support of detected vehicles.Hence, this system gives better outcomes for vehiclesempowered with remote correspondence than for ones thatstay undetected. Another significant component that improvesalgorithm stability is the experience replay and target network[2] used during the training phase of the agent (traffic signal).The experience replay mechanism contains the informationneeded for learning in the form of a randomized group ofsamples is called ‘batch’. This information is submitted to theagent, but instead of immediately submitting the informationthat the agent gathers during the simulation, the batches arestored in a data structure called memory. In this memory,every sample is stored which is collected during the training.Also this framework can give precise outcomes as it utilizesmachine crafted features instead of human-crafted features(e.g. vehicle queue length, position and speed of vehicles) foranalysing real-time traffic and developing an optimal policyfor adaptive traffic signal control.To make the system increasingly receptive to the actualtraffic it is necessary to emphasize the feasibility and value ofapplying a model-less temporal difference reinforcementlearning algorithm [3] for traffic light control. The maindrawback of such a system is environment involves four-wayintersections but allows traffic stream in either horizontal orvertical not both [3].The multi-agent system for network traffic signal controlintroduces the use of a multi-agent system and reinforcementlearning algorithm to obtain an efficient traffic signal control[4].In this, two types of agents are used i.e. central agent andan outbound agent. The outbound agents schedule trafficsignals using the Linear Queue First (LQF) algorithm and thecentral agent learns a value function (Q-learning) driven by itslocal and neighbour’s traffic conditions. At low arrival rates,the LQF scheduling algorithm performs slightly better thanthe multi-agent Q Learning system.Adaptive traffic signal control, which adjusts traffic signaltiming according to real-time traffic, is an effective method toreduce traffic congestion. Another set of multi-agent modelbased Reinforcement Learning systems was formulated underthe Markov Decision process model for traffic light control[5]. The system does not rely on heuristics equations butlearns the optimal control by improving its experience byinteracting with the environment. Such systems can beimproved by adding public transport which should givepriority for public transport since they carry more passengers[5].IJERTV9IS020159TABLE I.RELATED WORKCOMPARATIVE STUDY OF TRAFFIC CONTROL SYSTEM USING s1.Intelligent TrafficSignal Control:UsingReinforcementLearning withPartial DetectionRL algorithm forpartially observable ITSbased on DSRC.(Dedicated Short RangeCommunications).Betterperformancefor thedetectedvehicles thanundetectedvehicles.2.Adaptive TrafficSignal Control:Deepreinforcementlearning algorithmwith experiencereplay and targetnetworkReinforcement learningalgorithm usingexperience replay andtarget networkGives accurateresults formachine craftedfeatures only3.ReinforcementLearning forIntelligent TrafficLight ControlTo determine thefeasibility and value ofapplying a model-lesstemporal differenceReinforcement learningalgorithm to traffic lightcontrol.Environmentinvolves fourwayintersectionsbut allowstraffic to flowin eitherhorizontal orvertical stem fornetwork trafficsignal controlThe outbound agentsschedule traffic signals(LQF algorithm) andcentral agent learns avalue function (Qlearning) driven by itslocal and neighbourstraffic conditions.At low arrivalrates, LQFschedulingalgorithmperformsslightly betterthan the multiagent QLearningsystem.Require lanesystem.5.Multi-AgentReinforcementLearning forIntelligent TrafficLight ControlA set of multi-agentmodel-basedReinforcement Learningsystem for traffic lightcontrolShould givepriority topublictransport, sincethey carrymorepassengerswww.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)444

Published by :http://www.ijert.orgIII.International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 9 Issue 02, February-2020PROBLEM FORMULTION AND DESIGNIn modern cities, we have lots of traffic on the road and mostof the time traffic management systems will not be able tohandle such traffic congestion problems. Traffic problemsmay occur due to emergencies, construction on road sites, ortourist vacations etc. Traditional traffic light control system ortraffic signal with the pre-set timers have fixed cycles ofchanging phase or alternatives signals which is not suitable forreal world traffic congestions and results in inefficient trafficflow.Current traffic controllers are either pre timed controlsystem or actuated control systems. Pre timed control systemshave pre-set of timings on green signal light. A longer greenlight duration during peak hours and a shorter duration duringafternoons and late nights. Another is actuated control systemswhich is responsive to dynamic traffic but does not reallyserve well in long term traffic scenarios. To avoid orovercome this issues, adaptive or intelligent traffic lightcontrol system have been designed to cope up with real timetraffic congestion problems.In the design of this system which is based on thereinforcement learning algorithm, it is important tocharacterize the environment, states, actions and rewards andlearning mechanisms involved. In the simulation, theenvironment is represented by 4-way intersection whichcontain 4 incoming lanes and 4 outgoing lanes (Fig. 2.).Eachincoming lane defines the possible direction that vehicles canfollows: left most lane is used by left turn only, right mostlane is used for right turn and for going straight and twomiddle lanes are dedicated to only going straight.In the environment, there are 8 traffic lights traffic lightswhich are indicated by a colour on the stop line of everyincoming lane that represents the status of traffic light for thatparticular lane. For example, whenever cars are coming fromsouth direction and if that vehicle want to go straight or turnright then as Fig. 2. shows green for that particular lane andred for remaining lanes.So, chances of the modification in traffic flow through anintersection which is managed by traffic light controller. Theanalysis will be conducted with simulation where an agent isused to make a choice of which traffic light should beactivated in order to reduce the traffic congestion problem andoptimizing the traffic efficiency. To choose the best action inevery situation, some learning mechanisms are used bylearning agent and those learning techniques are related toreinforcement and deep learning algorithms.Each reinforcement learning framework has two primary parts- Agent and Environment (Fig. 1.). The agent is the trafficlight framework that is liable for taking actions. Theenvironment is the present state of distribution of vehicles inan intersection.Fig. 2. Four-way intersection without vehiclesRules for traffic lights: The colour phase transition for every traffic light is alwaysred-green-yellow-red. Duration for traffic light is fixed.10 seconds for greentraffic light and 4 seconds for yellow traffic light and durationfor red traffic light is defined as the amount of time since thelast phase change. At least one traffic light is yellow or green phase Every traffic light is not in the red phase simultaneously.Fig. 1. Workflow of the Traffic Control SystemIn the proposed framework, the Sumo (Simulation of UrbanMobility) test system produces an arbitrary number ofvehicles picking irregular source and distribution, which inturn gives a contribution as a state to the agent. This state isutilized by the Q-learning algorithm and the action with thehighest Q-value is picked. Deep neural systems are utilized toget the approximated values to improve results and trafficlight signal performs actions that influence the environment(vehicle distribution).IJERTV9IS020159In each lane of the crossing point, incoming lanes arediscretized (Fig. 3.) in the cells that can recognize theappearance or non-appearance of vehicle inside them. In eacharm there are 20 cells.10 of them are set along the left mostpath whereas 10 are set within others three lanes. So, in entiresystem there are total 80 cells.www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)445

Published by :http://www.ijert.orgInternational Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 9 Issue 02, February-2020 Agent can adopt different situations such as accident orweather conditions Agent learns using the system performance i.e. rewards sothere is no need to describe every variable of the environmentFig. 3. Design of state representationIV.LEARNING METHODOLOGYA. Reinforcement learning AlgorithmIn real life, we play out various undertakings to seek afterour dreams. After performing tasks, we get a few prizes whichis either positive or negative. Along with this rewards, wecontinue investigating various ways and attempt to makesense of which activity may prompt better rewards. Forreasons unknown, the entire thought of reinforcement learningis truly observational in nature. Reinforcement learning is abranch of artificial intelligence which lets machine learns onyour own in a way different from traditional machine learning.Reinforcement learning is nothing but taking suitable action tomaximize reward in particular situation.In reinforcement learning, input should be an initial statefrom which the model start and there are many possibleoutputs as there are a variety of solutions to a particularproblem. The training is based upon the input, the model willrestore a state and the user will decide to reward or rebuff themodel based on its output.Rewards in Reinforcement is either positive or negativebased on the decision taken by agent. Positive reward ischaracterized as when an event occurs due to a specificbehaviour, builds the quality and the frequency of thebehaviour. In other words it has a positive effect on thebehaviour. After getting positive reward agent maximizes theperformances and sustain change for a long period of time.Negative Reinforcement is defined as the strengthening of abehaviour because a negative condition is stopped or avoided.At the point when agent gets negative rewards it increasesbehaviour and provide defiance to minimum standard ofperformance.Adaptive traffic signal control system is a system that can beimplemented using RL techniques. Reinforcement learningoffers numerous feasible solutions to address the traffic flowproblem. To solve these problems it emerges differentalgorithm and neural network. In this, one or moreindependent agents have the objective of increasing theproficiency of traffic flow that drives through one or morecrossing point controlled by traffic light controller. Todescribe the context of traffic light signal controller RLcomponents that is state, action, reward are widely used.Several reasons to use RL for traffic light control system: Agent can make decision without supervision priorknowledge of the environmentIJERTV9IS020159B. Q-learning algorithmThe agent’s learning mechanism is Deep Q-learning. The qlearning function learns from activities that are outside thepresent policy, such as taking random activities, and hence apolicy isn't required. Q-learning seeks to learn a policy thatmaximizes the total reward. The 'q' in q-learning representsquality. Quality for this situation speaks to how helpful agiven activity is in increasing some future reward. It is a blendof Deep Neural Networks and Q-learning. Q-learning is abasic yet very powerful algorithm for our agent since thisenables the agent to make sense of precisely which activity toperform. In deep Q-learning, we utilize a neural system toapproximate the Q-value function. The state is given as theinput and the Q-value of every single imaginable activity iscreated as the output. It is a model free reinforcement methodwhich includes assigning a Q-value to an activity performedby the agent.Steps involved in reinforcement learning using deep Qlearning networks:a. All the past experience is stored by the user in memoryb. The next action is determined by the maximum output ofthe Q-networkc. The loss function here is mean squared error of thepredicted Q-value and thetarget Q-value – Q*. This isbasically a regression problem. However, we do not know thetarget or actual value here as we are dealing with areinforcement learning problem. Going back to the Q-valueupdate equation derived from the Bellman equation.Q-value is defined asQ(st, at) Q(st, at) α(rt 1 γ · maxAQ(st 1,at) Q(st, at))(1)Where,Q(st, at) is value of action at performed in state stQ(st 1,at) is the Q-value of immediate next steprt 1 is reward agent gets after performing action atγ is the discount factor determines the significance offuture rewards. Discount factor ranges between 0 and 1.Discount factor 0 makes the agent opportunistic by onlythinking about current rewards whereas factor 1 make it strivefor long term rewards.α is the learning rate in which factor 0 will make the agentnot learning anything whereas a factor 1 defines that agentconsider only the most recent information.The Q-value for state-action is upgraded by an error, balancedby the learning rate(α).The learning rate decides to whatdegree recently obtained data overrides old information. Qvalue speaks to the conceivable rewards gotten within anothertime stamp for performing an action in state S, also thewww.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)446

Published by :http://www.ijert.orgInternational Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 9 Issue 02, February-2020discounted future rewards obtained from the next state-actionperception.C. Reinforcement learning ModelsWe consider an adaptive traffic light control system, whichtakes reward and state perception from

the multi-agent Q Learning system. Adaptive traffic signal control, which adjusts traffic signal timing according to real-time traffic, is an effective method to reduce traffic congestion. Another set of multi-agent model-based Reinforcement Learning systems was formulated under the Mar

Related Documents:

the destination. The traffic light system designed by Salim Bin Islam provided a design and development of a microcontroller based intelligent traffic control system. He proposed a new intelligent traffic control system that is to control the traffic system through traffic signal on the basis of current traffic density.

Adaptive Control, Self Tuning Regulator, System Identification, Neural Network, Neuro Control 1. Introduction The purpose of adaptive controllers is to adapt control law parameters of control law to the changes of the controlled system. Many types of adaptive controllers are known. In [1] the adaptive self-tuning LQ controller is described.

Sybase Adaptive Server Enterprise 11.9.x-12.5. DOCUMENT ID: 39995-01-1250-01 LAST REVISED: May 2002 . Adaptive Server Enterprise, Adaptive Server Enterprise Monitor, Adaptive Server Enterprise Replication, Adaptive Server Everywhere, Adaptive Se

Traffic light controller, Real-time traffic signaling, congestion, ZigBee communication board, Google Traffic API, Agent-based traffic modeling. ABSTRACT: Controlling of traffic signals optimally helps in avoiding traffic jams as vehicle volume density changes on temporally short and spatially small scales.

Adaptive Control - Landau, Lozano, M'Saad, Karimi Adaptive Control versus Conventional Feedback Control Conventional Feedback Control System Adaptive Control System Obj.: Monitoring of the "controlled" variables according to a certain IP for the case of known parameters Obj.: Monitoring of the performance (IP) of the control

SCATS (Sydney Co-ordinate adaptive traffic system) form some of the best pre-determined off-line timing methods to account for traffic congestion. The Adaptive Signal-Vehicle Co-operative control system [3] provides an optimal traffic signal schedule as well as an optimal vehicle speed advice. The traffic signal scheduling is

traffic flow information to enhance adaptive traffic signal control in urban areas where pedestrian traffic is substantial and must be given appropriate attention and priority. Our recent work with Surtrac [12], a real-time adaptive signal control system for urban grid networks, has resulted in an extended intersection scheduling

Our AAT Advanced Diploma in Accounting course is the intermediate level of AAT’s accounting qualifications. You’ll master more complex accountancy skills, including advanced bookkeeping, preparing final accounts, and management costing techniques. You’ll also cover VAT issues in business, and the importance of professional ethics - all without giving up your job, family time or social .