Now This Is Podracing - Driving With Neural Networks

3y ago
58 Views
2 Downloads
2.82 MB
7 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Luis Wallis
Transcription

Now This is Podracing - Driving with Neural NetworksAlexander DewingStanford UniversityXiaonan TongStanford Using the game as our simulation environment, wedevelop a convolutional neural network to control theplayer’s vehicle in real-time by generating a singleturning tendency per frame, discussed in Section 4.We train weights for this CNN through three studies:offline, online and reinforcement learning, which wewill also discuss in Section 4. Section 5 and 6 willoffer our conclusions on the topic, as well as insightsinto future research directions.AbstractGames are a way to easily develop and testautonomous driving systems on virtual roads withchanging conditions. In this paper, we present aconvolutional neural network, as well as areinforcement learning heuristic, that takes as inputthe raw video feed of the game and outputs turningdecisions, while valuing speed first and safetysecond. Tasked with operating only one dimension ofcontrol (turning), the trained neural net can completea test course in as fast as 33 seconds, slightly slowerthan the human performance of 28-30s.2. Related WorkSimple Neural Networks have been used to greateffect in the field of Autonomous Driving. Theflexibility of the technique has seen it used inmultiple ways. A surprisingly simple three layerfeed-forward network, for instance, can be used toreliably detect pedestrians from a depth-map obtainedvia stereo camera matching; Zhao’s 5-hidden-neuronthree-layer-net achieved around 85% accuracy and3% misidentification rate [3]. From CMU, theALVINN [6] (autonomous land vehicle in a neuralnetwork) uses a separate three-layer Feed-ForwardNeural Network structured at (1217 input, 29 hidden,45 output) neurons to achieve very stable navigationat 1m/s. ALVINN’s approach to driving as aclassification approach was very similar to our own,as the 45 output units directly influenced the steeringof the vehicle, but the speed was fixed to 1m/s due toprocessing speed and safety concerns. A purelysimulated environment allowing for failure andaccidents will have no such drawbacks.1. IntroductionA growing success of Artificial Neural Networks inthe research field of Autonomous Driving, such asthe ALVINN (Autonomous Land Vehicle in a NeuralNetwork) and recent commercial solutions fromMobileEye, Google, and others prove that theflexibility of neural networks can outperform theirtraditional computer vision counterparts, especially inhigh noise and uncertain situations [7]. In addition,the capability for neural nets to learn on top ofexisting classification nets offer exciting abilities forresearch to stack and aggregate various networkschemes to improve a Neural Network’s decisionmaking accuracy [12].In addition to being a complex decision-makingparadigm, Autonomous Driving is complicated by thepracticalities in gathering real-world data. It takesmany hours of driving to train any initial model, andsubstantially longer to test and reinforce-learn themodel to reach acceptable accuracy. Moreover,testing autonomous vehicles forces researchers totake risks and are often a source of logistical red tape.We turn to a common source of high fidelitysimulation, gaming, and train our Neural Network torace against other vehicles in the Star Wars Episode IPodracer racing game.1

Figure 1. The 14-layer Neural Network derived from VGG 16 [1]. Note the lack of three consecutive 512-deep convolution layers. The output ofthe neural net denotes the net’s decision to turn left (0.0), right (1.0) or stay straight ahead (0.5). The decision making mechanism alters the outputby a random number of standard deviation 0.15, and issues left if x 0.36, and right if x 0.63, otherwise straight.One perceived drawback to learning in syntheticsituations is the incapability to apply the model toreal world. However, significant amount of researchhas paved the way for transitioning from simulatedtraining to real-world results. Deep visual-motorrepresentations can be translated from synthetic toreality through methods such as Generalized DomainAlignment to minimize the difference [13]. Thefluidity of neural networks permits a portion of theneural network to be transplanted through TransferLearning [12], and used as a feature extractor to pretrain and build more complex networks. Inapplication, this will help a realistic self-driving carstart with better awareness of the world, improve itslearning speed, and obtain a better outcome by theend of its training. The graphics capability of moderngames is outstripping that of typical simulators, andfrequently the two are joined together seamlessly. Forinstance, the popular trucking game EuroTruckSimulator 2 was recently used as a testing bed toexamine the effects of graded auditory warnings onhuman truck drivers while playing the game [2].These effects all suggest the maturity of moderngraphics and rendering.functionality is enabled. As the emulator has over adecade of code history, capturing the framebufferwas nontrivial: the graphics emulation originallytargeted 3dfx Voodoo graphics cards from the late1990s, with wrappers on top of wrappers ultimatelysupporting OpenGL in a scheme that was unfamiliarto us, with recent OpenGL experience.3.2. Neural Network InfrastructureIn an effort to implement a turnkey neural networkand begin training quickly, we chose to base ourarchitecture off of a VGG network and used apretrained network acquired from the model zoo. Thenetwork was the VGG 16 model from BMVC-2014[11]. We shrunk the final output layers to correspondto be a single neuron. Unfortunately, our trainingmachine, containing a Nvidia GTX 780, onlycontained 3GB of VRAM, and was unable to reachdesired performance with memory-constrained batchsizes. As a result, we ultimately shrunk the networkto limit the parameter count, allowing larger batchsizes and faster training rates. The resulting neuralnet is presented in Figure 1.In contrast to the VGG 16 network in BMVC2014 [11], a standard Batch Normalization layer isinserted before the first convolution to account forthe lack of mean image that VGG 16 provides. Thischange fundamentally changes how the input 3channel 224x224 image is perceived by theconvolution layers, and removes the possibility oftransfer learning from pre-trained VGG weights. Wealso dropped the fourth and fifth convolution layers(originally duplicates of the third layer) forsimplicity’s sake. The fully connected layers are alsoshrunk from 4096 neurons, to 1024 and 256 neuronsrespectively.Unfortunately, the modifications precluded the useof the pre-trained model, so training began with a Heinitialized network [4]. Much of the processing logicto interface race footage and user testing was mosteasily implemented in Python, so we chose Theano as3. Infrastructure3.1. Game and Engine ModificationOur chosen target game is available for multipleplatforms, and for reasons of compatibility, we choseto run the Nintendo64 edition inside an emulator.Requiring source code availability and highcompatibility, we opted for the Mupen64Plusemulator [5]. In order to permit flexible loading andunloading of user or neural-net control modules, wemodified the emulator to bind to a TCP socket, andpass video frames to any connected client. In return,the clients could issue keystrokes back into theemulator corresponding to that frame. We alsoimplemented functionality to dump compressed videoand keystroke logs to disk when recording2

the best framework for implementing our neuralnetwork, and built on top of Lasagne [9] for ease ofuse.During our third study, which alizations to show live saliency maps,reinforcement learning stack visualization, andgraphs of our objective metric. Screengrabs from thisUI are presented in Section 4.3.learned to read these features. The offline learningnature of the network has inverted causality: thevehicle should tilt in respond to control inputs, ratherthan vise versa.In response, we blacked out the pixels over thevehicle (and the top UI including the time) andcontinued training. As the network was initializedwith the result from the previous experiment, the4. Methodology4.1. Study 1: Offline Learning4.1.1MethodsOur first study attempted to perform offlinelearning against a large dataset of user-playedrecordings. We played the game manually forseveral hours, gathering approximately300,000 video frames of training data and theassociated keyboard inputs as labels. In aneffort to prevent map-specific overfit, ourdataset contained footage from 10-15 mapswithin the game. In Figure 2. Offline Trainingprogression, training lasted over 20 hours. Briefanalysis of this shows that we are not significantlyoverfitting the training set, and that the neural netresponds well to unseen frames., we include aFigure 2. Offline Training progression, training lasted over 20 hours. Briefanalysis of this shows that we are not significantly overfitting the trainingset, and that the neural net responds well to unseen frames. It is clear thatwe’re not training at a high enough learning rate, which is rectified in a latersession.chart of how the learning progressed through55 epochs. Every 8th batch of raw data wasset aside to be the validation dataset. Racesconsist of multiple loops, so it’s not a guarantee thatthe NN have never seen the scenery from thevalidation segments of the map, but it will suffice toestimate the Neural Net’s response to frames it hasnever seen before.Training our simplified network for 40 epochs onour entire dataset with slightly higher learning ratethan shown in Figure 2 resulted in initiallyencouraging performance. We were able to achieveapproximately 80% training accuracy, with 70%accuracy on the same validation set. Our hypothesiswas that this might be a hard accuracy wall, due tothe stochastic nature of on/off turning signals thateven a human will be hard-pushed to reproduceperfectly. However, when this network was used tocontrol the vehicle in-game, the actions wereunintelligible: the vehicle would proceed in a straightline until it hits a wall, then it turned in circlesforever in the opposite direction.Analysis of saliency maps [10] revealed the issueat play. Rather than generalizing to map andenvironment features, the network learned theorientation of the vehicle. This is visible in Figure 3,where the region of strongest saliency is the vehicle,not the track. The vehicle tilts left and right inresponse to hard turns, and the network effectivelyFigure 3. Saliency Map [10] for one training frame. Note the highintensity around the pod racer.3

network converged fairly quickly, and within 5employed a low-pass filter on the user training labels.epochs converged to 70% training accuracy and 67%The filter was implemented as a sinc convolutionvalidation. Again, the network chose to learnwith length of 7. The presence of the sinc noticeablyineffective features, and the in-game test-driveimproved learning performance by reducing theperformance was as bad as the previous method. Thisvariance in control outputs, but the frequencytime, the network became a horizon detector. Whileparameters were not very sensitive. We settled onthe vehicle tilts substantially during turns, so does theabout 10% of the control decision being decided by camera to a smaller degree. In Figure 4, the visible3 frames from the initial frame.horizon edge is strongly detected on theVery quickly, we were able to seeleft.the live evolution of behavior, suchWe increasingly realized that theas basic wall avoidance. Withinoffline learning scheme was flawed, butabout 10 minutes of live trainingattempted one last experiment by(about 14000 frames of play, ofrandomly rotating our blacked outwhich training occurred on abouttraining data 15 before passing it into10%), the neural network was able tothe training phase. Surprisingly, thisfinish its first unassisted lap on theexperiment resulted in 90% trainingMon Gazza map, which representsaccuracy, which suggested very strongour timing benchmark henceforth.overfitting, even on our very largeThe network still had a tendency todataset. Equally surprising was the 55%make 180 turns and start racingvalidationaccuracy,where50%backwards, which would only beFigure 4. Blocking out pod racerapproximatesrandom.Increasingreset by sufficiently violent crashescreates tilt detection tendencies.regularization up to 100 times ourwhere the game resets the vehicle. Itprevious value 0.0001, could not raise validationis difficult for the network to realize that it is drivingaccuracy above 57%. Naturally, there was nothe wrong way because it was never trained againstsemblance of intelligent control during testing.the eventuality. As such, the first lap completion tookThroughout our first study, we concluded that the2 minutes and 12 seconds, a far cry from a typicalopen-loop control model of offline learning was nothuman race time of 28-30 seconds.going to evolve effective control algorithms for ourAnother 10 minutes of training resulted in thevehicle. As the human test drivers achievednetwork becoming sufficiently competent toperformance far better than the network initially, thecomplete a full 5-lap race unassisted. The firstfootage covers a very small subset of the states thatsuccessful race time was 3:36 (average of 43 secondscould be experienced by the neural net. The/ lap), with a best lap time of 37 seconds, and thestochastic nature of control inputs paired with thepeak performance of all our online-trained networkslarge state space informed us of the need for feedbackwas 3:25 (41 seconds / lap). For context, our bestin our training procedures.network from Study 3 was able to complete the racein 3:10 (38 seconds / lap).In an attempt to evaluate overfit, we took models4.2. Study 2: Online Learningtrained on Mon Gazza Speedway (a relatively narrowOur online learning scheme was our first attempt toand simple map), and tested them on a map withclose the control loop of the neural net in training. Indifferent color scheme, track style, and a far largerthe new setup, the neural network (running a modelvariety of terrain (Ando Prime). While the directlytrained in Study 1) would directly control the vehicle,implanted models were either unsuccessful or verybut a human would define the ground-truth labelsslow, only a small amount of training was needed toused for training. During each frame of the game, there-fit the model for the new map. Usually, 3-5frame would be randomly rotated and trained onminutes of training was sufficient to learn mapusing the human-supplied label in real time. Wespecific behavior. As these retrained models werewould only train when the user defined a turn (orthen useless on the original map, we chose not todefined straight with a third key), and ignore framesfurther experiment with additional maps due to ourwhere the user found it unnecessary to override theobviously inadequate regularization scheme, andnetwork. Initially, training rates were exceptionallyinstead focus on further research on learning.high to develop semblance of behavior quickly, butClosing the feedback by using online learning toas the network improved, we reduced the rate fromtake ‘encouragements’ from the user was a0.01 to 0.0004, which combined the ability to stillbreakthrough that began to distill intelligent featuresmeaningfully affect the network with a resistance tointo our neural net. While able to complete tracks andoverwhelm existing behavior in the network. We alsoable to win first place against ‘medium’ ingame4

Figure 5. Digit-Parsing CNN Architecture. Each digit is processed through affine transformation and sliced apart, then classified via CNN and recombined.computer opponents, this learning scheme was stillnot able to replicate human capabilities.heavy penalties for immediate deceleration and moremodest penalties for decisions that led to decelerationseveral frames into the future. On the converse,positive acceleration-inducing decisions wereselected positively.We implemented reinforcement learning by settinga frame-by-frame learning rate to our goodnessheuristic. Good behavior (goodness 0) would betrained positively, and bad behavior trainednegatively to diffuse the decision. Our peak learningrates were substantially lower than in Study 2. Wesettled on the learning rate of 0.00001 as a balancebetween meaningful learning and the qualitylimitations of the heuristic.At this point, the race times of the reinforcementlearned network were meaningfully worse than themodel from Study 2 from which it inherited itsinitialization. Race times could not break 3:45 (9second degradation).We quickly realized that since most decisions areto continue straight ahead (and most of thesepositively reinforce due to lack of obstacles), thenetwork was encouraged to strengthen straightbehavior to the point of never turning. To counter thisbehavior, we reinforce straight decisions with aweight 100x lower than turns. We also capped thenegative learning rate for turns during rapiddeceleration (crashes), to prevent the unlearning ofall turning behavior. These tunings were able toreduce the performance gap to about parity (3:38),but still improvements eluded us.A few subtler tunings were able to substantiallyimprove our performance. Acceleration after acollision would be very positive due to rapidacceleration, overwhelming actual good driving.Accordingly, we weight accelerating with a quadraticof speed, to encourage acceleration at high velocity,without encouraging acceleration at low velocity(recovery, not necessarily good). Finally, we discardall training frames below a low velocity threshold, asthe control regime in the game appears markedlydifferent (turning is vastly less sensitive), to preventtraining to a different turning model in an atypicalregime. These modifications, combined with hours ofreinforcement training, were able to reduce race4.3. Study 3: Reinforcement LearningReinforcement learning requires an objectivefunction, and we opted to use vehicle velocity as theinput feature to a local objective heuristic. To thatend, we preprocess the game UI and perform OCR onthe velocity indicator. The preprocessing steps arecolor segmentation (select blue digits), affinetransformation (straighten digits), digit separation,and resampling. The open-source Tesseract OCRengine[8] resulted in unacceptable performance onthe output from all of these steps, so instead, the finalOCR is achieved through a small secondary neuralnet presented in Figure 5. Initially, this model wasinitialized from a MNIST-trained network, thoughsurprisingly, the unrefined results were worse thanrandom (6% accuracy on 100 manually labeledcharacters). Hence, we needed to train the model onour specific digit dataset which apparently differedsubstantially from handwritten digits. We built aframework for extracting digits from the gamerecordings and labeling them quickly. Using a datasetof 1000 hand-labeled digits, we were able to achieve100% accuracy using the above network. This is notsurprising, as there is only mild noise in the UIcoloration and therefore little noise in the input to thenetwork.The Reinforcement Learning method is built on topof a history Queue. As each frame is processed, it isadded to a variable length queue, the length of whichis determined by the current vehicle speed (between 3and 24 frames at 0 500 speed). The historyvariability allows for vehicle to have a semblance ofreaction speed – a faster vehicle should react earlierthan a slow vehicle, so a history frame should staylonger in the queue. After training each frame, wepop off every frame that exceed this length, and trainit based on the following performance heuristic.Our performance heuristic was tuned extensivelyas we watched the behaviors that each iterationencouraged. Initially, we convolved acceleration witha sawtooth to form our heuristic. This resulted in5

BADCFigure 7. Screenshot of final monitoring UI

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

Related Documents:

PWC Driving Licence In NSW it is compulsory for every person driving a PWC to hold a current PWC driving licence. There are two types of PWC driving licence: 1. PWC driving licence for those aged 16 years and over. 2. Young Adult PWC driving licence for people aged from 12 to less than 16 years. A Young Adult PWC driving licence

Eating while driving and talking on a hand-held cell phone while driving are the two most common distracted driving behaviors. 15 %Yes-No Total Eaten food while driving 46%-53% Talked on a hand-held phone while driving 32%-67% Dealt with your car's navigation system while driving 23%-75% Texted or answered a text on your hand-held cell phone .

In the 26 years since 有iley publìshed Organic 1于ze Disconnection Approach 色y Stuart Warren,由自approach to the learning of synthesis has become while the book Ìtself is now dated in content and appearance' In 唱Tiley published Organic and Control by Paul Wyatt and Stuart 轧Tarren. Thís muc如柱。okís as a

3. Supervise teen driving 4. Set family driving rules and limits 5. Impose consequences for violations Welcome to the world of teen driving! Teen driving is an exciting time for families. Teenagers want to drive and parents want to reduce their “chauffeur” duties. However, teen driving is dangerous. Motor vehicle crashes are the leading .

1 There are several descriptive terms and acronyms describing the offense of driving while impaired by drugs and/or alcohol, including Driving While Intoxicated (DWI), Driving Under the Influence (DUI), Driving While Impaired (also DWI). The title of the pertinent section in Pennsylvania statutes is Driving under the

Automated Driving Systems (ADS) in vehicles can handle the entire work of driving when the person wants the vehicle to switch to an auto-driving mode or when the person is unsure of driving. Self-driving vehicles and trucks that drive us will become a reality instead of us driving them. Object detection is necessary to achieve all these things.

NHTSA Drug-Impaired Driving GHSA's report Drug-Impaired Driving: Marijuana and Opioids Raise Critical Issues for States GHSA's report Drug-Impaired Driving: A Guide for States, 2017 Update Drugged Driving AAA Foundation Countermeasures Against Prescription and Over-the-Counter Drug-Impaired Driving References 1. Adrian, M. (2015).

READING COMPREHENSION PRACTICE EXAM. GENERAL INSTRUCTIONS: You will have 90 minutes for this test. Work rapidly but carefully. Do no spend too much time on any one question. If you have time after you have finished the test, go back to the questions you have left unanswered. The three parts of this test are English Usage, Sentence Correction, and Reading Comprehension. When you have finished .