DeepMPC: Learning Deep Latent Features For Model Predictive Control

1y ago

4 Views

1 Downloads

799.00 KB

9 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Grady Mosby

Report this link

Download PDF

Transcription

DeepMPC: Learning Deep Latent Features forModel Predictive ControlIan Lenz, Ross Knepper, and Ashutosh SaxenaDepartment of Computer Science, Cornell University.Email: {ianlenz, rak, asaxena}@cs.cornell.eduAbstract—Designing controllers for tasks with complex nonlinear dynamics is extremely challenging, time-consuming, andin many cases, infeasible. This difficulty is exacerbated in taskssuch as robotic food-cutting, in which dynamics might vary bothwith environmental properties, such as material and tool class,and with time while acting. In this work, we present DeepMPC,an online real-time model-predictive control approach designed tohandle such difficult tasks. Rather than hand-design a dynamicsmodel for the task, our approach uses a novel deep architectureand learning algorithm, learning controllers for complex tasksdirectly from data. We validate our method in experimentson a large-scale dataset of 1488 material cuts for 20 diverseclasses, and in 450 real-world robotic experiments, demonstratingsignificant improvement over several other approaches.I. I NTRODUCTIONMost real-world tasks involve interactions with complex,non-linear dynamics. Although practiced humans are able tocontrol these interactions intuitively, developing robotic controllers for them is very difficult. Several common householdactivities fall into this category, including scrubbing surfaces,folding clothes, interacting with appliances, and cutting food.Other applications include surgery, assembly, and locomotion.These interactions are characterized by hard-to-model effects,involving friction, deformation, and hysteresis. The compoundinteraction of materials, tools, environments, and manipulatorsfurther alters these effects. Consequently, the design of controllers for such tasks is highly challenging.In recent years, “feed-forward” model-predictive control(MPC) has proven effective for many complex tasks, includingquad-rotor control [36], mobile robot maneuvering [20], fullbody control of humanoid robots [14], and many others[26, 18, 11]. The key insight of MPC is that an accuratepredictive model allows us to optimize control inputs for somecost over both inputs and predicted future outputs. Such acost function is often easier and more intuitive to design thancompletely hand-designing a controller. The chief difficulty inMPC lies instead in designing an accurate dynamics model.Let us consider the dynamics involved in cutting food items,as shown in Fig. 1 for the wide range of materials shownin Fig. 2. An effective cutting strategy depends heavily onproperties of the food, including its coefficient of frictionwith the knife, elastic modulus, fracture effects, and hystereticeffects such as plastic deformation [29]. These parameterslead humans to such diverse cutting strategies as slicing,sawing, and chopping. In fact, they can even vary within asingle material; compare cutting through the skin of a lemonto cutting its flesh. Thus, a major challenge of this workFig. 1: Cutting food: Our PR2 robot uses our algorithms to performcomplex, precise food-cutting operations. Given the large variety ofmaterial properties, it is challenging to design appropriate controllers.is to design a model which can estimate and make use ofglobal environmental properties such as the material and toolin question and temporally-changing properties such as thecurrent rate of motion, depth of cutting, enclosure of the knifeby the material, and layer of the material the knife is incontact with. While some works [15] attempt to define theseproperties, it is very difficult to design a set that truly capturesall these complex inter- and intra-material variations.Hand-designing features and models for such tasks is infeasible and time-consuming, so here, we take a learningapproach. In the recent past, feature learning methods haveproven effective for learning latent task-specific features acrossmany domains [3, 19, 24, 9, 28]. In this paper, we give a noveldeep architecture for physical prediction for complex taskssuch as food cutting. When this model is used for predictivecontrol, it yields a DeepMPC controller which is able to learntask-specific controls. Deep learning is an excellent choice asa model for real-time MPC because its learned models areeasily and efficiently differentiable with respect to their inputsusing the same back-propagation algorithms used in learning,and because network sizes can simply be adjusted to trade offbetween prediction accuracy and computational speed.Our model, optimized for receding-horizon prediction,learns latent material properties directly from data. Our architecture uses multiplicative conditional interactions and temporal recurrence to model both inter-material and time-dependentintra-material variations. We also present a novel learning

Fig. 2: Food materials: Some of the 20 diverse food materials whichmake up our material interaction dataset.algorithm for this recurrent network which avoids overfittingand the “exploding gradient” problem commonly seen whentraining recurrent networks. Once learned, inference for ourmodel is extremely fast - when predicting to a time-horizonof 1s (100 samples) in the future, our model and its gradientscan be evaluated at a rate of 1.2kHz.In extensive experiments on our large-scale dataset comprising 1488 examples of robotic cutting across 20 differentmaterial types, we demonstrate that our feature-learning approach outperforms other state-of-the-art methods for physicalprediction. We also implement an online real-time modelpredictive controller using our model. In a series of over 450real-world robotic trials, we show that our controller givesextremely strong performance for robotic food-cutting, evencompared to methods tuned for specific material classes.In summary, the contributions of this paper are: We combine deep learning and model-predictive controlin a DeepMPC controller that uses learned task dynamics. We propose a novel deep architecture which is able tomodel dynamics conditioned on learned latent propertiesand a multi-stage pre-training algorithm that avoids common problems in training recurrent neural networks. We implement a real-time model predictive control system using our learned dynamics model. We demonstrate that our model and controller give strongperformance for the difficult task of robotic food-cutting.II. R ELATED W ORKReactive feedback controllers, where a control signal isgenerated based on error from current state to some setpoint, have been applied since the 19th century [4]. Stiffnesscontrol, where error in robot end-effector pose is used to determine end-effector forces, remains a common technique forcompliant, force-based activities [5, 2, 15]. Such approachesare limited because they require a trajectory to be givenbeforehand, making it difficult to adapt to different conditions.Feed-forward model-predictive control allows controls toadapt online by optimizing some cost function over predictedfuture states. These approaches have gained increased attentionas modern computing power makes it feasible to performoptimization in real time. Shim et al. [36] used MPC tocontrol multiple quad-rotors in simulation, while Howard et al.[20] performed intricate maneuvers with real-world mobilerobots. Erez et al. [14] used MPC for full-body control ofa humanoid robot. These approaches have been extended tomany other tasks, including underwater vehicle control [26],visual servoing [18], and even heart surgery [11]. However, allthese works assume the task dynamics model is fully specified.Model learning for robot control has also been a very activearea, and we refer the reader to a review of work in the area byNguyen-Tuong and Peters [31]. While early works in modellearning [1, 30] fit parameters of some hand-designed taskspecific model to data, such models can be difficult to designand may not generalize well to new tasks. Thus, several recentworks attempt to learn more general dynamics models such asGaussian mixture models [7, 21] and Gaussian processes [22].Neural networks [8, 6] are another common choice for learninggeneral non-linear dynamics models. The highly parameterizednature of these models allows them to fit a wide variety of datawell, but also makes them very susceptible to overfitting.Modern deep learning methods retain the advantages ofneural networks, while using new algorithms and networkarchitectures to overcome their drawbacks. Due to their effectiveness as general non-linear learners [3], deep learninghas been applied to a broad spectrum of problems, includingvisual recognition [19, 24], natural language processing [9],acoustic modeling [28], and many others. Recurrent deepnetworks have proven particularly effective for time-dependenttasks such as text generation [37] and speech recognition [17].Factored conditional models using multiplicative interactionshave also been shown to work well for modeling short-termtemporal transformations in images [27]. More recently Taylorand Hinton [38] applied these models to human motion, butdid not model any control inputs, and treated the conditioningfeatures as a set of fully-observed “motion styles”.Several recent approaches to control learning first learn adynamics model, then use this model to learn a policy whichmaps from system state to control inputs. These works ofteniteratively use this policy to collect more data and re-learn anew policy in an online learning approach. Levine and Abbeel[25] use a Gaussian mixture model (GMM) where linearmodels are fit to each cluster, while Deisenroth and Rasmussen[10] use a Gaussian process (GP.) Experimentally, both thesemodels gave less accurate predictions than ours for roboticfood-cutting. The GP also had very long inference times(roughly 106 times longer than ours) due to the large amountof training data needed. For details, see Section VII-B. Thisweak performance is because they use only temporally-localinformation, while our model uses learned recurrent features tointegrate long-term information and model unobserved systemproperties such as materials.These works focus on online policy search, while herewe focus on modeling and application to real-time MPC.Our model could be used along with them in a policylearning approach, allowing them to model dynamics withenvironmental and temporal variations. However, our model isefficient enough to optimize for predictive control at run-time.This avoids the possibility of learned policies overfitting thetraining data and allows the cost function and its parametersto be changed online. It also allows our model to be used withother algorithms which use its predictions directly.

Lemon, Faster0.020.010 0.010 0.01 0.022Time (s)4 0.03060.015Time (s)40 0.005 0.0146Time (s)460.010.0050 0.005 0.015020.015 0.0120 0.01 0.03060.01Position (m)0.010.01 0.0220.0150.005 0.01500.01Position (m) 0.030Position (m)0.030.02Position (m)Position (m)Lemon0.030.02 0.02Position (m)Vertical AxisSawing AxisButter0.030.0050 0.005 0.01246 0.0150246Fig. 3: Variation in cutting dynamics: plots showing desired (green) and actual (blue) trajectories, along with error (red) obtained using astiffness controller while cutting butter (left) and a lemon at low (middle) and high (right) rates of vertical motion.Gemici and Saxena [15] presented a learning system formanipulating deformable objects which infers a set of materialproperties, then uses these properties to map objects to a latentset of haptic categories which are used to determine howto manipulate the object. However, their approach requires apredefined set of properties (plasticity, brittleness, etc.), andchooses between a small set of discrete actions. By contrast,our approach performs continuous-space real-time control, anduses learned latent features to model material properties andother variations, avoiding the need for hand-design.III. P ROBLEM D EFINITION AND S YSTEMIn this work, we focus on the task of cutting a wide range offood items. This problem is a good testbed for our algorithmsbecause of the variety of dynamics involved in cutting differentmaterials. Designing individual controllers for each materialwould be very time-consuming, and hand-designing accuratedynamics models for each would be nearly infeasible.For the task of cutting, we define gripper axes as shownin Fig. 4, such that the X axis points out of the point of theknife, Y axis normal to the blade, and Z axis vertically. Here,we consider linear cutting, where the goal is to make a cutof some given length along the Z axis. The control inputs(t)(t)(t)to the system are denoted as u(t) (Fx , Fy , Fz ), where(t)Fx represents the force, in Newtons, applied along the endeffector X axis at time t. The physical state of the system is(t)(t)(t)(t)x(t) (Px , Py , Pz ) where Px is the X coordinate ofthe end-effector’s position at time t.A simple approach to control for this problem might usea fixed-trajectory stiffness controller, where control inputs areproportional to the difference between the current state x(t)and some desired state x (t) taken from a given trajectory.Fig. 3 shows some examples which demonstrate the difficulties inherent in this approach. While some materials, suchas the butter shown on the left, offer very little resistance,allowing a stiffness controller to accurately follow a giventrajectory, others, such as the lemon shown in the remainingtwo plots, offer more resistance, causing significant deviationfrom the desired trajectory. When cutting a lemon, we can alsosee that the dynamics change with time, resisting the knifemore as it cuts through the skin, then less once it enters theflesh of the lemon. The dynamics of the sawing and verticalaxes are also coupled - increasing the rate of vertical motionFig. 4: Gripper axes: PR2’s gripper with knife grasped, showing theaxes used in this paper. The X (“sawing”) axis points along the bladeof the knife, Y points normal to the blade, and Z points vertically.increases error along the sawing axis, even though the samecontrols are used for that axis.In our approach, we fix the orientation of the end-effector,as well as the position of the knife along its Y axis, usingstiffness control to stabilize these. However, even though ourprimary goal is to move the knife along its Z axis, as shownin Fig. 3, the X and Z axes are strongly coupled for thisproblem. Thus, our algorithm performs control along both theX and Z axes. This allows “sawing” and “slicing” motionsin which movement along the X axis is used to break staticfriction along the Z axis and enable forward progress. We usea nonlinear function f to predict future states:x̂(t 1) f (x(t) , u(t 1) )(1)We can apply this formula recurrently to predict further intothe future, e.g. x̂(t 2) f (x̂(t 1) , u(t 2) ).A. Model-Predictive Control: BackgroundIn this work, we use a model-predictive controller to controlthe cutting hand. Such controllers have been shown to workextremely well for a wide variety of tasks for which handdefined controllers are either difficult to define or simplycannot suffice [20, 14, 26, 11]. Defining Xt:k as the systemstate from time t through time k, and Ut:k similarly for systeminputs, a model-predictive controller works by finding a set of optimal inputs Ut 1:t Twhich minimize some cost functionC(X̂t 1:t T , Ut 1:t T ) over predicted state X̂ and controlinputs U for some finite time horizon T : Ut 1:t T arg max C(X̂t 1:t T , Ut 1:t T )(2)Ut 1:t TThis approach is powerful, as it allows us to leverageour knowledge of task dynamics f (x, u) directly, predictingfuture interactions and proactively avoiding mistakes rather

than simply reacting to past observations. It is also versatile,as we have the freedom to design C to define optimalityfor some task. The chief difficulty lies in modeling the taskdynamics f (x, u) in a way that is both differentiable and quickto evaluate, to allow online optimization.IV. M ODELING T IME -VARYING N ON -L INEAR DYNAMICSWITH D EEP N ETWORKSHand-designing models for the entire range of potentialinteractions encountered in complex tasks such as cutting foodwould be nearly impossible. Our main challenge in this workis then to design a model capable of learning non-linear, timevarying dynamics. This model must be able to respond toshort-term changes, such as breaking static friction, and mustbe able to identify and model variations caused by varyingmaterials and other properties. It must be differentiable withrespect to system inputs, and the system outputs and gradientsmust be fast to compute to be useful for real-time control.We choose to base our model on deep learning algorithms,a strong choice for our problem for several reasons. Theyhave been shown to be general non-linear learners [3], but remain differentiable using efficent back-propagation algorithms.When time is an issue, as in our case, network sizes can bescaled down to trade accuracy for computational performance.Although deep networks can learn any non-linear function,care must still be taken to design a task-appropriate model. Asshown in Fig. 7, a simple deep network gives relatively weakperformance for this problem. Thus, one major contributionof this work is to design a novel deep architecture for modeling dynamics which might vary both with the environment,material, etc., and with time while acting. In this section, wedescribe our architecture, shown in Fig. 5 and motivate ourdesign decisions in the context of modeling such dynamics.Dynamic Response Features: When modeling physical dynamics, it is important to capture short-term input-outputresponses. Thus, rather than learning features separately forsystem inputs u and outputs x, the basic input features usedin our model are a concatenation of both. It is also important tocapture high-order and delayed-response modes of interaction.Thus, rather than considering only a single timestep, weconsider blocks thereof when computing these features, so thatfor block b, with block size B, we have visible input featuresv (b) (Xb B:(b 1) B 1 , Ub B:(b 1) B 1 ).Conditional Dynamic Responses: For tasks such as materialcutting, our local dynamics might be conditioned on both timeinvariant and time-varying properties. Thus, we must designa model which operates conditional on past information. Wedo so by introducting factored conditional units [27], wherefeatures from some number of inputs modulate multiplicativelyand are then weighted to form network outputs. Intuitively,this allows us to blend a set of models based on featuresextracted from past information. Since our model needs toincorporate both short- and long-term information, we allowthree sets of features to interact – the current control inputs, thepast block’s dynamic response, and latent features modelinglong-term observations, described below. Although the pasth[l]l(b 1)l(b)h[lp]x̂(b 1)h[lc]h[f ]h[c]v (b 1)v (b)u(b 1)Fig. 5: Deep predictive model: Architecture of our recurrent conditional deep predictive dynamics model.block’s response is also included when forming latent features,including it directly in this conditional model frees our latentfeatures from having to model such short-term dependencies.We use c to denote the current timeblock, f to denote theimmediate future one, l for latent features, and o for outputs.Take Nv as the number of features v, Nx as the number ofstates x, and Nu as the number of inputs u in a block, andNl as the number of latent features l. With h[c](b) RNoh asthe hidden features from the current timestep, formed usingweights W [c] RNv Noh (similar for f and l), and W [o] RNoh Nx as the output weights, our predictive model is then:!NvX[c] (b)[c](b)(3)Wi,j vihj σi 1[f ](b)hj[l](b)hj σ σNuXi 1NlX[f ] (b 1)Wi,j ui(b 1) NohX(4)!(5)[o] [c](b) [f ](b) [l](b)hihi(6)[l] (b)Wi,j lii 1x̂j!Wi,j hii 1Long-Term Recurrent Latent Features: Another major challenge in modeling time-dependent dynamics is integratinglong-term information while still allowing for transitions in dynamics, such as moving from the skin to the flesh of a lemon.To this end, we introduce transforming recurrent units (TRUs).To retain state information from previous observations, ourTRUs use temporal recurrence, where each latent unit hasweights to the previous timestep’s latent features. To allow thisstate to transition based on locally observed transformations indynamics, they use the paired-filter behavior of multiplicativeinteractions to detect transitions in the dynamic response ofthe system and update the latent state accordingly. In previous work, multiplicative factored conditional units have beenshown to work well in modeling transformations in images[27] and physical states [38], making them a good choicehere. Each TRU thus takes input from the previous TRU’soutput and the short-term response features for the currentand previous time-blocks. With ll denoting recurrent weights,lc denoting current-step for the latent features, lp previousstep, and lo output, and Nlh as the number of TRU hidden

units, our latent feature model is then:!NvX[lc](b)[c] (b)hjWi,j vi σ[lp](b)hj(b)lj σ σi 1NvXi 1NlhXi 1[f ] (b 1)Wi,j vi(7)![lo] [lc](b) [lp](b)Wi,j hihi(8) NlX[ll] (b 1)Wk,j lkk 1!(9)Finally, Fig. 5 shows the full architecture of our deeppredictive model, as described above.V. L EARNING AND I NFERENCEIn this section, we define the learning and inference procedures for the model defined above. The online inferenceapproach is a direct result of our model. However, there aremany possible approaches to learning its parameters. Neuralnetworks require a huge number of parameters (weights) tobe learned, making them particularly susceptible to overfitting,and recurrent networks often suffer from instability in futurepredictions, causing large gradients which make optimizationdifficult (the “exploding gradient” problem).To avoid these issues, we define a new three-stage learningapproach which pre-trains the network weights before usingthem for recurrent modeling. Deep learning methods are nonconvex, and converge to only a local optimum, making ourapproach important in ensuring that a good optimum whichdoes not overfit the training data is reached.Inference: During inference for MPC, we are currently atsome time-block b with latent state l(b) , known system statex(b) and control inputs u(b) . Future control inputs Ut 1:t Tare also given, and our goal is then to predict the futuresystem states X̂t 1:t T up to time-horizon T , along with thegradients X/ U for all pairs of x and u. We do so byapplying our model recurrently to predict future states up totime-horizon T , using predicted states x̂ and latent featuresl as inputs to our predictive model for subsequent timesteps,e.g. when predicting x(b 2) , we use the known x(b) along withthe predicted x̂(b 1) and l(b 1) as inputs.Our model’s outputs (x̂) are differentiable with respect toall its inputs, allowing us to take gradients X/ U using anapproach similar to the backpropagation-through-time algorithm used to optimize model parameters during learning. Wecan in turn use these gradients with any gradient-based optimization algorithm to optimize Ut 1:t T with respect to somedifferentiable cost function C(X, U ). No online optimizationis necessary to perform inference for our model.Learning:During learning, our objective is to useour training data to learn a set of model parameters Θ (W [f ] , W [c] , W [l] , W [o] , W [lp] , W [lc] , W [ll] , W [lo] )which minimize prediction error while avoiding overfitting.A naive approach to learning might randomly initialize Θ,then optimize the entire recurrent model for prediction error.However, random weights would likely cause the model tomake inaccurate predictions, which will in turn be fed forwardsto future timesteps. This could cause huge errors at timehorizon T , which will in turn cause large gradients to be backpropagated, resulting in instability in the learning and overfitting to the training data. To remedy this, we propose a multistage pre-training approach which first optimizes some subsetsof the weights, leading to much more accurate predictions andless instability when optimizing the final recurrent network.We show in Fig. 7 that our learning algorithm significantlyoutperforms random initialization.Phase 1: Unsupervised Pre-Training: In order to obtaina good initial set of features for l, we apply an unsupervisedlearning algorithm similar to the sparse auto-encoder algorithm[16] to train the non-recurrent parameters of the TRUs. Thisalgorithm first projects from the TRU inputs up to l, then usesthe projected l to reconstruct these inputs. The TRU weightsare optimized for a combination of reconstruction error andsparsity in the outputs of l.Phase 2: Short-term Prediction Training: While we couldnow use these parameters as a starting point to optimize afully recurrent multi-step prediction system, we found that inpractice, this lead to instability in the predicted values, sinceinaccuracies in initial predictions might “blow up” and causehuge deviations in future timesteps.Instead, we include a second pre-training phase, where wetrain the model to predict a single timestep into the future. Thisallows the model to adjust from the task of reconstruction tothat of physical prediction, without risking the aforementionedinstability. For this stage, we remove the recurrent weightsfrom the TRUs, effectively setting all W [ll] to zero andignoring them for this phase of optimization.Taking x(m,k) as the state for the k th time-block of trainingcase m, M as the number of training cases, and Bm as thenumber of timeblocks for case m, this stage optimizes:M BXm 1XΘ arg min x̂(m,b 1) x(m,b 1) 22(10)Θm 1 b 2Phase 3: Warm-Latent Recurrent Training: Once Θ hasbeen pre-trained by these two phases, we use them to initializea recurrent prediction system which performs inference asdescribed above. We then optimize this system to minimize thesum-squared prediction error up to T timesteps in the future,using a variant of the backpropagation-through-time algorithmcommonly used for recurrent neural networks [34].When run online, our model will typically have someamount of past information, as we allow a short period wherewe optimize forces while a stiffness controller makes aninitial inwards motion. Thus, simply initializing the latentstate “cold” from some intial state and immediately penalizingprediction error does not match well with the actual use of thenetwork, and might in fact introduce overfitting by forcing themodel to rely more heavily on short-term information. Instead,we train our model for a “warm” start. For some number ofinitial time-blocks Bw , we propagate latent state l, but do notpredict or penalize system state x̂, only doing so after thiswarm-up phase. We still back-propagate errors from futuretimesteps through the warm-up latent states as normal.

Fig. 6: Online system: Block diagram of our DeepMPC system.VI. S YSTEM D ETAILSLearning System: We used the L-BFGS algorithm, shown togive strong results for deep learning methods [23], to optimizeour model during learning. While larger network sizes gaveslightly ( 10%) less error, we found that setting Nlh 50,Nl 50, and Noh 100 was a good tradeoff between accuracyand computational performance. We found that block size B 10, giving blocks of 0.1s, gave the best performance. Whenimplemented on the GPU in MATLAB, all phases of ourlearning algorithm took roughly 30 minutes to optimize.Robotic Platform: For both data collection and online evaluation of our algorithms, we used a PR2 robot. The PR2 has two7-DoF manipulators with parallel-plate grippers, and a reachof roughly 1m. For safety reasons, we limit the forces appliedby PR2’s arms to 30N along each axis, which was sufficientto cut every material tested. PR2 natively runs the RobotOperating System (ROS) [33]. Its arm controllers recieve robotstate information in the form of joint angles and must publishdesired motor torques at a hard real-time rate of 1KHz.Online Model-Predictive Control System: The main challenge in designing a real-time model-predictive controller forthis architecture lies in allowing prediction and optimizationto run continuously to ensure optimality of controls, whileproviding the model with the most recent state informationand performing control at the required real-time rate. Asshown in Fig. 6, we solve this by separating our onlinesystem into two processes (ROS nodes), one performingcontinuous optimization, and the other real-time control. Theseprocesses use a shared memory space for high-rate interprocess communication. This approach is modular and flexible- the optimization process is generic to the robot involved(given an appropriate model), while the control process isrobot-specific, but generic to the task at hand. In fact, modelsfor the optimization process do not even need to be learnedlocally, but could be shared using an online platform [35].The control process is designed to perform minimal computation so that it can be called at a rate of 1KHz. Itrecieves robot state information in the form of joint angles,and control information from the optimization process as endeffector forces. It performs forward kinematics to determineend-effector pose, transmits it to the optimization process, anduses it to determine restoring forces for axes not controlled byMPC. It translates the combination of these forces and thoserecieved from MPC to a set of joint torques sent to the arm.All operations performed by the control process are at mostquadratic in terms of the number of degrees of freedom of thearm, allowing each call to run in roughly 0.1 ms on PR2.The optimization process runs as a continuous loop. Whenstarted, it loads model parameters (network weights) fromdisk. Cost function parameters are loaded from a ROS parameter server, allowing them to be changed online. Theoptimization loop first uses past robot states (received fromthe control process) and control inputs along with past latentstate

feasible and time-consuming, so here, we take a learning approach. In the recent past, feature learning methods have proven effective for learning latent task-speciﬁc features across many domains [3, 19, 24, 9, 28]. In this paper, we give a novel deep architecture for physical prediction for complex tasks such as food cutting.

Related Documents:

DeepMPC: Learning Deep Latent Features for Model ...

Deep learning is an excellent choice as a model for real-time MPC because its learned models are easily and efﬁciently differentiable with respect to their inputs using the same back-propagation algorithms used in learning, and

102 Views

2y ago

Introducing Deep Learning with MATLAB

Deep Learning: Top 7 Ways to Get Started with MATLAB Deep Learning with MATLAB: Quick-Start Videos Start Deep Learning Faster Using Transfer Learning Transfer Learning Using AlexNet Introduction to Convolutional Neural Networks Create a Simple Deep Learning Network for Classification Deep Learning for Computer Vision with MATLAB

76 Views

1y ago

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

102 Views

1y ago

LATENT PRINT ANALYST II

Latent print analysis is defined as experience in comparison of latent prints with inked and/or imaged prints, experience in crime scene processing for latent prints, all phases of physical evidence processing, and expert testimony to the

22 Views

2y ago

Forensic Latent Print

Sep 30, 2021 · 1.8.4.1. The Latent Print Analyst should, when possible, examine the item first. 1.8.4.2. Prior to conducting any part of the latent print examination, the Latent Print Analyst shall ensure that the firearm is safe. If there is any question as to the safety of the f

24 Views

2y ago

Nonparametric Bayesian Topic Modelling with the ...

Topic models were inspired by latent semantic indexing (LSI,Landauer et al.,2007) and its probabilistic variant, probabilistic latent semantic indexing (pLSI), also known as the probabilistic latent semantic analysis (pLSA,Hofmann,1999). Pioneered byBlei et al. (2003), latent Dirichlet alloca

16 Views

2y ago

High Performance Distributed Deep Learning - Nvidia

-The Past, Present, and Future of Deep Learning -What are Deep Neural Networks? -Diverse Applications of Deep Learning -Deep Learning Frameworks Overview of Execution Environments Parallel and Distributed DNN Training Latest Trends in HPC Technologies Challenges in Exploiting HPC Technologies for Deep Learning

52 Views

1y ago

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many ﬁelds, including bioinformatics, computational linguistics and speech recognition [6, 9, 12]. For example, consider the natural language processing

44 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

2013 National Senior Games presented by Humana Medal

3 martin cherie ann canada track & field 2 martin cherieann canada track & field 3 rossi elsie canada track & field 1 stuart pam canada track & field 2 stuart pam canada track & field 3 stuart pam canada track & field 1 stuart pam canada track & field 1 sleepers canada volleyball 3 volleyhawks canada volleyball 1 horiuchi kumi co archery

2y ago

176 Views

International Registered and Reporting Companies .

Dorel Industries Inc. Canada GLOBAL MKT Draxis Health Inc. Canada GLOBAL MKT Dundee Corp. Canada OTC DynaMotive Energy Systems Corp. Canada OTC Eiger Technology Inc. Canada OTC El Nino Ventures, Inc. Canada OTC Eldorado Gold Corp. Canada AMEX Elephant & Castle Group, Inc. Canada OTC Emgold Mining Corp. Canada OTC

1y ago

112 Views

DeepMPC: Learning Deep Latent Features For Model Predictive Control

It looks like you're using an ad-blocker