Tendon-Driven Control Of Biomechanical And Robotic Systems: A Path .

1y ago
7 Views
3 Downloads
1.60 MB
7 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Wade Mabry
Transcription

Tendon-Driven Control of Biomechanical and Robotic Systems: A PathIntegral Reinforcement Learning Approach.Eric Rombokas, Evangelos Theodorou, Mark Malhotra, Emo Todorov and Yoky MatsuokaAbstract— We apply path integral reinforcement learning toa biomechanically accurate dynamics model of the index fingerand then to the Anatomically Correct Testbed (ACT) robotichand. We illustrate the applicability of Policy Improvement withPath Integrals (P I 2 ) to parameterized and non-parameterizedcontrol policies. This method is based on sampling variations incontrol, executing them in the real world, and minimizing a costfunction on the resulting performance. Iteratively improving thecontrol policy based on real-world performance requires nodirect modeling of tendon network nonlinearities and contacttransitions, allowing improved task performance.I. INTRODUCTIONWe demonstrate control and learning of tendon drivenbio-mechanical and robotic systems, under difficult-to-modelreal-world conditions. This work is part of a bigger projectin which the ultimate goal is twofold. First, we aim to usebiologically inspired control and design principles to improvethe state of the art robot control and design. Secondly,we seek to better understand the underlying computationalprinciples of neural and bio-mechanical systems.Although there have been a number of studies of neuralmotor control and robotics, there still remains much progressto be made in bridging the gap between these two areas.Most studies are limited to applications of control algorithmsto simulation, due to the difficulty of interfacing with realworld robotic hardware and biological motor systems. Oneof the main reasons for this discrepancy between simulationsand real systems is that tendon-driven systems are verycomplex. In tendon driven systems, torque around the jointsis created through a network of tendons attached to the links.These tendons produce only positive force since they mustpull and not push. Nonlinearities due to friction and controlconstraints contribute to the complexity of the underlyingrobotic dynamics. Control and reinforcement learning algorithms which perform well in simulation may not performwell on real robotic systems due to factors like dependenceon accurate models and the “Curse of Dimensionality.”A promising strategy for overcoming the complexity ofrobotic tendon driven control is to avoid modeling it directly,but instead to learn a set of controls which achieves hotra,yoky}@cs.washington.eduE.A. Theodorou is Postdoctoral Research Associate with the Departmentof Computer Science and Engineering, University of Washington, Seattle,USA etheodor@cs.washington.eduE. Todorov is with the Movement Control Laboratory, ComputerScience & Engineering, University of Washington, Seattle, USA,todorov@cs.washington.eduby actually trying them. Beginning with a single exampleor demonstration, reinforcement learning can be applied byiteratively minimizing a cost function on the outcome ofsample trials. This strategy, then, is to explore variations ofcontrol, observe the outcome of using that control, and revisethe controller accordingly.Recent work on path integral reinforcement learning[1] has demonstrated the robustness and scalability of themethod to robotic control in high dimensional state spaces.The iterative version of path integral control, the so-calledPolicy Improvement with Path Integrals (PI2 ), has beenapplied for learning and control with torque-driven roboticsystems. PI2 may be classified as model based, semi-modelbased or model free depending on how the learning problemis formulated. This is useful for learning control applicationswith complex robotic systems, for which modeling of theunderlying dynamics and contact phenomena is very difficult.One of the main ingredients of the application of PI2 inprevious work has been the use of nonlinear point attractors,called Dynamic Movement primitives (DMPs). DMPs wereused to parameterize trajectories for the case of planningor gains for the case of gain scheduling and applicationsof variable stiffness control. In this work we go one stepfurther by applying (PI2 ) to tendon-driven hand systems.In particular we demonstrate the use of PI2 on an accuratebiomechanical model of the index finger, and go on to applyPI2 to the Anatomically Correct Testbed (ACT) robotic handfor the task of sliding a switch. As we show, PI2 is flexiblebecause 1) it may be extended to tendon-driven systemsand 2) its use does not rely on policy parameterizations,though it can accommodate them if desired. With verysmall algorithmic changes, PI2 can be used to either directlycompute control commands u(x, t) or learn parameters θwhich, when projected onto basis functions, represent desiredtrajectories or control gains: u(x, t) Φ(x, t)T θ.In Sections II-III we review the control framework andparameterization. Section IV describes the tendon-drivensystems, first in simulation and then on the ACT robotichand. The experimental conditions and results are describedin Section V.II. PATH I NTEGRAL C ONTROLIn this section we review the path integral control framework [1], [2]. We consider the following stochastic optimalcontrol problem with the cost function under minimizationgiven by the mathematical expression:

ZtNL(x, u, t)dtV (x) min J(x, u) minuu(1)tosubject to the nonlinear stochastic dynamics:dx F(x, u)dt B(x)dw(2)with x n 1 denoting the state of the system, u p 1the control vector and dw p 1 Brownian noise. Thefunction F(x, u) is a nonlinear function of the state x andaffine in controls u and therefore is defined as F(x, u) f (x) G(x)u . The matrix G(x) n p is the controlmatrix, B(x) n p is the diffusion matrix and f (x) n 1 are the passive dynamics. The cost function J(x, u) isa function of states and controls. Under the optimal controlsu the cost function is equal to the value function V (x). Theterm L(x,u,t) is the immediate cost and it is expressed as:1L(x, u, t) q0 (x, t) q1 (x, t)u uT Ru2(3)The immediate cost has three terms1 , the first q0 (xt , t) is anarbitrary state-dependent cost, the second term depends onstates and controls and the third is the control cost with R 0 the corresponding weight. The stochastic HJB equation [3],[4] associated with this stochastic optimal control problemis expressed as follows: 1 t V min L ( x V ) F tr ( xx V )GGTu2TλG(x)R 1 G(x)T B(x)B(x)T Σ the resulting PDEis formulated as follows:11 t Ψ q̃Ψ f T ( x Ψ) tr (( xx Ψ)Σ)(6)λ2 with boundary condition: ΨtN exp λ1 φtN . Application of the Feynman-Kac lemma to the ChapmanKolmogorov PDE (6) yields its solution in form of anexpectation over system trajectories R tN 1 Ψ (xti ) Eτ i e ti λ q(x)dt Ψ(xtN )(7)on sample paths τ i (xi , ., xtN ) generated with theforward sampling of the diffusion equation dx f (xt )dt B(x)dw. Thus, the Feynman-Kac lemma is crucial totransforming the stochastic optimal control problem into aproblem of approximating a path integral. In discrete time,the solution to 7 is approximated by: Z Ψ(xti ) limP xN , tN xi , ti(8)dt 0 PN 1 φtN j i qtj dt dxN exp λ where the probability P xN , tN xi , ti has the form ofpath integral. After approximating the exponentiated valuefunction Ψ(x, t), the optimal controls can be recovered: (4)To find the minimum, the cost function (3) is inserted into(4) and the gradient of the expression inside the parenthesisis taken with respect to controls u and set to zero. Thecorresponding optimal control is given by the equation: 1Tu(xt ) Rq1 (x, t) G(x) x V (x, t)(5)substitution of the optimal controls into the stochastic HJB(4) results in the following nonlinear, second-order PDE:1 t V q̃ ( x V )T f̃ ( x V )T GR 1 GT ( x V )2 1 tr ( xx V )BBT2with q̃(x, t) and f (x, t) defined as q̃(x, t) q0 (x, t) 21 q1 (x, t)T R 1 q1 (x, t) and f (x, t) f (x, t) G(x, t)R 1 q1 (x, t) and the boundary conditionV (xtN ) φ(xtN ). Solving the PDE above, especially forhigh dimensional dynamical systems remains one of the mainchallenges in nonlinear optimal control theory. To transformthe PDE above into a linear one, we use an exponentialtransformation of the value function V λ log Ψ. Byinserting the logarithmic transformation and the derivativesof the value function as well as considering the assumption1 The aforementioned immediate cost has the additional second term andin that sense is more general than costs where only the first and third termsare considered.uP I (x) R 1 T x Ψ(x, t)q1 (x, t) λG(x)Ψ(x, t)(9)where the subscript P I stands for Path Integral. Whenconstraints in control are considered umin uP I (x) umax the optimal control is expressed as: uCP I (x) max umin , min uP I (x), umaxThe subscript CP I stands for Constrained Path Integral.The min and max operators need to be applied element-wise.In [1], [2] it has been shown that the path integral optimalcontrol takes the form:ZuP I (xti ) limP (τ i ) dwti(10)dt 0with τ i is a trajectory in state space starting from xti andending in xtN , therefore τ i (xti , ., xtN ). The probabilityP (τ i ) is defined ase λ S̃(τ i )1e λ S̃(τ i ) dτ i1P (τ i ) R(11)In the iterative version of path integral control framework,dw can be thought as variations δu in the controls u.An alternative formulation exists when control policies areparameterized as u(x, t) Φ(x, t)T θ. In these cases, theparameter θ plays the role of controls while dw can bethought of as variations δθ in the parameters θ of theparameterized policy u(x, t). Table I illustrates PI2 when

TABLE I: Policy Improvements with path integrals PI2 -I. Given:– An immediate state dependent cost function q(xt )– The control weight R̃ Σ 1Repeat until convergence of the trajectory cost R:– Create K roll-outs of the system from the same start state x0using stochastic parameters u δus at every time step– For k 1.K, compute costs and weights: P 1 S(τ i ) φtN Nqtj δus R̃ δus dtj i 1 S(τ i,k ) P τ i,k P e λ 1 S(τ )K [e λi,k ]k 1– For i 1.(N 1), compute: P δu(xti ) KP τ i,k δus (ti , k)k 1 – Update u max umin , min u δu, umaxTABLE II: Policy Improvements with path integrals PI2 -II. Given:– An immediate state dependent cost function q(xt )– The control weight R̃ Σ 1Repeat until convergence of the trajectory cost R:– Create K roll-outs of the system from the same start state x0using stochastic parameters θ δθ s at every time step– For k 1.K, compute costs and weights: P 1 S(τ i ) φtN Nqtj δθ s R̃ δθ s dtj i 1 S(τ i,k ) P τ i,k P e λ 1 S(τ )K [e λi,k ]k 1– For i 1.(N 1), compute: P δθ(xti ) Kk 1 P τ i,k δθ s (ti , k)– Time averagingP 1 δθ Nwi δθ(xti )i– Update θ θ δθit is applied to constrained optimal control problems. TableII illustrates PI2 for the case where parameterized policiesare used. The main difference is that in PI2 -I there is notime averaging of the control strategy changes as in PI2 -II.In addition, in the last step of PI2 -I the controls are updatedsuch that constraints are not violated.The assumption of Path integral control frameworkλG(x)R 1 G(x)T B(x)B(x)T Σ establishes arelationship between control cost and variance of noise.Essentially, high variance results in low control cost andtherefore increased control authority. Depending on the levelsof noise, the connection between noise and control authoritymay result in noisy control commands. This characteristicmay be desirable when applying path integral control to biomechanical and neuromuscular models, in order to matchobserved noisy controls. For reinforcement learning applications to robotic systems, however, it may be preferable touse smooth control commands. In that case, nonlinear pointattractors [5] offer a low dimensional parameterization oftrajectories and control gains. This parameterization reducesthe search space and also has a smoothing effect on thecontrol commands. In the next section we review nonlinearpoint attractors and provide their mathematical formulations.III. DYNAMIC M OVEMENT P RIMITIVES : N ONLINEARP OINT ATTRACTORS WITH ADJUSTABLE ATTRACTORLANDSCAPEThe nonlinear point attractor consists of two sets of differential equations, the canonical and transformation systemwhich are coupled through a nonlinearity [5]. The canonicalsystem is formulated as τ1 ẋt αxt . That is a first order linear dynamical system for which, starting from somearbitrarily chosen initial state x0 , e.g., x0 1, the state xconverges monotonically to zero. x can be conceived of as aphase variable, where x 1 would indicate the start of thetime evolution, and x close to zero means that the goal g (seebelow) has essentially been achieved. The transformationsystem consist of the following two differential equations: f y αz z(12)τ ż αz βzg αz βzτ ẏ zThese 3 differential equations code a learnable pointattractor for a movement from yt0 to the goal g, whereθ determines the shape of the attractor. yt , ẏt denote theposition and velocity of the trajectory. αz , βz , τ are timeconstants. The nonlinear coupling or forcing term f isdefined as:PNi 1 K (xt , ci ) θi xtf (x) P(g y0 ) ΦP (x)T θ (13)Ni 1 K (xt , ci )The basis functions K (xt , ci ) are defined as K (xt , ci ) exp 0.5hj (xt cj )2 with bandwith hj and center cj ofthe Gaussian kernels – for more details see [5]. The full dynamics of the point attractor have the form of dx α(x)dt C(x)udt where the state x is specified as x (y, z) whileTthe controls are specified as u θ (θ1 , ., θp ) . Thusα(x) and C(x) are specified as follows: α(x) zαz βz (g y) αz z C(x) 0ΦP (x)T (14) (15)The representation above is advantageous as it guaranteesthat the attractor progresses towards the goal while remaininglinear in the parameters θ. By varying θ, the shape of thetrajectory changes while the goal state g and initial state yt0remain fixed. These properties facilitate learning [6].IV. T ENDON - DRIVEN SYSTEMSIn this section we describe two tendon-driven systems usedin our work. The first is a dynamical model of the humanindex finger, and the second is the ACT robotic hand.

(18)The Anatomically Correct Testbed (ACT) robotic handmimics the interactions among muscle excursions and jointmovements produced by the bone and tendon geometries ofthe human hand. This mimicry results in a robotic systemsharing the redundancies and nonlinearities of the biologicalhand [8] [9].The ACT hand uses 24 motor-driven tendons to control athumb, index finger, middle finger, and wrist. Each segmentof these fingers is machined using human bone data, andis accurate in surface shape, mass, and center-of-gravity tothe human equivalent. The extensor mechanisms are websof tendons on the dorsal side of the fingers, and are crucialfor emulating dynamic human behavior [10]. As each tendonis pulled by a motor, it is routed through attachment pointsmimicking human tendon sheaths and following the contoursof the bones. Since these bone shapes are complicatedsurfaces, the effective moment arm the tendon exerts on thejoint varies with joint angle [11]. The hand may optionallyinclude a silicon rubber skin on its palmar surfaces. The ACTmotors are controlled at 200 Hz using real-time RTAI Linux,and have encoders with a resolution of 230 nm, allowing forprecise control and sensing of tendon length.(19)V. R ESULTSA. Index Finger BiomechanicsThe skeleton of the human index finger consists of 3joints connected with 3 rigid links. The two joints (proximal interphalangeal (PIP) and the distal interphalangeal(DIP)) are described as hinge joints that can generate bothflexion-extension. The metacarpophalangeal joint (MCP) isa saddle joint and it can generated flexion-extension as wellas abduction-adduction. Fingers have at least 6 muscles,and the index finger is controlled by 7. Starting with theflexors, the index finger has the Flexor Digitorum Profundus(FDS) and the Flexor Digitorum Superficialis (FDP). Thethe Radial Interosseous (RI) acts on the MCP joint. Lastly,the extensor mechanism acts on all three joints. It is aninterconnected network of tendons driven by two extensorsExtensor Communis (EC) and the Extensor Indicis (EI), andthe Ulnar Interosseous (UI) and Lumbrical (LU).The full model of the index finger is given by: 1q̈ I (q)T M(q) · F1 (F Gu)τ u uminḞumax 1C (q, q̇) Bq̇ I (q)T (16)(17)where I 6 6 is the inertial matrix and C(q, q̇) 6 1is matrix of Coriolis and centripetal forces and B 3 3is the joint friction matrix. The matrix M 3 7 is themoment-arm matrix specified in [7], T 3 1 is the torquevector, F 7 1 is the force in N t applied on the tendonsand u is the control vector in units of muscle stress N t/cm2 .Equation (18) is used to model delays in the generation oftensions on the tendons. The matrix G is determined bythe PCSA parameter [7] of each individual muscle- tendonG Diag (4.10, 3.65, 1.12, 1.39, 0.36, 4.16, 1.60) cm2 . Thecontrol constraints are specified as umin 0 and themaximum muscle stress umax 35N t/cm2 .For our simulations we have excluded the abductionadduction movement at MCP joint, so we examine the tendonlength and velocity profiles necessary for producing planarmovements. The quantities q and q̇ are vectors of dimensionality q 3 1 ,q̇ 3 1 defined as q (q1 , q2 , q3 )and q̇ (q̇1 , q̇2 , q̇3 ). The inertia I(q) term of the forwarddynamics are given in the appendix.A. Biomechanical model: learning to tapWe apply PI2 -I on the biomechanical model of the indexfinger presented in Section IV-A. The task is to move thefinger from an initial posture to final posture. In this workthere is no pre-specified trajectory incorporated in the costfunction, but there is a constraint in the terminal fingerposition and velocity. Consequently, there is a terminal costthat is a function of the desired position and velocity states,and it is only the control cost that is accumulated over thetime horizon of the movement. In mathematical terms theobjective function is expressed as follows:Z T TJ (q q ) Qp (q q ) q̇ Qv q̇ uT Rudt (20)with Qp 1000 I3 3 , Qv 10 I3 3 and R 250 I3 3 .The desired target posture and desired velocity are defined asq (7π/6, π/4, π/12) and q̇ (0, 0, 0), while the timehorizon is T 420ms.PosturesControl Profiles0.7FDSFDPEIECLUMRIUI0.02B. The ACT robotic 0.050.10.150.20.250.30.350.40.45sec(a)(b)Fig. 2: Sequence of postures and control profiles forFDS(blue), FDP(red), EI(black), EC(yellow), LUM (cyan),RI(green) and UI(magenta).Fig. 1: Anatomically Correct Testbed-ACT hand.

Tension Profilesreduce the sampling space and improve performance.Length of Active Tendons1.44FDSFDPEIECLUMRIUI1.2B. ACT Hand: Sliding a 150.20.25secsec(a)(b)0.30.350.40.45Fig. 3: Tension profiles and length of FDS(blue),FDP(red), EI(black), EC(yellow), LUM (cyan), RI(green)and UI(magenta).1) Experimental setup: The second experiment is aswitch-sliding task using real-world hardware. Before eachattempt, the index finger began in an extended position,hovering over the switch in the air (Figure 5). The taskconsisted of first making contact with the switch, and thensliding it down, using mostly flexion of the MCP joint,though this requirement is implicit in the switch movementperformance.TorquesVelocity of Active )0.30.350.40.45Fig. 4: Velocity profiles for FDS(blue), FDP(red), EI(black),EC(yellow), LUM (cyan), RI(green) and UI(magenta) andjoint torques.The results are shown in Figures 2-4. Figure 2a illustratesthe sequence of postures. Figure 2a presents the control profiles required for the finger to perform the tapping movement.The controls are in units of stress N t/cm2 . Characteristically, the tendons FDS, UI, RI and LUM are activated duringthe acceleration phase of movement, while the extensortendons EC and EI are involved in the second, deceleration,phase of the movement. The same synchronization amongtendons is shown in Figure 3a that illustrates the tensionprofiles in units of N t. The only difference with respectto stress profiles is that the tension applied to the LUM issmall relative to FDS, UI, and RI. This observation agreeswith studies of the index finger [7] showing that LUM is theweakest tendon.Tendon excursions are illustrated in Figure 3b. All tendonsbesides EC and EI are acting as flexors since they aremoving inwards (towards the muscle) and therefore theirlength increases. EC and EI act as extensors since they aremoving outward and towards the finger tip. The result of thismotion is that their lengths decrease. Figure 4 illustrates thetendon velocities and torques generates at the MCP, PIP andDIP joints.The application of PI2 -I on the constrained biomechanicalmodel of the finger reveals the efficiency of the method whenapplied to constrained nonlinear stochastic dynamics. PI2 isa sampling based method. In contrast to other trajectoryoptimizers, the efficiency of PI2 is not affected by theexistence of control constraints. In fact, constraints in controlFig. 5: Experimental setup. The finger begins extended, nottouching the switch, and must perform a contact transition.For the simulation experiment, tendon tension profileswere learned directly, but PI2 may also be applied to learnother control formulations. For switch-sliding using the ACThand, we learned controls which drove a nonlinear pointattractor system (Section III). The point attractors for alltendons share the same canonical system. The point attractoroutputs smooth target trajectories of tendon lengths for alower-level PID controller. The dynamics of the environment,together with the control-induced dynamics of the hand andthe dynamics of the point attractor, may be combined into anaugmented plant [12]. In this way, the learning frameworkencounters the lumped dynamics of robotic manipulator inthe context of the task. Figure 6 provides an overview.A single example of task completion was demonstratedby a human moving the ACT finger through the motion ofpushing the switch. The tendon excursions produced by thisexternally-powered example grossly resemble those requiredfor the robot to complete the task, but simply replayingthem using the PID controller does not necessarily resultin successful task completion. Firstly, during demonstrationthe tendons are not loaded, which changes the configurationof the tendon network in comparison to when it is activelymoving. Secondly, and more importantly, the tendon trajectories encountered during a demonstration do not impartany information about the necessary torques required toaccommodate the dynamics of the task. For instance, at thebeginning of the task, the finger must transition from movingthrough air freely, to contacting and pushing the switch. APID controller following a reference trajectory has no wayof anticipating this contact transition, and therefore will failto initially strike the switch with enough force to producethe desired motion. The nonlinear point attractor provides a

6x 10CostFunctionJ(x,u)Cost for noise free rolloutsPI27.5uNonlinear PointAttractorL*x54.543.530𝜏Fig. 6: System overview. A proportional-integral-derivative(PID) controller outputs torques τ to attain the target tendonlengths L generated by the nonlinear point attractor. Theactual tendon lengths L, and state of the switch x, constitutethe sensory feedback observed by the system. The PI2framework finds controls u, which minimize the cost for theaugmented plant (all components within the shaded box).means for generating smooth reference trajectories based onthe demonstration but modulated by the learned controls u.Controls take the form u δθ, the change in theparameter θ determining the shape of the attractor trajectory(see Section III). Each revision of the control parameterθ we refer to as a trial. A sample trajectory is queriedfrom the system by sampling δθ, and actually performinga switch-slide using the resulting θ. We refer to one of theseexploratory executions of the task as a rollout. To revise θ atthe end of a trial, each sampled control strategy is weightedaccording to the cost encountered by the correspondingrollout (Table II). The results reported here use σ 30 forsampling. The smaller this exploration variance is, the moresimilar rollouts are, so the magnitute of σ should depend onthe natural stochasticity of the plant, though here it is set byhand. Convergence is qualitatively insensitive to the exactvalue of σ, and has been confirmed for σ as low as 10 andhigh as 50. Each trial consists of fifteen rollouts, and afterevery third trial, performance is evaluated by executing threeexploration-free rollouts (σ 0). The cost-to-go function fora rollout having duration T had the following form :TXq(xt ) ut T Rut51015TrialHand andSwitch:“Plant”Augmented PlantCt qterminal (xT ) 65.52.5LPID LengthControl76.5(21)tIn this cost function, xt is the location of the switch attime t. q(xt ) is the cost weighting on the switch state, withqterminal (xT ) referring to the terminal cost at the end of therollout. R is the cost weighting for controls. Results reportedhere are for qterminal 300, T 300, q 1, R .3333I .Fig. 7: Sum cost for revisions of the control parameter θ.Each trial consists of a revision based on fifteen rollouts.On every third revision, three exploration-free rollouts wereevaluated, each using identical controls, to evaluate learningprogress. The bars indicate standard deviation for those threerollouts.2) ACT Hand Experimental Results: Performance is improved, with decreasing costs as trials progress resultingin the switch being moved further in less time. The sumcost-to-go results for every third trial, begining with trial 0,the “before learning” performance, are reported in Figure 7.Before learning, the system is able to move the switch only asmall amount, 0.7 cm, but after 10 trials the switch is pushedto the end of its range (2.75 cm).Learning effects on the trajectories of the flexors are mostpronounced. Consider the change in reference trajectory forFDS (Figure 8, the red lines beginning near 1cm). Beforelearning, the reference trajectory undergoes extension beforeflexing, but after learning it simply flexes, and more aggresively. The dynamics of the underlying PID controller dictatethat reference and actual trajectories must differ in order toexert forces on the switch. Tendons may not push, so onlydifferences in the negative direction contribute significantlyto forces in the system.Contact with the switch occurs near 150 mS both beforeand after learning, but after learning the contact is morevigorous, resulting in greater switch displacement until theend of the range is met near 250 mS, for the post-learningexample (Figure 8b).VI. DISCUSSIONAnimals are capable of impressive feats of motor control in novel and uncertain environments, and even majorchanges to their own bodies due to growth, fatigue, andinjury. Through embodied experience of using their bodies tointeract with the world, they learn strategies for dealing withthe complexities of the sensorimotor landscape. We hope tobring robots closer to this ability by using the world as itsown model [13], and emphasize the importance of movingbeyond simulation into the complex and uncertain real world.In this work we perform reinforcement learning in tendondriven systems in simulation as well as a real robotic system.

Target tendon lengths and actual recorded lengthsTarget tendon lengths and actual recorded 0.60.40.2000501001502002503002 µ6 sin (θ2 θ3 ) θ 10.80.2 0.2C2 µ5 sin θ2 θ̇12h i µ5 sin θ3 θ̇3 2θ̇1 θ̇2 θ̇3 0.20C350time (mS)100150200250 µ5 sin θ3 θ̇1 θ̇2 µ6 sin θ̇2 θ̇3 θ̇12300time (mS)(a)(b)Fig. 8: Lengths before (a) and after (b) learning the switchpushing task. The bold lines are the actual tendon lengthsrecorded, and the thin lines are the reference trajectoriesselected by the learning algorithm. Tendon trajectories displayed are Palmar Interosseus (Blue), FDP (Green), FDS(Red), LUM (Aqua), EI (Purple), and RI (Yellow). Figure(b) corresponds to 15 revisions of the control parameter θ .PI2 is a sampling based method in which variations in controlare generated, actually run, and then updated using scoresaccording to the cost of the outcome. We show that thiscan improve the performance of a real-world task despitethe complexity of the underlying dynamics, using no modelsand only sensors of tendon length and switch position.The successes and limitations of these experiments suggesta number of next steps. For instance, control variations (e.g.δθ for the ACT experiment) were sampled from a Guassiandistribution having spherical covariance, but this samplingstrategy may be shaped according to observed costs orplant characteristics. Alternatively, incorporatation of sensoryfeedback for use in the cost function or feedback controlwould allow for a variety of improvements, such as gainscheduling and variable stiffness control.VII. A PPENDIXIn this section we provide the parameters of the inertia,coriolis and centripetal forces matrices. More precisely, theelements of the inertia matrix are expressed as follows:I11 I31 µ1 µ2 2µ4 cos θ2I21 I22 µ4 cos θ2 µ6 cos (θ2 θ2 )I22 I33 µ2 2µ5 cos θ3I31 I32 µ6 cos (θ3 θ3 )I33 µ3The coriolis and centripetal forces C(θ, θ̇) are:h iC1 µ4 sin θ2 θ̇2 2θ̇1 θ̇2h i µ5 sin θ3 θ̇3 2θ̇1 2θ̇2 θ̇3 µ6 sin (θ2 θ3 ) θ̇2 θ̇3 2θ̇1 θ̇2 θ̇3The terms µ1 ,

Recent work on path integral reinforcement learning [1] has demonstrated the robustness and scalability of the method to robotic control in high dimensional state spaces. The iterative version of path integral control, the so-called Policy Improvement with Path Integrals (PI2), has been applied for learning and control with torque-driven .

Related Documents:

Plantaris Variable size! Absent in 6% to 8%! . Tendon may rupture! Tendon may be used as a tendon graft in Achilles reconstruction! . o Stiskel et al. Magnetic resonance imaging of Achilles tendon in patients with rheumatoid arthritis. Invest Radiol. 1997;32(10):602-8.! .

Learning Objectives (1) Familiarize the reader with some aspects of tendon transfer history. (2) Identify principles of tendon transfers. (3) Identify important preoperative considerations. (4) Understand the physiology of the muscle-tendon unit and the Blix curve. (5) Identify strategies for setting tension during a tendon transfer and rehabil-

ACHILLES TENDON RUPTURE ANATOMY The Achilles tendon is a strong tendon that connects the calf muscles to the heel. The calf is formed by two muscles: the underlying soleus muscle and the thick outer gastrocnemius muscle.When they contract, they pull on the Achilles tendon

A biomechanical study using finite element (FE) analysis can help to elucidate the complex biomechanical proper-ties of the cervical spine, including stresses, strains, and loads under different conditions [21–23]. This study was a biomechanical comparative analysis of four anter

Tendon Injuries Around the Ankle Achilles tendon Peroneal (brevis and longus) EDL EHL Tibialis anterior Posterior tibial tendon (PTT) FDL Flexor hallucis longus (FHL) www.UOANJ.com

Biomechanical Modeling and Neuromuscular Control of the Neck Sung-Hee Lee Demetri Terzopoulosƒ University of California, Los Angeles Figure 1: Our biomechanical system comprises a skeleton, muscles, neural control system, and expressive face. Abstract Unlike th

anthropometry, mass properties, joint properties (e.g., range of motion), and biomechanical response to impact. This manual describes the anthropometry and biomechanical response requirements which are recommended to assess the THOR-05F ATD. The tests and procedures described here were derive

Business Accounting Volume 1is the world’s best-selling textbook on bookkeeping and accounting. Now in its tenth edition, it has become the standard introductory text for accounting students and professionals alike. New to this edition: Over 120 brand new review questions for exam practice Coverage of International Accounting Standards 2005 Additional and updated worked examples for areas of .