AUTOMATIC DIFFERENTIATIONPHILIPP MΓLLERUNIVERSITY OF ZURICH
MOTIVATIONDerivatives are omnipresent in numerical algorithms1st order derivatives Solving non-linear equations E.g., by Newtonβs method(Un-)constrained optimization Gradient-Based optimization algorithms Especially difficult for high dimensional variables, i.e., objective function π: π π π Structural sparsity can be key2nd order derivatives (Un-)constrained optimizationHigher order derivatives Higher-order differential equations
MOTIVATIONSuppose we want to solve the unconstrained optimization problemmin π(π₯)π₯with π: π π and π₯ π .Gradient-based optimization requires the gradient π π₯
FINITE DIFFERENCESRecall: The Taylor series expansion of a real-valued function π π π , π 2, around π₯ and evaluated at π readsπ π πΌ π1 πΌ π π₯ π π₯πΌ! π π₯ π π₯π₯ π π₯ πΌ π 1 2 π2! π₯ 2 O( a x 3 )π₯ π π₯2
TAYLOR APPROXIMATION OF π π₯ AROUND 0π π₯ ππ₯
TAYLOR APPROXIMATION OF π π₯ AROUND 0π₯2 π₯3 π₯4π―π π₯ 1 π₯ 26 24
TAYLOR APPROXIMATION OF π π₯ AROUND 0π₯2 π₯3 π₯4π―π π₯ 1 π₯ 26 24
FINITE DIFFERENCESRecall (Taylor Series):π π π π₯ π π₯π₯ π π₯ 1 2 π2! π₯ 2π₯ π π₯2Truncate the Taylor series and set a π₯ β with a small β yields: ππ π₯ β π π₯ π₯ π₯ β π₯ π β2 π₯ ππ π₯ β π π₯Φπ₯ π(β) π₯βThis results in the well-known forward difference equation: ππ π₯ β π π₯π₯ π₯β
WHY WOULD WE NEED ANYTHING ELSE? πWe derivedby truncating the Taylor series resulting in the error π β . This truncation error decreases in the π₯step size. ππ π₯ β π π₯ββ 0Accurate and efficient approximation of π₯ by choosing a very small h, i.e., lim?
PROBLEM SOLVED?Apply forward differences toπ π₯ π₯3and increase the step size from 10 16 to 0.1. π 3π₯ 2 π₯ πΉπ·β
PROBLEM SOLVED?Apply forward differences toπ π₯ π₯3and increase the step size from 10 16 to 0.1.CD: π π₯1 1π π₯ 2β π π₯ 2ββ π 3π₯ 2 π₯ πΉπ·β
NUMERICAL ERRORS IN FINITE PRECISION ARITHMETICRounding error Intermediate results are rounded Any rounding error propagates and amplifiesTruncation error Even if an algorithm is converging to the true solution, we are stopping it after some finite time. Mitigated by the appropriate convergence criteria as introduced by Ken.Cancellation error
CANCELLATION ERRORπ β ππππ πππππ π1 ββ ππππ ππh 1E-12a 1-True valueNoise /h b a h# add h to ac bβa# c should be equal to hd c/h# c h, thus d should 1d 0.999200722162641
FINITE DIFFERENCES FOR π: π π π Letβs consider a n-dimensional unconstrained optimization problemmin π(π₯)π₯with π: π π π and π₯ π π .The finite difference quotients resemble directional derivatives for π: π π π π and n 1 : ππ π₯ βππ π π₯π₯ . π₯πβThe cost of FD scales with n - O(n) * cost(f).
SCALINGπ Letβs consider the Rosenbrock function π: π π π asbenchmarkπ 1π π₯ 10 π₯π 1 2π₯π2 1 π₯ππ 1 The runtimes are averaged across 1000 runs.2.π±ππ« [s]π [s]106.7931e-05 (54x)1.2472e-061006.0959e-04 (332x)1.8355e-0610001.4839e-02 (1500x)9.9629e-06100001.3282 (20000x)6.6663e-05100000?9.0872e-04
AUTOMATIC DIFFERENTIATIONBasic Idea: Every computer program is a composition of differentiable elementary operations as, basic arithmetic operations as, e.g., , -, and *, and basic functions as, e.g., sin, cos and tan.Automatic differentiation can transform the source code of your function into the source code of the gradient.
TOY EXAMPLEConsider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1This function can be discomposedin differentiable elementaryoperations:π€1 π₯1π€2 π₯2π€3 π€1 π€2π€4 sin(π€1 )π€5 π€3 π€4π π€5π₯1π€1sinπ€4 π₯2π€2 π€3π€5π
FORWARD MODEConsider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1To calculate the Gradient, calculate π π₯1 , π₯2 π₯1Choose input variable π₯1 andcalculate the sensitivity of eachintermediate value as π€ππ€αΆ π π₯π π€4 π€1π€αΆ 4 cos(π€1 )π€αΆ 1 π€1 π₯π π€1π€αΆ 1 π₯ππ₯1π€1sinπ€4π€αΆ 5 π€αΆ 3 π€αΆ 4 π₯2π€2 π€2π€αΆ 2 π₯π π€3π€αΆ 3 π€αΆ 1 π€2 π€1 π€αΆ 2π€5πππ₯π π€αΆ 3 π€αΆ 4
FORWARD MODE: EVALUATIONSuppose: π 1, π₯1 1, π₯2 2Consider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1To calculate the Gradient, calculate π π₯1 , π₯2 π₯1Choose input variable π₯1 andcalculate the sensitivity of eachintermediate value as π€ππ€αΆ π π₯1π€αΆ 1 π₯1 π€1 1 π₯1π€1π€αΆ 4 cos π€1 π€αΆ 1 cos(1)sinπ€4π€αΆ 5 π€αΆ 3 π€αΆ 4 2 cos(1) π₯2π€2π€αΆ 2 π€2 0 π₯1 π€3π€αΆ 3 π€αΆ 1 π€2 π€1 π€αΆ 2 2Accurate up to working precision, but still scales linearly in n. πΆππ π‘ π½π π π πΆππ π‘(π)π€5πππ₯1 2 cos(1)
REVERSE MODE (ADJOINT MODE) β PRIMAL TRACESuppose: π₯1 1, π₯2 2Consider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1π€4 sin 1 0.175π€1 1π₯1π€1sinπ€4 Calculate the sensitivity of the outputw.r.t. each intermediate value ππ€ΰ΄₯π π€πAll intermediate values are stored. Thisleads to a high memory consumption,mitigated by good AD softwareπ₯2π€2π€2 2 π€3π€3 2π€5π€5 2.175π
REVERSE MODE (ADJOINT MODE) π π€4 π π€3 π€4 π€1 π€3 π€1 π€ΰ΄₯4 cos π€1 π€ΰ΄₯3 π€2π€ΰ΄₯1 Consider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1π₯1π€1sin π π π€5π€ΰ΄₯4 π€ΰ΄₯5 1 π€4 π€5 π€4π€4 Calculate the sensitivity of the outputw.r.t. each intermediate value ππ€ΰ΄₯π π€ππ₯2π€2π€ΰ΄₯2 π π€3 π€ΰ΄₯3 π€1 π€3 π€2π€3π€ΰ΄₯3 ππ€5π€ΰ΄₯5 π π€5 π€ΰ΄₯5 1 π€5 π€3 π π€5
REVERSE MODE (ADJOINT MODE) β DUAL TRACESuppose: π₯1 1, π₯2 2Consider the function π: π 2 π π π₯1 , π₯2 π₯1 π₯2 sin π₯1π€1 1π€ΰ΄₯1 π€ΰ΄₯4 cos π€1 π€ΰ΄₯3 π€2 cos 1 2π₯1π€1sinπ€4 sin 1 0.175π€ΰ΄₯4 π€ΰ΄₯5 1π€4 Calculate the sensitivity of the outputw.r.t. each intermediate value ππ€ΰ΄₯π π€ππ₯2π€2π€2 2π€ΰ΄₯2 1 π€3π€3 2π€ΰ΄₯3 π€ΰ΄₯5 1Accurate up to working precision, scales linearly in m. πΆππ π‘ π½π π π2 πΆππ π‘(π)ππ€5π€5 2.175 ππ€ΰ΄₯5 π€ 15
SUMMARYFinite differences The approximation error decreases as π β for forward finite differences. BUT, the error due to the finite precision arithmetic cannot be neglected. The time required to compute the Jacobian of π: π π π π scales with π π πππ π‘ π .AD - Forward mode The gradients are accurate up to machine precision. The time required to compute the Jacobian of π: π π π π scales with π π πππ π‘ πAD - Reverse mode The gradients are accurate up to machine precision. The memory requirement may be huge depending on the underlyingimplementation. The time required to compute the Jacobian of π: π π π π scales with π π πππ π‘ π
AD TOOLSCasADi Available for Python, Matlab, Octave and C Includes interfaces to a lot of free as well as commercial optimizers (as e.g., IPOPT (IP), KNITRO (IP & SQP), WORHP (SQP),SNOPT (SQP)) Structural sparsity detectionADiMat Available for MatlabPyTorch / Tensorflow
TUTORIAL SESSIONImplementation of the Rosenbrock function1.π 1π π₯ 10 π₯π 1 π₯π22 1 π₯π2,π 12.Implementation of the finite difference approximation and reverse mode AD of f(x). Comparison of theirruntimes for π 10π , i 1, 2, 3, 4, 3.Optimization of the Rosenbrock function using fminunc using1.the finite difference approximation of f(x), and2.the reverse mode approximation of f(x).
CASADI Include the casadi directory in the Matlab pathimport casadi.*x MX MX.sym(βsome nameβ, size rows, size columns)% create symbolic variabled rosenbrock jacobian(rosenbrock(x MX), x MX)% differentiate rosenbrockd rosenbrock Function(βsome nameβ, {x MX}, {d rosenbrock })d rosenbrock(x)% create callable function
AUTOMATIC DIFFERENTIATION Basic Idea: Every computer program is a composition of differentiable elementary operations as, basic arithmetic operations as, e.g., , -, and *, and basic functions as, e.g., sin, cos and tan. Automatic differentiation can transform the source code of your function into the source code of the gradient.
Automatic Differentiation Introductions Automatic Differentiation What is Automatic Differentiation? Algorithmic, or automatic, differentiation (AD) is concerned with the accurate and efο¬cient evaluation of derivatives for functions deο¬ned by computer programs. No truncation errors are incurred, and the resulting numerical derivative
simplifies automatic differentiation. There are other automatic differentiation tools, such as ADMAT. In 1998, Arun Verma introduced an automatic differentiation tool, which can compute the derivative accurately and fast [12]. This tool used object oriented MATLAB
Key Ideas in Automatic Differentiation ΓLeverage Chain Ruleto reason about function composition ΓTwo modes of automatic differentiation ΓForward differentiation:computes derivative during execution Γefficient for single derivative with multiple outputs ΓBackward differentiation (back-propagation): computes derivative
theory of four aspects: differentiation, functionalization, added value, and empathy. The purpose of differentiation is a strategy to distinguish oneself from competitors through technology or services, etc. It is mainly divided into three aspects: market differentiation, product differentiation and image differentiation.
Categories and Subject Descriptors: G.1.4 [Numerical Analysis]: Automatic Differentiation General Terms: Automatic Differentiation, Numerical Methods, MATLAB Additional Key Words and Phrases: algorithmic differentiation, scientiο¬c computation, applied mathemat-ics, chain rule, forward mode, overloading, source transformation ACM Reference Format:
multiplex, Dermoid cyst, Eruptive vellus hair cyst Milia Bronchogenic and thyroglossal cyst Cutaneous ciliated cyst Median raphe cyst of the penis. 2. Tumours of the epidermal appendages Lesions Follicular differentiation Sebaceous differentiation Apocrine differentiation Eccrine differentiation Hyperplasia, Hamartomas Benign
Section 2: The Rules of Partial Diο¬erentiation 6 2. The Rules of Partial Diο¬erentiation Since partial diο¬erentiation is essentially the same as ordinary diο¬er-entiation, the product, quotient and chain rules may be applied. Example 3 Find z x for each of the following functio
The Nutcracker Ballet is derived from the story βThe Nutcracker and the King of Miceβ which was written E. T. A. Hoffman. The story begins on Christmas Eve in 19th Century Germany. It begins in the Stahlbaumβs house where everyone is preparing for their festive Christmas Eve party. The Stahlbaumβs house is a large and beautiful home, with the grandest Christmas tree imaginable. Mrs .