Automatic Differentiation - Kenjudd

2y ago
27 Views
2 Downloads
712.08 KB
26 Pages
Last View : 26d ago
Last Download : 3m ago
Upload by : Braxton Mach
Transcription

AUTOMATIC DIFFERENTIATIONPHILIPP MÜLLERUNIVERSITY OF ZURICH

MOTIVATIONDerivatives are omnipresent in numerical algorithms1st order derivatives Solving non-linear equations E.g., by Newton’s method(Un-)constrained optimization Gradient-Based optimization algorithms Especially difficult for high dimensional variables, i.e., objective function 𝑓: 𝑅 𝑛 𝑅 Structural sparsity can be key2nd order derivatives (Un-)constrained optimizationHigher order derivatives Higher-order differential equations

MOTIVATIONSuppose we want to solve the unconstrained optimization problemmin 𝑓(π‘₯)π‘₯with 𝑓: 𝑅 𝑅 and π‘₯ 𝑅.Gradient-based optimization requires the gradient 𝑓 π‘₯

FINITE DIFFERENCESRecall: The Taylor series expansion of a real-valued function 𝑓 π’ž 𝑛 , 𝑛 2, around π‘₯ and evaluated at π‘Ž reads𝑓 π‘Ž 𝛼 𝑛1 𝛼 𝑓 π‘₯ π‘Ž π‘₯𝛼! 𝑓 π‘₯ 𝑓 π‘₯π‘₯ π‘Ž π‘₯ 𝛼 𝑅1 2 𝑓2! π‘₯ 2 O( a x 3 )π‘₯ π‘Ž π‘₯2

TAYLOR APPROXIMATION OF 𝑒 π‘₯ AROUND 0𝑓 π‘₯ 𝑒π‘₯

TAYLOR APPROXIMATION OF 𝑒 π‘₯ AROUND 0π‘₯2 π‘₯3 π‘₯4𝒯𝑓 π‘₯ 1 π‘₯ 26 24

TAYLOR APPROXIMATION OF 𝑒 π‘₯ AROUND 0π‘₯2 π‘₯3 π‘₯4𝒯𝑓 π‘₯ 1 π‘₯ 26 24

FINITE DIFFERENCESRecall (Taylor Series):𝑓 π‘Ž 𝑓 π‘₯ 𝑓 π‘₯π‘₯ π‘Ž π‘₯ 1 2 𝑓2! π‘₯ 2π‘₯ π‘Ž π‘₯2Truncate the Taylor series and set a π‘₯ β„Ž with a small β„Ž yields: 𝑓𝑓 π‘₯ β„Ž 𝑓 π‘₯ π‘₯ π‘₯ β„Ž π‘₯ 𝑂 β„Ž2 π‘₯ 𝑓𝑓 π‘₯ β„Ž 𝑓 π‘₯֞π‘₯ 𝑂(β„Ž) π‘₯β„ŽThis results in the well-known forward difference equation: 𝑓𝑓 π‘₯ β„Ž 𝑓 π‘₯π‘₯ π‘₯β„Ž

WHY WOULD WE NEED ANYTHING ELSE? 𝑓We derivedby truncating the Taylor series resulting in the error 𝑂 β„Ž . This truncation error decreases in the π‘₯step size. 𝑓𝑓 π‘₯ β„Ž 𝑓 π‘₯β„Žβ„Ž 0Accurate and efficient approximation of π‘₯ by choosing a very small h, i.e., lim?

PROBLEM SOLVED?Apply forward differences to𝑓 π‘₯ π‘₯3and increase the step size from 10 16 to 0.1. 𝑓 3π‘₯ 2 π‘₯ πΉπ·β„Ž

PROBLEM SOLVED?Apply forward differences to𝑓 π‘₯ π‘₯3and increase the step size from 10 16 to 0.1.CD: 𝑓 π‘₯1 1𝑓 π‘₯ 2β„Ž 𝑓 π‘₯ 2β„Žβ„Ž 𝑓 3π‘₯ 2 π‘₯ πΉπ·β„Ž

NUMERICAL ERRORS IN FINITE PRECISION ARITHMETICRounding error Intermediate results are rounded Any rounding error propagates and amplifiesTruncation error Even if an algorithm is converging to the true solution, we are stopping it after some finite time. Mitigated by the appropriate convergence criteria as introduced by Ken.Cancellation error

CANCELLATION ERRORπ‘Ž β„Ž π‘›π‘œπ‘–π‘ π‘’π‘›π‘œπ‘–π‘ π‘’1 β„Žβ„Ž π‘›π‘œπ‘–π‘ π‘’π‘Žh 1E-12a 1-True valueNoise /h b a h# add h to ac b–a# c should be equal to hd c/h# c h, thus d should 1d 0.999200722162641

FINITE DIFFERENCES FOR 𝑓: 𝑅𝑛 𝑅Let’s consider a n-dimensional unconstrained optimization problemmin 𝑓(π‘₯)π‘₯with 𝑓: 𝑅𝑛 𝑅 and π‘₯ 𝑅𝑛 .The finite difference quotients resemble directional derivatives for 𝑓: 𝑅𝑛 π‘…π‘š and n 1 : 𝑓𝑓 π‘₯ β„Žπ’†π’Š 𝑓 π‘₯π‘₯ . π‘₯π‘–β„ŽThe cost of FD scales with n - O(n) * cost(f).

SCALING𝑛 Let’s consider the Rosenbrock function 𝑓: 𝑅 𝑛 𝑅 asbenchmark𝑛 1𝑓 π‘₯ 10 π‘₯𝑖 1 2π‘₯𝑖2 1 π‘₯𝑖𝑖 1 The runtimes are averaged across 1000 runs.2.𝑱𝑭𝑫 [s]𝒇 [s]106.7931e-05 (54x)1.2472e-061006.0959e-04 (332x)1.8355e-0610001.4839e-02 (1500x)9.9629e-06100001.3282 (20000x)6.6663e-05100000?9.0872e-04

AUTOMATIC DIFFERENTIATIONBasic Idea: Every computer program is a composition of differentiable elementary operations as, basic arithmetic operations as, e.g., , -, and *, and basic functions as, e.g., sin, cos and tan.Automatic differentiation can transform the source code of your function into the source code of the gradient.

TOY EXAMPLEConsider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1This function can be discomposedin differentiable elementaryoperations:𝑀1 π‘₯1𝑀2 π‘₯2𝑀3 𝑀1 𝑀2𝑀4 sin(𝑀1 )𝑀5 𝑀3 𝑀4𝑓 𝑀5π‘₯1𝑀1sin𝑀4 π‘₯2𝑀2 𝑀3𝑀5𝑓

FORWARD MODEConsider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1To calculate the Gradient, calculate 𝑓 π‘₯1 , π‘₯2 π‘₯1Choose input variable π‘₯1 andcalculate the sensitivity of eachintermediate value as π‘€π‘–π‘€αˆΆ 𝑖 π‘₯𝑗 𝑀4 𝑀1π‘€αˆΆ 4 cos(𝑀1 )π‘€αˆΆ 1 𝑀1 π‘₯𝑗 𝑀1π‘€αˆΆ 1 π‘₯𝑗π‘₯1𝑀1sin𝑀4π‘€αˆΆ 5 π‘€αˆΆ 3 π‘€αˆΆ 4 π‘₯2𝑀2 𝑀2π‘€αˆΆ 2 π‘₯𝑗 𝑀3π‘€αˆΆ 3 π‘€αˆΆ 1 𝑀2 𝑀1 π‘€αˆΆ 2𝑀5𝑓𝑓π‘₯𝑗 π‘€αˆΆ 3 π‘€αˆΆ 4

FORWARD MODE: EVALUATIONSuppose: 𝑗 1, π‘₯1 1, π‘₯2 2Consider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1To calculate the Gradient, calculate 𝑓 π‘₯1 , π‘₯2 π‘₯1Choose input variable π‘₯1 andcalculate the sensitivity of eachintermediate value as π‘€π‘–π‘€αˆΆ 𝑖 π‘₯1π‘€αˆΆ 1 π‘₯1 𝑀1 1 π‘₯1𝑀1π‘€αˆΆ 4 cos 𝑀1 π‘€αˆΆ 1 cos(1)sin𝑀4π‘€αˆΆ 5 π‘€αˆΆ 3 π‘€αˆΆ 4 2 cos(1) π‘₯2𝑀2π‘€αˆΆ 2 𝑀2 0 π‘₯1 𝑀3π‘€αˆΆ 3 π‘€αˆΆ 1 𝑀2 𝑀1 π‘€αˆΆ 2 2Accurate up to working precision, but still scales linearly in n. πΆπ‘œπ‘ π‘‘ 𝐽𝑓 𝑛 𝑐 πΆπ‘œπ‘ π‘‘(𝑓)𝑀5𝑓𝑓π‘₯1 2 cos(1)

REVERSE MODE (ADJOINT MODE) – PRIMAL TRACESuppose: π‘₯1 1, π‘₯2 2Consider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1𝑀4 sin 1 0.175𝑀1 1π‘₯1𝑀1sin𝑀4 Calculate the sensitivity of the outputw.r.t. each intermediate value 𝑓𝑀ΰ΄₯𝑖 𝑀𝑗All intermediate values are stored. Thisleads to a high memory consumption,mitigated by good AD softwareπ‘₯2𝑀2𝑀2 2 𝑀3𝑀3 2𝑀5𝑀5 2.175𝑓

REVERSE MODE (ADJOINT MODE) 𝑓 𝑀4 𝑓 𝑀3 𝑀4 𝑀1 𝑀3 𝑀1 𝑀ΰ΄₯4 cos 𝑀1 𝑀ΰ΄₯3 𝑀2𝑀ΰ΄₯1 Consider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1π‘₯1𝑀1sin 𝑓 𝑓 𝑀5𝑀ΰ΄₯4 𝑀ΰ΄₯5 1 𝑀4 𝑀5 𝑀4𝑀4 Calculate the sensitivity of the outputw.r.t. each intermediate value 𝑓𝑀ΰ΄₯𝑖 𝑀𝑗π‘₯2𝑀2𝑀ΰ΄₯2 𝑓 𝑀3 𝑀ΰ΄₯3 𝑀1 𝑀3 𝑀2𝑀3𝑀ΰ΄₯3 𝑓𝑀5𝑀ΰ΄₯5 𝑓 𝑀5 𝑀ΰ΄₯5 1 𝑀5 𝑀3 𝑓 𝑀5

REVERSE MODE (ADJOINT MODE) – DUAL TRACESuppose: π‘₯1 1, π‘₯2 2Consider the function 𝑓: 𝑅2 𝑅𝑓 π‘₯1 , π‘₯2 π‘₯1 π‘₯2 sin π‘₯1𝑀1 1𝑀ΰ΄₯1 𝑀ΰ΄₯4 cos 𝑀1 𝑀ΰ΄₯3 𝑀2 cos 1 2π‘₯1𝑀1sin𝑀4 sin 1 0.175𝑀ΰ΄₯4 𝑀ΰ΄₯5 1𝑀4 Calculate the sensitivity of the outputw.r.t. each intermediate value 𝑓𝑀ΰ΄₯𝑖 𝑀𝑗π‘₯2𝑀2𝑀2 2𝑀ΰ΄₯2 1 𝑀3𝑀3 2𝑀ΰ΄₯3 𝑀ΰ΄₯5 1Accurate up to working precision, scales linearly in m. πΆπ‘œπ‘ π‘‘ 𝐽𝑓 π‘š 𝑐2 πΆπ‘œπ‘ π‘‘(𝑓)𝑓𝑀5𝑀5 2.175 𝑓𝑀ΰ΄₯5 𝑀 15

SUMMARYFinite differences The approximation error decreases as 𝑂 β„Ž for forward finite differences. BUT, the error due to the finite precision arithmetic cannot be neglected. The time required to compute the Jacobian of 𝑓: 𝑅 𝑛 𝑅 π‘š scales with 𝑂 𝑛 π‘π‘œπ‘ π‘‘ 𝑓 .AD - Forward mode The gradients are accurate up to machine precision. The time required to compute the Jacobian of 𝑓: 𝑅 𝑛 𝑅 π‘š scales with 𝑂 𝑛 π‘π‘œπ‘ π‘‘ 𝑓AD - Reverse mode The gradients are accurate up to machine precision. The memory requirement may be huge depending on the underlyingimplementation. The time required to compute the Jacobian of 𝑓: 𝑅 𝑛 𝑅 π‘š scales with 𝑂 π‘š π‘π‘œπ‘ π‘‘ 𝑓

AD TOOLSCasADi Available for Python, Matlab, Octave and C Includes interfaces to a lot of free as well as commercial optimizers (as e.g., IPOPT (IP), KNITRO (IP & SQP), WORHP (SQP),SNOPT (SQP)) Structural sparsity detectionADiMat Available for MatlabPyTorch / Tensorflow

TUTORIAL SESSIONImplementation of the Rosenbrock function1.𝑛 1𝑓 π‘₯ 10 π‘₯𝑖 1 π‘₯𝑖22 1 π‘₯𝑖2,𝑖 12.Implementation of the finite difference approximation and reverse mode AD of f(x). Comparison of theirruntimes for 𝑛 10𝑖 , i 1, 2, 3, 4, 3.Optimization of the Rosenbrock function using fminunc using1.the finite difference approximation of f(x), and2.the reverse mode approximation of f(x).

CASADI Include the casadi directory in the Matlab pathimport casadi.*x MX MX.sym(β€˜some name’, size rows, size columns)% create symbolic variabled rosenbrock jacobian(rosenbrock(x MX), x MX)% differentiate rosenbrockd rosenbrock Function(β€˜some name’, {x MX}, {d rosenbrock })d rosenbrock(x)% create callable function

AUTOMATIC DIFFERENTIATION Basic Idea: Every computer program is a composition of differentiable elementary operations as, basic arithmetic operations as, e.g., , -, and *, and basic functions as, e.g., sin, cos and tan. Automatic differentiation can transform the source code of your function into the source code of the gradient.

Related Documents:

Automatic Differentiation Introductions Automatic Differentiation What is Automatic Differentiation? Algorithmic, or automatic, differentiation (AD) is concerned with the accurate and efficient evaluation of derivatives for functions defined by computer programs. No truncation errors are incurred, and the resulting numerical derivative

simplifies automatic differentiation. There are other automatic differentiation tools, such as ADMAT. In 1998, Arun Verma introduced an automatic differentiation tool, which can compute the derivative accurately and fast [12]. This tool used object oriented MATLAB

Key Ideas in Automatic Differentiation ØLeverage Chain Ruleto reason about function composition ØTwo modes of automatic differentiation ØForward differentiation:computes derivative during execution Øefficient for single derivative with multiple outputs ØBackward differentiation (back-propagation): computes derivative

theory of four aspects: differentiation, functionalization, added value, and empathy. The purpose of differentiation is a strategy to distinguish oneself from competitors through technology or services, etc. It is mainly divided into three aspects: market differentiation, product differentiation and image differentiation.

Categories and Subject Descriptors: G.1.4 [Numerical Analysis]: Automatic Differentiation General Terms: Automatic Differentiation, Numerical Methods, MATLAB Additional Key Words and Phrases: algorithmic differentiation, scientific computation, applied mathemat-ics, chain rule, forward mode, overloading, source transformation ACM Reference Format:

multiplex, Dermoid cyst, Eruptive vellus hair cyst Milia Bronchogenic and thyroglossal cyst Cutaneous ciliated cyst Median raphe cyst of the penis. 2. Tumours of the epidermal appendages Lesions Follicular differentiation Sebaceous differentiation Apocrine differentiation Eccrine differentiation Hyperplasia, Hamartomas Benign

Section 2: The Rules of Partial Differentiation 6 2. The Rules of Partial Differentiation Since partial differentiation is essentially the same as ordinary differ-entiation, the product, quotient and chain rules may be applied. Example 3 Find z x for each of the following functio

The Nutcracker Ballet is derived from the story β€œThe Nutcracker and the King of Mice” which was written E. T. A. Hoffman. The story begins on Christmas Eve in 19th Century Germany. It begins in the Stahlbaum’s house where everyone is preparing for their festive Christmas Eve party. The Stahlbaum’s house is a large and beautiful home, with the grandest Christmas tree imaginable. Mrs .