Kriging Local Volatility

2y ago
11 Views
3 Downloads
1.81 MB
14 Pages
Last View : 5d ago
Last Download : 3m ago
Upload by : Ciara Libby
Transcription

1234Beyond Surrogate Modeling: Learning the Local Volatility Via ShapeConstraints Marc Chataigner† , Areski Cousin ‡ , Stéphane Crépey§ , Matthew Dixon¶, andDjibril Gueye‡567891011121314151617Abstract. We explore the abilities of two machine learning approaches for no-arbitrage interpolation of European vanilla option prices, which jointly yield the corresponding localvolatility surface: a finite dimensional Gaussian process (GP) regression approach underno-arbitrage constraints based on prices, and a neural net (NN) approach with penalization of arbitrages based on implied volatilities. We demonstrate the performance ofthese approaches relative to the SSVI industry standard. The GP approach is provenarbitrage-free, whereas arbitrages are only penalized under the SSVI and NN approaches(an arbitrage-free NN approach was found insufficiently expressive in [2]). The GP approach obtains the best out-of-sample calibration error and provides uncertainty quantification, which is useful for the assessment of model risk. The NN approach yields asmoother local volatility and a better backtesting performance, as its training criterionincorporates a local volatility regularization term.18Key words. Gaussian Processes; Local Volatility; Option pricing; Neural Networks; No-arbitrage.191. Introduction. There have been recent surges of literature about the learningof derivative pricing functions by machine learning surrogate models, i.e. neural netsand Gaussian processes that are respectively surveyed in [11] and [4, Section 1].There has, however, been relatively little coverage of no-arbitrage constraints wheninterpolating prices, and of the ensuing question of extracting the correspondinglocal volatility surface.Tegnér & Roberts [12, see their Eq. (10)] first attempt the use of GPs for localvolatility modeling by placing a Gaussian prior directly on the local volatility surface.Such an approach leads to a nonlinear least squares training loss function which isnot obviously amenable to gradient descent (stochastic or not), so the authors resortto a MCMC optimization. Zheng et al. [13] introduce shape constraint penalizationvia a multi-model gated neural network which replaces neurons with s: The authors are thankful to Antoine Jacquier and Tahar Ferhati for usefulhints regarding the SSVI method, and to an anonymous referee for stimulating comments. Single-file demos Master.html (with the results of the paper) and Master.ipynb (for dynamic execution of all scripts) are available on raints]. Note that, due to github size limitations, the fileMaster.html file must be downloaded locally (and then opened with a browser) to be displayed.†LaMME, Université d’Evry, CNRS, Université Paris-Saclay; marc.chataigner@univ-evry.fr. The PhDthesis of Marc Chataigner is co-funded by the Research Initiative “Modélisation des marchés actions,obligations et dérivés”, financed by HSBC France under the aegis of the Europlace Institute of Finance,and by the public grant ANR-11-LABX-0056-LLH LabEx LMH.‡Institut de Recherche en Mathématique Avancée, Université de Strasbourg, 7 rue René Descartes,67084 Strasbourg, cedex; a.cousin@unistra.fr§LPSM, Université de Paris; Stephane.Crepey@lpsm.paris. The research of S. Crépey benefited fromthe support of the Chair Stress Test, RISK Management and Financial Steering, led by the French Ecolepolytechnique and its Foundation and sponsored by BNP Paribas.¶Department of Applied Mathematics,Illinois Institute of Technology,Chicago;matthew.dixon@iit.edu.1This manuscript is for review purposes only.

5657585960616263646566functions enforcing the stylized properties of the implied volatility surface, includingno-arbitrage penalizations. Their architecture is a multi-model architecture, whichuses another network to fit the parameters. The advantage of the approach is thatthe gated network is interpretable and lightweight, although the training is stillexpensive and no there is no guarantee of no-arbitrage. One important differencewith the present work is that they do not consider the local volatility and theassociated regularization terms, nor do they assess the extent to which no-arbitrageis violated in a test set.The introduction to GP regressions of shape constraints is addressed in theliterature on the topic of “constrained kriging” or “constrained GPs regression”.Motivated by the fact that the posterior process is then no longer Gaussian andthat the inequality constraints are usually infinite dimensional, Maatouk & Bay [9]propose using the finite dimensional approximation of Gaussian processes for whichthe inequality constraints are straightforward to impose and verify. Cousin et al.[3] apply this technique to ensure arbitrage-free and error-controlled yield-curve andCDS curve interpolation.In this paper, we propose an arbitrage-free GP option price interpolation. Withsuch no-arbitrage constraints in place, we are able to jointly model the local volatilitysurface and to provide a corresponding uncertainty quantification. As an alternativeto the above, another contribution of the paper is to introduce a neural networkapproximation of the implied volatility surface, penalizing arbitrages on the basisof the local volatility implied variance formula, i.e. the Dupire formula restated interms of implied variance, which is also used for extracting the corresponding localvolatility surface. This is all evidenced on an SPX option dataset.Throughout the paper we consider European vanilla option prices on a stock orindex S under the assumption that a deterministic short interest rate term structurer(t) has been bootstrapped from the zero coupon curve, and that a term structureof deterministic continuous-dividend-yields q(t) on S has then been extracted fromthe prices of the forward contracts on S. For simplicity we assume r and q constantin the notation below. Without restriction given the call-put parity relationship,we only consider put option prices hereafter. We denote by P (T, K) the marketprice of the put option with maturity T and strike K on S, observed for a finitenumber of pairs (T, K) at a given day, conventionally taken as t 0. Given anyrectangular domain of interest in time and space, we tacitly rescale the inputs sothat the domain becomes Ω [0, 1]2 . This rescaling avoids any one independentvariable dominating over another during any fitting of the market prices.722. Gaussian process regression for learning arbitrage-free price surfaces. Ourfirst goal is to construct, by Gaussian process regression, an arbitrage-free and continuous put price surface P : R R R interpolating P up to some error term,and to retrieve the corresponding local volatility surface σ(·, ·) by the Dupire formula[5]. In terms of the reduced prices p(T, k) eqT P (T, K), where k Ke (r q)T , thisformula reads (assuming p of class C 1,2 on {T 0}):73(2.1)6768697071σ 2 (T, K) T p(T, k) 2 2 : dup(T, k).2k k2 p(T, k)2This manuscript is for review purposes only.

Obviously, for the Dupire formula to be meaningful, its output must be nonnegative.This holds, in particular, whenever the interpolating map p exhibits nonnegative76 derivatives w.r.t. T and second derivative w.r.t. k, i.e.747577(2.2) T p(T, k) 0 , k22 p(T, k) 0,In this section, we construct reduced put price surfaces (T, k) 7 p(T, k) satisfying the conditions (2.2) from n noisy observations y [y1 , . . . , yn ] of function80 p at input points x [x1 , . . . , xn ] . The input points xi (Ti , ki ) correspond to81 observed maturities and strikes. The market fit condition is written as787982(2.3)83where p(x) [p(x1 ), . . . , p(xn )] is the vector composed of the bid and ask put pricesat the observation points. The additive noise term ε [ε1 , . . . , εn ] is assumed to bea zero-mean Gaussian vector, independent from p(x), and with an homoscedasticcovariance matrix given as ς 2 In , where In is the identity matrix of dimension n.Note that bid and ask prices are considered here as (noisy) replications at the sameinput location.8485868788y p(x) ε,942.1. Classical Gaussian process regression. We consider a zero-mean Gaussianprocess prior on the mapping p p(x)x Ω with covariance function (a.k.a. kernelfunction) c. Then, the output vector p(x) has a normal distribution with zero meanand covariance matrix C with components cov(p(xi ), p(xj )) c(xi , xj ). We considera 2-dimensional isotropic covariance kernel given, for any x (T, k), x0 (T 0 , k 0 ) Ω, as95(2.4)8990919293c(x, x0 ) σ 2 γT (T T 0 , θT )γk (k k 0 , θk ).Here (θT , θk ) θ and σ 2 correspond to the length scale and the variance hyper97 parameters of the kernel function c and the functions γT and γk are kernel correlation98 functions. By Gaussian conditioning, the conditional process p p(x) ε y is99 Gaussian with mean function ηy and covariance function cy such that96100(2.5)ηy (x) c(x) (C ς 2 In ) 1 y, x Ω101(2.6)cy (x, x0 ) c(x, x0 ) c(x) (C ς 2 In ) 1 c(x0 ), x, x0 Ω102where c(x) [c(x, x1 ), . . . , c(x, xn )] .Without consideration of the conditions (2.2), (unconstrained) kriging predictionand uncertainty quantification are made using the conditional distribution p p(x) ε y. The best linear unbiased estimator of p is given as the conditional meanfunction (2.5). The conditional covariance function (2.6) can then be used to obtainconfidence bands around the predicted price surface. The hyper-parameters of thekernel function c as well as the variance ς 2 of the noise can be estimated using amaximum likelihood estimator (MLE).1031041051061071081092.2. Imposing the no-arbitrage conditions. Regarding the constraints (2.2),111 we adopt the solution of Cousin et al. [3] that consists in constructing a finite di112 mensional approximation ph of the Gaussian prior p for which the constraints can1103This manuscript is for review purposes only.

113114115116117118119120be imposed in the entire domain Ω with a finite number of checks. One then recovers the (non Gaussian) constrained posterior distribution by sampling a truncatedGaussian process In the original infinite dimensional situation, testing the inequalityconstraints on the entire input domain would require an infinite number of checks.With the computation of the local volatility surface in mind, switching to a finitedimensional approximation can also be viewed as a form of regularization, whichis required anyway to deal with the ill-posedness of the (numerical differentiation)Dupire formula.We first consider a discretized version of the rescaled input space Ω [0, 1]2as a regular grid (ıh)ı , where ı (i, j), for a suitable mesh size h and indices i, jranging from 0 to 1/h (taken in N? ). For each knot ı (i, j), we introduce the hatbasis functions φı with support [(i 1)h, (i 1)h] [(j 1)h, (j 1)h] given, forx (T, k), byφı (x) max(1 121122123124125 T ih k jh , 0) max(1 , 0).hhWe take V H 1 (Ω) {u L2 (Ω) : Dα u L2 (Ω), α 1}, where Dα u is aweak derivative of order α , as the space of (the realizations of) p. Let V h Vdenote the finite dimensional linear subspace spanned by the ν linearly independentbasis functions φı . The (random) surface p in V is projected onto V h asX(2.7)ph (x) p(ıh)φı (x), x 0ph V h is a bilinear quadrilateral finite element approximation of the values of p atknots (ıh)ı . If we denote ρı p(ıh), then ρ (ρı )ı is a zero-mean Gaussian columnvector (indexed by ı) with ν ν covariance matrix Γh such that Γhı, c(ıh, h),for any two grid nodes ı and . Let φ(x) denote the vector of size ν given byφ(x) (φı (x))ı . The equality (2.7) can be rewritten as ph (x) φ(x) · ρ. Denotingby ph (x) [ph (x1 ), . . . , ph (xn )] and by Φ(x) the n ν matrix of basis functionswhere each row corresponds to the vector φ(x ), one has ph (x) Φ(x) · ρ.Proposition 1. The following statements hold.(i) The finite dimensional process ph converges uniformly to p on Ω as h 0, almostsurely,(ii) ph (T, k) is a nondecreasing function of T if and only if ρi 1,j ρi,j , (i, j),(iii) ph (T, k) is a convex function of k if and only if ρi,j 2 ρi,j 1 ρi,j 1 ρi,j , (i, j).Proof. This follows by an application of the general methodology in [9] for GPregression subject to inequality constraints.141142143144145146In view of the first statement in Proposition 1, denoting by I the set of 2d continuous positive functions which are nondecreasing in T and convex in k,we choose as constrained GP metamodel for the put price surface the conditionaldistribution of ph given hp (x) ε yph I.4This manuscript is for review purposes only.

147148149150151152153154155156157158159160In view of the last two statements of Proposition 1, ph satisfies the inequality constraints in the entire domain Ω if and only if it satisfies these constraints at theknots, i.e. ph I if and only if ρ I h , where I h corresponds to the set of (ıindexed) vectors % (%ı )ı such that %i 1,j %i,j and %i,j 2 %i,j 1 %i,j 1 %i,jhold (i, j). Hence, our GP metamodel for the put price surface can be reformulatedas the conditional distribution of ρ given Φ(x) · ρ ε y(2.8)ρ I h.2.3. Hyper-parameter learning. Hyper-parameters consist in the length scalesθ and the variance parameter σ 2 of the kernel function c in (2.4), as well as thenoise variance ς. Denoting λ [θ, σ, ς] , we propose to maximize the marginal loglikelihood L(λ) for the process ph w.r.t. λ for parameters learning (MLE estimation).Up to a constant, the so called marginal log likelihood of ρ can be expressed as (seee.g. [10, Section 15.2.4, p. 523]): 1 11L(λ) y Φ(x)Γh Φ(x) ς 2 Iny log det Φ(x)Γh Φ(x) ς 2 In .22161Remark 2. This expression does not take into account the inequality constraintsin the estimation. However, Bachoc et al. [1, see e.g. their Eq. (2)] argue (and we164 also observed empirically) that, unless the sample size is very small, conditioning165 by the constraints significantly increases the computational burden with negligible166 impact on the MLE.1621632.4. The most probable response surface and measurement noises. The maximum a posteriori probability estimate (MAP) p̃h of ph given the constraints satisfies169 the constraints on the entire domain of interest and corresponds to the most likely170 surface. Its expression is given in [3]:X171 (2.9)p̃h (x) ρ̃ı φı (x),167168ıwhere ρ̃ is the MAP of ρ. In order to also identify the locations x of the most likelyarbitrages in the data and to quantify the latter, as the locations of the largestnoises and their values, we compute the joint MAP (ρ̃, ε̃) of the truncated Gaussianvector ρ and the MAP of the Gaussian noise vector ε. This can be defined as arg max Prob ρ [%, % d%], ε [e, e de] Φ(x) · ρ ε y, ρ I h%,e(for the probability measure Prob underlying the GP model). As (ρ, ε) is Gaussiancentered with block-diagonal covariance matrix with blocks Γh and ς 2 In , this implies174 that the mode (ρ̃, ε̃) is a solution to the following quadratic problem arg min175 (2.10)% (Γh ) 1 % e (ς 2 In ) 1 e .172173Φ(x)·% e y, % I hAs a consequence, we define the most probable measurement noise to be ε̃ and themost probable response surface p̃h (x) Φ(x) · ρ̃. Distance to the data can be an178 effect of arbitrage opportunities within the data and/or misspecification / lack of179 expressiveness of the kernel.1761775This manuscript is for review purposes only.

2.5. Sampling finite dimensional Gaussian processes under shape constraints.In view of (2.8), our construction of the put price surface consists in sampling ρ given182 Φ(x)·ρ ε y, truncated on I h . The conditional distribution of ρ Φ(x)·ρ ε y183 is multivariate Normal with mean η y (x) and covariance matrix Cy (x) such that180181184(2.11)η y (x) Γh Φ(x) (Φ(x)Γh Φ(x) ς 2 In ) 1 y185(2.12)Cy (x) Γh Φ(x) (Φ(x)Γh Φ(x) ς 2 In ) 1 Φ(x)Γh .Hence we face the problem of sampling from a truncated multivariate Gaussiandistribution, which we do by Hamiltonian Monte Carlo, using the MAP ρ̃ of ρ as188 the initial vector (which must verify the constraints) in the 81992002012022032042052062072.6. Local volatility. For the purposes of local volatility surface construction,the finite dimensional approximation ph should satisfy conditions (2.2). However,due to the constraints and the ensuing finite-dimensional approximation with basisfunctions of class C 0 (for the sake of Proposition 1), ph is not differentiable. Hence,exploiting GP derivatives analytics as done for the mean in [4, cf. Eq. (10)] and alsofor the covariance in [8] is not possible here. To address this issue, we shall formulatea weak form of the Dupire equation and construct the local volatility surface usinga finite element method: see the paper’s github for the detail. See Algorithm 2.1for the main steps of the approach.Algorithm 2.1 The GP-FE algorithm for local volatility surface approximation.Data: Put price training set p?Result: M realizations of the local volatility surface {duphi }Mi 1λ̂ Maximize the marginal log-likelihood of the put price surface ph w.r.t. λ// Hyperparameter fitting(ρ̃, ε̃) Minimize quadratic problem (2.10) based on λ̂ // Joint MAP estimateρ̃ Initialize a Hamiltonian MC samplerph1 , . . . , phM Hamiltonian MC Sampler // Sampling price surfacesduphi Finite element approximation using each phi , i : 1 M3. Neural networks implied volatility metamodeling. Our second goal is touse neural nets (NN) to construct a continuous implied volatility put surface Σ :R R R , interpolating implied volatility market quotes Σ up to some errorterm, both being stated in terms of a put option maturity T and log-(forward)kmoneyness κ log( S0 ) log SK0 (r q)T . The advantage of using impliedvolatilities rather than prices (as previously done in [2]), both being in bijection viathe Black-Scholes put pricing formula as well known, is their lower variability, hencebetter performance as we will see.The corresponding local volatility surface σ is given by the following local volatility implied variance formula, i.e. the Dupire formula stated in terms of the implied6This manuscript is for review purposes only.

208total variance Θ(T, κ) Σ2 (T, κ)T (assuming Θ of class C 1,2 on {T 0}):1(3.1)209σ 2 (T, K) 1 κΘ κ Θ 14 14 T Θ1Θ κ2Θ2 ( κ Θ)2 12 κ2 Θ(T, κ) :calT (Θ)(T, κ)buttk (Θ)We use a feedforward NN with weights W, biases b and smooth activationfunctions for parameterizing the implied volatility (hence the total variance), whichwe denote byΣ ΣW,b , Θ ΘW,b .210211212213214215216217218219220221222223224The terms calT (ΘW,b ) and buttk (ΘW,b ) are available analytically, by automatic differentiation, which we exploit below to penalize calendar spread arbitrages, i.e. negativity of calT (Θ), and butterfly arbitrage, i.e. negativity of buttk (Θ).The training of NNs is a non-convex optimization problem and hence does notguarantee convergence to a global optimum. We must therefore guide the NN optimizer towards a local optima that has desirable properties in terms of interpolationerror and arbitrage constraints. This motivates the introduction of an arbitragepenalty function into the loss function to select the most appropriate local minima.An additional challenge is that maturity-log moneyness pairs with quoted optionprices are unevenly distributed and the NN may favor fitting to a cluster of quotesto the detriment of fitting isolated points. Consequently large pointwise errors mayarise where the NN has favored a local minima with low interpolation accuracy andno arbitrage violation. To remedy this non-uniform data fitting problem, we propose a novel solution which involves re-weighting the observations by the Euclideandistance between neighboring points.More precisely, given n observations χi (Ti , κi ) of maturity-log moneynesspairs and of the corresponding market implied volatilities Σ (χi ), we construct then n distance matrix where each coefficient d(χi , χj ) is the euclidean distancebetween points :qd(χi , χj ) (Tj Ti )2 (κj κi )2 .We then define the loss weighting wi for each point χi as the distance with theclosest point:wi min d(χi , χj ).j,j6 iThis weighting aims at reducing error for any isolated points. In order to adjust226 the weight of penalization, we multiply our penalties by the weighting mean µw : P227 ν1wi . Learning the weights W and biases b to the data subject to no arbitrage225isoft constraints (i.e. with penalization of arbitrages), takes the form of the following229 (nonconvex) loss minimization problem:228230(3.2)vu u1 XΣW,b (χi ) Σ (χi ) 2 µw X Ttarg minwi λ R(ΘW,b )(ξ),nΣ (χi )νW,bi1ξ ΩhThis follows from the Dupire formula by simple transforms detailed in [6, p.13].7This manuscript is for review purposes only.

7248249250where λ [λ1 , λ2 , λ3 ] R3 and R(Θ) [cal T (Θ), buttk (Θ), calTcalT(Θ) a (Θ) a ] buttkbuttkis a regularization penalty vector evaluated over a penalty grid Ωh with ν 50 100nodes, which extends well beyond the unit square domain of the IV interpolation. Inthe unscaled moneyness and maturity coordinates, the domain of the penalty grid is[0.5, 2] [0.005, 10Y ]. This is intended so that the penalty term penalizes arbitragesoutside of the domain used for IV Interpolation. Even on such an extended penaltygrid, we found no arbitrage violation in our experiments (after training calT (ΘW,b )and buttk (ΘW,b ) are even 0 at all nodes of Ωh ).Note that the the error criterion is calculated as a root mean square error onrelative difference, chosen here, so that it does not discriminate high or low impliedvolatilities.The first two elements in the penalty vector favor the no-arbitrage conditions(2.2) and the third element favors desired lower and upper bounds 0 a a(constants or functions of T ) on the estimated local variance σ 2 (T, K). Suitablevalues of the “Lagrange multipliers” λ, ensuring the right balance between fit tothe market implied volatilities and the constraints, is obtained by grid search. Ofcourse a soft constraint (penalization) approach does not fully prevent arbitrages.However, for large λ, arbitrages are extremely unlikely to occur (except perhapsvery far from Ω). See Algorithm 3.1 for the pseudo-code of the NN approach.Algorithm 3.1 The NN-IV algorithm for local volatility surface approximation.Data: Market implied volatility surfaceq Σ calTResult: The local volatility surface butt(ΘŴ,b̂ 66(Ŵ, b̂) Minimize the penalized training loss (3.2) w.r.t. (W, b);qcalTbuttk (ΘŴ,b̂ ) AAD differentiation of the trained NN implied vol. surface4. Numerical results.4.1. Experimental design. Our training set is prepared using SPX Europeanputs with different available strikes and maturities ranging from 0.005 to 2.5 years,listed on 18th May 2019, with S0 2859.53. Each contract is listed with a bid/askprice and an implied volatility corresponding to the mid-price. The associated interest rate is constructed from US treasury yield curve and dividend yield curverates are then obtained from call/put parity applied to the option market pricesand forward prices. We preprocess the data by removing the shortest maturityoptions, with T 0.055, and the numerically inconsistent observations for whichthe gap between the listed implied volatility and the implied volatility calibratedfrom mid-price with our interest/dividend curves exceeds 5% of the listed impliedvolatility. But we do not remove arbitrable observations. The preprocessed trainingset is composed of 1720 market put prices. The testing set consists of a disjoint setof 1725 put prices.All results for the GP method are based on using Matern ν 5/2 kernels over a[0, 1]2 domain with fitted kernel standard-deviation hyper-parameter σ̂ 185.7611,8This manuscript is for review purposes only.

267268269270271272273274length-scale hyper-parameters θ̂k 0.3282 and θ̂T 0.2211, and homoscedasticnoise standard deviation, ςˆ 0.6876.2 The grid of basis functions for constructingthe finite-dimensional process ph has 100 nodes in the modified strike direction and25 nodes in the maturity direction. The Matlab interior point convex algorithmquadprog is used to solve the MAP quadratic program (2.10).Regarding the NN approach, we use a three layer architecture similar to the onebased on prices (instead of implied volatilities in Section 3) in [2], to which we referthe reader for implementation details.4.2. Arbitrage-free SVI. We benchmark the machine learning results with theindustry standard provided by the arbitrage free stochastic volatility inspired (SVI)277 model of [7]. Under the “natural parameterization” SVI ( , µ, ρ, ω, ζ), the implied278 total variance is given, for any fixed T , by pω ΘSVI (κ) 279 (4.1)1 ρ(κ µ)ζ (ζ(κ µ) ρ)2 (1 ρ2 ) 294295296297298299300301302303304305306SSVI is the parameterization of a full surface given as SVIT (0, 0, ρ, ΘT , φ(ΘT ))for each T , where ΘT is the at-the-money total implied variance and we use for φ aηpower law function φ(ϑ) ϑγ (1 ϑ)1 γ . [7, Remark 4.4] provides sufficient conditionson SSVI parameters (η(1 ρ ) 2 with γ 0.5) that rule out butterfly arbitrage,whereas SSVI is free of calendar arbitrage when ΘT is nondecreasing.We calibrate the model as in [7]:3 First, a guess on SVI is obtained by fitting theSSVI model; Second, for each maturity in the training grid, the five SVI parametersare calibrated (starting in each case from the SSVI calibrated values). The impliedvolatility is obtained for new maturities by a weighted average of the parametersassociated with the two closest maturities in the training grid, T and U , say, withweights determined by ΘT and ΘU . The corresponding local volatility is extractedby finite difference approximation of (3.1).Note that as, in practice, no arbitrage constraints are implemented for SSVIby penalization (see [7, Section 5.2]), the SSVI approach is in fact only practicallyarbitrage-free, much like our NN approach, whereas it is only the GP approach thatis proven arbitrage-free.4.3. Calibration results. Training times for SSVI, GP, and NNs are reportedin the last line of Table 1 which, for completeness, also includes numerical resultsobtained by NN interpolation of the prices as per [2]. Because price based NN resultsare outperformed by IV based NN results we only focus on the IV based NN in thefigures that follow, referring to [2] for every detail on the price based NN approach.Again, in contrast to the SSVI and NNs which fit to mid-quotes, GPs fit to thebid-ask prices.The GP implementation is in Matlab whereas the SSVI and NN approaches areimplemented in Python. On our large dataset, the constrained GP has the longesttraining time. Training is longer for constrained SSVI than for unconstrained SSVIbecause of the ensuing amendments to the optimization routine. There are no2When re-scaled back to the original input domain, the fitted length scale parameters of the 2DMatern ν 5/2 are θ̂k 973.1901 and θ̂T 0.5594.3Building on thors/44395469This manuscript is for review purposes only.

IV RMSE(Price RMSE)Calibr. fit onthe training setCalibr. fit onthe testing setMC backtestFD backtestComput. time(seconds)0.60%(0.321)0.57%(0.477)IV basedNNUnconstr.0.84%(2.163)0.86%(2.045)Pricebased NNUnconstr.5.65 29SSVIGPIV basedNNPricebased 91)33856191185Table 1: The IV and price RMSEs of the SSVI, GP and NN approaches. Last line:computation tatic arbitrage violations observed for any of the constrained methods in neitherthe training or the testing grid. Unconstrained methods yield 18 violations withNN and 177 with SSVI on the testing set, out of a total of 1725 testing points,i.e. violations in 1.04% and 10.26% of the test nodes. The unconstrained GPapproach yields constraint violations on 12.5% of the basis function nodes. TheNN penalizations (calT ) and (buttk ) vanish identically on Ωh in the constrainedcase, whereas in the unconstrained case their averages across grid nodes in Ωh are(calT ) 3.91 10 6 and (buttk ) 1.60 10 2 with the IV based NN.Fig. 1(a-b) respectively compare the fitted IV surfaces and their errors withrespect to the market mid-implied volatilities, among the constrained methods. Thesurface is sliced at various maturities and the IVs corresponding to the bid-ask pricequotes are also shown – the blue and red points respectively denote training andtest observations.We generally observe good correspondence between the models and that eachcurve typically falls within the bid-ask spread, except for the shortest maturitycontracts where there is some departure from the bid-ask spreads for observationswith the lowest log-moneyness values. We see on Fig. 1(b) that the GP IV errorsare small and mostly less than 5 volatility points whereas NN and SSVI exhibitIV error that may exceed 15 volatility points. The green line and the red shadedenvelopes respectively denote the GP MAP estimates and the posterior uncertaintybands under 100 samples per observation. The support of the posterior GP processass

51 approximation of the implied volatility surface, penalizing arbitrages on the basis 52 of the local volatility implied variance formula, i.e. the Dupire formula restated in 53 terms of implied variance, which is also used for extracting the correspondin

Related Documents:

1.2.8 Volatility in terms of delta 11 1.2.9 Volatility and delta for a given strike 11 1.2.10 Greeks in terms of deltas 12 1.3 Volatility 15 1.3.1 Historic volatility 15 1.3.2 Historic correlation 18 1.3.3 Volatility smile 19 1.3.4 At-the-money volatility interpolation 25 1.3.5 Volatility

Good Volatility, Bad Volatility and Option Pricing . by Bruno Feunou and Cédric Okou . 2 Bank of Canada Staff Working Paper 2017-52 . December 2017 . Good Volatility, Bad Volatility and Option Pricing by Bruno Feunou 1 and Cédric Okou 2 1 Financial Markets Department

Trading volatility means using equities and options to generate strategies which make or lose money when the market becomes more volatile or less volatile. One can think of market volatility as being the actual (realized) volatility of equities or, alternatively, the volatility implied by option prices (implied volatility)

Patchwork Kriging approach is to smooth out some of the discontinuity by using some weighted average across the local models or across multiple sets of local models via a Dirichlet mixture (Rasmussen and Ghahramani, 2002), a

Volatility Strategies How to profit from interest rate volatility . Source: Ardea Investment Management, Bloomberg. 5 These dynamics of abnormally low market pricing of interest rate volatility and compressed volatility risk premia used to be rare but are now becoming more common. Just as risk premia have shrunk in other

ratio portfolios with volatility derivatives. Intuitively, one expects that a portfolio strategy mixing a well-diversified equity benchmark and a suitably designed long exposure to volatility through trading in volatility index futures and/or volatility index options can be enginee

pricing certain kinds of exotic and structured products. keywords: volatility of volatility, variance derivatives, exotic options, structured products. 0.1 Introduction It is intuitively clear that for exotic products that are strongly dependent on the dynamics of the volatility surface pro

hardware IP re-use and consistency accross product families and higher level programming language makes the development job far more convenient when dealing with the STM32 families. HIGH-PERFORMANCE HIGH DEGREE OF INTEGRATION AND RICH CONNECTIVITY STM32H7: highest performance STM32 MCUs with advanced features including DSP and FPU instructions based on Cortex -M7 with 1 to 2 Mbytes of .