GPU Acceleration Of Monte Carlo Simulation For Capital .

2y ago
40 Views
2 Downloads
2.14 MB
25 Pages
Last View : 7d ago
Last Download : 3m ago
Upload by : Axel Lin
Transcription

GPU Acceleration of Monte Carlo simulationfor Capital Markets & InsuranceSerguei Issakov, Ph.D.Global Head of Quantitative Research & Development, Senior Vice PresidentGPU Technology Conference, San Jose, May 9, 2017

2.Agenda Numerix introductionHow Monte Carlo simulation is used for pricing in finance: pricing models, financial instrumentsFunctionalities in productionUse casesPricing code reorganization to run on GPUGPU acceleration factors for financial instruments of different complexityMulti-GPU scaling on DGX-1Nested Monte Carlo for future capital and margin projectionsRoadmap / future work

Numerix is the award-winning industry leader inrisk management and quantitative analytics forcapital markets participants.Our 200 financial engineers, developers, and implementation experts based insixteen countries help over seven hundred customers manage their mostdemanding trading, risk management, and regulatory compliance challenges

The World’s Financial Institutions Rely on NumerixOver 700 Global Financial InstitutionsClient-focused Exceptional, awardwinning client support Celent & Chartis CustomerService AwardsUSD 10Trillion inAssets Priced& Managed Local time zonecoverage/support Fast response time toclient requestsOver 90 Partners Agile development &flexible technologyFlexibleIntegrationwith ExistingSystems

Customer Use Cases & Global PresenceFinancial institutions all over the world rely on Numerix, with 24 offices in 16 countries, to price theirderivatives contracts, manage their risk exposures, and better serve their customersBlackRock Solutions Challenge: Expand FXexotic analytics intoexisting platform Solution: FX models,flexible & scalableThiel Capital Challenge: Trade macroeconomic risk anywhere in theworld quickly and efficiently Solution: Realtime trade andrisk management solutionwith broadest coverage in theindustryLehman Brothers HoldingsInc. Challenge: Independentanalytics for valuing entireLBHI derivatives portfolio Solution: Accurate, real-timevaluations for bankruptcy &unwinding of booksLinear Investments Challenge: Manage andmonitor client exposureintra-day Solution: Realtime clientmargin solutionGlitnir Bank Challenge: Independent,valuations for unwindingof complex OTC book Solution: Independentmodels and valuations forcreditorsBelfius Bank Challenge:Independent modelvalidation Solution: Modelvalidation &riskcontrolpbb DeutschePfandbriefbank Challenge: Firm-wideCVA calculations Solution: Accurate,real-time CVA & PFEOTP Bank Challenge: Flexibleanalytics for structuredproducts Solution: Pricing & riskfor complex structuresNumerix Corporate HeadquartersNumerix OfficesRegions with Numerix ClientsSwedbank Challenge: Flexible &transparent frameworkfor pricing & structuringexotics Solution: Unifiedvaluation framework forpricing & riskChina CITIC Bank Challenge: Analyticsfor pricing & riskmanagement ofcomplex derivatives Solution: Breadth ofmodels for pricing & riskYuanta Financial Holdings Challenge: Consolidatedrisk platform for all assetclasses Solution: Unifiedplatform for pricing & riskfor entire portfolioHDFC Bank Challenge: Consolidaterisk & reporting,enhance market & CCR Solution: One platformfor risk & added CCRDBS Challenge: Automatedworkflow for 2000 Requestfor Quotes Solution: Automatedprocess to respond to RFQs

Numerix butionCrossAssetfor BloombergCrossAssetfor EikonCrossAssetSDKCrossAssetExcel

7.GPU support of Monte Carlo simulation at NumerixTimelineAug 2016: First production release of Monte Carlo simulation on GPU for simpler trades (CapitalMarkets)Nov 2016: Support of CUDA 8.0 environment required to run on the latest generation of GPUs, PascalDec 2016: Added support for most complex trades (Insurance)GPU advantages at a glanceIncreased computation speed: acceleration of 20X on one GPU versus a single threaded computationon CPUSupport for running Monte Carlo simulation on multiple GPUs, with practically perfect parallelization(tested on NVIDIA DGX-1)Allows to substantially increase the number of Monte Carlo paths for more accurate pricing and riskmanagement

8.Monte Carlo simulation in financeEvolution of markets follows stochastic processesPricing models: stochastic differential equationsFinancial instruments / trades: Payoff “script” to define instrumentMonte Carlo pricing Generate random numbersGenerate Monte Carlo “paths” according to model (discretization of stochastic evolution)Execute payoff script to compute distributions of future prices (most operations are path-wise)

9.Equity / Foreign Exchange ModelsBlack-Scholes model– short term rate,– continuous dividend,– volatility (deterministic function of time)– Brownian motion (in the risk-neutral measure)Local Volatility (Dupire) model– dividend curve,– local vol (deterministic function of the asset level and time)Stochastic Volatility (Heston) model– variance,– mean reversion rate, – long-term variance,– volatility of volatility.is correlation between the stochastic processes for asset level and its variance.

10.Payoff script example (simpler exotic trade)Trade type: worst of down & in put for equity basket with 3 underlyings in the same currency,with continuous barrier monitoringPAYOFFSCRIPTPRODUCTSDISCOUNTING WO, WOKO123, WOKO123discrete, wodip, wodipSquaredNONDISCOUNTING eq0[3], worstperf, isKo, OneAssetINTEGER iEND PRODUCTSPAYOFFSCRIPTIF ISACTIVE(today) THENisKO 1Oneasset 1eq0[1] 67.2eq0[2] 72.5eq0[3] 11.55AttachBarrier(Oneasset, EQ1, TODAY, Barrier *AttachBarrier(Oneasset, EQ2, TODAY, Barrier *AttachBarrier(Oneasset, EQ3, TODAY, Barrier *END IFIF ISACTIVE(obsdates) THENworstperf 10000FOR i 1 TO 3worstperf min(eq[i] / eq0[i], worstperf)NEXTisKO * STEP(worstperf-Barrier)END IFIF ISACTIVE(expiry) THENWO CASH(MAX(strike - worstperf, 0), expiry,WOKO123discrete WO *isKOwoko123 WO * oneassetwodip wo - woko123wodipSquared wodip * wodipEND IFEND PAYOFFSCRIPTeq0[1], BarrierDown, 0, PayRebateAtMaturity, EXPIRY)eq0[2], BarrierDown, 0, PayRebateAtMaturity, EXPIRY)eq0[3], BarrierDown, 0, PayRebateAtMaturity, EXPIRY)THISPAY)

11.Monte Carlo pricing on GPU in productionForward Monte Carlo simulation on GPUSupported pricing models & model configurations Equity/FX models. H2 2016: Black-Scholes, Local Vol (Dupire)Q1 2017: Stochastic Vol (Heston), ‘Hot start’ Heston [*]Q2 2017: Local Stochastic Vol (LSV), Stochastic Vol with Jumps (Bates) Equity/FX basket models with above models for individual equities Single currency Hybrid model with the above models for individual equities & deterministic IR modelRandom numbers & Floating point precision Quasi-random numbers (e.g. Sobol sequences) & pseudorandom numbers (lower memory footprint) Double precision/FP64 & single precision/FP32 in Monte Carlo simulation[*] S. Mechkov, ‘Hot-start’ initialisation of the Heston model (2016), RISK, 0/hot-start-initialisation-heston-modelSerguei Mechkov initialises Heston model’s parameters using probability distributions

12.Trade types & GPU controlsTrade types: All trades that can be priced by Forward Monte Carlo simulation are supported on GPUTrade complexity# lines inpayoff scriptExampleSimpler exotics30Options on small equity baskets with barrier conditionsStructured deals ofaverage complexity300FX TARF (Target Redemption Forward) allows to buy or sellforeign currency at an agreed “enhanced rate” for a number ofexpiry datesMost complexstructured deals3,000Variable AnnuitiesGPU controls User’s access to GPU card parameters, the numbers of blocks and threads, to choose optimal GPUhardware configuration Ability to direct simulation to a particular GPU in the multi-GPU setup

13.Use Cases: Monte Carlo pricing on GPUEMEA: Swiss Private BankRequires high accuracy (a very large number of Monte Carlo paths) and high speed. Already running Monte Carlosimulation on GPU in production, with a simple model (Black-Scholes). Needs a more advanced Local Vol model.Timing requirements: 1 second on a modern GPU, for pricing and greeks for an equity basket option trade, with300,000 Monte Carlo paths and 100 timesteps.APAC: Major Commercial Bank in East AsiaSimulation time on one CPU core: 120 min. Required time to simulate a portfolio (price and greeks): 20 secondsObjective: optimal solution with a CPU/GPU configurationTrade type: structured product FX TARF (Target Redemption Forward)Models: FX Local Volatility model, FX Local Stochastic Volatility modelAmericas: US InsurerRepresentative portfolio/block of 60,000 policies (Variable Annuities): runtime 1.5 hoursHybrid model with Black-Scholes equity basket models and deterministic rates

14.Pricing code re-organization for running on GPUPricing code (on CPU) is re-written / re-reorganized as a long unrolled sequence (“batch”), of tens of thousands tohundreds of thousands of short function calls.The length is proportional to the simulation length (number of dates) and also depends on the instrument complexity.List of functionsdevice const nsSchedule::Vvvcc functions [] {Vneg, Vabs, Vexp, Vlog, Vstep,VassignC, VplusC, VmultC, VmaxC, VpowC,VassignV, VplusV, VminusV, VmultV, VdivV, VmaxV,VshiftVC, VbarrierDnVCC, VbarrierUpVCC,VassignR,VmultB, VsumB,.};//self transformation//number r.h.s//vector r.h.s//combinations//pseudo-random//MC normalizationRegistration on CPUVoid registerEvent(Vvvcc fun, nsFloat* , const nsFloat* v0, const nsFloat* v1,nsFloat c0, nsFloat c1, void* data);results in a batch of “events”Event {Vvvcc f; nsFloat * ; const nsFloat *v0, *v1; nsFloat c0, c1; void *d;};prepared and executed on GPU

15.Examples on functions on GPU// Assign Vectordevice void VassignV(nsFloat* ,const nsFloat* x,const nsFloat* y,nsFloat a,nsFloat b,unsigned n,void* data){int step gridDim.x*blockDim.x;for(int tid threadIdx.x blockIdx.x*blockDim.x; tid n; tid step)[tid] x[tid];}// Initialization of pseudo-random numbersdevice void VassignR(nsFloat* ,const nsFloat* x,const nsFloat* y,nsFloat a,nsFloat b,unsigned n,void* data){nsRandTaus* r (nsRandTaus*)data;int step gridDim.x*blockDim.x;for(int tid threadIdx.x blockIdx.x*blockDim.x; tid n; tid step)[tid] normal(r[tid]);}There are functions that do averaging over paths. Done in two steps: first averagingover threads in a block, in shared memory, and then averaging over blocks.

16.Custom functions on GPUdevice void volsFromStates(nsFloat* vols,const nsFloat* states,const nsFloat*,nsFloat from,nsFloat to,unsigned np,void* data){const nsSimLV::LVstep& lv step *(nsSimLV::LVstep*)(data);unsigned n lv step.n;const nsFloat* ddates lv step.dates;int step gridDim.x*blockDim.x;for(int tid threadIdx.x blockIdx.x*blockDim.x; tid np; tid step){nsFloat x states[tid] lv step.dstate;if(n 2)vols[tid] volatilityFromState(x,lv step.maps[0].v,lv step.maps[0].n);else{nsFloat vv 0.;for(size t t 0; t n; t){nsFloat v volatilityFromState(x,lv step.maps[t].v,lv step.maps[t].n);vv v*v*(ddates[t 1]-ddates[t]);}vols[tid] sqrt(vv/(ddates[n]-ddates[0]));}}}

16GPU Benchmarks: Equity basket optionsEquity basket options with barriers. Equity basket model with Black-Scholes for individual equitiesWorkstation: CPU 10 cores, RAM 64GBGPU: GeForce GTX Titan (Kepler), 2688 CUDA coresLaptop: CPU 6th gen i7 6820-HQ 2.7GHz 4 cores,RAM 16GBGPU: Quadro M1000M, 512 CUDA coresWorkstation with GeForce GTX Titan# PathsCPUBATCHFP64CPUBATCHFP32GPUFP64GPUFP32Laptop with Quadro 50K 1.38100K 2.92200K 6.25300K 10.019.511.310.412.31112121350K100K200K300K# PathsCPUBATCHFP64Pseudorandom numbers100K 2.57 2.67200K 5.26 3.64300K 8.90 4.53500K 14.22 10.500.230.370.530.780.110.220.330.54Time in eudorandom numbersQuasi-random numbers100K 2.38200K 4.87300K 6.75500K .080.150.270.42Quasi-random 71.280.10.170.340.55

18GPU Benchmarks, with 2017 optimizationsTrade type: worst of down & in put for equity basket with 3 underlyings in the same currency, with continuousbarrier observationModel: Equity basket model with 3 Black Scholes models for underlying equitiesSimulation parameters: 300,000 Monte Carlo paths, 100 timestepsQuantities computed: price plus delta, gamma, and vega for each of 3 underlying equities. Greeks are computedas central finite differences, thus requiring 13 PV computations total for price and all Greeks.Accelerated computation of Greeks on GPU: by reusing the same random numbers for price and GreeksTime in seconds, on one GPUGPU ArchitecturePascalKeplerGPU GradeConsumerProfessionalGPU ModelGTX GeForce 1080Tesla isionFP320.901.24

19Benchmarking on NVIDIA DGX-1Multi-GPU time scalingThe execution time of 1 task on 1 GPU device is measured (in seconds) astotal time*#GPUdevices/#CPUthreadsFor perfect scaling this number should be invariant2X 20-core Intel Xeon E5-2698 v4 8X NVIDIA Tesla P100CPU Threads14014080Time,Time,GPU Devices single precision FP32 double precision FP64114480.05730.05720.05920.06210.0827 We are working with NVIDIA to make available the option forContainerized Numerix applications on NVIDIA’s DGX-10.08200.08180.08520.09140.1375

20GPU Benchmarks: FX TARFStructured deal of average complexity: 300 lines of price scriptFX TARF, 20K Monte Carlo pathsLaptop with Quadro M1000M, 512 CUDA cores, time in GPUFP64speedupGPUFP32speedupPseudorandom numbersBlack-Scholes1.571.370.160.069.823Local andom numbersBlack-ScholesLocal Vol1.451.541.001.150.260.27

228.GPU benchmarks for Insurance: Variable AnnuitiesMost complex structured deal: 3,000 lines of pricing (payoff) “script”Hybrid model with Equity Black-Scholes and deterministic rates, FP64Laptop 4 CPUs, M1000M GPU 512 cores# MonteCarlopathsCPU onlyTotal timeCPU GPU:Total timeCPU GPU:CPU timeCPU GPU:GPU 311.421.381.551.691.911.832.112.162.412.42Double precision FP64 GPU acceleration factor: 6 for 10K paths. NVLink between CPU and GPU should help accelerate more. Strategy for a smaller # Monte Carlo paths:Dynamic compilation (using CUDA PTX) of a payoff script to reuse it: (a) for Greeks, (b) for computing a block ofinsurance policies with the same definition and differed by parameters only

22.GPU to enhance business processes in InsuranceRisk-neutral (RN) pricing and greek computations for a portfolio of hedge assets and liabilitieso Stochastic RN scenarios (e.g. 50-year monthly timestep projection with 10,000 paths ((50 * 12) 1) * 10,000) 6,010,000 values to compute for each index in the simulation)o Hedging for a block of Variable Annuity or Fixed Index Annuity businesso Intra-day pricing where the speed of these computations on a portfolio level is criticalInsurance Reserves & Capital (nested stochastic)o Stochastic Real World scenarios for an outer loop (e.g. 50-year monthly timestep projection with 10,000 paths)o Along each Real World path and timestep value hedge assets using risk-neutral pricing framework Total paths required # RW Paths * # RN Paths 10,000 * 10,000 100,000,000 paths Total values to compute Total paths required * ((50 * 12) 1) 60,100,000,000 values for each index in the simulationFinancial Planningo Examine company financials under various planning scenarios (requires looking at reserves and capital in these variousmacro scenarios)o Essentially a ‘third’ loop to run the above frameworks (triple stochastic)

23.Nested Monte Carlo simulation for XVAXVA components (Valuation Adjustments) MVA: cost of future initial marginWork in progress in the industry, after new initial margin rules (in effect Sep 2016 for larger banks in US and Japan, 2017 inEurope). Expected to become a major contribution into XVA. KVA – cost of future capitalNew FRTB (Fundamental Review of Trading Book) regulatory capital requirements (2016)Nested Monte CarloOuter loop: generate Monte Carlo scenariosInner loop: simulate margin / capital

.24Numerix GPU RoadmapPricing ModelsH2 2017Extending support of Forward Monte Carlo simulation on GPU to Stochastic Interest Rate models Hybrid models with stochastic interest rates2018 Support of American Monte Carlo / Least Squares Monte Carlo on GPU(to price callable structured/exotic trades)Acceleration of Risk Management & XVA for Front & Middle OfficeH2 2017 Middle office Counterparty Risk (Expected Exposures, PFE, etc.) for risk neutral & real world scenarios for simple trades XVA for simple trades (vanilla swaps, FX forwards), typically a majority of a portfolio2018 XVA for structured/callable trades

isakov@numerix.comTHANK YOU

Supported pricing models & model configurations Equity/FX models. H2 2016: Black-Scholes, Local Vol (Dupire) Q1 2017: Stochastic Vol (Heston), Hot start Heston [*] Q2 2017: Local Stochastic Vol (LSV), Stochastic Vol with Jumps (Bates) Equity/FX basket models with above models for individual equities

Related Documents:

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

Fourier Analysis of Correlated Monte Carlo Importance Sampling Gurprit Singh Kartic Subr David Coeurjolly Victor Ostromoukhov Wojciech Jarosz. 2 Monte Carlo Integration!3 Monte Carlo Integration f( x) I Z 1 0 f( x)d x!4 Monte Carlo Estimator f( x) I N 1 N XN k 1 f( x k) p( x

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

Tian et al. developed a GPU Monte Carlo dose calculator (goMC) based on the OpenCL GPU computing framework to enable widespread adoption of Monte Carlo simulation across all popular GPU hardware architectures.22Ziegenhein et al. delocalized the dose calculation process with an inte- grated cloud-based Monte Carlo framework that allows dynamic sc.

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

vi Equity Valuation 5.3 Reconciling operating income to FCFF 66 5.4 The financial value driver approach 71 5.5 Fundamental enterprise value and market value 76 5.6 Baidu’s share price performance 2005–2007 79 6 Monte Carlo FCFF Models 85 6.1 Monte Carlo simulation: the idea 85 6.2 Monte Carlo simulation with @Risk 88 6.2.1 Monte Carlo simulation with one stochastic variable 88

Electron Beam Treatment Planning C-MCharlie Ma, Ph.D. Dept. of Radiation Oncology Fox Chase Cancer Center Philadelphia, PA 19111 Outline Current status of electron Monte Carlo Implementation of Monte Carlo for electron beam treatment planning dose calculations Application of Monte Carlo in conventi