PYTHON IN HIGH-ENERGY PHYSICS - Scikit-HEP

1y ago
41 Views
2 Downloads
669.93 KB
35 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Madison Stoltz
Transcription

PYTHON IN HIGH-ENERGYPHYSICSHans Dembinski, MPIK Heidelberg21 Mar 20191

ABOUT MECosmic ray/HEP physicist now in LHCbTrying to solve the Muon Puzzle in air showersActive in the Boost C and Scikit-HEP Python communitiesMy OSS projectsBoost::Histogrampyhepmciminuit (maintainer)2

TAKE-HOME MESSAGEHEP so ware is still dominantly C (ROOT) but half the analyses in LHCb already in Python (survey 2018)Next major release ROOT 7 will resolve fundamental design issuesOSS initiatives in Python and C offer alternatives to ROOTScikit-HEP Project: uproot, iminuit, Boost::Histogram with Python frontendBright future for Python in HEPPython can easily bind to C libraries with pybind11Python itself can be made fast with NumbaGrowth of Python ecosphere outperforms growth of C ecosphere3

HIGH-ENERGY PHYSICSBig Data: billions of events, Petabytes of dataNeed fast code to execute on computing clustersHierarchical data structures: Trees (event variables, track variables)Computing uses consumer hardware (no Crays)Run same code on laptop and cluster (almost)Physicists traditionally prefer to use one language for everythingPast: libraries and analysis code written in C (Fortran before)Current: write libraries in C and analysis code in C or PythonTrend: more Python, less C 4

ROOT FRAMEWORKLatest release 6.16/00Large meta-libraryIO, data structures, histograms, fitting, graphics,databases, OS interaction, High-level statistics toolsRooFit, RooStats, TMVA5

WHAT ROOT DOES WELLROOT IO: TFile & TTree have no equalPortable binary hierarchical data formatTransparent compressionAllows partial reads & partial recovery from failed writesFast interactive data exploration with TTree::DrawCling: ROOT’s C runtime interpreterFully standard compliant (based on LLVM)Run C code like a script or compile for fast executionReplaced CINT from ROOT 5PyROOT: Auto-generated Python bindingsWraps arbitrary C code to Python without extra effort (when it works)Backward compatibility6

ROOTBOOKS IN SWANJupyter on top of CERNBox with Python and ROOT C kernels7

WHAT ROOT DOES NOT SO WELLBrittle automatic memory managementNo. 1 user complaint, see my LHCb talk at ROOT Users’ Workshop, slide 11ROOT tried to replace the C standard any libraryNot-invented here syndrome and vendor lock-inStandard interfaces duplicated in ROOT with added maintenance burdenUsers forced to learn ROOT style instead of idiomatic C Maintenace nightmareBugs bugs bugs, and many of them open for yearsToo small developer team for too large code baseLittle support from industry and OSS communityDesign issues: leaking abstractions, lack of RAII, inconsistencies8

AVERAGE BUG LIFETIME IN ROOT9

DESIGN ISSUESActual ROOT codeTFile* outfile new TFile(.); // stack allocation usually does not workTH1D* histogram new TH1D(.); // ROOT wants everything on the heap// .fill histogram.histogram- Write(); // how does histogram know where to write to?outfile- Close(); // histogram also silently deleted here?delete outfile; // histogram also silently deleted here?Desired ROOT codeTFile outfile("output.root", "recreate"); // stack allocation worksTH1D histogram(.);// .fill histogram.outfile histogram; // ostreaming, just like in std iostreamsoutfile.close(); // no coupling of life-time of TFile and TH1D10

THE FUTURE: ROOT 7First release in 20 years to break backward-compatibilityRequired to fix historic mistakes in interfaces and memory management“We will use standard C types, standard interface behavior”Nice new thingsRHist replaces previous histogramsRDataFrame replaces TTreeBetter (automatic) parallelizationBetter graphicsMany talks about ROOT 7 at ROOT Users’ Workshop 201811

WHY ROOT 7 WILL NOT WIN THE DAYROOT 7 is a big improvement, but Big Data community is moving away from C towards PythonIndustry-powered machine learning tools are in PythonML tools draw people to Python ecospherePython gives you access to better and faster evolving librariesWhy would you ever go back?Manpower problem remainsStill large amounts of tech debt which binds manpowerCan either fix bugs or develop new featuresLoosing race againsts other libraries which attract more manpowerROOT core team are good people, but cannot compete with OSS communitySupport unlikely to come from OSS community/industry12

PYTHONNow the dominant language in scientific computingComfortable syntax for analysis scriptsEasy to learn and masterRich and vibrant ecosphereNumPy, matplotlib, scipy, scikit-learn, pandas, JupyterAnaconda, PyTorch, TensorFlow, Keras, Easy to write and distribute new librariesAdopted by industry leaders: Google, Instragram, Facebook, Adopted by leading (astro)particle physics experimentsIceCube Neutrino Observatory, CTA, CERN, 13

Really, everything. Even CMake or pybind11.14

GOOGLE TRENDS15

program CPU time / fastest CPU time531benchmarks gamere01 Mar 2019 u64qGoOCamlGoGoOCamlOCamlLisp SBCLGoFree Pascalprogram CPU time / fastest CPU timeOCamlGoLisp SBCLGoFree Pascalisp SBCLGoPascalChapelSwiftJavaF# .NET CoreChapelSwiftJavaHow many times slower?1benchmarks gamePerllkTruffle Ruby01 Mar 2019 u64qRubyRubyLuaLuaLuaMatz's RubyLuaRubyPerlPython 3RubyPerlVW SmalltalkPython 3LuaPHPErlang HiPEPHPTruffle RubyPerlRubyPerlMatz's RubyLuaPython 3RubyPerlVW lang HiPEPHPRacketJuliaDartNode jsDartNode Node jsDartHP53e js10Haskell GHC5030Haskell GHC100GHC300tJavaSwiftChapelSwiftJavaF# .NET CoreChapelSwiftJavaAda 2012 GNATC# .NET CoreRustRustFortran IntelC gccC gccC# .NET CoreRustFortran IntelRustC gccda 2012 GNATentelRustC gcc10gcc5030C g 100C g 300C g BUT PYTHON IS SLOW !How many times slower?Source: The Benchmark Game16

OR IS IT?Use a fast Python library (written in C/C , Fortran, )NumPy, CuPy, SciPy, Use a JIT in your Python session: NumbaUse a faster Python interpreter: PyPyUse Python as a glue languagePython configures and steers fast C/C /Fortran codePasses memory buffers from one library to the nextExamples: ROOT, LHCb Core So ware, IceCube Framework Generate bindings with pybind11, cffi, f2py, ctypes, Cython, Boost.Python, SWIG, PyROOT, 17

NUMPYSIMD programming: Single Instruction on Multiple DataCompute one array at a time instead of one value at the timePython loops and functions are slow, NumPy calls them in CProContraEasy to useQuite fastO en compact readable codeCreates temporary arrays which could be avoidedNot so readable/fast when instruction has branchesLearning-curve: Thinking in arrays, NumPy API18

import numpy as npx np.random.rand(1000)#abcgood 2 * x 1 np.log(x ** 4) x 0.5 # creates a boolean array, can be used to filter x# not so good: compute 2 x if x 2 and else x 3d np.where(x 2, 2 * x, x 3)Doesn’t work when instructions differ for each elementMC simulation of multiple particle trajectoriesMandelbrot fractal (no. of iterations vary in each pixel)19

NUMBA: JIT COMPILER FOR PYTHON1. Translates Python code into AST (types are inferred)2. Applies optimizations (vectorization, parallelization)3. Compiles AST with LLVM into machine codeProContraEasy to useReally fast pythonic codeSupports auto-parallelizationSupports GPU computationUse NumPy as input and outputNot all Python types supportedOnly works on functions and methods (not classes)Learning-curve: understanding Numba errorsNumba is pretty smart: inlines nested JITed functions, 20

Just import njit and decorate your functionfrom numba import njitimport numpy as npx np.random.rand(1000)def func with branch numpy(x): # 11 µsreturn np.where(x 0.5, 2 * x, x 3)@njitdef func with branch numba(x):result np.empty like(x)for i, xi in enumerate(x):if xi 0.5:result[i] 2 * xielse:result[i] xi 3return result# 0.9 µsNumba is 12x faster than NumPy on my laptop21

PYPY: JIT-ENABLED INTERPRETERAlternative JIT-enabled Python interpreter written in RPythonProContraIdeally: Use PyPy and code gets fastExpressions are JIT-compiled as neededCan optimize classesCan do global code optimizationsNumpy, matplotlib workNot all Python libraries work: e.g. SciPyA bit cumbersome to installLagging behind CPython syntax (stable: 3.5)NumPy code may run slowerNumPyPy incomplete22

Official Download and Install PagePortable binaries for Linuxmkdir -p HOME/pypyURL ads/pypy3.5-7.0.0linux x86 64-portable.tar.bz2wget -O - URL tar xjf - --strip-components 1 -C HOME/pypy HOME/pypy/bin/virtualenv-pypy HOME/pypy/venvsource HOME/pypy/venv/bin/activateMac OS X binarymkdir -p HOME/pypyURL v7.0.0-osx64.tar.bz2wget -O - URL tar xjf - --strip-components 1 -C HOME/pypypip install --user virtualenvvirtualenv HOME/pypy/venv -p HOME/pypy/bin/pypy3source HOME/pypy/venv/bin/activate23

PyPy3.5-7.0: 1.7x faster than NumPy in CPythonNumba in CPython 7x faster than PyPy3.5-7.0Could not compile NumPy on OSX (works on Linux)setuptools doesn’t add -stdlib libc on Darwin platformimport randomx [random.uniform(0, 1) for i in range(1000)]def func with branch(x): # 6.3 µsresult [0.0] * 1000 # using [0] * 1000 here gives a slowdown of 2!for i, xi in enumerate(x):if xi 0.5:result[i] 2 * xielse:result[i] xi 3return result but you can write plain pythonic code and it is fast24

SCIKIT-HEP PROJECTOnline community which develops Python stack for HEPSupported by IRIS-HEP, NSF funded so ware instituteLeading members from Princeton, Cincinnati U, Washington U Join us on Gitter: https://gitter.im/HSF/PyHEPScikit-HEP forum: scikit-hep-forum@googlegroups.comOn Github: https://github.com/scikit-hepHome of uproot, iminuit, boost-histogram, particle, pyhepmc, 25

UPROOTImplementation ROOT I/O in pure Python and NumpyRead/write ROOT trees, histograms, TGraphs, T(Lorentz)VectorsCan read data fields of any other ROOT typeUp to 3x faster than C ROOTDoes not depend on C ROOT (just one pip install away)Extensible, see uproot-methods repositoryPowered by awkward-arrayHierarchical array implemented on top of standard Numpy arraysSee Jim Pivarski’s talk for interesting details26

27

import numpy as npimport uprootf uproot.open(" /Data/sct/mc/00058786 00000001 5.sct.root")print(f.keys())# [b'sct;6', b'sct;5']f['sct'].show()# evt run# .# vtx x(no streamer)asdtype(' i4')(no streamer)asjagged(asdtype(' f4'))f['sct/evt evnum'].array()# array([5881230, 5881230, ., 5878628, 5878628], dtype int32)pz f['sct/trk pz'].array()# JaggedArray [[4186.4 5212.5 3073.3] [] [6479.1 3533.5] .] from matplotlib import pyplot as pltplt.hist(np.log10(pz.flatten())) # plot log10(pz) distributionfor pxi in f['sct/trk px'].array(): print(np.mean(pxi))# 150.75218 nan -79.71784 -120.3935 nan -146.99773 12.007137 .28

IMINUITThe Python wrapper of C MINUIT2 libraryOther wrappers (pyminuit, pyminuit2) discontinuedBindings generated with Cython (will switch to pybind11)Python 2.7 to 3.7 on Linux, Mac, WindowsNew: PyPy support (PyPy3.5-7.0)Does not depend on C ROOTSimply install with pip or condaMany good OSS minimizers: scipy, libnlopt, MINUIT’s unique feature is error computation with Hesse & MINOS29

from iminuit import Minuitdef f(x, y, z):return (x - 2) ** 2 (y - 3) ** 2 (z - 4) ** 2m Minuit(f)# Minuit automagically detects parameter names!m.migrad()print(m.values)# run optimiser# {'x': 2,'y': 3,'z': 4}m.hesse()print(m.errors)# run Hesse error estimator# {'x': 1,'y': 1,'z': 1}Minuit can do much moreParameters with limitsFixed parametersPretty Jupyter outputBuiltin plotting of error contours and function minimum30

BOOST-HISTOGRAMPython wrapper (alpha stage) for Boost::Histogram in C Boost::Histogram will be first released with Boost-1.70 in AprilGeneralized multi-dimensional histograms and profiles in idiomatic C 14Use buitin axis types or add your ownregular, variable, circular, category; all growing or non-growingSupport for complex binning schemes, like hexagonal binningEasy and safe to use in default configurationVery customizable for power usersGet the highest speed for given taskWrite new specialized axis and storage types that we didn’t think ofTMP under the hood makes execution fast and interface easy to use31

from boost.histogram import histogramfrom boost.histogram.axis import regular, categoryhist histogram(category(("red", "blue")),regular(4, 0.0, 1.0))# input doesn't have to be numericalhist(["red", "red", "blue"],[0.1 , 0.4 , 0.9])counts hist.view# returns numpy array view into histogram counts:# [[1, 1, 0, 0],# [0, 0, 0, 1]]32

SUMMARY AND OUTLOOKHEP so ware is still dominantly C , but bright future for PythonPython can be very fast with NumbaPython can integrate with C/C libraries using pybind11If you can write fast code in Python, why would you use C ?OSS initiatives in Python and C offer alternatives to ROOTScikit-HEP Project: uproot, iminuit, Boost::Histogram with Python frontendSpecialized HEP-style plots in development, to be included in matplotlib33

BACKUP: PYBIND11 VS. CYTHONCython: transpiler for custom Python/C mixed dialectLearning curve: need to learn this dialectDesigned for C; C only partially supportedClumsy syntax, workarounds needed for missing features and bugsCython adds problems instead of solving thempybind11Based on the brilliant Boost::Python libraryNo transpiler, just a header-only C 11 libraryUses TMP to automate boilerplate codeAutomated handling of refcountsFull power of C , no workarounds, explicit ownership of memoryExcellent docs34

#include pybind11/pybind11.h #include pybind11/numpy.h namespace py pybind11;py::array t double func with branch(py::array t double x) {auto result py::array t double (x.shape(0));auto rd result.mutable data();auto xd x.data();for (ssize t i 0, n x.shape(0); i n; i) {if (xd[i] 0.5) {rd[i] 2 * xd[i];} else {rd[i] xd[i] 3;}}return result;}PYBIND11 MODULE(example, m) {m.def("func with branch", &func with branch); // 1.7 µs (compiled with -O3)}6.5x faster than NumPy version, but 1.9x slower than Numba35

HP PHP PHP PHP PHP PHP HiPE Erlang HiPE Erlang HiPE . Perl Perl Perl Perl Perl Perl Ruby Ruby Ruby Ruby Ruby Python 3 Python 3 Python 3 Lua Lua Lua Lua Lua Lua Ruby Matz's Ruby Matz's Ruby benchmarks game 01 Mar 2019 u64q p r o . Python configures and steers fast C/C /Fortran code Passes memory buffers from one library to the next

Related Documents:

Python Programming for the Absolute Beginner Second Edition. CONTENTS CHAPTER 1 GETTING STARTED: THE GAME OVER PROGRAM 1 Examining the Game Over Program 2 Introducing Python 3 Python Is Easy to Use 3 Python Is Powerful 3 Python Is Object Oriented 4 Python Is a "Glue" Language 4 Python Runs Everywhere 4 Python Has a Strong Community 4 Python Is Free and Open Source 5 Setting Up Python on .

Python 2 versus Python 3 - the great debate Installing Python Setting up the Python interpreter About virtualenv Your first virtual environment Your friend, the console How you can run a Python program Running Python scripts Running the Python interactive shell Running Python as a service Running Python as a GUI application How is Python code .

Python is readable 5 Python is complete—"batteries included" 6 Python is cross-platform 6 Python is free 6 1.3 What Python doesn't do as well 7 Python is not the fastest language 7 Python doesn't have the most libraries 8 Python doesn't check variable types at compile time 8 1.4 Why learn Python 3? 8 1.5 Summary 9

site "Python 2.x is legacy, Python 3.x is the present and future of the language". In addition, "Python 3 eliminates many quirks that can unnecessarily trip up beginning programmers". However, note that Python 2 is currently still rather widely used. Python 2 and 3 are about 90% similar. Hence if you learn Python 3, you will likely

There are currently two versions of Python in use; Python 2 and Python 3. Python 3 is not backward compatible with Python 2. A lot of the imported modules were only available in Python 2 for quite some time, leading to a slow adoption of Python 3. However, this not really an issue anymore. Support for Python 2 will end in 2020.

A Python Book A Python Book: Beginning Python, Advanced Python, and Python Exercises Author: Dave Kuhlman Contact: dkuhlman@davekuhlman.org

Mike Driscoll has been programming with Python for more than a decade. He has been writing about Python on his blog, The Mouse vs. The Python, for many years. Mike is the author of several Python books including Python 101, Python Interviews, and ReportLab: PDF Processing with Python. You can find Mike on Twitter or GitHub via his handle .

Launch Eclipse Install Python plug-in for Eclipse Add a Python Interpreter Create a Python Project Create a Python Program Run a Python Program Debug a Python Program 0 Introduction This tutorial is for students who want to develop Python projects using Eclipse. E