AI-Systems Machine Learning Frameworks

2y ago
33 Views
2 Downloads
4.37 MB
54 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Kian Swinton
Transcription

AI-SystemsMachine LearningFrameworksJoseph E. GonzalezCo-director of the RISE Labjegonzal@cs.berkeley.edu

Class ProjectsØ Project Signup (QR Code) link on websiteØ Please add your project or join a project by WednesdayØ Project Teams ( 3 people per team)Ø One Page Project descriptions are due 9/30 at 11:59ØØØØTitle and team membersProject description and what is the key problem being studiedDiscussion of related workProposed plans for semesterØ 3 weeks until first presentation (initial results)Ø 8 weeks until end of semester project dueØ Google Doc (enable commenting so I can comment on it)Ø Submit your project description to this google form.

Objectives For TodayØ Historical Evolution of Machine Learning FrameworksØ Declarative (Lazy) vs Imperative (Eager) DSLsØ Automatic DifferentiationØ This weeks reading

Historical Context

Early ML / Stats LanguagesØ S Data Programming LanguagesØØØØDeveloped in 1976 as Bell Labs by John ChambersReplaced Fortran by providing higher level APIs, graphicsDeveloped formula syntax for describing modelsEventually replaced by R Ø R open-source implementation of S (S-Plus)Ø Developed in 1990’s at University of AucklandØ Ross Ihaka, Robert GentlemanØ Like S/S-Plus à Linear algebra abstractionsØ Rich set of libraries for statistical analysisØ Still widely used

Ø Matlab (Matrix Laboratory) – Numerical Computing Sys.ØØØØDeveloped in 1970s at the University of New Mexico by Cleve MolerDesigned to simplify access to LINPACK and EISPACKReasonable integration with C/FortranRich graphical interface with support for graphical programmingØ SimulinkØ Expensive à Octave limited open-source versionØ Popular in applied math, engineering, and controls communityØ Extremely popular in the machine learning communityØ We would joke that ML people only knew how to program MatlabØ and then it all changed

Rise of the Python Eco-SystemØ Development of %pylabØ iPython (2001) SciPy (2001) Matplotlib (2003) NumPy (2006)Ø Functions /APIs were like Matlab so easy to transitionØ Freeeeeee!Ø Scikit-learn – basic ML algorithms and models (2007)Ø Started as Google summer of code project à developed by INRIAØ Wide range of standard machine learning techniquesØ 2012 large fraction of ML community Matlab à PythonØ Why?Ø Development remained focused on algorithms libraries

Machine Learning LibrariesØ LIBLINEAR/LIBSVM (2008) – fast algorithms for fitting linear modelsand kernelized SVMsØ Developed at National Taiwan University for (still used in Sklearn)Ø Vowpal Wabbit (2010?) – out-of-core learning for generalized linearmodels and othersØ Developed by John Langford while at Yahoo!Ø Popular for high-dimensional featuresØ Weka (Java version 1997) – Collection of ML algorithms for JavaØ Developed at the University of Waikato in New ZealandØ Provided tools for visualizing and analyzing dataØ Xgboost (2014) – distributed boosted decision treesØ Developed by Tianqi Chen at University of WashingtonØ Many more

Distributed Machine Learning FrameworksØ Mahout (2009) – ML algorithms on HadoopØ Early distributed ML library with “recommender algorithms”Ø Unable to leverage memory cachingØ GraphLab (2010) – Framework for graph structured algorithmsØ Contained library of algs. (e.g., Gibbs Sampling, LoopyBP, )Ø Developed new abstractions for distributed graph algs.Ø Spark mllib / SparkML (2014) – ML algorithms for SparkØ Leverages memory cachingØ Benefits from work on GraphLab/Sklearn/SystemML

Languages vs Algorithm edded)DSLsLibraries ofAlgorithmsSimplicityØ Languages provided support for mathematical operationsØ User still implemented new models and algorithms using fundamentallinear algebra primitivesØ Libraries of Algorithms provided individual learning techniquesØ Often specialized to model/technique (fast and easy-to-use)Ø Need something in the middle!

Embedded Domain Specific LanguagesØ Domain specific languages (DSLs) provide specializedfunctionality for a given taskØ Limited functionality à simplicity and optimizationØ Example: SQL à Specialized for data manipulationØ Embedded DSLs are libraries or language extensions withina general-purpose language tailored to a specific taskØ Combine benefits of DSL and general languagesØ Example: linear algebra librariesØ Embedded DSLs have played a significant role in MLØ Linear Algebra à Pipelines à Differentiable Programs

Machine Learning PipelinesØ Scikit Learn Pipelines (2011)Ø Describes composition of feature transformations and modelsØ Enables end-to-end training and standardized predictionØ Spark ML Pipelines (Similar to SkLearn)

SystemML (VLDB’16)Ø Developed at IBMØ Domain specific language fordescribing ML algorithmsØ Python/R like but not embeddedØ Optimizer and runtime to executeon Apache SparkØ Explored range of optimizationsØ Data repartitioningØ CachingØ Distributed matrix representations

Keystone ML (ICDE’17)Ø Developed in AMPLab@BerkeleyØ Pipelines of ML algorithms andoptimization on top of SparkØØEmbedded Scala DSLOutperformed SystemMLØ Cost based optimize to selectbest version of learning algorithmbased on inputsØExample: QR vs L-BFGS

Languages vs Algorithm LibrariesLanguages(R/Matlab)Embedded DSLs raries ofAlgorithmsGeneralitySimplicityØ Increased focus on deep learning àempirical risk minimization for complex differentiable modelsØ Research shifts from algorithm design to model designØ Deep Learning Frameworks:Theano (2008), Caffe (2014),MXNet (2015), TensorFlow (2015), PyTorch (2016)Ø Combine automatic differentiation with hardware acceleration

Review ofAutomatic Differentiation

Automatic DifferentiationØ Method of computing numeric derivatives of a programby tracking the forward execution of that programØ Other methods for computing derivativesØ Manual implementation: the standard method in deep learningprior to these frameworksØ laborious and error prone!Ø Numerical differentiation: using finite differencesØ Easy, costly and sensitive to numerical precisionØ Symbolic differentiation: using computer algebraic systemsØ Expressions can grow exponentially

Illustration from“AutomaticDifferentiation inMachine Learning:a Survey”

Illustration from“AutomaticDifferentiation inMachine Learning:a Survey”

How I used to do this as agraduate student (2010).How I would cheat usingMathematica.

Automatic differentiationoperates on a program togenerate a program thatcomputes the derivativeefficiently and accurately.

Key Ideas in Automatic DifferentiationØ Leverage Chain Rule to reason about function composition latexit sha1 base64 "ti1of3mF8Ig/W5y54oBlBl uvtw " wLI LJiO8KVndfHob1 iuTiofBMwwi1vJJN Sv0CwCP2tyIhvlID39WdPoGemq0NyUW1ZgzebSvhQRQDC kn5tm41TQmmn30J8yLb2KNwG0 /latexit @@ f (g (x)) f (g (x))g (x)@x@xØ Two modes of automatic differentiationØ Forward differentiation: computes derivative during executionØ efficient for single derivative with multiple outputsØ Backward differentiation (back-propagation): computes derivative(gradient) by reverse evaluation of the computation graphØ Efficient for multiple derivative (gradient) calculation Requires caching

Forward Differentiation (Example)f (x1 , x2 ) ln (x1 ) x1 x2sin (x2 ) latexit sha1 base64 "PZVpmN2oapJJY1gNa6woxzFC5jI " kNOAh2FytGQl wW79znvm390xpHslr6MWsEZKW5AGnBIzkFS4DV7AAdrqes4 QSUKmG nw7D7eNkoTB5EyTwIeqt8nUhJq3Qt9kwwJtPVvLxP/8 gBDdALerUerWfrzXofRaes8cwG gHr4xN10ald /latexit Goal is to compute: latexit sha1 base64 "8L7kLaeyGSWrkFIa5oIcHRvBx2w " MyqVZX0ZpZXVtfWN8mZla3tnd8/cP 05i4IRpyGlCMlJY8s yVAQwioQ9XcKb 7khRKOU09HVliNRILnq5 H6fjyEOzFlZdJp1G3T uN27Na86KIowyOQBWcABtcgia4AS3QBhg8gCfwAl6NR PZeDPe56Ulo g5BH9gfHwDwNGazQ /latexit @v5@x1@ latexit sha1 base64 "dLHmN5nL8ujBgfqY0TCdFwqc3f0 " vsZJIMmZ1dZu5KwpLWxl xsVDE1j w82 cTVJo4oELZ865l7n3BLHgGmz728otLa IieyMs9fQrxjj3C J9PW8l4n/ee0EuldeymWcAJN0 z6mrTlrNrOP/sD6/AHQxZcx /latexit x1 2 and x2 5

Forward Differentiation (Example)f (x1 , x2 ) ln (x1 ) x1 x2sin (x2 ) latexit sha1 base64 "PZVpmN2oapJJY1gNa6woxzFC5jI " kNOAh2FytGQl wW79znvm390xpHslr6MWsEZKW5AGnBIzkFS4DV7AAdrqes4 QSUKmG nw7D7eNkoTB5EyTwIeqt8nUhJq3Qt9kwwJtPVvLxP/8 gBDdALerUerWfrzXofRaes8cwG gHr4xN10ald /latexit ln(2)2x1 latexit sha1 base64 "e/fJj6C04kkG2sldC0Y38pTBzU4 " AAACDHicdVDLSgMxFM34rPVVdekmWARXw8x0aOtCKLhxWcE koRRTlqKKka6iSAoChjpBOPL3O/cEiFpzG/UJCF hIachhQjpaV 0ZD0NOUoItLPZstM4alWBjCMhT5cwZn6syNDkZSTKNCVEVIj dvLxb fMr6hC gxHown49V4m5euGIueI7AE4/0L56ebew /latexit ln@x1 1@x1Goal is to compute:v1 latexit sha1 base64 "qsGOsB3T0QLL7TsBNAbGoT6P0T0 " AAACRnicfVDLSgMxFL1TX7W dy /f9CUYSwIbZCQh6LtYkk5C2hDMcVpOxIU y6nLff2MtVbIyokC4MbNYmo4 NBwDxGsNJUL 90PYFJ0o2wUAxzNOpZ069urLuLuUPTqYaW/ON//Pa0ly YxfNq2S6VkVk0zYplWymwK6WzErI0k1YBFlXv5R xSdaKaPvFDoEyg0Y79PJNiXcuK72uljNZTLWkr B08jkdwudP0d t6M97k1YyxmDuFHZeAD1BqzHg /latexit latexit sha1 base64 "8L7kLaeyGSWrkFIa5oIcHRvBx2w " MyqVZX0ZpZXVtfWN8mZla3tnd8/cP 05i4IRpyGlCMlJY8s yVAQwioQ9XcKb 7khRKOU09HVliNRILnq5 H6fjyEOzFlZdJp1G3T uN27Na86KIowyOQBWcABtcgia4AS3QBhg8gCfwAl6NR PZeDPe56Ulo g5BH9gfHwDwNGazQ /latexit 10*5x2 latexit sha1 base64 "TC08Btakt0IGXhpN0PdmK9xW7Ts " AAACDHicdVDLSgMxFM34rPVVdekmWARXw8x0aOtCKLhxWcE uTe/yYUaks69NYWV1b39gsbBW3d3b39ksHh20ZJQKTFo5YJLo koRRTlqKKka6sSAo9Bnp 81coQBpHQhys4U5c7UhRKOQ19XRkiNZa/vVz8y v6IO4XtT D9pO6ZdMZ1rt9yoLuIogGNwAs6ADWqgAa5AE7QABvfgETyDF PBeDJejbd56Yqx6DkCP2C8fwHnuJt7 /latexit @x2 0@x1 latexit sha1 base64 "JA3d0IH/6AtRjaydeoWV6B wSR0 " BlmR1ndXD2wsysKIt/sjdf is1q0al9cEMZ84531zOOCGjQhrGStMPUodHx UU4PyYnY1RLyL60fSbc rWW0dz2l5/T8xqpr255L8Kv0q09N97Vy /latexit latexit sha1 base64 "g4pYaLKQcsU3b9KvHgwzmVoAxKk " KZaaYQ5HgV987R4DVJxB EuDZjQQHv0WebOiz5OM3SuM5Thwam799KTp zcgrhtKj0MkVTTIb4nnYNFDimqp /x1TAA8MMYJRIs4SG7 x3R45jpcZxaJQx1g9qtleSf/W6mY5O Pl8L/wY1XR426d bU092 BH2fANolK3ug /latexit v4sin(5)sinv3 latexit sha1 base64 "OHDhvQutOKZ3gInMdj6kJLSdGZU " GTTulAQ eRtkMgE77LjV/hzoUbFxrj1g OUFXLFObHPFCmEV6nYmq omvooezFyVzMyCxtRYKECI/QgHQV9ZFHRC erz6D50rpQzfg6vgSztWfHTHyhJh6jnJ6SA7Fai0R/6p1I KoSvTeH/pGXmjULebBSztdIyjhQ4BWcgBwxQBjVwA qgCTC4A0/gBbxq99qz9qa9L6xr2rLnBPyC9vEJaC tFQ /latexit -@v4@v1@v21 5@x1@x1@x12@v2@x2@x1 x1 x2 5@x1@x1@x1@v3@x2 cos(5) 0@x1@x1@ latexit sha1 base64 "dLHmN5nL8ujBgfqY0TCdFwqc3f0 " vsZJIMmZ1dZu5KwpLWxl xsVDE1j w82 cTVJo4oELZ865l7n3BLHgGmz728otLa IieyMs9fQrxjj3C J9PW8l4n/ee0EuldeymWcAJN0 z6mrTlrNrOP/sD6/AHQxZcx /latexit x1 2 and x2 5Ln(2) 10 – sin(5)ln(2) 10@v11 @x11 @x1v1 @x12v2@v5@x1v5 latexit sha1 base64 "4ZGeHHKxsrTvT5GtK0S1kxWXu1M " AAACZXicdZHNS8MwGMbT jWn007FiweDQxCE0XbdpgdB8OJxgtPBNkqapVtY kGSiqP2n/Tm1Yv/hqmbX5u EHjyvL XJE UlbUslIJ YEBR4j9974Ku/fPxAuaBTeyklM gEahtSnGEllucZTz 6EdcrVDCD/fnRIoCISaBp8gAyZGY7 pFKQ6mOKKoTPl8L/xZ1dtWpV qa7OZXfCr9MN3IPG3lQ /latexit @v5@v4@v31 5 @x1@x1@x12Ø Notice that only last resultsneed to be storedØ Would need to repeat for x2

Backward (Reverse) Differentiationf (x1 , x2 ) ln (x1 ) x1 x2 latexit sha1 base64 "PZVpmN2oapJJY1gNa6woxzFC5jI " kNOAh2FytGQl wW79znvm390xpHslr6MWsEZKW5AGnBIzkFS4DV7AAdrqes4 QSUKmG nw7D7eNkoTB5EyTwIeqt8nUhJq3Qt9kwwJtPVvLxP/8 gBDdALerUerWfrzXofRaes8cwG gHr4xN10ald /latexit ln(2)2x1 latexit sha1 base64 "DrFWDgwEA7PScGEUN3vfOhV2pyg " AAACVnicdVFdS8MwFE2r86N VX305eJQBGE0VXQ CIIvPiq4KayjpFk6g kHSTocZX9SX/Sn CKmc uQfspjTok2VOgmQURk TAKMezCaRBLQssgJ1JzImAQ4tFX92C68fSgmt4HcAD GPwpBh CwHFCt qDKGHpeMajE0gFDJzV2B3hGTlDY/UYXw VKYDtp Ax80/KvD mnk30o2z3Hf/OssU /latexit lnGoal is to compute:v1@v1@v2x̄1 v̄1 v̄2 v̄ @v4 v̄ 114@x1@x1@v1110 1 5 1 5.52 latexit sha1 base64 "YQg7PG6yow4iEqyFoXZSq9DBfC8 " AAACO3ichVDLSsNAFJ34tr6iLt0MFkFBSlKfS8GNyypWC00Ik mkHZw8mLkplpD/cuNPuHPjxoUibt07aQNqFTwwcOace /MPX4iuALLejQmJqemZ2bn5isLi0vLK 8EeKh zhklEQA00IlVz/FdMe0RmBjrsIwR5f 17o1n4814H5VOGGXPOvoB4 MTBOWumA /latexit ln(2) 10 sin (x2 )@v5 @v5,@x1 @x2v4 @ latexit sha1 base64 "dLHmN5nL8ujBgfqY0TCdFwqc3f0 " vsZJIMmZ1dZu5KwpLWxl xsVDE1j w82 cTVJo4oELZ865l7n3BLHgGmz728otLa IieyMs9fQrxjj3C J9PW8l4n/ee0EuldeymWcAJN0 z6mrTlrNrOP/sD6/AHQxZcx /latexit x1 2 and x2 5Ln(2) 10 – sin(5)-v5 latexit sha1 base64 "cq8PjALA5rQWAHWNUTjBaaNLJJE " C2YorTXiQo9l1Ou 2BEWimEOp04lXe1Q DsljkxRyqEGaJwSETlCg 9aRG /latexit v2 latexit sha1 base64 "IK8C FE2nt/EO5Q yDq ojdBnHE " 3W0FwL1g4lI54jWMsZXyZ BGh2K mO5nsnbetm2MMSQEl0u2IdVqpYArgBPLIIsWqPcz791BQCOP wA2nK1zBTv0/ExFNq4jmm0yN6pH57ifiX14m0W nF3A8jzXw6X cuWYuZA/QD1tsnhN2Vtg /latexit * latexit sha1 base64 "67Sq5CGGs1VU73Pkq W1leGVpvQ " ikmTY08yDJFMowH tKehjz0q /HsyASea2YA3UDo5ys4Y1cnYuxJOfUc7fSwGsnfWkr wAN4Ai/g1Xg0no1342NuXTMWMyfgRxlf3/D6pE4 /latexit @v5v̄5 1@v5 latexit sha1 base64 "4Ub1Pn7ZKsU2QQRdpIHGcSDhFPo " cKUR dSDLiu4J13PF16ncmTCoeBnd6GrG DDufqzIyG UlPfNZU 0SP120vFv7xerL1qP V6sJ6sV ttUbpmLXtOwAqsjy/C4aAP /latexit 5@v4v̄2 v̄4 1@v2sin(5)x2sin latexit sha1 base64 "9xB9xUSU8ollXTDvNGrRqrHYxkQ " yRYQTtOCHUsaYTLBI9rXMMA Lqg4QFUaxoQBaLvJhDFcIsMThkghLFZxpgIpj KyRjrJNSOtcshO9L4f CfjzXhfWHPGcuYY/Cjj8wvnPqRI /latexit @v2@v3x̄2 v̄2 v̄3 2 1 cos(5) @x2@x2 latexit sha1 base64 "qFu6IRj2w4Grp1ifB8eH7TpJ9tU " KYzF N0bG5 YnJqemU3NzS8sLqWXV pGxZrxGlNS6XOfGi5FyGtWWMnPI81p4Et Q2TQCNV2 q3ZUSwOeGiZpMY0CI5sq0 GlgTC/wXTKg9tL89hLxL68R226p1RdhFFsess HurEEqyApBzpCc2ZlzxHKtHB/BXZJNWXWVZiU8LUp/E/q RzZy VP9jOVwqiOGbSG1lEWEVREFXSMqqiGGLpF9 gRPXl33oP37L18Rse80cwq gHv9QNff5c2 /latexit latexit sha1 base64 "MWu4sXWMsJhm11tmHU0UDqJBe/U " hiIBuW2clsMmT2g5nZYFjyF33Qp/6PvvhQ6WySoq16YeDcc 6ZjzNhJoU2rvuz4mxtv3u/s7tXre1/ PipfnDY12muGO xVKZqEFLNpUh4zwgj SBTnMah5Nfh7KrUr dcaZEmP8wi46OYThIRCUaNpYL61A pKm6WgYeXfqQoK/yMKiOoxHngLZ pjcnYTyhD PtSfBv0vSZpNb3vXxuds00cu3AMJ/AFCJxDB75BF3rA4BZ wW94rNxXHhxwnPWoU9l4juCfcmp/AH1Ss5A /latexit @v5v̄5 1@v4v̄4 @v5v̄3 v 5 @v31 1.716v3 latexit sha1 base64 "QfppGDs3TXji5OH1l7osulvfyio " It1Lsi1UsLRgx4pFhI7IgHUNDYjPVC/5PDSFB0bpQy mIBdQiz1GCfS0a1mBhCqOTmr5AOiYlLm2yzEL4uhf 9aT9WZNZ605az6zC37Aev8A41alHA /latexit 1 1

Backward (Reverse) DifferentiationØ Performs well when computing large gradients relative tonumber of function outputsØ When might forward differentiation perform well? Why?Ø Requires caching or recomputing intermediateactivations from forward passØ Active research on what to recompute vs cachexLossggg

Deep Learning Frameworks

Declarative vs Imperative AbstractionsØ Declarative (define-and-run): Embedded DSL used toconstruct static computation graphØ Examples: Theano (2010), Caffe (2014), TensorFlow (2015)Ø Easier to optimize, distribute, and export modelsØ Imperative (define-by-run): Embedded DSL used todirectly compute output resulting in a dynamiccomputation graph defined by the programØ Examples: Chainer (2015), autograd (2016), PyTorch (2017)Ø Interpreted execution of inference and gradientØ Easier to program and debugØ Hybrid Approaches: Current researchØ TensorFlow Eager, MXNet

Theano – Original Deep Learning FrameworkØ First developed at the University of Montreal (2008)Ø from Yoshua Bengio’s groupØ Abstraction: Python embedded DSL (as a library) toconstruct symbolic expression graphs for complexmathematical expressionsØ System: a compiler for mathematical expressions in PythonØ Optimizes mathematical expressions (e.g., (A b)(A b) (A b) 2)Ø CPU/GPU accelerationØ Also automatic differentiation

xDeclaring Variablesywbdotexplog-1**1log1 *1/-**2p 1xent0.01*MeanSumBuilding Expression GraphWhat is thevalue (type)of prediction?Note that this looks like a NumPyexpressionThis is more difficult todebug and reason about.1-Gradient operationcan traverse graph Cost

Declaring VariablesUpdates shared variablesafter computationInstantiatingValuesBuilding Expression GraphWhat is thevalue (type)of prediction?Note that this looks like a NumPyexpressionThis is more difficult todebug and reason about.Function call compiles graphs intooptimized native execution.

Theano Compilation of onGPUTransferCodeGenerationØ Rewriting (simplify) mathematical expressionØ Exp(log(x)) xØ Duplicate code eliminationØ Important because gradient rewrites introduce redundancyØ Recall gradient calculations extend graph via the chain rule

Theano Compilation of onGPUTransferCodeGenerationAddresses numerical stability of operationsØ Example: for x 709, x 710 what is the value oflog(1 exp(x)) Ø for x 709 è 709Ø for x 710 è infØ Rewritten as x for x 709

Theano Compilation of onGPUTransferØ Rewrite subgraphs to more efficient formsØ pow(x,2) à square(x)Ø Tensor slicing à memory aliasingØ Mapping to best version of GEMM routinesCodeGeneration

Theano Compilation of onGPUTransferCodeGenerationØ GPU versions of ops are introduced (where possible)Ø Copy routines are added to move data

Theano Compilation of onGPUTransferCodeGenerationØ Generate and link C and CUDA implementations ofoperatorsØ Picking from existing implementationsØ Specialization for different dtypes

What happened to Theano?Ø Fairly advanced compared to TensorFlow (TF) in 2016Ø Symbolic gradient optimization and wide range of operatorsØ Initially faster than TensorFlowØ What happened?Ø Didn’t have the backing of a large industrial groupØ TensorFlow was being pushed heavily by GoogleØ Did not support multi-GPU/distributed computation and limitedsupport for user defined parallelizationØ TensorFlow had more built-in deep learning operatorsØ Theano lacked visualization tools (e.g., TensorBoard)Ø Complaints about error messages ?

PyTorchØ Imperative DL library which works like NumPy (on GPUs)tensor([2.0814], device 'cuda:0')tensor([2.0814], dtype torch.float64)Ø and supports automatic differentiation# tensor([[3., 3.],#[3., 3.]], grad fn AddBackward0 )# tensor(27., grad fn MeanBackward0 )# tensor([[4.5000, 4.5000],#[4.5000, 4.5000]])

This weeks readings

Reading for the WeekØ Automatic differentiation in ML: Where we are and where we shouldbe goingØØNeurIPS’18Provides an overview of the state of automatic differentiationØ TensorFlow: A System for Large-Scale Machine LearningØØOSDI’16The primary TensorFlow paper discusses system and design goalsØ JANUS: Fast and Flexible Deep Learning via Symbolic Graph Executionof Imperative ProgramsØØNSDI’19Recent work exploring a method to bridge Declarative and Imperativeapproaches in TensorFlow

Extra Suggested ReadingØ Automatic Differentiation in Machine Learning: aSurvey(JMLR’18)Ø Longer discussion on automatic differentiation in MLØ Theano: A CPU and GPU Math Compiler in Python(SciPy’10)Ø Great overview of AD and Theano systemØ TensorFlow Eager: A Multi-Stage, Python-Embedded DSLfor Machine Learning (arXiv’19)Ø Good follow-up to TF paper addressing limitations

Automatic differentiation inML: Where we are andwhere we should be going?Bart van Merriënboer, Olivier Breuleux, Arnaud Bergeron,Pascal LamblinFrom Mila (home of Theano) and Google Brain (home of TF)

Automatic differentiation in ML: Where weare and where we should be going?Ø Context: A vision paper that outlines the current state of automaticdifferentiation techniques and proposes a new functional, typedintermediate representation (IR)Ø Key Idea: Observe convergence of imperative and declarativeapproaches and draws connections to compilers à argues for the needfor a common IR like those found in modern compilers.Ø Contribution: Frames problem space and range of techniques.Ø Rational for Reading: condensed context and some insights for futureresearch directions

TensorFlow: A System forLarge-Scale MachineLearningLarge fraction of Google Brain team under Jeff Dean

ContextØ Need for distributed training for Deep LearningØ Parameter server abstractions were too generalØ Difficult to useDistBelief FrameworkØ Theano not designed for distributed setting

Big IdeasØ Adopts a dataflow programming abstractionØ Inspired by distributed data processing systems (@ google)Ø Resulting abstraction is very similar to TheanoØ Fine grained placement of operations on devicesØ Support multiple distributed concurrency protocols

Recent advances in TensorFlowØ Keras : high-level layer composition APIDiscussion onTensorFlow Eager innext section.

What to think about when readingØ Relationship and comparisons to Theano?Ø Support for distributed computing and exposedabstraction?Ø What are the implications of design decisions on anEager ExecutionAdditional ReadingØ TensorFlow: Large-Scale Machine Learning onHeterogeneous Distributed Systems

JANUS: Fast and Flexible DeepLearning via Symbolic GraphExecution of ImperativeProgramsEunji Jeong* et al. at Seoul National University*Currently visiting in the RISE Lab

ContextØ In response to PyTorch Google recently releasedTesorFlow EagerØ Pro: Simplifies programming especially for dynamic graphsØ Con: limited opt. and interpreted execution àdegraded training performance

Big IdeasØ Convert imperative executions into dataflow graphsØ Combines:Ø Python program analysis to generate symbolic graphØ Execution profiling to observe outcomes of dynamic comp.Ø Leverage profiling to specu

Key Ideas in Automatic Differentiation ØLeverage Chain Ruleto reason about function composition ØTwo modes of automatic differentiation ØForward differentiation:computes derivative during execution Øefficient for single derivative with multiple outputs ØBackward differentiation (back-propagation): computes derivative

Related Documents:

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

Machine learning has many different faces. We are interested in these aspects of machine learning which are related to representation theory. However, machine learning has been combined with other areas of mathematics. Statistical machine learning. Topological machine learning. Computer science. Wojciech Czaja Mathematical Methods in Machine .

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL .

with machine learning algorithms to support weak areas of a machine-only classifier. Supporting Machine Learning Interactive machine learning systems can speed up model evaluation and helping users quickly discover classifier de-ficiencies. Some systems help users choose between multiple machine learning models (e.g., [17]) and tune model .

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

Analysis of curriculum /learning frameworks for the early years (birth to age 8) 12 Context 12 Themes from national and international curriculum/learning frameworks 15 1. Early years lay the foundation for future learning 15 . Some states such as Tasmania have used common language and organisers across all children from birth to sixteen years.

Artificial Intelligence, Machine Learning, and Deep Learning (AI/ML/DL) F(x) Deep Learning Artificial Intelligence Machine Learning Artificial Intelligence Technique where computer can mimic human behavior Machine Learning Subset of AI techniques which use algorithms to enable machines to learn from data Deep Learning

patterns during design phase Frameworks Data Entry Frameworks, Business Rules Frameworks, etc. Design Patterns: Elements of Reuseable Object-Oriented Software By Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides COTS Best Practice I.e, Documentum, Crystal Enterprise, Oracle Security, SQL Server, etc. Focus on Frameworks