DAWN: Infrastructure For Usable Machine Learning

3y ago
26 Views
2 Downloads
5.31 MB
25 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Dahlia Ryals
Transcription

DAWN: Infrastructure forUsable Machine LearningPeter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia

It’s the Golden Age of Data*Incredible advances in image recognition, naturallanguage processing, planning, info retrievalSociety-scale impact: autonomous vehicles,personalized medicine, human traffickingNo end in sight for advances in ML*for the best-funded, best-trainedengineering teams

Building ML Products is Too HardMajor successes (e.g., AlphaGo, ImageNet)require hundreds to thousands of engineersHuge effort in data preparation, model tuning,experimentation, and productionizingDomain experts cannot easily or cheaply buildML products

“Only a fraction of real-world ML systemsis composed of ML code”

The DAWN QuestionWhat if anyone with domain expertise could buildtheir own production-quality ML products? Without a PhD in machine learning Without being an expert in systems Without understanding the latest hardwareIt’s happened before

It’s happened before: SearchBefore: Decades of research on informationretrieval, indexes, ranking, etcAfter: any developer can add search to anapplication by linking a library (e.g. Solr, Lucene);everyone (i.e., non-expert users) uses search

It’s happened before: SQLBefore: raw access to disk, manual layout ofrecords, network databases (CODASYL)After: SQL forms basis for transactional engines,data warehousing, business intelligence toolsKey idea: end-to-end systems that tackle thebarriers to access & production use

The DAWN StackHardware Systems Algorithms InterfacesData AcquisitionFeature EngineeringSnorkelDeepDiveModel TrainingProductionizingModelSnapModelQAMacroBase (Streaming Data)Data FusionNoScope (Video)AutoRec, SimDex (Recommendation)Mulligan (SQL graph ML)End-to-End Compilers: Weld, DeliteNew Hardware: FuzzyBit, Plasticine CGRA CPUGPUFPGAClusterMobile

Example: MacroBasefor Continuous AnalyticsEnd-to-end system to prioritize user attentionmulti-dimensionaldata streamsMacroBaseanomalies &explanations

Too much data for manual inspectionEven harder when data is streaming

MacroBase SummaryEnd-to-end system to prioritize user attention No ML expertise needed: MacroBase uses generalmodels and tunes them automatically No separate step for production use Co-design from algorithms to HWEarly users: automotive, cloud, mobile apps, manufacturingOpen source: github.com/stanford-futuredata/macrobase

The DAWN StackHardware Systems Algorithms InterfacesData AcquisitionFeature EngineeringSnorkelDeepDiveModel TrainingProductionizingModelSnapModelQAMacroBase (Streaming Data)Data FusionNoScope (Video)AutoRec, SimDex (Recommendation)Mulligan (SQL graph ML)End-to-End Compilers: Weld, DeliteNew Hardware: FuzzyBit, Plasticine CGRA CPUGPUFPGAClusterMobile

Weld: Rethinking the Interface toData Analytics LibrariesStandard approach: users combine libraries usingfunction calls that pass data via memoryProblem: for data-intensive apps, data movementcost dominates on modern hardware!func15-30x slowdowns in NumPy,Spark, TensorFlow, func2

Weld’s eld IRCPUsGPUsFPGAs Open source: weld.stanford.edu

SparkSQLWeldTPC-H Q1TPC-H .040.020NPNExprWeldVector SumRuntime [secs; log10]454035302520151050Runtime [secs]Runtime [secs]Results: Existing Frameworks100TFHand-optWeld1010.1LR(1T)1 CoreLR Cores(12T)12WorkloadLogistic RegressionIntegration effort: 500 lines glue, 30 lines/operator

Results: Cross-Library OptimizationPandas NumPy1010.1CurrentWeld, no CLOWeld, CLOWeld, 12 core31x290x2.0Runtime (sec)Runtime (sec, log10)100Spark SQL UDFScala UDFWeld1.51.00.514x0.01CLO cross-library optimization0.0Open source: weld.stanford.edu

The DAWN StackHardware Systems Algorithms InterfacesData AcquisitionFeature EngineeringSnorkelDeepDiveModel TrainingProductionizingModelSnapModelQAMacroBase (Streaming Data)Data FusionNoScope (Video)AutoRec, SimDex (Recommendation)Mulligan (SQL graph ML)End-to-End Compilers: Weld, DeliteNew Hardware: FuzzyBit, Plasticine CGRA CPUGPUFPGAClusterMobile

NoScope: Fast CNN-BasedVideo QueriesOpportunity: CNNs allow more accurate querieson visual data than everChallenge : processing 1 video in real timerequires a 1000 GPUResult: same accuracy but100-3000x faster through: Scene-specific distillation Temporal spatial localitybit.ly/NoScopeArxiv

The DAWN StackHardware Systems Algorithms InterfacesData AcquisitionFeature EngineeringSnorkelDeepDiveModel TrainingProductionizingModelSnapModelQAMacroBase (Streaming Data)Data FusionNoScope (Video)AutoRec, SimDex (Recommendation)Mulligan (SQL graph ML)End-to-End Compilers: Weld, DeliteNew Hardware: FuzzyBit, Plasticine CGRA CPUGPUFPGAClusterMobile

Training data is key enabler,barrier to entryHow can we leverage data that’sexpensive to label at scale?

Snorkel’s Approach:Weak Supervision1) User writes labeling functions: short programs thatmay not always give right label E.g. regex to search in text2) Snorkel simultaneously learns noise in LFs and anoise-aware target model (e.g. LSTM)SystemNCBI Disease CDR Disease CDR Chem.(F1)(F1)(F1)4 hours LF coding with bio experts: match months of hand-labelingTaggerOne (Dogan, 2012)*81.579.688.4high-qualityfrom sRegression79.179.6 labeling 88.4Snorkel: LSTM 8.2

DAWN: machine learning for everyone vianovel techniques and interfaces that spanhardware, systems, and algorithmsFind out more at dawn.cs.stanford.eduPeter BailisChris RéKunle OlukotunMatei Zaharia

DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of Data Incredible advances in image recognition, natural language processing, planning, info retrieval Society-scale impact: autonomous vehicles,

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Warriors dawn of the clans book 3. Warriors dawn of the clans box set. Warriors dawn of the clans book 6. Warriors dawn of the clans books in order. Seller Image Erin Hunter Published by Harpercollins Publishers Inc, New York (2016) ISBN 10: 0 062 410 075 ISBN 13: 9 780 062 410 078 New Paperback Quantity: 1 Vendor: Grand Eagle Retail .

DAWN is used to monitor trends in the adverse health consequences associated with drug use. Section 501(n) of the Public Health Service Act prohibits SAMHSA from using or disclosing DAWN data for any purpose other than that for which they were collected. Public reporting burden for DAWN emergency departments is estimated at 113 minutes per case.

rotational motion and astrophysics can have impacts on our lives, as well on the environment/society. This application and development of skills can be achieved using a variety of approaches, including investigation and problem solving. The Unit will cover the key areas of kinematic relationships, angular motion, rotational dynamics, gravitation, general relativity, and stellar physics .