TensorFlow: A System For Large-Scale Machine Learning

3y ago

38 Views

2 Downloads

2.14 MB

21 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Javier Atchley

Report this link

Download PDF

Transcription

TensorFlow: A System for Large-ScaleMachine LearningMartín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur,Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker,Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, Google nical-sessions/presentation/abadiThis paper is included in the Proceedings of the12th USENIX Symposium on Operating Systems Designand Implementation (OSDI ’16).November 2–4, 2016 Savannah, GA, USAISBN 978-1-931971-33-1Open access to the Proceedings of the12th USENIX Symposium on Operating SystemsDesign and Implementationis sponsored by USENIX.

TensorFlow: A system for large-scale machine learningMartı́n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur,Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker,Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang ZhengGoogle BrainAbstractTensorFlow is a machine learning system that operates atlarge scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation,shared state, and the operations that mutate that state. Itmaps the nodes of a dataflow graph across many machinesin a cluster, and within a machine across multiple computational devices, including multicore CPUs, generalpurpose GPUs, and custom-designed ASICs known asTensor Processing Units (TPUs). This architecture givesflexibility to the application developer: whereas in previous “parameter server” designs the management of sharedstate is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications,with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, andit has become widely used for machine learning research.In this paper, we describe the TensorFlow dataflow modeland demonstrate the compelling performance that TensorFlow achieves for several real-world applications.1 IntroductionIn recent years, machine learning has driven advances inmany different fields [3, 5, 24, 25, 29, 31, 42, 47, 50,52, 57, 67, 68, 72, 76]. We attribute this success to theinvention of more sophisticated machine learning models [44, 54], the availability of large datasets for tackling problems in these fields [9, 64], and the development of software platforms that enable the easy use oflarge amounts of computational resources for trainingsuch models on these large datasets [14, 20].We have developed the TensorFlow system for experimenting with new models, training them on largeUSENIX Associationdatasets, and moving them into production. We havebased TensorFlow on many years of experience with ourfirst-generation system, DistBelief [20], both simplifying and generalizing it to enable researchers to explorea wider variety of ideas with relative ease. TensorFlowsupports both large-scale training and inference: it efficiently uses hundreds of powerful (GPU-enabled) serversfor fast training, and it runs trained models for inference inproduction on various platforms, ranging from large distributed clusters in a datacenter, down to running locallyon mobile devices. At the same time, it is flexible enoughto support experimentation and research into new machinelearning models and system-level optimizations.TensorFlow uses a unified dataflow graph to represent both the computation in an algorithm and the stateon which the algorithm operates. We draw inspirationfrom the high-level programming models of dataflow systems [2, 21, 34] and the low-level efficiency of parameter servers [14, 20, 49]. Unlike traditional dataflow systems, in which graph vertices represent functional computation on immutable data, TensorFlow allows vertices torepresent computations that own or update mutable state.Edges carry tensors (multi-dimensional arrays) betweennodes, and TensorFlow transparently inserts the appropriate communication between distributed subcomputations.By unifying the computation and state management in asingle programming model, TensorFlow allows programmers to experiment with different parallelization schemesthat, for example, offload computation onto the serversthat hold the shared state to reduce the amount of networktraffic. We have also built various coordination protocols,and achieved encouraging results with synchronous replication, echoing recent results [10, 18] that contradict thecommonly held belief that asynchronous replication is required for scalable learning [14, 20, 49].Over the past year, more than 150 teams at Google haveused TensorFlow, and we have released the system as an12th USENIX Symposium on Operating Systems Design and Implementation265

open-source project.1 Thanks to our large community ofusers we have gained experience with many different machine learning applications. In this paper, we focus onneural network training as a challenging systems problem,and select two representative applications from this space:image classification and language modeling. These applications stress computational throughput and aggregatemodel size respectively, and we use them both to demonstrate the extensibility of TensorFlow, and to evaluate theefficiency and scalability of our present implementation.and write back “delta” updates to each parameter server,which combines the updates with its current state.Although DistBelief has enabled many Google products to use deep neural networks and formed the basis ofmany machine learning research projects, we soon beganto feel its limitations. Its Python-based scripting interfacefor composing pre-defined layers was adequate for userswith simple requirements, but our more advanced userssought three further kinds of flexibility:tain the current version of the model parameters. DistBelief’s programming model is similar to Caffe’s [38]: theuser defines a neural network as a directed acyclic graphof layers that terminates with a loss function. A layer isa composition of mathematical operators: for example, afully connected layer multiplies its input by a weight matrix, adds a bias vector, and applies a non-linear function(such as a sigmoid) to the result. A loss function is a scalarfunction that quantifies the difference between the predicted value (for a given input data point) and the groundtruth. In a fully connected layer, the weight matrix andbias vector are parameters, which a learning algorithmwill update in order to minimize the value of the loss function. DistBelief uses the DAG structure and knowledgeof the layers’ semantics to compute gradients for eachof the model parameters, via backpropagation [63]. Because the parameter updates in many algorithms are commutative and have weak consistency requirements [61],the worker processes can compute updates independentlyDefining new training algorithms DistBelief workersfollow a fixed execution pattern: read a batch of input dataand the current parameter values, compute the loss function (a forward pass through the network), compute gradients for each of the parameter (a backward pass), andwrite the gradients back to the parameter server. This pattern works for training simple feed-forward neural networks, but fails for more advanced models, such as recurrent neural networks, which contain loops [39]; adversarial networks, in which two related networks are trained alternately [26]; and reinforcement learning models, wherethe loss function is computed by some agent in a separatesystem, such as a video game emulator [54]. Moreover,there are many other machine learning algorithms—suchas expectation maximization, decision forest training, andlatent Dirichlet allocation—that do not fit the same moldas neural network training, but could also benefit from acommon, well-optimized distributed runtime.Defining new layers For efficiency, we implementedDistBelief layers as C classes. Using a separate, lessfamiliar programming language for implementing layers2 Background & motivationis a barrier for machine learning researchers who seek toexperiment with new layer architectures, such as sampledWe begin by describing the limitations of our previoussoftmax classifiers [37] and attention modules [53].system (§2.1) and outlining the design principles that weused in the development of TensorFlow (§2.2).Refining the training algorithms Many neural networks are trained using stochastic gradient descent(SGD), which iteratively refines the parameters of the net2.1 Previous system: DistBeliefwork by moving them in the direction that maximally deTensorFlow is the successor to DistBelief, which is creases the value of the loss function. Several refinementsthe distributed system for training neural networks that to SGD accelerate convergence by changing the updateGoogle has used since 2011 [20]. DistBelief uses the pa- rule [23, 66]. Researchers often want to experiment withrameter server architecture, and here we criticize its lim- new optimization methods, but doing that in DistBeliefitations, but other systems based on this architecture have involves modifying the parameter server implementation.addressed these limitations in other ways [11, 14, 49]; we Moreover, the get() and put() interface for the parameter server is not ideal for all optimization methods:discuss those systems in Subsection 2.3.In the parameter server architecture, a job comprises sometimes a set of related parameters must be updatedtwo disjoint sets of processes: stateless worker processes atomically, and in many cases it would be more efficientthat perform the bulk of the computation when training a to offload computation onto the parameter server, andmodel, and stateful parameter server processes that main- thereby reduce the amount of network traffic.1 Software266available from https://tensorflow.org.In addition, we designed DistBelief with a single platform in mind: a large distributed cluster of multicore12th USENIX Symposium on Operating Systems Design and ImplementationUSENIX Association

# 1. Construct a graph representing the model.x tf.placeholder(tf.float32, [BATCH SIZE, 784])y tf.placeholder(tf.float32, [BATCH SIZE, 10])# Placeholder for input.# Placeholder for labels.W 1 tf.Variable(tf.random uniform([784, 100]))b 1 tf.Variable(tf.zeros([100]))layer 1 tf.nn.relu(tf.matmul(x, W 1) b 2)# 784x100 weight matrix.# 100-element bias vector.# Output of hidden layer.W 2 tf.Variable(tf.random uniform([100, 10]))b 2 tf.Variable(tf.zeros([10]))layer 2 tf.matmul(layer 1, W 2) b 2# 100x10 weight matrix.# 10-element bias vector.# Output of linear layer.# 2. Add nodes that represent the optimization algorithm.loss tf.nn.softmax cross entropy with logits(layer 2, y)train op tf.train.AdagradOptimizer(0.01).minimize(loss)# 3. Execute the graph on batches of input data.with tf.Session() as sess:sess.run(tf.initialize all variables())for step in range(NUM STEPS):x data, y data .sess.run(train op, {x: x data, y: y data})#####Connect to the TF runtime.Randomly initialize weights.Train iteratively for NUM STEPS.Load one batch of input data.Perform one training step.Figure 1: An image classifier written using TensorFlow’s Python API. This program is a simple solution to the MNISTdigit classification problem [48], with 784-pixel images and 10 output classes.servers [20]. We were able to add support for GPU acceleration, when it became clear that this accelerationwould be crucial for executing convolutional kernels efficiently [44], but DistBelief remains a heavyweight systemthat is geared for training deep neural networks on hugedatasets, and is difficult to scale down to other environments. In particular, many users want to hone their modellocally on a GPU-powered workstation, before scaling thesame code to train on a much larger dataset. After training a model on a cluster, the next step is to push themodel into production, which might involve integratingthe model into an online service, or deploying it onto amobile device for offline execution. Each of these taskshas some common computational structure, but our colleagues found it necessary to use or create separate systems that satisfy the different performance and resourcerequirements of each platform. TensorFlow provides asingle programming model and runtime system for all ofthese environments.cations on distributed clusters, local workstations, mobile devices, and custom-designed accelerators. A highlevel scripting interface (Figure 1) wraps the constructionof dataflow graphs and enables users to experiment withdifferent model architectures and optimization algorithmswithout modifying the core system. In this subsection, webriefly highlight TensorFlow’s core design principles:Dataflow graphs of primitive operators Both TensorFlow and DistBelief use a dataflow representation for theirmodels, but the most striking difference is that a DistBelief model comprises relatively few complex “layers”,whereas the corresponding TensorFlow model representsindividual mathematical operators (such as matrix multiplication, convolution, etc.) as nodes in the dataflowgraph. This approach makes it easier for users to compose novel layers using a high-level scripting interface.Many optimization algorithms require each layer to havedefined gradients, and building layers out of simple operators makes it easy to differentiate these models automatically (§4.1). In addition to the functional operators, werepresent mutable state, and the operations that update it,2.2 Design principlesas nodes in the dataflow graph, thus enabling experimenWe designed TensorFlow to be much more flexible than tation with different update rules.DistBelief, while r

it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataﬂow model and demonstrate the compelling performance that Tensor-Flow achieves for several real-world applications. 1 Introduction In recent years, machine learning has driven advances in many different ﬁelds [3, 5, 24, 25, 29, 31, 42, 47, 50,

Related Documents:

Welcome to TensorFlow!

TensorFlow for Machine Intelligence (TFFMI) Hands-On Machine Learning with Scikit-Learn and TensorFlow. Chapter 9: Up and running with TensorFlow Fundamentals of Deep Learning. Chapter 3: Implementing Neural Networks in TensorFlow (FODL) TensorFlow is being constantly updated so books might become out

21 Views

2y ago

TensorFlow - Tutorialspoint

TensorFlow 5 Step 3: Execute the following command to initialize the installation of TensorFlow: conda create --name tensorflow python 3.5 It downloads the necessary packages needed for TensorFlow setup. Step 4: After successful environmental setup, it is

98 Views

2y ago

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

376 Views

1y ago

TensorFlow et Keras Version en ligne OFFERTE ! + QUIZ ... - fnac-static.com

TensorFlow et Keras L'IA appliquée à la robotique humanoïde Téléchargement www.editions-eni.f.frr sur www.editions-eni.fr : b près de 1500 lignes de code commentées et directement utilisables en version 2 du framework TensorFlow/Keras (version majeure) Henri LAUDE TensorFlow et Keras L'intelligence artificielle appliquée

7 Views

1y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

738 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

339 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

358 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

345 Views

1y ago

Recent Views

TENTH EDITION self-therapy for the stutterer

Stuttering Foundation of America self-therapy for the stutterer TENTH EDITION THE STUTTERING FOUNDATION PUBLICATION NO. 0012 self-therapy for the stutterer Publication No. 0012 First Edition—1978 Tenth Edition—2002 Revised Tenth Edition—2007 Published by Stuttering Foundation of America 3100 Walnut Grove Road, Suite 603 P.O. Box 11749 Memphis, Tennessee 38111-0749 Library of Congress .

3y ago

40 Views

Supply Chain Management: An International Journal

The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of Nottingham At 06:12 31 October 2018 (PT) Modern slavery challenges to supply chain management Stefan Gold International Centre for .

3y ago

29 Views

Operation London Bridge - Fremington Parish Council

OPERATION LONDON BRIDGE . 1 CONTENTS Page 2 – 1. Introduction Page 3 – 2. Protocol Page 3 – 2.1 Implementation of Protocol Page 3 – 3. Flag Flying Page 3 – 4. Proclamation Day Schedule Page 4 – 4.1 Proclamation Day Page 4 – 4.2 Proclamation Day Protocol Page 5 – 5. Books of Condolence Page 6 – 5.1 Online Book of Condolence Page 6 – 6. Events During the Period of Mourning .

3y ago

62 Views

A CONTINUUM OF QUALITY: ON FIRE

ASTM D 5132 BSS 7230 MODEL 701-S MODEL 701-S-X (export) MODEL VC-1 MODEL VC-1-X (export) MODEL VC-2 MODEL VC-2-X (export) MODEL HC-1 MODEL HC-1-X (export) MODEL HC-2 MODEL HC-2-X (export) FAA Listed TM. FAA MULTI-PURPOSE SMALL SCALE FLAMMABILITY TESTER SPECIFICATIONS: FAR Part 25 Appendix F Part I (Vertical, Horizontal, 45 and 60 ) DRAPERY FLAMMABILITY The most widely cited .

3y ago

80 Views

Combustion Analysis of Nanoenergetic Materials

Osci 1 05 10 15 P a [MPa] Acc Osci. NEEM MURI Temperature Measurements for understanding Gas Generation Previous work: gas fraction at equilibrium Drawbacks: No intermediate gases (not present at equilibrium) nAl/MoO 3 30 Many of the equilibrium gases will not be realized until very high temperatures (ex. Cu: BP of 2835K) nAl/CuO in burn tube at 10 20 e ssure [MPa] 1atm in air nAl/MoO .

3y ago

37 Views

Wiring and testing electrical equipment and circuits

circuits to occur, strain on terminations, insufficient slack cable at terminations, continuity and polarity checks, insulation checks) K21 the care, handling and application of electrical test and measuring instruments (such as multimeter, insulation resistance tester, loop impedance test instruments) K22 applying approved test procedures; the safe working practices and procedures required .

3y ago

46 Views

GRID DIP METER DESIGN - makearadio

circuits). 2. Rough frequency and harmonic measurements 3. AM signal monitor receiver. 4. Simple RF signal generator including AM modulation if required. 5. Crystal Testing. 6. Use as a BFO for SSB and CW reception 7. Measurement of unknown capacitors and inductors I decided to include some extra features above the normal in functionality RF output from the oscillator enabling use of an .

3y ago

208 Views

OPHTHALMOLOGY GOALS AND OBJECTIVES

The objectives of Ophthalmology Residency Program are to: 1. Provide residents with a strong scientific understanding of the fundamentals of ophthalmology through a combination of mentoring and didactic education. 2. Provide residents with clinical skills in all subspecialties of ophthalmology. 3.

3y ago

60 Views

History of Computers

An analog computer does not store information digitally Values are stored as voltage levels Analog computers are particularly useful solving nonlinear simultaneous differential equations An electric circuit can be defined by an equation. An analog computer is programmed by creating a circuit that follows a desired equation.

3y ago

37 Views

Risk Management and Corporate Governance - OECD

Corporate Governance Risk Management and Corporate Governance Volume 2011/Number of issue,Year of edition Author (affiliation or title), Editor Tagline Groupe de travail/Programme (ligne avec top à 220 mm)

3y ago

66 Views

RF Design and Test Using MATLAB and NI Tools

RF Design and Test Using MATLAB and NI Tools . Antenna array, RF, and digital signal processing cannot be designed separately! – Large communication bandwidth digital signal processing is challenging – High-throughput DSP linearity requirements imposed over large bandwidth

3y ago

87 Views

Digital Signal Processing - Webspaces - Accueil

J.-P. Delmas et al. / Digital Signal Processing 95 (2019) 102579. lower far-ﬁeld DOA CRB. Furthermore, thanks to the decoupling be-tween the DOA and range parameters to the second-order w.r.t. the inverse of the range in the Fisher information matrix, the deriva-tion of closed-form approximate expressions of the CRB is greatly simpliﬁed.

3y ago

23 Views

History of U.S. Children’s Policy, 1900-Present

Social dislocations of the late 19th century, sparked by rapid industrialization, population growth, urbanization, and immigration, together with the economic crises of the late 1870s and 1890s, led to social reform movements in the 1890s and during the Progressive Era at the beginning of the 20th century. With respect to children, many reformers

3y ago

53 Views

EDUKASYONG PANGKATAWAN 5 Lesson Exemplars Karapatang Ari .

nakasaad sa ilalim ng makabagong kurikulum, ang K to 12 Currriculum. Layunin nito na mabigyan ng sapat na kaalaman at pagpapahalaga sa mga gawaing may kinalaman sa pagpapaunlad ng pangangatawan. Sa paghahanda ng mga aralin na nakapaloob sa exemplar na ito, isinasaalang-alang ang mga sumusunod na pangunahing kaisipan:

3y ago

99 Views

ELECTRICAL ENGINEERING GRADUATE

Electrical Engineering, or is not equivalent to the BSEE degree offered by Cal State LA, we may require you to complete certain prerequisite courses before being admitted to our program. These will normally be 300level courses, though the list mig0- ht contain a number of 2 or 400000-0-

3y ago

30 Views

TensorFlow: A System For Large-Scale Machine Learning

It looks like you're using an ad-blocker