UC#BERKELEY# Mesos: A Platform For Fine- Grained Resource .

3y ago

51 Views

2 Downloads

1.39 MB

37 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Ophelia Arruda

Report this link

Download PDF

Transcription

UCBERKELEYMesos: A Platform for FineGrained Resource Sharingin Data Centers (II)Anthony D. JosephLASER Summer SchoolSeptember 2013

My Talks at LASER 20131. AMP Lab introduction2. The Datacenter Needs an Operating System3. Mesos, part one4. Dominant Resource Fairness5. Mesos, part two6. Spark2

Collaborators Matei Zaharia Benjamin Hindman Andy Konwinski Ali Ghodsi Randy Katz Scott Shenker Ion Stoica3

Apache MesosSpark CassandraMPIHiveHadoopHypertbaleA common resource sharing layer for diverse frameworksPIQL SCADSDatacenter “OS” (e.g., Apache Mesos)Node OS(e.g. Linux)Node OS(e.g. Windows) Node OS(e.g. Linux)Run multiple instances of the same framework» Isolate production and experimental jobs» Run multiple versions of the framework concurrentlySupport specialized frameworks for problem domains4

Implementation20,000 lines of C APIs in C, C , Java, and PythonMaster failover using ZooKeeperFrameworks ported: Hadoop, MPI, TorqueNew specialized frameworks: Spark, Apache/HaProxyOpen source Apache project http://mesos.apache.org/5

FrameworksPorted frameworks:» Hadoop (900 line patch)» MPI (160 line wrapper scripts)New frameworks:» Spark, Scala framework for iterative jobs (1300 lines)» Apache haproxy, elastic web server farm (200 lines)6

IsolationMesos has pluggable isolation modules to isolatetasks sharing a nodeCurrently supports Linux Containers and Solarisprojects» Can isolate memory, CPU, IO, network bandwidthCould be a great place to use VMs7

Apache e servers require coordination» Leader Election, Group Membership, Work Queues, DataSharding, Event Notifications, Configuration, and ClusterManagementHighly available, scalable, distributed coordination kernel» Ordered updates and strong persistence guarantees» Conditional updates (version), Watches for data changes

Connect tocurrently activemasterMaster FailureSchedulerHadoopJobTrackerMesos Master2Mesos Master3Mesos Master1SchedulerMPISchedulerZooKeeper used toelect one activeMesos masterZoo1Zoo5ZooKeeperZoo4Zoo2Zoo3Connect tocurrently activemasterSlave 1HadoopExecutorMPI ExecutorSlave 2HadoopExecutorHadoop ExecutorSlave 3JVM Executor9

Resource RevocationKilling tasks to make room for other usersKilling typically not needed for short tasks» If avg task length is 2 min, a new framework gets 10% ofall machines within 12 seconds on avgHadoop job and task durations at Facebook10

Resource Revocation (2)Not the normal case because fine-grained tasks enablequick reallocation of resourcesSometimes necessary:» Long running tasks never relinquishing resources» Buggy job running forever» Greedy user who decides to makes his task longSafe allocation lets frameworks have long running tasksdefined by allocation policy» Users will get at least safe share within specified time» If stay below safe allocation, task won’t be killed11

Resource Revocation (3)Dealing with long tasks monopolizing nodes» Let slaves have long slots and short slots» Short slots killed if used too long by a taskRevoke only if a user is below its safe share and isinterested in offers» Revoke tasks from users farthest above their safe share» Framework given a grace period before killing its tasks12

Example: Running MPI on MesosUsers always told their safe share» Avoid revocation by staying below itGiving each user a small safe share may not beenough if jobs need many machinesCan run a traditional HPC scheduler as a userwith a large safe share of the cluster, and have MPIjobs queue up on it» E.g. Torque gets 40% of cluster13

Example: Torque on MesosFacebook.com20%Job 1Job 2TorqueAdsSpamUser 140%40%User 2Job 1Safe share 40%MPIJobMPI MPI MPIJobJobJobJob 414

Some Mesos Deployments1,000’s of nodes running over a dozenproduction servicesGenomics researchers using Hadoop andSpark on MesosSpark in use by Yahoo! ResearchSpark for analyticsHadoop and Spark used by machine learningresearchers15

Results

Share of ClusterDynamic Resource park1316191121151181211241271301331Time (s)17

Elastic Web Server FarmLoadcalculationHTTP d genframeworkresourceofferMesos masterMesos slaveMesos slaveLoad genexecutorWebexecutorLoad che)statusupdateMesos slaveLoad gen executorWebexecutortasktasktask(Apache)18

Web Framework Results19

ScalabilityTask startup overhead with 200 frameworks20

Fault ToleranceMean time to recovery, 95% confidence21

Deep Dive ExperimentsMacrobenchmark experiment» Test the benefits of using Mesos to multiplex a clusterbetween multiple diverse frameworksHigh level goals of experiment» Demonstrate increased CPU/memory utilization due tomultiplexing available resources» Demonstrate job runtime speedups22

Macrobenchmark setup100 Extra Large EC2 instances (4 cores/15GB ramper machine)Experiment length: 25 minutesRealistic workload1. A Hadoop instance running a mix of small and largejobs based on the workload at Facebook2. A Hadoop instance running a set of large batch jobs3. Spark running a series of machine learning jobs4. Torque running a series of MPI jobs23

Goal of experimentRun the four frameworks and correspondingworkloads » 1st on a cluster that is shared via Mesos» 2nd on 4 partitioned clusters, each ¼ the size of theshared clusterCompare resource utilization and workloadperformance (i.e., job run times) on staticpartitioning vs. sharing with Mesos24

Macrobenchmark Details: Breakdown of the Facebook Hive(Hadoop) Workload mixBinJob TypeMap TasksReduce TasksJobs Run1Selection1NA382Text gation1001066Selection200NA67Text Search400NA48Join40030225

Results: CPU Allocation100 node clusterHadoop (Facebook Mix)Hadoop (Batch Jobs)26

Sharing With Mesos vs. No-Sharing (Dedicated Cluster)10.80.60.40.20(b) Large Hadoop MixShare of ClusterShare of Cluster(a) Facebook Hadoop MixDedicated 01600Dedicated ClusterMesos05001000Time (s)2004006008001000Time (s)Share of ClusterShare of Cluster25003000(d) Torque / MPIDedicated ClusterMesos02000Time (s)(c) .20Dedicated ClusterMesos0200400600800100012001400Time (s)271600

CPU Utilization (%)Cluster Utilization Mesos vs. Dedicated 0014001600Memory Utilization (%)Time 1600Time (s)28

Job Run Times (and Speedup) Grouped By FrameworkFrameworkSum of Exec times on Sum of Exec TimesDedicated Cluster (s)on Mesos (s)SpeedupFacebook Hadoop Mix723563191.14Large Hadoop Mix314314942.10Spark168413381.26Torque / MPI321033520.962x speedup forLarge Hadoop Mix29

Job Run Times (and Speedup) Grouped by Job TypeFrameworkFacebook HadoopMixJob Typeselection (1)text search (2)aggregation (3)selection (4)aggregation (5)selection (6)text search (7)join (8)Large Hadoop Mixtext searchSparkALSTorque / MPIsmall tachyonlarge tachyonTime on DedicatedCluster (s)24318265192136137662314337261822Avg. Speedupon .8830

Discussion: Facebook Hadoop Mix ResultsSmaller jobs perform worse on Mesos:» Side effect of interaction between fair sharing performedby Hadoop framework (among its jobs) and performedby Mesos (among frameworks)» When Hadoop has more than 1/4 of the cluster, Mesosallocates freed up resources to framework farthest belowits share» Significant effect on any small Hadoop job submittedduring this time (long delay relative to its length)» In contrast, Hadoop running alone can assign resources tothe new job as soon as any of its tasks finishes31

Discussion: Facebook HadoopMix ResultsSimilar problem with hierarchical fair sharingappears in networks» Mitigation #1: run small jobs on a separate framework, or» Mitigation #2: use lottery scheduling as the Mesosallocation policy32

Discussion: Torque ResultsTorque is the only framework that performedworse, on average, on Mesos» Large tachyon jobs took on average 2 minutes longer» Small ones took 20s longer33

Discussion: Torque ResultsCauses of delay» Partially due to Torque having to wait to launch 24 tasks onMesos before starting each job – average delay is 12s» Rest of the delay may be due to stragglers (slow nodes)» In standalone Torque run, two jobs each took 60s longerto run than others» Both jobs used a node that performed slower on singlenode benchmarks than the others (Linux reported a 40%lower bogomips value on the node)» Since tachyon hands out equal amounts of work toeach node, it runs as slowly as the slowest node34

Macrobenchmark SummaryEvaluated performance of diverse set of frameworksrepresenting realistic workloads running on Mesosversus a statically partitioned clusterShowed 10% increase in CPU utilization, 18%increase in memory utilizationSome frameworks show significant speed ups in jobrun timeSome frameworks show minor slowdowns in job runtime due to experimental/environmental artifacts35

SummaryMesos is a platform for sharing data centersamong diverse cluster computing frameworks» Enables efficient fine-grained sharing» Gives frameworks control over schedulingMesos is» Scalable (50,000 slaves)» Fault-tolerant (MTTR 6 sec)» Flexible enough to support a variety of frameworks(MPI, Hadoop, Spark, Apache, )36

My Talks at LASER 20131. AMP Lab introduction2. The Datacenter Needs an Operating System3. Mesos, part one4. Dominant Resource Fairness5. Mesos, part two6. Spark37

per machine)! Experiment length: 25 minutes! Realistic workload! 1. A Hadoop instance running a mix of small and large jobs based on the workload at Facebook! 2. A Hadoop instance running a set of large batch jobs! 3. Spark running a series of machine learning jobs! 4. Torque running a series of MPI jobs! 23!

Related Documents:

Mesos: A Platform for Fine-Grained Resource Sharing in the Data ... - UMass

We begin our description of Mesos by discussing our de-sign philosophy. We then describe the components of Mesos, our resource allocation mechanisms, and how Mesos achieves isolation, scalability, and fault tolerance. 3.1 Design Philosophy Mesos aims to provide a scalable and resilient core for enabling various frameworks to efﬁciently share .

6 Views

1y ago

Monitoring Mesos, Docker, Containers with Zabbix

Graphite key: mesos-masterx.mesos-master.gauge.master_elected Zabbix Host: mesos-masterx Zabbix key: graphite[mesos-master.gauge.master_elected] Pros: Possible to use Graphite functions in zabbix requests (zabbix key) 21

6 Views

1y ago

mesos - riptutorial.com

Capítulo 1: Empezando con mesos Observaciones Esta sección proporciona una descripción general de qué es mesos y por qué un desarrollador puede querer usarlo. También debe mencionar cualquier tema grande dentro de mesos, y vincular a los temas relacionados. Dado que la Documentación para mesos es nueva, es posible que deba crear

6 Views

1y ago

Michael Whittaker - GitHub Pages

Figure 2: Mesos architecture diagram, showing two running frameworks (Hadoop and MPI). 3 Architecture We begin our description of Mesos by discussing our de-sign philosophy. We then describe the components of Mesos, our resource allocation mechanisms, and how Mesos achieves isolation, scalability, and fault tolerance. 3.1 Design Philosophy

6 Views

1y ago

CS 6453 - Lecture 6: Mesos Platform - Cornell University

The tachyon ray tracing job is the only one which performed worse on Mesos than on the static partition This is likely a result of the job's long task times and strong interdependency -it runs as slowly as the slowest node so stragglers drag it down Overall the Mesos platform imposes about a 4% overhead

7 Views

1y ago

Introduction to - University of California, Berkeley

Connects to Mesos master. Accepts or declines resources. Contains delay scheduling logic for rack locality, etc. Executor Connects to local Mesos slave. Runs framework tasks. Mesos handles scheduler-executor communication. Metaframeworks Aurora, Marathon, Chronos Data processing Spark, Storm, Myriad (Hadoop)

5 Views

1y ago

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Figure 1: CDF of job and task durations in Facebook's Hadoop data warehouse (data from [38]). ations, and shown that Spark can outperform Hadoop by 10x in iterative machine learning workloads. This paper is organized as follows. Section 2 details the data center environment that Mesos is designed for. Section 3 presents the architecture of Mesos.

7 Views

1y ago

NORTHERN CALIFORNIA GROUNDWATER DEPTH CHART

berkeley berkeley lab 15 47 8/11/1994 cl berkeley bldg. 64 25 4/8/1997 gp/cl berkeley lbl 60 60 7/23/1997 berkeley near university 7 21.5 7/1/1999 land fill berkeley san pablo 20 30 03/27/92 cl sw berkeley uclbnl 23 25 12/30/1998 cl berkeley uclbnl 15 16 11/21/91 cl

81 Views

2y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

LIST OF EXHIBITORS

jec world 2020 list of exhibitors / updated on november 20, 2019 company name country corelite inc usa coriolis composites france corso magenta france new cousin composites france covestro deutschland ag germany cpic international co.,limited hong-kong cqfd composites france creaform (ametek sas - division creaform) france crepim france ctmi france

2y ago

115 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Google Cheat Sheets - Shake Up Learning

Google Slides Cheat Sheet p. 15-18 Google Sheets Cheat Sheet p. 19-22 Google Drawings Cheat Sheet p. 23-26 Google Drive for iOS Cheat Sheet p. 27-29 Google Chrome Cheat Sheet p. 30-32 ShakeUpLearning.com Google Cheat Sheets - By Kasey Bell 3

2y ago

296 Views

UC#BERKELEY# Mesos: A Platform For Fine- Grained Resource .

It looks like you're using an ad-blocker