Hadoop 3 And Docker On Amazon EMR 6.0 - Amazon Web Services, Inc.

1m ago
4 Views
0 Downloads
544.44 KB
20 Pages
Last View : 9d ago
Last Download : n/a
Upload by : Elisha Lemon
Transcription

Hadoop 3 and Docker on Amazon EMR 6.0.0 Imtiaz (Taz) Sayed Data Analytics Tech Leader, AWS April 2020 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda What is Amazon EMR? New in EMR release 6.0.0 Major application version updates Amazon AMI for EMR update Docker on YARN support Apache Hive & LLAP Demo Q&A 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon EMR Amazon EMR Data Lake 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Application Versions Hadoop 3.2.1 Spark 2.4.4 HUE 4.4.0 Livy 0.6.0 MXNet 1.5.1 Oozie 5.1.0 Phoenix 5.0.0 ZooKeeper 3.4.14 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Presto 0.227 Hive 3.1.2 SageMaker 1.2.6 TensorFlow 1.14.0 HBase 2.2.3 Tez 0.9.2 Zeppelin 0.9.0

EMR 6.0.0 – Amazon Machine Image (AMI) Amazon Linux 2 Enhanced performance with Systemd support Also available as a docker image Amazon Corretto JDK 8 Production ready distro of the OpenJDK Multiplatform support Python 3 Default version in EMR 6.0.0 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

YARN: Docker Support 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Docker on YARN Dependency Management: Today Dependency Management: Docker/YARN Scope: Cluster wide Mode: Custom AMI, Bootstrap Actions Jobs EMR Cluster Hosts D D D Scope: Job Mode: Docker container Jobs D D D D Dependency 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. D D EMR Cluster Hosts D Dependency

EMR 6.0.0 – Docker on YARN - Workflow YARN App or Spark Job Docker Image Jobs Docker Image Repository 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EMR Cluster Hosts D

EMR 6.0.0 – Docker on YARN – Docker Image Public Subnet Deployment Jobs D EMR Cluster Hosts docker pull hadoop-docker Private Subnet Deployment: ECR Jobs EMR Cluster Hosts docker pull hadoop-docker 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. D

EMR 6.0.0 – Docker on YARN - Dockerfiles FROM centos RUN yum install -y epel-release RUN yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel RUN yum -y install python36u python36u-dev python36u-pip RUN pip3 install numpy FROM amazoncorretto:8 RUN yum -y update RUN yum -y install yum-utils RUN yum -y groupinstall development RUN yum -y install python3 python3-dev python3-pip python3-virtualenv ENV PYSPARK DRIVER PYTHON python3 ENV PYSPARK PYTHON python3 RUN pip3 install –upgrade pip RUN pip3 install numpy RUN python3 -c "import numpy as np" 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Docker on YARN – Job Submission YARN CONTAINER RUNTIME TYPE YARN CONTAINER RUNTIME DOCKER IMAGE vars "YARN CONTAINER RUNTIME TYPE docker,YARN CONTAINER RUNTIME DOCKER IMAG E hadoop-docker" hadoop jar hadoop-examples.jar pi \ -Dyarn.app.mapreduce.am.env vars \ -Dmapreduce.map.env vars \ -Dmapreduce.reduce.env vars spark-submit --master yarn \ --conf spark.executorEnv.YARN CONTAINER RUNTIME TYPE docker \ --conf spark.executorEnv.YARN CONTAINER RUNTIME DOCKER IMAGE hadoop-docker \ --conf spark.yarn.AppMasterEnv.YARN CONTAINER RUNTIME TYPE docker \ --conf spark.yarn.AppMasterEnv.YARN CONTAINER RUNTIME DOCKER IMAGE hadoop-docker 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Docker on YARN – EMR Notebooks Amazon EMR Notebooks Serverless Jupyter notebook Durable storage Independent of compute infrastructure EMR Notebooks with Docker on YARN Demo time! 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Apache Hive 3.1.2 / LLAP 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Apache Hive 3.1.2 and LLAP Performance gains in 3.1.2 LLAP support for Apache Hive Persistent daemons Dynamic in-memory caching 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Apache Hive 3.1.2 Performance 2X faster than EMR 5.29.0 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Apache Hive LLAP Performance 27% faster in LLAP mode 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Resources for modernizing big data to Amazon EMR Detailed migration guide Free onsite 2-day workshop Deconstruction of workloads Future state architecture Migration plan and recommended next steps AWS Professional Services and partners available for implementation Visit aws.amazon.com/emr/emr-migration/ Email emr-migration-help@amazon.com 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EMR 6.0.0 – Documentation References EMR: What’s New? e/emr-whatsnew.html EMR: Release Notes 6.0.0 e/emr-release6x.html#emr-600-relnotes Hive Performance on EMR 6.0.0 is-2x-faster-with-hive-llapon-emr-6-0-0/ Spark on Docker with EMR 6.0.0 plications-with-dockerusing-amazon-emr-6-0-0-beta/ 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you! 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Q&A 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR. Data Lake. Amazon EMR

Related Documents:

Docker Quickstart Terminal Docker Quickstart Terminal Docker . 2. docker run hello-world 3. . Windows Docker : Windows 7 64 . Windows Linux . 1.12.0 Docker Windows Hyper-V Linux 1.12 VM . docker . 1. Docker for Windows 2. . 3. . 1.11.2 1.11 Linux VM Docker, VirtualBox Linux Docker Toolbox .

o The Docker client and daemon communicate using a RESTAPI, over UNIX sockets or a network interface. Docker Daemon(dockerd) listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes. Docker Client(docker) is the primary way that many Docker users interact with Docker. When docker run

Docker images and lauch Docker containers. Docker engine has two different editions: the community edition (Docker CE) and the enterprise edition (Docker EE). Docker node/host is a physical or virtual computer on which the Docker engine is enabled. Docker swarm cluster is a group of connected Docker nodes.

Introduction to Containers and Docker 11 docker pull user/image:tag docker run image:tag command docker run -it image:tag bash docker run image:tag mpiexec -n 2 docker images docker build -t user/image:tag . docker login docker push user/image:tag

Exercise: How to use Docker States of a Docker application: – Dockerfile Configuration to create a Docker Image. – Docker Image Image can be loaded by Docker and is used to create Docker Container. – Docker Container Instance of a Docker Image. Dockerfile – Build a Docker Image from Dockerfile wi

3.Install the Docker client and daemon: yum install docker-engine. 4.Start the Docker daemon: service docker start 5.Make sure the Docker daemon will be restarted on reboot: chkconfig docker on 6. Add the users who will use Docker to the docker group: usermod -a -G docker user .

1: hadoop 2 2 Apache Hadoop? 2 Apache Hadoop : 2: 2 2 Examples 3 Linux 3 Hadoop ubuntu 5 Hadoop: 5: 6 SSH: 6 hadoop sudoer: 8 IPv6: 8 Hadoop: 8 Hadoop HDFS 9 2: MapReduce 13 13 13 Examples 13 ( Java Python) 13 3: Hadoop 17 Examples 17 hoods hadoop 17 hadoop fs -mkdir: 17: 17: 17 hadoop fs -put: 17: 17

Open docker-step-by-step.pdf document Introduction to Containers and Docker 19. Backup slides. Docker cheatsheet Introduction to Containers and Docker 21 docker pull user/image:tag docker run image:tag command docker run -it image:tag bash docker run image:tag mpirun -n 2