Virtual Machine (VM) For Hadoop Training - Iitr.ac.in

1y ago
3 Views
1 Downloads
526.57 KB
13 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Averie Goad
Transcription

2012 coreservlets.com and Dima MayVirtual Machine (VM)For Hadoop TrainingOriginals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/Also see the customized Hadoop training courses (onsite or at public venues) – mlCustomized Java EE Training: http://courses.coreservlets.com/Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.Developed and taught by well-known author and developer. At public venues or onsite at your location. 2012 coreservlets.com and Dima MayFor live customized Hadoop training (including prepfor the Cloudera certification exam), please emailinfo@coreservlets.comTaught by recognized Hadoop expert who spoke on Hadoopseveral times at JavaOne, and who uses Hadoop daily inreal-world apps. Available at public venues, or customizedversions can be held on-site at your organization. Courses developed and taught by Marty Hall– JSF 2.2, PrimeFaces, servlets/JSP, Ajax, jQuery, Android development, Java 7 or 8 programming, custom mix of topics– Coursesavailable in any stateor country.Maryland/DC http://courses.coreservlets.com/area companies can also choose afternoon/evening courses.CustomizedJavaEE Training: Coursesdevelopedand taught Servlets,by coreservlets.comexperts(editedHibernate,by Marty)RESTful Web Services, Android.Hadoop,Java,JSF 2, PrimeFaces,JSP, Ajax, jQuery,Spring,– Spring, Hibernate/JPA, GWT, Hadoop, HTML5, RESTful Web ServicesDeveloped and taught by well-knownand developer. At publicvenues or onsite at your location.Contactauthorinfo@coreservlets.comfor details

Agenda Overview of Virtual Machine for HadoopTraining Eclipse installation Environment Variables Firefox bookmarks Scripts Developing Exercises Well-Known Issues4Virtual Machine In this class we will be using Virtual Box , adesktop virtualization product, to runUbuntu– https://www.virtualbox.org Ubuntu image is provided with Hadoopproducts pre-installed and configured fordevelopment– Cloudera Distribution for Hadoop (CDH) 4 is used;installed products are: 5Hadoop (HDFS and YARN/MapReduce)HBaseOoziePig & Hive

Installing Virtual Box Download the latest release for yourspecific OS– https://www.virtualbox.org/wiki/Downloads After download is complete, run Virtual Boxinstaller Start Virtual Box and import providedUbuntu image/appliance– File Import Appliance Now that new image is imported, select itand click ‘Start’6VM Resource VM is set up with– 3G of RAM and 2CPUs and 13G of Storage If you can spare more RAM and CPU adjustVM Settings– Virtual Box Manager right click on VM Settings System adjust under Motherboard and Processor tabs7

Logging In Username: hadoop Password: hadoop8Desktop ScreenCommand line terminalEclipse is installed to assist indeveloping Java code and scripts9

Directory LocationsAll the training artifacts; located in the user’s home directoryInstallation directory for Hadoop productsEclipse installationCode, resources and scripts managed via EclipseData for exercisesHadoop is configured to store its data hereJava Development Kit (JDK) installationLogs are configured to be saved in this directoryEclipse Plugin to enable highlighting of Pig ScriptsExecute Java code, MapReduce Jobs and scripts from hereWell known shell scripts10EclipseEclipse workspace will contain threeprojects: Exercises – you will implementhands-on exercises in this project Solutions – the solutions to theexercises can be found here HadoopSamples – code samplesused throughout the slides11

Eclipse ProjectProjects follow maven directorystructure /src/main/java – Java packagesand classes reside here /src/main/resources – non-Javaartifacts /src/main/test/java – Java unittests go hereTo further learn about maven pleasevisit http://maven.apache.org12Environment Variables VM is set up with various environmentvariables to assist you with referencingwell-known directories Environment variables are sourced from– /home/hadoop/Training/scripts/hadoop-env.sh For example:– echo PLAY AREA– yarn jar PLAY AREA/Solutions.jar .13

Environment Variables PLAY AREA /home/hadoop/Training/play area– Run examples, exercises, and solutions from this directory– Jar files are copied here (by maven) TRAINING HOME /home/hadoop/Training– Root directory for all of the artifacts for this class HADOOP LOGS TRAINING HOME/logs– Directory for logs; logs for each product are stored under– ls HADOOP LOGS/ hbase hdfs oozie pig yarn HADOOP CONF DIR HADOOP HOME/conf– Hadoop configuration files are stored here14Environment Variables There is a variable per product referencingit’s home directory– CDH HOME TRAINING HOME/CDH4– HADOOP HOME CDH HOME/hadoop-2.0.0cdh4.0.0– HBASE HOME CDH HOME/hbase-0.92.1-cdh4.0.0– OOZIE HOME CDH HOME/oozie-3.1.3-cdh4.0.0– PIG HOME CDH HOME/pig-0.9.2-cdh4.0.0– HIVE HOME CDH HOME/hive-0.8.1-cdh4.0.015

Firefox BookmarksFolder with bookmarks toJavadocs for each productused in this classFolder with bookmarks todocumentation packagedwith each product used inthis classFolders with bookmarks tomanagement webapplications for eachproduct; of course theHadoop product has to berunning for those links towork16Scripts Scripts to start/stop ALL installed Hadoopproducts––––startCDH.sh - start ALL of the productsstopCDH.sh - stop ALL of the productsThese scripts are located in /Training/scripts/Scripts are on the PATH, you can execute from anywhere startCDH.sh. stopCDH.sh. ps -ef grep java. kill XXXX17Start then stop all of the productsCheck if any processes failed to shutdown, if so kill them by PID

Developing Exercises Proposed steps to develop code for trainingexercises1. Add code, configurations and/or scripts to the Exercisesproject Utilize Eclipse2. Run mvn package Generates JAR file with all of the Java classes andresources For your convenience copies JAR file to a set of wellknown locations Copies scripts to a well-known location3. Execute your code (MapReduce Job, Oozie job or ascript)181: Add Code to the ExercisesProject19Write and edit code

2: Run mvn packageSelect a project then use Eclipse’s pre-configured"mvn package" command; messages on the Consoleview will appear; notice that it copied jar file intoplay area directory; we will be executing majority ofcode in the play area directory203: Execute your code Utilize the jar produced by step #2 Run your code in PLAY AREA directory cd PLAY AREAProduced by previous step Exercises.jarwill reside in PLAY AREA directory yarn jar PLAY AREA/Exercises.jar \mapRed.workflows.CountDistinctTokens \/training/data/hamlet.txt \/training/playArea/firstJobClean up after yourself;Delete output directoryThis is a MapReduce jobimplemented in the Exercisesproject and then package intoa JAR file hdfs dfs -rm -r /training/playArea/firstJob21

Save VM Option Instead of Shutting down OS you can savecurrent OS State– When you load it again the saved state will be restored22Well-Known Issues If you "save the machine state", instead ofrestarting VM, HBase will not properlyreconnect to HDFS– Solution: shutdown all of the Hadoop products priorclosing VM (run stopCDH.sh script) Current VM allocates 3G of RAM; it is reallynot much given all of the Hadoop andMapReduce daemons– Solution: If your machine has more RAM to spare,increase it. When the VM is down go to Settings System Base Memory23

2012 coreservlets.com and Dima MayWrap-UpCustomized Java EE Training: http://courses.coreservlets.com/Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.Developed and taught by well-known author and developer. At public venues or onsite at your location.Summary We now know more about Ubuntu VMThere are useful environment variablesThere are helpful Firefox bookmarksUse management scripts to start/stopHadoop products Develop exercises utilizing Eclipse andMaven Look out for well-known issues with runningHadoop on top of Virtual Box VM25

2012 coreservlets.com and Dima MayQuestions?More info:http://www.coreservlets.com/hadoop-tutorial/ – Hadoop programming ining.html – Customized Hadoop training courses, at public venues or onsite at your -Materials/java.html – General Java programming l/ – Java 8 sf2/ – JSF 2.2 rimefaces/ – PrimeFaces tutorialhttp://coreservlets.com/ – JSF 2, PrimeFaces, Java 7 or 8, Ajax, jQuery, Hadoop, RESTful Web Services, Android, HTML5, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE trainingCustomized Java EE Training: http://courses.coreservlets.com/Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.Developed and taught by well-known author and developer. At public venues or onsite at your location.

Agenda Overview of Virtual Machine for Hadoop Training Eclipse installation Environment Variables Firefox bookmarks Scripts Developing Exercises Well-Known Issues 4 Virtual Machine In this class we will be using Virtual Box , a desktop virtualization product, to run Ubuntu - https://www.virtualbox.org Ubuntu image is provided with Hadoop

Related Documents:

1: hadoop 2 2 Apache Hadoop? 2 Apache Hadoop : 2: 2 2 Examples 3 Linux 3 Hadoop ubuntu 5 Hadoop: 5: 6 SSH: 6 hadoop sudoer: 8 IPv6: 8 Hadoop: 8 Hadoop HDFS 9 2: MapReduce 13 13 13 Examples 13 ( Java Python) 13 3: Hadoop 17 Examples 17 hoods hadoop 17 hadoop fs -mkdir: 17: 17: 17 hadoop fs -put: 17: 17

The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop.

2006: Doug Cutting implements Hadoop 0.1. after reading above papers 2008: Yahoo! Uses Hadoop as it solves their search engine scalability issues 2010: Facebook, LinkedIn, eBay use Hadoop 2012: Hadoop 1.0 released 2013: Hadoop 2.2 („aka Hadoop 2.0") released 2017: Hadoop 3.0 released HADOOP TIMELINE Daimler TSS Data Warehouse / DHBW 12

The In-Memory Accelerator for Hadoop is a first-of-its-kind Hadoop extension that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, including Hadoop 1.x and Hadoop 2.x distributions. The In-Memory Accelerator for Hadoop is designed to provide the same performance

Introduction Apache Hadoop . What is Apache Hadoop? MapReduce is the processing part of Hadoop HDFS is the data part of Hadoop Dept. of Computer Science, Georgia State University 05/03/2013 5 Introduction Apache Hadoop HDFS MapReduce Machine . What is Apache Hadoop? The MapReduce server on a typical machine is called a .

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

Configuring SSH: 6 Add hadoop user to sudoer's list: 8 Disabling IPv6: 8 Installing Hadoop: 8 Hadoop overview and HDFS 9 Chapter 2: Debugging Hadoop MR Java code in local eclipse dev environment. 12 Introduction 12 Remarks 12 Examples 12 Steps for configuration 12 Chapter 3: Hadoop commands 14 Syntax 14 Examples 14 Hadoop v1 Commands 14 1 .

-Type "sudo tar -xvzf hadoop-2.7.3.tar.gz" 6. I renamed the download to something easier to type-out later. -Type "sudo mv hadoop-2.7.3 hadoop" 7. Make this hduser an owner of this directory just to be sure. -Type "sudo chown -R hduser:hadoop hadoop" 8. Now that we have hadoop, we have to configure it before it can launch its daemons (i.e .