Apache Hadoop 2.6.0 Installation And Single Node Cluster .

2y ago
26 Views
4 Downloads
4.91 MB
28 Pages
Last View : 3m ago
Last Download : 2m ago
Upload by : Matteo Vollmer
Transcription

SDJ INFOSOFT PVT. LTDApache Hadoop 2.6.0 Installation and SingleNode Cluster Configuration on UbuntuA guide to install and setup Single-Node Apache Hadoop 2.xTable of ContentsTopicPageSoftware RequirementsHardware Requirements.Introduction1. Setting up the Ubuntu Desktop1.1 Creating an Ubuntu VM Player instance1.1.1Download the VMware image1.1.2Open the image file1.1.3Play the Virtual Machine1.1.4Update the OS packages and their dependencies1.1.5Install the Java and openssh server for Hadoop 2.6.01.2 Download the Apache Hadoop 2.6.0 binaries1.2.1Download the Hadoop package2. Configure the Apache Hadoop 2.6.0 Single Node Server2.1 Update the Configuration files2.1.1 Update “.bashrc” file for user ‘ubuntu’. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com1

2.2 Setup the Hadoop Cluster2.2.1 Configure JAVA HOME2.2.2 Create NameNode and DataNode directory2.2.3 Configure the Default File system2.2.4 Configure the HDFS2.2.5 Configure YARN framework2.2.6 Configure MapReduce framework2.2.7 Edit /etc/hosts file2.2.9 Creating ssh2.2.10 Moving the key to authorized key:2.2.11 Start the DFS services2.2.12 Perform the Health Check 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com2

IntroductionThis setup and configuration document is a guide to setup a Single-Node Apache Hadoop 2.0 cluster on anUbuntu virtual machine on your PC. If you are new to both Ubuntu and Hadoop, this guide comes handy toquickly setup a Single-Node Apache Hadoop 2.0 Cluster on Ubuntu and start your Big Data and Hadooplearning journey.The guide describes the whole process in two parts:Section 1: Setting up the Ubuntu OS for Hadoop 2.0This section describes step by step guide to download, configure an Ubuntu Virtual Machine image inVMPlayer, and provides steps to install pre-requisites for Hadoop Installation on Ubuntu.Section 2: Installing Apache Hadoop 2.0 and Setting up the Single Node ClusterThis section explains primary Hadoop 2.0 configuration files, Single-Node cluster configuration and Hadoopdaemons start and stop process in detail.1. Setting up the Ubuntu DesktopThis section describes the steps to download and create an Ubuntu image on VMware Player.1.1 Creating an Ubuntu VM Player instanceThe first step is to download an Ubuntu image and create an Ubuntu VMPlayer instance.1.1.1 Download the VMware image 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com3

Access the following link and download the 12.0.4 Ubuntu t.html1.1.2 Open the image fileExtract the Ubuntu VM image and Open it in VMware Workstation.Click open virtual machine and select path where you have extracted the image 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com4

1.1.3 Play the Virtual MachineYou would see the below screen in VMware Workstation after the VM image creation completes. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com5

You will get the home screen with the following image. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com6

The user details for the Virtual instance is:Username : userPassword : passwordOpen the terminal to access the file system. (CTRL ALT T) short cut key to open terminal. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com7

1.1.4 Update the OS packages and their dependenciesThe first task is to run ‘apt-get update’ to download the package lists from the repositories and "update" them toget information on the newest versions of packages and their dependencies. Type command on terminal. sudo apt-get updateProblem Note : sudo apt-get update not working error related to lock file than sudo rm /var/lib/apt/lists/lock sudo rm /var/cache/apt/archives/lock1.1.5 Install the Java and openssh server for Hadoop 2.6.0Check java version installation. java –version 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com8

Use apt-get to install the JDK 7. sudo apt-get install openjdk-7-jdk java –versionCheck the location of java folder where java install. cd usr/lib/jvm 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com9

sudo apt-get install openssh-server1.2 Download the Apache Hadoop 2.6.0 binaries1.2.1 Download the Hadoop packageDownload the binaries to your home directory. Use the default user ‘user’ for the installation. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com10

In Live production instances a dedicated Hadoop user account for running Hadoop is used. Though, it’s notmandatory to use a dedicated Hadoop user account but is recommended because this helps to separate the Hadoopinstallation from other software applications and user accounts running on the same machine (separating forsecurity, permissions, backups, etc.). Click on following link to download hadoop-2.6.0.tar.gz mmon 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com11

Unzip the files and review the package content and configuration files. tar –xvf hadoop-2.6.0.tar.gz 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com12

2015 SDJINFOSOFT PVT. LTDwww.javapadho.com13

Review the Hadoop configurations files.After creating and configuring your virtual servers, the Ubuntu instance is now ready to start installation andconfiguration of Apache Hadoop 2.6.0 Single Node Cluster. This section describes the steps in details to installApache Hadoop 2.6.0 and configure a Single Node Apache Hadoop cluster.2. Configure the Apache Hadoop 2.6.0 Single Node ServerThis section explains the steps to configure the Single Node Apache Hadoop 2.6.0 Server on Ubuntu.2.1 Update the Configuration files2.1.1 Update “.bashrc” file for user ‘ubuntu’.Must be move to ‘user’ HOME directory and edit ‘.bashrc’ file. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com14

Update the ‘.bashrc’ file to add important Apache Hadoop environment variables for user. a)Change directory to home. cdb) Edit the file sudo gedit .bashrcAdd below lines in the .bashrc file.# Set Hadoop-related environment variablesexport HADOOP HOME HOME/hadoop-2.6.0export HADOOP CONF DIR HOME/hadoop-2.6.0/etc/hadoopexport HADOOP MAPRED HOME HOME/hadoop-2.6.0export HADOOP COMMON HOME HOME/hadoop-2.6.0export HADOOP HDFS HOME HOME/hadoop-2.6.0export YARN HOME HOME/hadoop-2.6.0# Set JAVA HOMEexport JAVA HOME /usr/lib/jvm/java-7-openjdk-i386# Add Hadoop bin/ directory to PATHexport PATH PATH: HOME/hadoop-2.6.0/bin 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com15

c) Source the .bashrc file to set the hadoop environment variables without having to invoke a new shell:2.2 Setup the Hadoop ClusterThis section describes the detail steps needed for setting up the Hadoop Cluster and configuring the core Hadoopconfiguration files.2.2.1 Configure JAVA HOMEConfigure JAVA HOME in ‘hadoop-env.sh’. This file specifies environment variables that affect the JDK used byApache Hadoop 2.6.0 daemons started by the Hadoop start-up scripts:/hadoop-2.6.0/etc/hadoop sudo gedit hadood-env.shCopy this line in haoop-env.sh 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com16

export JAVA HOME /usr/lib/jvm/java-7-openjdk-i3862.2.2 Create NameNode and DataNode directoryCheck the HADOOP HOME value .Create DataNode and NameNode directories to store HDFS data. sudo mkdir –p HADOOP HOME/hadoop2 data/hdfs/namenode sudo mkdir –p HADOOP HOME/hadoop2 data/hdfs/datanode 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com17

Check by hadoop2 data folder create inside /home/user/hadoop-2.6.02.2.3 Configure the Default File systemThe ’core-site.xml’ file contains the configuration settings for Apache Hadoop Core such as I/O settings that arecommon to HDFS, YARN and MapReduce. Configure default files-system (Parameter: fs.default.name) used by clientsin core-site.xmlNote : copy configuration tag to /configuration and replace same tag in core-site.xml file !-- core-site.xml -- configuration property name fs.default.name /name value hdfs://localhost:9000 /value /property /configuration Where hostname and port are the machine and port on which Name Node daemon runs and listens. Italso informs the Name Node as to which IP and port it should bind. The commonly used port is 9000 andyou can also specify IP address rather than hostname. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com18

2.2.4 Configure the HDFSThis file contains the configuration settings for HDFS daemons; the Name Node and the data nodes.Configure hdfs-site.xml and specify default block replication, and NameNode and DataNode directories forHDFS. The actual number of replications can be specified when the file is created. The default is used ifreplication is not specified in create time. configuration property name dfs.replication /name value 1 /value /property property name dfs.permissions /name value false /value /property property name dfs.namenode.name.dir /name value /home/user/hadoop-2.6.0/hadoop2 data/hdfs/namenode /value /property property name dfs.datanode.data.dir /name value /home/user/hadoop-2.6.0/hadoop2 data/hdfs/datanode /value /property /configuration 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com19

2.2.5 Configure YARN frameworkThis file contains the configuration settings for YARN the NodeManager. !-- yarn-site.xml -- configuration property name yarn.nodemanager.aux-services /name value mapreduce shuffle /value /property property name ss /name value org.apache.hadoop.mapred.ShuffleHandler /value /property /configuration 2.6 Configure MapReduce frameworkThis file contains the configuration settings for MapReduce. Configure mapred-site.xml and specifyframework details./hadoop-2.6.0/etc/hadoop cp mapred-site.xml.template mapred-site.xml/hadoop-2.6.0/etc/hadoop sudo gedit mapred-site.xml 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com20

!-- mapread-site.xml -- configuration property name mapreduce.framework.name /name value yarn /value /property /configuration 2.2.7 Edit /etc/hosts fileGive ifconfig in the terminal and note down the ip address. Then put this ip address in /etc/hosts file asmentioned in below snapshots, save the file and then close it. cd ifconfigThe ip address in this file, localhost and ubuntu are separated by tab. 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com21

sudo gedit /etc/hostsNote: if not change anything in this etc/hosts file only first two line mention thanalso correct. etc/hosts File generally use in Multimode cluster.2.2.9 Creating ssh ssh-keygen -t rsa -P "" 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com22

2.2.10 Moving the key to authorized key: cat HOME/.ssh/id rsa.pub HOME/.ssh/authorized keys2.2. 11 start the DFS servicesThe first step in starting up your Hadoop installation is formatting the Hadoop file-system, which is implemented ontop of the local file-systems of your cluster. This is required on the first time Hadoop installation. Do not format arunning Hadoop file-system, this will cause all your data to be erased.To format the file-system, run the command: 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com23

cd hadoop namenode -format-----------------------Reboot the system------------------------You are now all set to start the HDFS services i.e. Name Node, Resource Manager, Node Manager and Data Nodes onyour Apache Hadoop Cluster. cd hadoop-2.6.0/sbin/ ./hadoop-daemon.sh start namenode ./hadoop-daemon.sh start datanode 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com24

Start the YARN daemons i.e. Resource Manager and Node Manager. Cross check the service start-up using JPS(Java Process Monitoring Tool). ./yarn-daemon.sh start resourcemanager ./yarn-daemon.sh start nodemanager 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com25

Start the History server. ./mr-jobhistory-daemon.sh start historyserverNote: Always suspend your VMware Workstation, do not shut it down. So that when youopen your VM again, your cluster will be up. In case you shut it down, so when you startyour VM all your daemons will be down (not running). So again start all your daeonsstarting from namenode, do not format the namenode again.2.2. 12 Perform the Health Checka) Check the NameNode status: 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com26

http://localhost:50070/dfshealth.jspb) JobHistory status:http://localhost:19888/jobhistory 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com27

c) Browse HDFS input and output file and log informaton.http:localhost:50070 2015 SDJINFOSOFT PVT. LTDwww.javapadho.com28

Section 2: Installing Apache Hadoop 2.0 and Setting up the Single Node Cluster This section explains primary Hadoop 2.0 configuration files, Single-Node cluster configuration and Hadoop daemons start and stop process in detail.

Related Documents:

Getting Started with the Cloud . Apache Bigtop Apache Kudu Apache Spark Apache Crunch Apache Lucene Apache Sqoop Apache Druid Apache Mahout Apache Storm Apache Flink Apache NiFi Apache Tez Apache Flume Apache Oozie Apache Tika Apache Hadoop Apache ORC Apache Zeppelin

1: hadoop 2 2 Apache Hadoop? 2 Apache Hadoop : 2: 2 2 Examples 3 Linux 3 Hadoop ubuntu 5 Hadoop: 5: 6 SSH: 6 hadoop sudoer: 8 IPv6: 8 Hadoop: 8 Hadoop HDFS 9 2: MapReduce 13 13 13 Examples 13 ( Java Python) 13 3: Hadoop 17 Examples 17 hoods hadoop 17 hadoop fs -mkdir: 17: 17: 17 hadoop fs -put: 17: 17

Introduction Apache Hadoop . What is Apache Hadoop? MapReduce is the processing part of Hadoop HDFS is the data part of Hadoop Dept. of Computer Science, Georgia State University 05/03/2013 5 Introduction Apache Hadoop HDFS MapReduce Machine . What is Apache Hadoop? The MapReduce server on a typical machine is called a .

CDH: Cloudera’s Distribution Including Apache Hadoop Coordination Data Integration Fast Read/Write Access Languages / Compilers Workflow Scheduling Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE APACHE OOZIE APACHE OOZIE APACHE HIVE File System Mount UI

The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop.

2006: Doug Cutting implements Hadoop 0.1. after reading above papers 2008: Yahoo! Uses Hadoop as it solves their search engine scalability issues 2010: Facebook, LinkedIn, eBay use Hadoop 2012: Hadoop 1.0 released 2013: Hadoop 2.2 („aka Hadoop 2.0") released 2017: Hadoop 3.0 released HADOOP TIMELINE Daimler TSS Data Warehouse / DHBW 12

The In-Memory Accelerator for Hadoop is a first-of-its-kind Hadoop extension that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, including Hadoop 1.x and Hadoop 2.x distributions. The In-Memory Accelerator for Hadoop is designed to provide the same performance

these experts in data science and Hadoop is Doug Eadline, frequent contributor to the Addison-Wesley Data & Analytics Series with the titles Hadoop Fundamentals Live Lessons, Apache Hadoop 2 Quick-Start Guide, and Apache Hadoop YARN. Collectively, this team of authors brings over a decade of Hadoop experience. I can imagine few others that have as