Hadoop Tiered Storage With Dell EMC Isilon And Dell EMC ECS Clusters

1y ago
9 Views
2 Downloads
2.84 MB
108 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Nixon Dill
Transcription

HADOOP TIERED STORAGE WITH DELL EMCISILON AND DELL EMC ECS CLUSTERSMarch 2021AbstractThis solution guide describes how to easily expand storage to existing DAS Hadoop clusterswith Dell EMC Isilon and Dell EMC ECS systems to provide immediate capacity, better storageefficiency, and reduced total cost of ownership.H16659.3SOLUTION GUIDE

CopyrightThis document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to updatethe document over subsequent future releases to revise these words accordingly.This document may contain language from third party content that is not under Dell's control and is not consistent with Dell'scurrent guidelines for Dell's own content. When such third party content is updated by the relevant third parties, thisdocument will be revised accordingly.The information in this publication is provided as is. Dell Inc. makes no representations or warranties of any kind with respectto the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particularpurpose.Use, copying, and distribution of any software described in this publication requires an applicable software license.Copyright 2017-2021 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, Dell Technologies, EMC, Dell EMC and othertrademarks are trademarks of Dell Inc. or its subsidiaries. Intel, the Intel logo, the Intel Inside logo and Xeon are trademarksof Intel Corporation in the U.S. and/or other countries. Other trademarks may be the property of their respective owners.Published in the USA 03/21 Solution Guide H16659.3.Dell Inc. believes the information in this document is accurate as of its publication date. The information is subject to changewithout notice.2Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

ContentsContentsChapter 1Executive Summary5Business case . 6Solution overview . 6Key results . 6Document purpose . 7Audience. 7We value your feedback . 7Chapter 2Technology Overview8Reference architecture . 9Key components . 10Software resources. 11Chapter 3Solution Design12Deployment best practices . 13Hadoop tiered storage with an Isilon or ECS cluster . 15Chapter 4Hadoop Cluster Deployment and Integration with IsilonCluster19Overview. 20Setting up the HDP cluster . 20Setting up the Isilon cluster . 24Creating Isilon access zones . 24Enabling Kerberos on the HDP cluster . 28Enabling Kerberos on the Isilon cluster . 35Enabling Ranger and setting policies . 37Validating HDP deployment and Isilon integration . 46Chapter 5Hadoop Cluster Deployment and Integration with ECSCluster53Overview. 54Setting up the HDP and ECS clusters . 54Creating ECS buckets . 54Installing ECS HDFS Client software . 57Enabling Kerberos on the HDP cluster . 59Enabling Kerberos on the ECS cluster . 59Validating HDP deployment and ECS integration . 64Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide3

ContentsChapter 6Sample Use Cases: MapReduce, Spark, and Hive68Isilon use cases . 69ECS use cases . 73Chapter 7Conclusion77Summary . 78Chapter 8References79Dell EMC documentation . 80Hortonworks documentation . 80VMware documentation . 80Appendix A Ambari Smoke Test Screenshots81Ambari Server screenshots: Hadoop/Isilon . 82Ambari Server screenshots: Hadoop/ECS . 85Appendix B Hadoop/Isilon Tests87Ambari GUI smoke testing . 88MapReduce testing without Kerberos . 88Spark testing without Kerberos. 89Hive-MapReduce/Tez testing without Kerberos . 89TPC-DS testing. 92Kerberos security testing . 93Ranger policy testing . 93Ranger policy with Kerberos security testing . 94Ranger policy with Kerberos security testing on Hive warehouse . 95DistCp in Kerberized and non-Kerberized cluster . 97Appendix C Hadoop/ECS Tests98Ambari GUI smoke testing . 99MapReduce testing without Kerberos . 99Spark testing without Kerberos . 100Hive-MapReduce/Tez testing without Kerberos . 100TPC-DS testing. 103Kerberos security testing . 104MapReduce word count and Spark word count, line count on Kerberizedcluster . 105Kerberos security testing on Hive warehouse . 105DistCp in Kerberized and non-Kerberized cluster . 1084Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 1: Executive SummaryChapter 1 Executive SummaryThis chapter presents the following topics:Business case. 6Solution overview . 6Key results . 6Document purpose . 7Audience . 7We value your feedback. 7Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide5

Chapter 1: Executive SummaryBusiness caseEnterprises implementing digital transformation initiatives and data-driven decisionmaking often must deal with exponential data growth that is not provided for in their ITbudgets. For most enterprises, most of this data growth is “cold data,” which is historical innature and does not require frequent or low-latency access. The remainder of the datagrowth is in “hot data,” which is recently generated data that requires frequent and lowlatency access.A Hadoop solution that consists of a hot tier and a cold tier enables the enterprise to storehot data in a high-throughput, low-latency cluster with low cost per MB/s and cold data ina capacity-dense cluster with low cost per TB.Solution overviewThis Hadoop tiered storage solution provides an architecture that can support crossnamespace analytics. With this solution, you can use both direct-attached storage (DAS)and an alternate storage media such as Dell EMC Isilon and Dell EMC Elastic CloudStorage (ECS) storage, and run analytics jobs and toolsets across data that spans thesestorage tiers.The Hadoop tiered storage solution from Dell EMC enables: Cold data storage in a shared storage cluster that is based on the Isilon or ECSsystem, providing outstanding capacity density and low cost per TB. Hot data storage in a DAS cluster that is based on the Dell EMC PowerEdgeserver, which delivers high performance and low cost per MB/s. Processing of data by Yarn- or Mesos-based Hadoop applications across bothclusters, which are subject to data governance, risk management, and compliancemanagement. The DAS and Isilon clusters represent separate namespaces, soHadoop applications and governance run on the federated namespace.Deployment options are as follows: Customers who have existing Hadoop clusters running DAS and who need toexpand their Hadoop clusters to hundreds of TBs or PBs can add an Isilon or ECScluster to their existing Hadoop cluster to handle the high volume of data growth. Customers who plan to deploy a large Hadoop data lake can build the Hadooptiered storage solution with DAS and Isilon or ECS clusters.Key resultsDell EMC and Hortonworks have validated multiple configurations for Hadoop tieredstorage with a logical Hadoop cluster (DAS storage) and an infrastructure cluster (Isilon orECS system) that meet or exceed the functional objectives of this solution. You can matchmost needs with an approved configuration. By combining the Hortonworks Data Platform(HDP) cluster (logical Hadoop cluster) with the flexibility of an Isilon or ECS infrastructurecluster, you can scale the solution to handle future requirements without extensiveupgrades or expensive replatforming.6Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 1: Executive SummaryDocument purposeThis solution guide provides detailed information for evaluating the applicability of Hadooptiered storage for your environment. The guide provides solution validation, includingresults of rigorous testing of the major components of the Hadoop cluster and theirfunctionality in the tiered storage environment.AudienceThis guide is for IT administrators, storage administrators, virtualization administrators,system administrators, IT managers, and those who evaluate, acquire, manage, maintain,or operate Hadoop cluster environments.We value your feedbackDell EMC and the author of this document welcome your feedback on the Ready Stackand the Ready Stack documentation. Contact the Dell EMC Solutions team by email orprovide your comments by completing our documentation survey.Authors: Boni Bruno, Kirankumar Bhusanurmath, Tao Guo, Eric Wang, Karen JohnsonHadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide7

Chapter 2: Technology OverviewChapter 2 Technology OverviewThis chapter presents the following topics:Reference architecture. 9Key components. 10Software resources . 118Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 2: Technology OverviewReference architectureFigure 1 shows the reference architecture of Hadoop tiered storage with an Isilon or ECSsystem. This reference architecture provides for hot-tier data in high-throughput, low-latencylocal storage and cold-tier data in capacity-dense remote storage. You can deploy theHadoop cluster on physical hardware servers or a virtualization platform.Figure 1.Reference architecture of Hadoop tiered storage with an Isilon or ECS systemFigure 2 shows the high-level reference architecture of Hadoop tiered storage with anIsilon cluster.Figure 2.Reference architecture of Hadoop tiered storage with an Isilon clusterHadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide9

Chapter 2: Technology OverviewKey componentsDell EMC IsilonThe Dell EMC Isilon scale-out network-attached storage (NAS) platform provides Hadoopclients with direct access to Big Data through a Hadoop File System (HDFS) interface.Powered by the distributed Dell EMC Isilon OneFS operating system, an Isilon clusterdelivers a scalable pool of storage with a global namespace. The distributed OneFSoperating system combines the memory, I/O, CPUs, and disks of the nodes into acohesive storage unit to present a global namespace as a single file system.Hadoop compute clients access the data that is stored in an Isilon cluster by using theHDFS protocol. Every node in the cluster can act as a NameNode and a DataNode. Eachnode boosts performance and expands the cluster's capacity. For Hadoop analytics, theIsilon scale-out distributed architecture minimizes bottlenecks, rapidly serves big data, andoptimizes performance for analytics jobs. The NameNode daemon is a distributed processthat runs on all the nodes in the cluster. A compute client can connect to any node in thecluster to access NameNode services. The nodes work together as peers in a sharednothing hardware architecture with no single point of failure.An Isilon cluster is platform agnostic for compute. You can run most of the commonHadoop distributions with an Isilon cluster. Clients running different Hadoop distributionsor versions can simultaneously connect to the cluster.Dell EMC ECSThe Dell EMC ECS platform is a complete software-defined cloud storage system thatsupports the storage, manipulation, and analysis of unstructured data on a massive scaleon commodity hardware. You can deploy the ECS platform as a turnkey storage applianceor as a software product on a set of qualified commodity servers and disks. The ECSplatform offers the cost advantages of a commodity infrastructure and the enterprisereliability, availability, and serviceability of traditional arrays.The ECS scalable architecture includes multiple nodes and attached storage devices. Thenodes and storage devices are commodity components, similar to devices that aregenerally available, and are housed in one or more racks.An ECS appliance consists of a rack, rack components, and preinstalled software that aresupplied by Dell EMC. An ECS software-only solution uses a rack and commodity nodesthat are not supplied by Dell EMC. A cluster consists of multiple racks.ECS HDFS is a Hadoop Compatible File System (HCFS) that enables you to run Hadoop2.x applications on top of your ECS infrastructure. You can configure your Hadoopdistribution to run against the built-in Hadoop file system, ECS HDFS, or any combinationof HDFS, ECS HDFS, or other HCFSs available in your environment.HortonworksData Platform10HDP is an enterprise-level, hardened Hadoop distribution that combines the most usefuland stable versions of Apache Hadoop and its related projects into a single tested andcertified package. HDP enables Enterprise Hadoop by providing a complete set ofessential Hadoop capabilities. It delivers the core elements of Hadoop—scalable storageand distributed computing—as well as all of the necessary enterprise capabilities such assecurity, high availability, and integration with a broad range of hardware and softwaresolutions.Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 2: Technology OverviewAmbariApache Ambari is a utility that provides installation, monitoring, and managementcapabilities for an HDP cluster. The Ambari web client and REST APIs are used to deploy,operate, manage, and monitor the HDP cluster.KerberosKerberos is a network authentication protocol designed to provide strong authenticationfor client/server applications by using secret-key cryptography in most distributedsystems, including HDP. Kerberos provides secure and reliable authentication to multipleapplications. Isilon and ECS systems support the Kerberos authentication feature usingKerberos Key Distribution Center (KDC) services.RangerApache Ranger is a centralized management console that enables you to monitor andmanage data security across the Hortonworks Hadoop distribution system. A Rangeradministrator can define and apply authorization policies across Hadoop componentsincluding HDFS. Isilon OneFS 8.0.1.0 and later releases support Ranger HDFS policies.In an Isilon OneFS cluster with Hadoop deployment, Ranger authorization policies serveas a filter before the application of native file access control.Software resourcesTable 1 lists the solution software resources.Table 1.Software resourcesSoftwareVersionRed Hat Enterprise Linux 64-bit7.2Apache Ambari2.6.0.0Hortonworks Data Platform2.6.3.0MIT Kerberos5Dell EMC OneFS8.0.1.1Dell EMC ECS HDFS Client3.0.0.0Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide11

Chapter 3: Solution DesignChapter 3 Solution DesignThis chapter presents the following topics:Deployment best practices . 13Hadoop tiered storage with an Isilon or ECS cluster . 1512Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 3: Solution DesignDeployment best practicesSpreading datato an Isilon orECS clusterSpread data to an Isilon or ECS cluster when cold data grows beyond 75 TB but is below64 PB.For 1 PB of usable data: The acquisition cost of a Hadoop cluster with an Isilon or ECS cluster equals 60percent of DAS. The rack space of a Hadoop cluster with an Isilon or ECS cluster equals 40 percentof DAS.For more than 1 PB of usable data: The acquisition cost of a Hadoop cluster with an Isilon or ECS cluster could bemore than 60 percent of DAS.Partitioning large Partition tables if you collect time series data or logs that accumulate over time and youtablesonly need to query parts of the data. You can store the data in a subdirectory tree such asyear/month/day, continent/country/region/city, and so on, enabling your query to skip theirrelevant data.ORCFile formatfor Hive tablesHive supports ORCFile, a new table storage format that provides significantly increasedspeed through techniques such as predicate push-down, compression, and more. UsingORCFile for every Hive table provides fast response times for your Hive queries.DirectorystructureconsiderationsYou can use directory structures to organize data by department, business unit, lifecyclestage (new versus old, hot versus cold, raw versus derived), or other business concerns.Access control is an important consideration as well, especially in multitenant environments.Unlike more advanced traditional DBMS access-control models where you can carve upaccess based on metadata, HDFS is a distributed filesystem; directories can representyour metadata.Tools like Hive understand partition pruning during query execution. Each partition issimply a directory with a special naming convention that indicates the range of the table towhich the contained data belongs (at least in range-based partitioning). Tools other thanHive can have similar partition pruning simply by including only the directories that areknown to contain data of interest.Key directories to be aware of include: /user/ username —Home directories/scratch pads for users /tmp—Sticky-bit set scratch for tools and users (no guarantee on longevity) /data—Canonical, raw data sets ingested from other systems/applicationsHadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide13

Chapter 3: Solution DesignFor example:/data/ dataset name / optional partitions where dataset name is the equivalent of a table name in an RDBMS. Optionally, datasets can be partitioned by n columns, depending on the use case.Partitioned security log data by day example:/data/seclogs/date 20170101/{x.avro,y.avro,z.avro}/data/seclogs/date 20170102/{x.avro,y.avro,z.avro}ETL directory example:/etl/ group / application / process /{incoming,working,complete,failed}where group is the line of business/group (research, search quality, fraud analysis), application is the name of the application the process supports, and process isfor applications that have multiple processing stages. Each process "queue" could havefour state directories. For example: incoming—Newly arriving files drop off here. A process automatically renamesthem into a temp directory under working to indicate that they are in progress. working—This directory contains a timestamped directory for each attempt atprocessing the files. Files in these directories that are older than x require humanintervention. complete—After an ETL process finishes processing a file in working, this iswhere it could land. failed: If an ETL process permanently rejects a file, it moves the file here. If thedirectory contains 0 files, it requires human intervention.This example of an ETL directory structure shows four scenarios only. You could extendthe structure for your particular use cases.The general idea is to develop a directory structure to support a data lifecycle that can becontrolled by directories for partitions, ETL processes, user data, and the like.You can apply access control to individual processes, groups, applications, or data sets.Even partitions can be separately controlled in terms of access (on user type or line ofbusiness for data sets, for example).Directory structure design is a complex topic. Dell EMC offers professional services toassist in directory structure design as well as other Hadoop-related services.14Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 3: Solution DesignHadoop tiered storage with an Isilon or ECS clusterOverviewThe solution architectures of Hadoop with Isilon and Hadoop with ECS enable you to runanalytics jobs and toolsets on data that is spread across both DAS and Isilon or ECSstorage tiers.Figure 3 shows the Hadoop with Isilon solution architecture. Figure 4 shows the Hadoopwith ECS solution architecture.Figure 3.Hadoop tiered storage with Isilon solution architectureFigure 4.Hadoop tiered storage with ECS solution architectureHadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide15

Chapter 3: Solution DesignHadoop clusterdesignTable 2 describes the Hadoop cluster nodes, their roles, and the services running onthem.Table 2.Hadoop cluster services and instance rolesHostInstance roleServices on the nodehdp-ambari.bigdata.emc.localAmbari ServerHortonworks SmartSense Tool (HST) AgentKerberos ClientMetrics MonitorSNameNodehdp-master.bigdata.emc.localHDP Master/HDP ClientActivity AnalyzerActivity ExplorerApp Timeline ServerHCat ClientHDFS ClientHistory ServerHive ClientHive MetastoreHiveServer2HST AgentHST ServerInfra Solr ClientInfra Solr InstanceKerberos ClientMapReduce2 ClientMetrics CollectorGrafanaMetrics MonitorNameNodePig ClientRanger AdminRanger TagsyncRanger UsersyncResourceManagerSlider ClientSpark2 Client16Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 3: Solution DesignHostInstance roleServices on the nodeSpark2 History ServerSpark2 Thrift ServerTez ClientWebHCat ServerYARN ClientZooKeeper Clienthdp-worker01.bigdata.emc.localWorker node 01DataNodeHCat ClientHDFS ClientHive ClientHST AgentInfra Solr ClientKerberos ClientMapReduce2 ClientMetrics MonitorNodeManagerPig ClientSlider ClientSpark2 ClientTez ClientYARN ClientZooKeeper ClientZooKeeper Serverhdp-worker02.bigdata.emc.localWorker node 02DataNodeHCat ClientHDFS ClientHive ClientHST AgentInfra Solr ClientKerberos ClientMapReduce2 ClientMetrics MonitorNodeManagerHadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide17

Chapter 3: Solution DesignHostInstance roleServices on the nodePig ClientSlider ClientSpark2 ClientTez ClientYARN ClientZooKeeper ClientZooKeeper Serverhdp-worker03.bigdata.emc.localWorker node 03DataNodeHCat ClientHDFS ClientHive ClientHST AgentInfra Solr ClientKerberos ClientMapReduce2 ClientMetrics MonitorNodeManagerPig ClientSlider ClientSpark2 ClientTez ClientYARN ClientZooKeeper ClientZooKeeper Server18Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 4: Hadoop Cluster Deployment and Integration with Isilon ClusterChapter 4 Hadoop Cluster Deployment andIntegration with Isilon ClusterThis chapter presents the following topics:Overview. 20Setting up the HDP cluster . 20Setting up the Isilon cluster . 24Creating Isilon access zones . 24Enabling Kerberos on the HDP cluster . 28Enabling Kerberos on the Isilon cluster . 35Enabling Ranger and setting policies . 37Validating HDP deployment and Isilon integration . 46Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide19

Chapter 4: Hadoop Cluster Deployment and Integration with Isilon ClusterOverviewTable 3 lists the process flow for the Hadoop cluster deployment with an Isilon cluster.Table 3.Hadoop cluster deployment and integration with Isilon clusterStepAction1Set up the HDP cluster2Set up the Isilon cluster3Create an Isilon access zone4Enable Kerberos on the HDP cluster5Enable Kerberos on the Isilon cluster6Enable Ranger and set policies7Validate HDP deployment and Isilon integrationSetting up the HDP clusterInstalling Ambari Ambari Server automates the installation and configuration of HDP regardless of scale orServerdeployment environment. It also helps to manage and monitor the Apache Hadoop clusterand provides an intuitive Hadoop management web UI.Before you begin HDP cluster deployment, set up Ambari Server. For this solution, we setup Ambari Server using a virtual machine on one shared ESXi host.The following steps provide instructions for setting up Ambari Server. For more detailsabout the installation process, see Apache Ambari Installation on the Hortonworkswebsite.1.Find an available physical server or virtual machine to host Ambari Server.2.Install RHEL 7.2 or later using the default installation option, Minimal Install.3.Set up the IP address, netmask, and hostname.4.Log in to the server using the root account.5.Create an Ambari Server local repository configuration file(/etc/yum.repos.d/ambari.repo).Note: We created .x/updates/2.6.3.0/hdp.repo and /2.x/updates/2.6.0.0/ambari.repoas the local repository for the Ambari Server 2.6.0.0 and HDP 2.6.3.0 packages obtainedfrom the Hortonworks public repository.20Hadoop Tiered Storage with Dell EMC Isilon and Dell EMC ECS ClustersSolution Guide

Chapter 4: Hadoop Cluster Deployment and Integration with Isilon Cluster6.Generate SSH key pairs without passwords:clear &&ssh-keygen &&cd /root/.ssh &&cat id rsa.pub authorized keys &&chmod 600 /root/.ssh/authorized keys &&ll /root/.ssh/authorized keys &&echo "done"7.Copy the passwordless SSH key pairs to all Hadoop nodes:clear &&ssh root@hdp-master01 "mkdir -p /root/.ssh && chmod 700 /root/.ssh"&& scp /root/.ssh/authorize

Hadoop cluster on physical hardware servers or a virtualization platform. Figure 1. Reference architecture of Hadoop tiered storage with an Isilon or ECS system Figure 2 shows the high-level reference architecture of Hadoop tiered storage with an Isilon cluster. Figure 2. Reference architecture of Hadoop tiered storage with an Isilon cluster

Related Documents:

1: hadoop 2 2 Apache Hadoop? 2 Apache Hadoop : 2: 2 2 Examples 3 Linux 3 Hadoop ubuntu 5 Hadoop: 5: 6 SSH: 6 hadoop sudoer: 8 IPv6: 8 Hadoop: 8 Hadoop HDFS 9 2: MapReduce 13 13 13 Examples 13 ( Java Python) 13 3: Hadoop 17 Examples 17 hoods hadoop 17 hadoop fs -mkdir: 17: 17: 17 hadoop fs -put: 17: 17

2006: Doug Cutting implements Hadoop 0.1. after reading above papers 2008: Yahoo! Uses Hadoop as it solves their search engine scalability issues 2010: Facebook, LinkedIn, eBay use Hadoop 2012: Hadoop 1.0 released 2013: Hadoop 2.2 („aka Hadoop 2.0") released 2017: Hadoop 3.0 released HADOOP TIMELINE Daimler TSS Data Warehouse / DHBW 12

The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop.

This reference architecture supports hot-tier data in high-throughput, low-latency local storage, and cold-tier data in capacity-dense remote storage. You can deploy the Hadoop cluster on physical hardware servers or a virtualization platform. Reference architecture of Hadoop tiered storage with a PowerScale cluster 2.2 Key components 2.2.1 .

The In-Memory Accelerator for Hadoop is a first-of-its-kind Hadoop extension that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, including Hadoop 1.x and Hadoop 2.x distributions. The In-Memory Accelerator for Hadoop is designed to provide the same performance

Configuring SSH: 6 Add hadoop user to sudoer's list: 8 Disabling IPv6: 8 Installing Hadoop: 8 Hadoop overview and HDFS 9 Chapter 2: Debugging Hadoop MR Java code in local eclipse dev environment. 12 Introduction 12 Remarks 12 Examples 12 Steps for configuration 12 Chapter 3: Hadoop commands 14 Syntax 14 Examples 14 Hadoop v1 Commands 14 1 .

-Type "sudo tar -xvzf hadoop-2.7.3.tar.gz" 6. I renamed the download to something easier to type-out later. -Type "sudo mv hadoop-2.7.3 hadoop" 7. Make this hduser an owner of this directory just to be sure. -Type "sudo chown -R hduser:hadoop hadoop" 8. Now that we have hadoop, we have to configure it before it can launch its daemons (i.e .

Financial Accounting Working Papers, Robert F. Meigs, Jan R. Williams, Sue Haka, Susan F. Haka, Mark S Bettner, Jun 1, 2000, Business & Economics, 400 pages. . Accounting Chapters 1-14 The Basis for Business Decisions, Robert F. Meigs, Jan R. Williams, Sue Haka, Susan F. Haka, Mark S. Bettner, Sep 1, 1998, Business & Economics, . The Study Guide enables the students to measure their progress .