Big Data Analytics - Learnerspoint

1y ago
10 Views
2 Downloads
2.75 MB
23 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Troy Oden
Transcription

Learners Point TRAINING BIG DATA ANALYTICS UPSKILL NOW learnerspoint.org follow us

Training summary Big Data Analytics is the process of gathering, managing, and analyzing large sets of data (Big Data) to uncover patterns and other useful information. These patterns are a minefield of information and analysing them provide several insights that can be used by organizations to make business decisions. This analysis is essential for large organizations like Facebook who manage over a billion users every day, and use the data collected to help provide a better user experience. An IBM listing states that the demand for data science and analytics is expected to grow from 3,64,000 to nearly 27,20,000 by 2020. According to a recent study done by Forrester, companies only analyze about 12% of the data at their disposal. 88% of the data is ignored, mainly due to the lack of analytics and repressive data silos. Imagine the market share of big data if all companies start analysing 100% of the data available to them. Hence the conclusion is that there is no time like now to start investing for major companies big data Analytics. It is paramount that developers upskill themselves with analytical skills and get ready to take a share of the Big Data career pie. Training summary

Located at the heart of Dubai Learners Point is a well-recognized institute in corporate and individual training in the MENA region and has contributed to the career success of more than 110,000 professionals since its founding in 2001. We are ISO 9001:2015 quality management system certified. Our training institute is licensed by the Government of Dubai, UAE, and our certifications are widely recognized by employers around the globe. We are also associated with CPD UK, the premier accreditation service provider in the United Kingdom.

Who should attend the program? Fresher’s who would like to build the career in the distributed world, this is an introductory course. Laterals which want to learn framework like SPARK, hadoop knowledge will add benefit. 01 Software developers and architects Analytics professionals Senior IT professionals Testing and mainframe professionals Data management professionals Business intelligence professionals Project managers Aspiring data scientists Graduates looking to build a career in Big data analytics Who

Objectives Hadoop Distributed File System (HDFS) and Map-Reduce (MR) Understanding the core concepts of hadoop which includes hadoop distributed file system (HDFS) and map-reduce (MR) Understanding NO-SQL databases like HBASE and CASSANDRA. Understanding hadoop ecosystem like HIVE, PIG,SQOOP and FLUME Acquiring knowledge in other aspects like scheduling Hadoop jobs using Python, R, Ruby. Etc. Developing batch analytics applications for UK web based news channels to Up cast the news and engaging customer with the customized recommendations. Integrating clickstream and sentimental analytics to the UK Web based news channel. Deep knowledge in hadoop - ingestion Phase (FLUME AND SQOOP),Storage Phase(HDFS and HBASE), processing Phase (MR, HIVE, PIG and SPARK), cluster management (standalone and YARN) and integrations (HCATALOG, ZOOKEEPER and OOZIE) Accelerated career growth. Increased pay package due to hadoop skills. Objectives

Course outline Introducing big data & hadoop HDFS (hadoop distributed file system) Hadoop daemon processes Hadoop installation modes and HDFS Course outline Hadoop developer tasks Hadoop ecosystems Data analytics using pentaho as an ETL tool Integrations

Introducing big data & hadoop Learning objective: You will get introduced to real-world problems with Big data and will learn how to solve those problems with state-of-the-art tools. Understand how Hadoop offers solutions to traditional processing with its outstanding features. You will get to Know Hadoop background and different distributions of Hadoop available in the market. Prepare the Unix Box for the training. Topics: 1.1 Big data introduction: What is big data Data analytics Big data challenges Technologies supported by big data Course outline

1.2 Hadoop introduction: What is hadoop? History of hadoop Basic concepts Future of hadoop The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop. Course outline

Hadoop daemon processes Learning objective: You will learn what are the different daemons and their functionality at a high level. Topics: Name node Data node Secondary name node Job tracker Task tracker Hands On: Creates a unix shell script to run all the deamons at one time. Starting HDFS and MR separately Course outline

HDFS (Hadoop distributed file system) Learning objective: You will get to know how to write and read files in HDFS. Understand how name node, data node and secondary name node take part in HDFS architecture. you will also know different ways of accessing HDFS data. Topics: Blocks and input splits Data replication Hadoop rack awareness Cluster architecture and block placement Accessing HDFS JAVA approach CLI approach Hands On: Writes a shell script which write and read files in HDFS. Changes replication factor at three levels. Use Java for working with HDFS. Writes different HDFS commands and also admin commands. Course outline

Hadoop Installation Modes and HDFS Learning Objective: You will learn different modes of hadoop, understand pseudo mode from scratch and work with configuration. You will learn functionality of different HDFS operation and visual representation of HDFS read and write actions with their daemons namenode and data node. Topics: Local Mode Pseudo-distributed Mode Fully distributed mode Pseudo Mode installation and configurations HDFS basic file operations Hands On: Install virtual box manager and install Hadoop in Pseudo distributed mode. Changes the different configuration files required for pseudo distributed mode. Performs different file operations on HDFS. Course outline

Hadoop Developer Tasks Learning Objective: Understand different Phases in Map Reduce including Map, Shuffling, Sorting and Reduce Phases. Get a deep understanding of Life Cycle of MR in YARN submission. Learn about Distributed Cache concept in detail with examples. Write Wordcount MR Program and monitor the Job using Job Tracker and YARN Console. Also learn about more use cases. Course outline Topics: Basic API concepts The driver class The mapper class The reducer class The combiner class The partitioner class Examining a sample mapReduce program with several examples Hadoop's Streaming API Hands On: Learn about writing MR job from scratch,writing different logics in mapper and reducer and submitting the MR job in standalone and distributed mode. Also learn about writing word count MR job, calculating average salary of employee who meets certain conditions and sales calculation using MR.

Hadoop ecosystems 6.1 PIG - Learning objective: Understand the importance of PIG in Big Data world, PIG architecture and PIG Latin commands for doing different complex operation on relations, and also PIG UDF and aggregation functions with piggy bank library. Learn how to pass dynamic arguments to PIG scripts. Topics PIG concepts Install and configure PIG on a cluster PIG Vs MapReduce and SQL Write sample PIG Latin scripts Modes of running PIG PIG UDFs. Hands On: Login to Pig grunt shell to issue PIG latin commands in different execution modes. Different ways of loading and transformation on PIG relations lazily. Registering UDF in grunt shell and perform replicated join operations. Course outline

6.2 HIVE - Learning objective: Understand importance of Hive in Big Data world. Different ways of configuring HIVE metastore. Learn different types of tables in hive. Learn how to optimize hive jobs using partitioning and bucketing and passing dynamic arguments to hive scripts. You will get an understanding of Joins,UDFS,Views etc. Topics: Hive concepts Hive architecture Installing and configuring HIVE Managed tables and external tables Joins in HIVE Multiple ways of inserting data in HIVE tables CTAS, views, alter tables User defined functions in HIVE Hive UDF Hands On: Executes hive queries in different Modes. Creates Internal and External tables. Perform query optimization by creating tables with partition and bucketing concepts. Run system defined and user define functions including explode and windows functions. Course outline

6.3 SQOOP - Learning Objectives: Learn how to import normally and incrementally data from RDBMS to HDFS and HIVE tables, and also learn how to export the data from HDFS and HIVE table to RDBMS. Learns architecture of sqoop Import and export. Topics: SQOOP concepts SQOOP architecture Connecting to RDBMS Internal mechanism of import/export Import data from Oracle/MySQL to HIVE Export data to Oracle/MySQL Other SQOOP commands. Hands On: triggers shell script to call sqoop import and export commands. Learn to automate sqoop incremental imports with entering the ast value of the appended column. Run sqoop export from HIVE table directly to RDBMS. Course outline

6.4 HBASE - Learning Objectives: Understand different types of NOSQL databases and CAP theorem. Learn different DDL and CRUD operations of HBASE. Understand hbase architecture and Zookeeper Importance in managing HBase. Learns Hbase column family optimization and client side buffering Topics: HBASE concepts ZOOKEEPER concepts HBASE and Region server architecture File storage architecture NoSQL vs SQL Defining Schema and basic operations DDLs DMLs HBASE use cases Hands On: Create HBASE tables using shell and perform CRUD operations with JAVA API. Change the column family properties and also perform sharding process. Also create tables with multiple splits to improve the performance of HBASE query. Course outline

6.5 OOZIE - Learning objectives: Understand oozie architecture and monitor oozie workflow using oozie. Understand how coordinator and bundles work along with workflow in oozie. Also learn oozie commands to submit, monitor and Kill the workflow. Topics: OOZIE concepts OOZIE architecture Workflow engine Job coordinator 01Installing and configuring OOZIE HPDL and XML for creating workflows Nodes in OOZIE Action nodes and control nodes Accessing OOZIE jobs through CLI, and web console Develop and run sample workflows in OOZIE Run MapReduce programs Run HIVE scripts/jobs. Hands on: Create the Workflow to incremental Imports of sqoop. Create the workflow for Pig, Hive and sqoop exports. And also execute coordinator to schedule the workflows. Course outline

6.6 FLUME - Learning objectives: Understand flume architecture and its components source, channel and sinks. Configure flume with socket, file sources and HDFS and Hbase sink. Understand fan in and fan out architecture. 01 Topics: FLUME concepts FLUME architecture Installation and configurations Executing FLUME jobs Hands on: Create flume configurations files and configure with different source and sinks. Stream twitter data and create hive table. Course outline

Data analytics using pentaho as an ETL tool Learning objective: You will learn pentaho Big Data best practices guidelines, and techniques documents. Topics: Data Analytics using pentaho as an ETL tool Big Data Integration with zero coding required Hands on: You will use pentaho as ETL tool for data analytics. Course outline

Integrations Learning Objective: You will see different integrations among hadoop ecosystem in a data engineering flow. Also understand how important it is to create a flow for ETL process. Topics: MapReduce and HIVE integration MapReduce and HBASE integration Java and HIVE integration HIVE - HBASE Integration Hands On: Uses storage handlers for integrating HIVE and HBASE. Integrates HIVE and PIG as well. Course outline

About our trainer 27 years in IT Industry, 20 years in Multinationals (IBM/Citibank/SAP) 10 years of Industry experience in Data Science and Big Data Analytics 5 years expertise in Coin and Enterprise Blockchain Technology Published a book “BLOCKCHAIN IN LEGAL SYSTEM” 15 years of expertise in Banking & IT Technology Specialized in Blockchain, Bigdata and Artificial Intelligence integration Specialized in Tier-4 Datacenters design and executed the Asia-Pacific's largest Data Center in India Two times Bravo Award winner in IBM, Special awards and Cash awards from Citibank Y2K Command Center Head, Asia Pacific Head for Unix Systems in IBM Founder Necxury Blockchain Solutions, Blockchain Director, Decimus- Singapore Crypto coins and tokens developed and traded across world, IT Infrastructure Solutions Architect Hands on with 20 OS (Unix, Windows, Linux Etc.), 15 Computer Languages (C, Java, Golang etc.), 10 Databases (Sybase, Oracle, SQL-Serv, Etc. And KVS DB's Couch DB, Mongo DB etc.), Many Middleware's etc. Trainings conducted for major corporates & resources from IBM, SAP, Samba, Bank of America, JPA, Wipro, Accenture Etc. Cloud, Server, and Storages Architect In a single point managed up to 3000 Engineers (Server Engineers, NW, DBA and System SME's) System Security Head in Citicorp for EMEA Trainer

Students review Aftab Ahammed: IT Professional “I am delighted to join Learners Point for the Big Data Analytics course. Their extensive learning path helped me to excel across the entire modules of Big Data. Also, I was pretty much impressed with the trainer. He took ample time to explain the course content. Hats off to him!.” Jeeger Parekh: Finance Advisor “Learners Point Training Institutes is one of the best platforms to learn the key concepts of Big Data Analytics. Big shout to the trainer who explained Hadoop, Administration, Testing and Analysis modules in a much easier manner. His training sessions were structured and easy to follow, plus he finished the course on time.” Student review

971 (04) 403 8000 (04) 3266 880 info@learnerspoint.org #610 - Business Center, Burjman Metro Station Exit 4, Mashreq Bank Building - Dubai learnerspoint.org follow us

The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop.

Related Documents:

tdwi.org 5 Introduction 1 See the TDWI Best Practices Report Next Generation Data Warehouse Platforms (Q4 2009), available on tdwi.org. Introduction to Big Data Analytics Big data analytics is where advanced analytic techniques operate on big data sets. Hence, big data analytics is really about two things—big data and analytics—plus how the two have teamed up to

big data analytics" To discuss the in-depth analysis of hardware and software platforms for big data analytics The study only focused on the hardware and software platform for big data analytics. The review is centered on the impact of parameters such as scalability, data sizes, resources availability on big data analytics. However, the

India has the second largest unmet demand for AI and Big Data/Analytics, driven primarily by large service providers, GCCs and the start-up ecosystem NCR Others Hyderabad Pune Mumbai Bangalore Chennai Top Skills Talent Big Data/ Analytics 5,800 AI 1,200 Top Skills Talent Big Data/ Analytics 19,100 AI 7.400 Top Skills Talent Big Data/ Analytics .

Q) Define Big Data Analytics. What are the various types of analytics? Big Data Analytics is the process of examining big data to uncover patterns, unearth trends, and find unknown correlations and other useful information to make faster and better decisions. Few Top Analytics tools are: MS Excel, SAS, IBM SPSS Modeler, R analytics,

example, Netflix uses Big Data Analytics to prescribe favourite song/movie based on customer‟s interests, behaviour, day and time analysis. 3. Python For Big Data Analytics 3.1 . Advantages. of . Python for Big Data Analytics Python. is. the most popular language amongst Data Scientists for Data Analytics not only because of its ease in

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

Retail. Big data use cases 4-8. Healthcare . Big data use cases 9-12. Oil and gas. Big data use cases 13-15. Telecommunications . Big data use cases 16-18. Financial services. Big data use cases 19-22. 3 Top Big Data Analytics use cases. Manufacturing Manufacturing. The digital revolution has transformed the manufacturing industry. Manufacturers

The process of analyzing big data to extract useful information and insights is usually referred to as big data analytics or big data valu e chain [6], which is considered as one of the key enabling technologies of smart cities [7, 8, 9]. However, big data complexities comprise non-trivial challenges for the processes of big data analytics [3].