Introduction To Big Data Tools

2y ago

6 Views

3 Downloads

520.48 KB

26 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Farrah Jaffe

Report this link

Download PDF

Transcription

Introduction to Big Data toolsThe tools used for Big Data handling andanalysis and further reporting are calledBigData ToolsThe Big Data Tools are Hadoop Spark, Scala Impala etc

What is Hadoop? Apahe Hadoop is a framework that allows forthe distributed processing of large data setsacross clusters of commodity computers using asimple programming model. Hadoop used by Yahoo,IBM,Google,Amezon andmany more. Aadhar scheme is using Hadoop in India. Mapreduce is simple programming model used inhaddop.

Main components1. HDFS -Haddoop Dsitributed FileSystem(Storage)2. MapReduce(Processing)

HDFSHDFS is a file system designed for storing verylarge files with streaming data access patterns,running clusters on commodityhardware(Simple hardware)

HDFS-Hadoop Distributed File SystemFeatures of Hadoop: Highly fault totarent(replicate data on min of 3systems) High throughput—(in short time huge data canread processed) Suitable for applications with large data sets Streaming access to file system data--write oneceand read may times and analyzing logs Can be built out of Commoditty hardware

Apache Spark Apache Spark is an open source clustercomputing framework originally developed inthe AMPLab at University of California,Berkeley but was later donated to the ApacheSoftware Foundation where it remains today In contrast to Hadoop's two-stage disk-basedMap Reduce paradigm, Spark's multi-stage inmemory primitives provide performance up to100 times faster for certain applications.

By allowing user programs to load data into acluster's memory and query it repeatedly,Spark is well-suited to machine learningalgorithms Spark requires a cluster manager and adistributed storage system

For cluster management, Spark supports standalone(native Spark cluster), Hadoop YARN, or Apache Mesos. For distributed storage, Spark can interface with a widevariety, including Hadoop Distributed File System(HDFS), Cassandra, OpenStack Swift, Amazon S3, or acustom solution can be implemented. Spark also supports a pseudo-distributed local mode,usually used only for development or testing purposes,where distributed storage is not required and the localfile system can be used instead; in such a scenario, Spark is run on a single machinewith one executor per CPU core.

SCALA The name Scala is scalable" and "language", signifyingthat it is designed to grow with the demands of itsusers. Scala is a programming language for general softwareapplications. Scala has full support for functional programming anda very strong static type system. This allows programs written in Scala to be very conciseand thus smaller in size than other general-purposeprogramming languages. Many of Scala's design decisions were inspired bycriticism of the shortcomings of Java.

Scala source code is intended to be compiledto Java byte code, so that the resultingexecutable code runs on a Java virtualmachine. Java libraries may be used directly in Scalacode and vice versa Like Java, Scala is object-oriented, and usescurly-brace syntax reminiscent of the Cprogramming language

Unlike Java, Scala has many features of functionalprogramming languages like Schema, StandardML and Haskell, including currying, typeinference, immutability, lazy evaluation, andpattern matching It also has an advanced type system supportingalgebraic data types, covariance andcontravariance, higher-order types (but nothigher-rank types), and anonymous types. Other features of Scala not present in Javainclude operator overloading, optionalparameters, named parameters, raw strings, andno checked exceptions.

CLOUDERA IMPALA Open source massively parallel processing (MPP)SQL query engine for data stored in a computercluster running Apache Hadoop Cloudera Impala is a query engine that runs onApache Hadoop Impala brings scalable parallel databasetechnology to Hadoop, enabling users to issuelow-latency SQL queries to data stored in HDFSand Apache HBase without requiring datamovement or transformation.

Impala is integrated with Hadoop to use thesame file and data formats, metadata, securityand resource management frameworks usedby MapReduce, Apache Hive, Apache Pig andother Hadoop software Impala is promoted for analysts and datascientists to perform analytics on data storedin Hadoop via SQL or business intelligencetools.

The result is that large-scale data processing(via MapReduce) and interactive queries canbe done on the same system using the samedata and metadata – removing the need tomigrate data sets into specialized systemsand/or proprietary formats simply to performanalysis.

Features Supports HDFS and Apache HBase storage, Reads Hadoop file formats, including text, LZO,SequenceFile, Avro, RCFile, and Parquet Supports Hadoop security (Kerberos authentication) Fine-grained, role-based authorization with Apache Sentry, Uses metadata, ODBC driver, and SQL syntax from ApacheHive. In early 2013, a column-oriented file format called Parquetwas announced for architectures including Impala. In December 2013, Amazon Web Services announcedsupport for Impala. In early 2014, MapR added support forImpala

Identify gaps in the data and follow-upfor decision makingThere can be two types of gap in Data:1. Missing Data Imputation(Claim)2. Model based Techniques Missing values are replaced with Average valueor Removal. While for analysis to be proper we select thevariables for modeling based on correlationtest results

Techniques of dealing with missingdata Missing data reduce the representativeness ofthe sample and can therefore distortinferences about the population If it is possible try to think about how toprevent data from missingness before theactual data gathering takes place

Imputation Data analysis technique is good to considerimputing the missing dataImputation can be done in several ways: Use multiple imputations (5 or fewer)improves the quality of estimation.

Examples of imputations are listed below.Partial imputation The expectation-maximization algorithm is anapproach in which values of the statisticswhich would be computed if a completedataset were available are estimated(imputed), taking into account the pattern ofmissing data. In this approach, values for individual missingdata-items are not usually imputed

Partial deletion: Methods which involve reducing the dataavailable to a dataset having no missing valuesinclude: Listwise deletion/casewise deletion Pairwise deletion

Full analysis: Methods which take full account of allinformation available, without the distortionresulting from using imputed values as if theywere actually observed: The expectation-maximization algorithm full information maximum likelihoodestimation

Interpolation : In the mathematical field of numericalanalysis, interpolation is a method ofconstructing new data points within the rangeof a discrete set of known data points.

Model-Based Techniques Model-Based Techniques uses tools for testingmissing data types (MCAR, MAR, MNAR) andfor estimating parameters under missing dataconditions

For example, a test for refuting MAR/MCARreads as follows: For any three variables X,Y, and Z where Z isfully observed and X and Y partially observed,the data should satisfy: X Ry (Rx , Z) In words, the observed portion of X should beindependent on the missingness status of Y,conditional on every value of Z.

When data falls into MNAR categorytechniques are available for consistentlyestimating parameters when certainconditions hold in the model. For example, if Y explains the reason formissingness in X and Y itself has missingvalues, the joint probability distribution of Xand Y can still be estimated if the missingnessof Y is random

The estimand in this case will be:P(X,Y) P(X Y)P(Y) P(X Y, Rx 0, Ry 0)P(Y Ry 0)where Rx 0 and Ry 0 denote the observedportions of their respective variables.

cluster running Apache Hadoop Cloudera Impala is a query engine that runs on Apache Hadoop Impala brings scalable parallel database . ODBC driver, and SQL syntax from Apache Hive. In early 2013, a column-oriented file format called Parquet was announced for architectures including Impala.

Related Documents:

Big Data Analytics Turning Big Data Into Big Money

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

55 Views

1y ago

BigDataBench: a Big Data Benchmark Suite from Internet Services

big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target e-

27 Views

1y ago

A Study on Big data security issues and challenges

of big data and we discuss various aspect of big data. We define big data and discuss the parameters along which big data is defined. This includes the three v’s of big data which are velocity, volume and variety. Keywords— Big data, pet byte, Exabyte

50 Views

2y ago

Top Big Data Analytics Use Cases - oracle.com

Retail. Big data use cases 4-8. Healthcare . Big data use cases 9-12. Oil and gas. Big data use cases 13-15. Telecommunications . Big data use cases 16-18. Financial services. Big data use cases 19-22. 3 Top Big Data Analytics use cases. Manufacturing Manufacturing. The digital revolution has transformed the manufacturing industry. Manufacturers

29 Views

1y ago

Big Data: Impact and applications in Grocery Retail

Big Data in Retail 80% of retailers are aware of Big Data concept 47% understand impact of Big Data to their business 30% have executed a Big Data project 5% have or are creating a Big Data strategy Source: "State of the Industry Research Series: Big Data in Retail" from Edgell Knowledge Network (E KN) 6

21 Views

1y ago

Big Data Analytics - Vivomente

tdwi.org 5 Introduction 1 See the TDWI Best Practices Report Next Generation Data Warehouse Platforms (Q4 2009), available on tdwi.org. Introduction to Big Data Analytics Big data analytics is where advanced analytic techniques operate on big data sets. Hence, big data analytics is really about two things—big data and analytics—plus how the two have teamed up to

35 Views

1y ago

big data processing with hadoop - CORE

Hadoop, Big Data, HDFS, MapReduce, Hbase, Data Processing . CONTENTS LIST OF ABBREVIATIONS (OR) SYMBOLS 5 1 INTRODUCTION TO BIG DATA 6 1.1 Current situation of the big data 6 1.2 The definition of Big Data 7 1.3 The characteristics of Big Data 7 2 BASIC DATA PROCESSING PLATFORM 9

11 Views

1y ago

Your Business Turn Big Data - download.microsoft.com

Why Microsoft for Big Data? Microsoft is about making Big Data actionable for your business. When you choose Microsoft Big Data solutions, everyone in your company can tap into Big Data to get insights through familiar, easy-to-use tools they work with every day —whether at their desks or on their mobile devices. Because Microsoft Big Data .

25 Views

1y ago

Recent Views

Yahoo: Failures - Harvard University

Stock closes at an all time low 8.11 Yahoo invested 1Bn in Alibaba Yahoo co-founder & CEO Jerry Yang steps down after 18 months Microsoft and Yahoo agree to search partnership 2008 Yahoo tries to buy Google for 3Bn. Google denied the offer 2009 Yahoo acquires many media companies Microsoft tries to buy Yahoo for 44.6Bn Yahoo denied offer .

1y ago

200 Views

Reviewers Guide – AT&T Yahoo! Go Mobile

Reviewers Guide – AT&T Yahoo! Go Mobile AT&T Yahoo! Go Mobile gives you access to a wide range of the Yahoo! services you . select download then select attachments to view and download the attachment. 4 . emoticons, audibles, voice IMs and attach photos to IM conversations. To use Yahoo! Messenger, click on Messenger in the Yahoo! Go .

2y ago

369 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Yahoo Microsoft: A Horizontal Romance, or a Broken

News, Finance, Sports and Rivals Entertainment -Yahoo! Music, Movies, TV, Games, Video and omg! Life Style - Yahoo! Autos, Real Estate, Food, Tech, Kids, Health o Connected Life - Co-branded broadband, Yahoo! Moblie Digital Home, Desktop

1y ago

127 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

kama sastry 2004@yahoo.co.uk in.groups.yahoo .

kama_sastry_2004@yahoo.co.uk up/hot-indi

2y ago

477 Views

IX. “Can You Buy Me Now?”: The Erratic Closing of the .

2016-2017 Developments in Banking law 547 by both parties, Verizon was supposed to purchase Yahoo’s shares for 4,825,800,000.965 Excluded from the transaction were Yahoo’s holdings in Yahoo Japan and Alibaba.966 The sale will end Yahoo’s twenty

2y ago

358 Views

Implementasi Rest Web Service Pada Aplikasi Pengolah Pesan Yahoo . - Core

REST Web Service: Gambar 3. Desain Sistem REST Web Service 3. HASIL DAN PEMBAHASAN 3.1 Gambaran Umum Aplikasi Pada Penelitian ini akan menghasilkan sebuah aplikasi pengolah pesan Yahoo Messenger dan Aplikasi REST Web Service. Aplikasi pengolah pesan Yahoo Messenger berfungsi untuk mengirim dan menerima pesan Yahoo Messenger.

1y ago

165 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

Vol. 36 No. 7 - tall

Finance Officer Barry Umbs xxtallbarry@aol.com Secretary Mary Kershner tllskr@yahoo.com Editor Megan Lukans pdxmegan@yahoo.com Miss TI Coordinator Erica Hand QueenErica2015@gmail.com Alt. Exec Officer Patty Huggett pjh2637@yahoo.com Treasurer Bob Huggett Sactallbob@gmail.com

1y ago

106 Views

Introduction To Big Data Tools

It looks like you're using an ad-blocker