Using Big Data For The Analysis Of - FIWARE

1y ago

15 Views

2 Downloads

2.48 MB

58 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Sabrina Baez

Report this link

Download PDF

Transcription

Using Big Data for the analysis ofhistoric context informationFrancisco Romero BuenoTechnological Specialist. FIWARE data engineerfrancisco.romerobueno@telefonica.com

Big Data:What is it and how much data isthere2

What is big data? smalldata3

What is big or view of Stockholm Public Library.jpg big data4

Not a matter of thresholdsIf both the data used by your app and the processingcapabilities your app logic needs fit the availableinfrastructure, then you are not dealing with a bigdata problemIf either the data used by your app either theprocessing capabilities your app logic needs don’t fitthe available infrastructure, then you are facing a bigdata problem, and you need specialized services5

How much data is there?6

Data growing ni/vni-hyperconnectivity-wp.html

Two (three) approaches fordealing with Big Data:Batch and stream processing(and Lambda architectures)8

Batch processing It is about joining a lot of data (batching)– A lot may mean Terabytes or more – Most probably, data cannot be stored in a singleserver Once joined, it is analyzed– Most probably, aata cannot be analyzed using a singleprocess Time is not a problem– Batching can last for days or even months– Processing can last for hours or even days Analysis can be reproduced9

Stream processing It is about not storing the data and analyzingit on the fly– Most probably, data cannot be analyzed by asingle process Time is important– Since the data is not stored, it must be analyzedas it is received– The results are expected to be available in nearreal-time Analysis cannot be reproduced10

Lambda architectures A Big Data architecture is Lambda compliant if itproduces near-real time data insights based on thelast data only while large batches are accumulatedand processed for robust insights– Data must feed both batch-based and stream-basedsub-systems– Real-time insights are cached– Batch insights are cached– Queries to the whole system combine both kinds ofinsights http://lambda-architecture.net/11

Distributed storage:The Hadoop reference (HDFS)12

What happens if one shelving is notenough?You buy more shelves 13

then you create an index“The Avengers”, 1-100, shelf 1“The Avengers”, 101-125, shelf 2“Superman”, 1-50, shelf 2“X-Men”, 1-100, shelf 3“X-Men”, 101-200, shelf 4“X-Men”, 201, 225, shelf nTheAvengers14X-Men

Hadoop Distributed File System (HDFS) Based on Google File System Large files are stored across multiple machines(Datanodes) by spliting them into blocks that aredistributed Metadata is managed by the Namenode Scalable by simply adding more Datanodes Fault-tolerant since HDFS replicates each block (defaultto 3) Security based on authentication (Kerberos) andauthorization (permissions, HACLs) It is managed like a Unix-like file system15

Spliting, replication and distribution142111234large file.txt(4 blocks)332443rack 1: datanodes 1 to 4162rack 2: datanodes 5 to 8

Namenode metadataPathReplicas/user/user1/data/lar 3ge file.txtBlock IDs1 {dn1,dn2,dn5}2 {dn3,dn5,dn8}3 {dn3,dn6,dn8}4 {dn1,dn4,dn7}/user/user1/data/ot 2her file.txt5 { }6 { }7 { } 17

Datanodes failure recovering142111234332324432large file.txt(4 blocks)rack 1: datanodes 1 to 418rack 2: datanodes 5 to 8

Namenode failure recoveringPathReplicas/user/user1/data/lar 3ge file.txtBlock IDs1 {dn1,dn2,dn5}2 {dn2,dn5,dn8}3 {dn4,dn6,dn8}4 {dn1,dn4,dn7}/user/user1/data/ot 2her file.txt5 { }6 { }7 { } 19

services nodeHttpFSssh clientHadoopCommandsbrowserAPI RESTAPI RESTcustom appshttpclient machineManaging HDFSHUEWebHDFSssh daemonHDFS20

Managing HDFS: HTTP REST API The HTTP REST API supports the complete FileSystem interface for HDFS– Other Hadoop commands are not available through aREST API It relies on the webhdfs schema for URIswebhdfs:// HOST : HTTP PORT / PATH HTTP URLs are built as:http:// HOST : HTTP PORT /webhdfs/v1/ PATH ?op Full API specification– t-dist/hadoop-hdfs/WebHDFS.html21

Managing HDFS: HTTP REST APIexamples curl –X GET ser/frb/webinar/abriefhistoryoftime page1?op open&user.name frb”CHAPTER 1OUR PICTURE OF THE UNIVERSEA well-known scientist (some say it was Bertrand Russell) once gave a public lecture onastronomy. He described how the earth orbits around the sun and how the sun, in turn,orbits around the center of a vast curl -X PUT r/frb/webinar/afolder?op mkdirs&user.name frb"{"boolean":true} curl –X GET r/frb/webinar?op liststatus&user.name abriefhistoryoftime riefhistoryoftime riefhistoryoftime replication":0}]}} curl -X DELETE r/frb/webinar/afolder?op delete&user.name frb"{"boolean":true}22

Distributed batch computing:The Hadoop reference(MapReduce)23

What happens if you cannot read allyour books?24

Hadoop was created by Doug Cutting atYahoo!. based on the MapReduce patent by Google25

Well, MapReduce was really inventedby Julius CaesarDivide etimpera** Divide and conquer26

An exampleHow much pages are written in latin among the booksin the Ancient Library of INREF5P34GREEKREF2P128LATINpages 4545 (ref 1)still reading 12Mappers27

An exampleHow much pages are written in latin among the booksin the Ancient Library of EKREF2P128stillreading 45 (ref pers28

An exampleHow much pages are written in latin among the booksin the Ancient Library of Alexandria?GREEKREF7P20LATINREF4P73LATINpages 73LATINREF5P34LATINpages 3445 (ref 1) 73 (ref 4) 34 (ref 5)ReducerEGYPTIANGREEKREF8P230Mappers29

An exampleHow much pages are written in latin among the booksin the Ancient Library of Alexandria?GREEKGREEKREF7P2045 (ref 1) 73 (ref 4) 34 (ref 5)idle ReducerGREEKGREEKREF8P230Mappers30

An exampleHow much pages are written in latin among the booksin the Ancient Library of Alexandria?idle 45 (ref 1) 73 (ref 4) 34 (ref 5)idle Reduceridle Mappers31152 TOTAL

Another exampleHow much pages are written in all the languages amongthe books in the Ancient Library of F2P128EGYPTREF6P10EGYPTREF3P12still reading egy,12Reducer(egy,12)Mappers32

Another exampleHow much pages are written in all the languages amongthe books in the Ancient Library of Alexandria?GREEKREF7P20stillreading y,10ReducerMappers33gre,128

Another exampleHow much pages are written in all the languages amongthe books in the Ancient Library of ,230

Another exampleHow much pages are written in all the languages amongthe books in the Ancient Library of Alexandria?(gre,20)GREEKREF7P20Reduceridle gre,128gre,230gre,20

Another exampleHow much pages are written in all the languages amongthe books in the Ancient Library of Alexandria?idle Reduceridle lat,45lat,73lat,34lat,152egy,12egy,10egy,22idle ReducerMappers36gre,128gre,230gre,20gre,378

Writing MapReduce applications MapReduce applications are commonly written inJava language:– Can be written in other languages through Hadoop Streaming A MapReduce job consists of:– A driver, a piece of software where to define inputs, outputs,formats, etc. and the entry point for launching the job– A set of Mappers, given by a piece of software defining itsbehaviour– A set of Reducers, given by a piece of software defining itsbehaviour uce section)37

Implementing the example The input will be a single big file containing:symbolae ia est vincit,latin,134 The mappers will receive pieces of the above file, which will be read line byline– Each line will be represented by a (key,value) pair, i.e. the offset on the file and the realdata within the line, respectively– For each input pair a (key,value) pair will be output, i.e. a common “num pages” key andthe third field in the line The reducers will receive arrays of pairs produced by the mappers, allhaving the same key (“num pages”)– For each array of pairs, the sum of the values will be output as a (key,value) pair, in thiscase a “total pages” key and the sum as value38

Implementing the example: JCMapper.classpublic static class JCMapper extendsMapper Object, Text, Text, IntWritable {private final Text globalKey new Text(”num pages");private final IntWritable bookPages new IntWritable();@Overridepublic void map(Object key, Text value, Context context)throws Exception {String[] fields (“Processing “ fields[0]);if (fields[1].equals(“latin”)) {bookPages.set(fields[2]);context.write(globalKey, bookPages);} // if} // map} // JCMapper39

Implementing the example: JCReducer.classpublic static class JCReducer extendsReducer Text, IntWritable, Text, IntWritable {private final IntWritable totalPages new IntWritable();@Overridepublic void reduce(Text globalKey, Iterable IntWritable bookPages, Context context) throws Exception {int sum 0;for (IntWritable val: bookPages) {sum val.get();} // fortotalPages.set(sum);context.write(globalKey, totalPages);} // reduce} // JCReducer40

Implementing the example: JC.classpublic static void main(String[] args) throws Exception {int res ToolRunner.run(new Configuration(), new CKANMapReduceExample(), args);System.exit(res);} // main@Overridepublic int run(String[] args) throws Exception {Configuration conf this.getConf();Job job Job.getInstance(conf, ”julius th(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));return job.waitForCompletion(true) ? 0 : 1;} // run41

Simplifying the batch analysis:Querying tools42

Querying tools MapReduce paradigm may be hard to understandand, the worst, to use Indeed, many data analyzers just need to query forthe data– If possible, by using already well-known languages Regarding that, some querying tools appeared in theHadoop ecosystem– Hive and its HiveQL language quite similar to SQL– Pig and its Pig Latin language a new language43

Hive and HiveQL HiveQL reference– anguageManual All the data is loaded into Hive tables– Not real tables (they don’t contain the real data) but metadatapointing to the real data at HDFS The best thing is Hive uses pre-defined MapReduce jobsbehind the scenes!–––––Column selectionFields groupingTable joiningValues filtering Important remark: since MapReduce is used by Hive, thequeries make take some time to produce a result44

Hive CLI hivehive historyfile /tmp/myuser/hive job log opendata XXX XXX.txthive select column1,column2,otherColumns from mytable wherecolumn1 'whatever' and columns2 like '%whatever%';Total MapReduce jobs 1Launching Job 1 out of 1Starting Job job 201308280930 0953, Tracking URL http://cosmosmastergi:50030/jobdetails.jsp?jobid job 201308280930 0953Kill Command /usr/lib/hadoop/bin/hadoop job Dmapred.job.tracker cosmosmaster-gi:8021 -killjob 201308280930 09532013-10-03 09:15:34,519 Stage-1 map 0%, reduce 0%2013-10-03 09:15:36,545 Stage-1 map 67%, reduce 0%2013-10-03 09:15:37,554 Stage-1 map 100%, reduce 0%2013-10-03 09:15:44,609 Stage-1 map 100%, reduce 33%45

Hive Java API Hive CLI and Hue are OK for human-driven testingpurposes– Hive has no REST APIHive has several drivers and libraries–––––– But it is not usable by remote applicationsJDBC for JavaPythonPHPODBC for C/C Thrift for Java and C iveClientA remote Hive client usually performs:––A connection to the Hive server (TCP/10000)The query execution46

Hive Java API: get a connectionprivate static Connection getConnection(String ip, String port,String user, String password) {try iver");} catch (ClassNotFoundException e) {System.out.println(e.getMessage());return null;} // try catchtry {return DriverManager.getConnection("jdbc:hive://" ip ":” port "/default?user " user "&password “ password);} catch (SQLException e) {System.out.println(e.getMessage());return null;} // try catch} // getConnection47

Hive Java API: do the queryprivate static void doQuery() {try {Statement stmt con.createStatement();ResultSet res stmt.executeQuery("select column1,column2,” "otherColumns from mytable where “ “column1 'whatever' and “ "columns2 like '%whatever%'");while (res.next()) {String column1 res.getString(1);Integer column2 res.getInteger(2);} // whileres.close(); stmt.close(); con.close();} catch (SQLException e) {System.exit(0);} // try catch} // doQuery48

Hive tables creation Both locally using the CLI, or remotely using the Java API, use thiscommand:create [external] table. CSV-like HDFS filescreate external table table name ( field1 name field1 type , ., fieldN name fieldN type ) row formatdelimited field terminated by ‘ separator ' location‘/user/ username / path / to / the / data '; Json-like HDFS filescreate external table table name ( field1 name field1 type , ., fieldN name fieldN type ) row formatserde 'org.openx.data.jsonserde.JsonSerDe' location‘/user/ username / path / to / the / data ’;49

Distributed streamingcomputing:The Storm reference50

Storm project Created by Natham Marz at BackType/Twitter Distributed realtime computation system51

Storm basics Based on processing building blocks that can be composedin a topology–– It is scalable and fault-tolerant–– Spouts: blocks in charge of polling for data streams, producingdata tuplesBolts: blocks in charge of processing data tuples, performing basicoperations 1:1 operations: arithmetics, transformations N:1 operations: filtering, joining 1:N operations: spliting, replication A basic operation can be replicated many times in a layer of boltsIf a bolt fails, there are serveral other bolts performing the samebasic operation in the layerGuarantees the data will be processed–Storm perform an ACK mechanism for data tuples52

Big Data in FIWARE Lab:Cosmos and Sinfonier53

Cosmos Cosmos is the name of the Hadoop-basedglobal instance in FIWARE Lab Nothing has to be installed! There are two clusters exposing someservices:– Storage (storage.cosmos.lab.fiware.org) WebHDFS REST API (TCP/14000)– Computing (computing.cosmos.lab.fiware.org) Tidoop REST API (TCP/12000)Auth proxy (TCP/13000)HiveServer2 (TCP/10000)54

Feeding Cosmos with context data Cygnus tool– Apache Flume-based Standard NGSI connector for FIWARE Provides connectors for a wide variety ofpersistence backends–––––HDFSMySQLCKANMongoDBSTH Comet––––PostgreSQLKafkaDynamoDBCarto55

Sinfonier Sinfonier will be the name of the Stormbased global instance in FIWARE Lab Nothing will have to be installed! There will be one cluster exposing streaminganalysis services through an IDE Will be fed using Cygnus and Kafka queues Coming soon!56

Thank you!http://fiware.orgFollow @FIWARE on Twitter

A Big Data architecture is Lambda compliant if it produces near-real time data insights based on the last data only while large batches are accumulated and processed for robust insights -Data must feed both batch-based and stream-based sub-systems -Real-time insights are cached -Batch insights are cached

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

115 Views

9m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

469 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

328 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

125 Views

9m ago

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

376 Views

1y ago

Kinh Giải Thâm Mật HT. Thích Trí Quang dịch giải

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

1.6K Views

3y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

737 Views

2y ago

Recent Views

TENTH EDITION self-therapy for the stutterer

Stuttering Foundation of America self-therapy for the stutterer TENTH EDITION THE STUTTERING FOUNDATION PUBLICATION NO. 0012 self-therapy for the stutterer Publication No. 0012 First Edition—1978 Tenth Edition—2002 Revised Tenth Edition—2007 Published by Stuttering Foundation of America 3100 Walnut Grove Road, Suite 603 P.O. Box 11749 Memphis, Tennessee 38111-0749 Library of Congress .

3y ago

40 Views

Supply Chain Management: An International Journal

The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of Nottingham At 06:12 31 October 2018 (PT) Modern slavery challenges to supply chain management Stefan Gold International Centre for .

3y ago

29 Views

Operation London Bridge - Fremington Parish Council

OPERATION LONDON BRIDGE . 1 CONTENTS Page 2 – 1. Introduction Page 3 – 2. Protocol Page 3 – 2.1 Implementation of Protocol Page 3 – 3. Flag Flying Page 3 – 4. Proclamation Day Schedule Page 4 – 4.1 Proclamation Day Page 4 – 4.2 Proclamation Day Protocol Page 5 – 5. Books of Condolence Page 6 – 5.1 Online Book of Condolence Page 6 – 6. Events During the Period of Mourning .

3y ago

62 Views

A CONTINUUM OF QUALITY: ON FIRE

ASTM D 5132 BSS 7230 MODEL 701-S MODEL 701-S-X (export) MODEL VC-1 MODEL VC-1-X (export) MODEL VC-2 MODEL VC-2-X (export) MODEL HC-1 MODEL HC-1-X (export) MODEL HC-2 MODEL HC-2-X (export) FAA Listed TM. FAA MULTI-PURPOSE SMALL SCALE FLAMMABILITY TESTER SPECIFICATIONS: FAR Part 25 Appendix F Part I (Vertical, Horizontal, 45 and 60 ) DRAPERY FLAMMABILITY The most widely cited .

3y ago

80 Views

Combustion Analysis of Nanoenergetic Materials

Osci 1 05 10 15 P a [MPa] Acc Osci. NEEM MURI Temperature Measurements for understanding Gas Generation Previous work: gas fraction at equilibrium Drawbacks: No intermediate gases (not present at equilibrium) nAl/MoO 3 30 Many of the equilibrium gases will not be realized until very high temperatures (ex. Cu: BP of 2835K) nAl/CuO in burn tube at 10 20 e ssure [MPa] 1atm in air nAl/MoO .

3y ago

37 Views

Wiring and testing electrical equipment and circuits

circuits to occur, strain on terminations, insufficient slack cable at terminations, continuity and polarity checks, insulation checks) K21 the care, handling and application of electrical test and measuring instruments (such as multimeter, insulation resistance tester, loop impedance test instruments) K22 applying approved test procedures; the safe working practices and procedures required .

3y ago

46 Views

GRID DIP METER DESIGN - makearadio

circuits). 2. Rough frequency and harmonic measurements 3. AM signal monitor receiver. 4. Simple RF signal generator including AM modulation if required. 5. Crystal Testing. 6. Use as a BFO for SSB and CW reception 7. Measurement of unknown capacitors and inductors I decided to include some extra features above the normal in functionality RF output from the oscillator enabling use of an .

3y ago

208 Views

OPHTHALMOLOGY GOALS AND OBJECTIVES

The objectives of Ophthalmology Residency Program are to: 1. Provide residents with a strong scientific understanding of the fundamentals of ophthalmology through a combination of mentoring and didactic education. 2. Provide residents with clinical skills in all subspecialties of ophthalmology. 3.

3y ago

60 Views

History of Computers

An analog computer does not store information digitally Values are stored as voltage levels Analog computers are particularly useful solving nonlinear simultaneous differential equations An electric circuit can be defined by an equation. An analog computer is programmed by creating a circuit that follows a desired equation.

3y ago

37 Views

Risk Management and Corporate Governance - OECD

Corporate Governance Risk Management and Corporate Governance Volume 2011/Number of issue,Year of edition Author (affiliation or title), Editor Tagline Groupe de travail/Programme (ligne avec top à 220 mm)

3y ago

66 Views

RF Design and Test Using MATLAB and NI Tools

RF Design and Test Using MATLAB and NI Tools . Antenna array, RF, and digital signal processing cannot be designed separately! – Large communication bandwidth digital signal processing is challenging – High-throughput DSP linearity requirements imposed over large bandwidth

3y ago

87 Views

Digital Signal Processing - Webspaces - Accueil

J.-P. Delmas et al. / Digital Signal Processing 95 (2019) 102579. lower far-ﬁeld DOA CRB. Furthermore, thanks to the decoupling be-tween the DOA and range parameters to the second-order w.r.t. the inverse of the range in the Fisher information matrix, the deriva-tion of closed-form approximate expressions of the CRB is greatly simpliﬁed.

3y ago

23 Views

History of U.S. Children’s Policy, 1900-Present

Social dislocations of the late 19th century, sparked by rapid industrialization, population growth, urbanization, and immigration, together with the economic crises of the late 1870s and 1890s, led to social reform movements in the 1890s and during the Progressive Era at the beginning of the 20th century. With respect to children, many reformers

3y ago

53 Views

EDUKASYONG PANGKATAWAN 5 Lesson Exemplars Karapatang Ari .

nakasaad sa ilalim ng makabagong kurikulum, ang K to 12 Currriculum. Layunin nito na mabigyan ng sapat na kaalaman at pagpapahalaga sa mga gawaing may kinalaman sa pagpapaunlad ng pangangatawan. Sa paghahanda ng mga aralin na nakapaloob sa exemplar na ito, isinasaalang-alang ang mga sumusunod na pangunahing kaisipan:

3y ago

99 Views

ELECTRICAL ENGINEERING GRADUATE

Electrical Engineering, or is not equivalent to the BSEE degree offered by Cal State LA, we may require you to complete certain prerequisite courses before being admitted to our program. These will normally be 300level courses, though the list mig0- ht contain a number of 2 or 400000-0-

3y ago

30 Views

Using Big Data For The Analysis Of - FIWARE

It looks like you're using an ad-blocker