Apache Ignite - Using A Memory Grid For Heterogeneous Computation .

1y ago

7 Views

1 Downloads

3.22 MB

36 Pages

Last View : 8d ago

Last Download : 3m ago

Upload by : Rosemary Rios

Report this link

Download PDF

Transcription

Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation Chris Herrera Hashmap

Topics Who - Key Hashmap Team Members The Use Case - Our Need for a Memory Grid Requirements Approach V1 Approach V1.5 Approach V2 Lessons Learned What’s Next Questions 2

Who - Hashmap WHO Big Data, IIoT/IoT, AI/ML Services since 2012 HQ Atlanta area with offices in Houston, Toronto, and Pune Consulting Services and Managed Services REACH 125 Customers across 25 Industries PARTNERS Cloud and technology platform providers 3

Who - Hashmap Team Members Jay Kapadnis Akshay Mhetre Chris Herrera Lead Architect Team Lead Chief Architect/Innovation Officer Hashmap Hashmap Hashmap Pune, India Pune, India Houston, TX 4

The Use Case Oilfield Drilling Data Processing

Why - Oilfield Drilling Data Processing The Process Plan Execute Optimize WITSML Server Plan Store 6

Why - Oilfield Drilling Data Processing The Plan How to match the data Deduplication Missing information Various formats Various ingest paths Data Analyst TDM EDM WellView Homegrown Vendors Financial Homegrown 7

Why - Oilfield Drilling Data Processing Rig Site Data Flow WITSML Server MWD Operational Data Missing classification Unknown quality Various formats Various ingest paths Unknown completeness WITSML Server Mud Logger CSV CSV CSV Cement Magic CSV CSV DLIS Wireline Data Analyst 8

Why - Oilfield Drilling Data Processing Oilfield Drilling Data Processing - Office Impossible to generate insights without huge data cleansing operations Extracting value is a very expensive operation that has to be done with a combination of experts Generating reports requires a huge number of man-hours Data Analyst TDM Vendors EDM WellView Financial Homegrown Homegrown 9

Why - Oilfield Drilling Data Processing BUT WAIT 10

Why - Oilfield Drilling Data Processing We still have all the compute to deal with, some of which is very legacy code Parse Parse the data from CSV, WITSML, DLIS, etc. Identify & Enrich Understand where the data came from and what its global key should be Load Load the data into a staging area to start understanding what to do with it Clean Deduplicate, interpolate, pivot, split, aggregate Feature Engineering Persist & Report Generate additional features that are required to get useful insights into the data Land the data into a store that allows for BI reports and interactive queries 11

Requirements What do we have to do?

Functional Requirements Cleaning and Feature Engineering (the legacy code I referred to) Parse WITSML / DLIS Attribute Mapping Unit Conversions Null Value Handling Rig Operation Enrichment Rig State Detection Invisible Lost Time Analysis Anomaly Detection 13

Non-Functional Requirements Description Requirement 1 Heterogeneous Data Ingest Very flexible ingest Flexible simple transformations 2 Robust Data Pipeline Easy to debug Trusted 3 Extensible Feature Engineering Be able to support existing computational frameworks / runtimes 4 Scalable Scales up Scales Down If a data processing workflow fails at a step, it does not continue with erroneous data 5 Reliable 14

Approach V1 How Then?

Solution V1 HDFS HDFS Hive Spark Zeppelin Staging TDM EDM BI Reporting Well View Heterogeneous ingest implemented through a combination of NiFi processors/flows and Spark Jobs Marts WITSML Avro files loaded as external tables CS CS V VFiles WITS ML Server TDM EDM WellView BI connected via ODBC (Tableau) Zeppelin Hive interpreter was used to access the data in Hive Homegrown 16

Issues with the Solution Very Slow BI Tough to debug cleansing Tough to debug feature extractions A lot of overhead for limited benefit Painful data loading process Incremental refresh was challenging Chaining the jobs together in a workflow was very hard Mostly achieved via Jupyter Notebooks In order to achieve the functional requirements, all of the computations were implemented in Spark, even if there was little benefit 17

V1 Achieved Requirements Achieved Requirement Description 1 Heterogeneous Data Ingest Very flexible ingest Flexible simple transformations 2 Robust Data Pipeline Hard to Debug Hard to modify 3 Extensible Feature Engineering Hard to support other frameworks Hard to modify current computations 4 Scalable Scales up but not down 5 Robust Hard to debug 18

Approach V1.5 An Architectural Midstep

A Quick Architectural Midstep (V1.5) Hive Spark Jupyter HDF S HDFS/IGFS Ignite Staging BI Reporting Marts Complicated an already complex system In-Memory MapReduce TDM EDM Well View Did not solve all of the problems WITSML Needed a simpler way to solve all of the issues CS CS V VFiles WITS ML Server TDM EDM WellView Homegrown Ignite persistence was released while we were investigating this 20

Approach V2 How Now?

Approach V2 Kubernetes Workflow API Ignite HDFS Spark Scheduler API Docker Functions API Flink Zeppelin Service Grid Memory Grid Functions Caches Function Function Workflow Cache Workflow Cache Persistent Storage (Configurable) Allows for very interactive workflows Workflows can be scheduled Each workflow is made up of functions (microservices) Each instance of a workflow workflow contains its own cache Zeppelin via the Ignite interpreter Workflows loaded data and also processed data 22

Approach V2 - The Workflow Source is the location the data is coming from The workflow is the data that goes from function to function Data stored as data frames can be queried by an API or another function Source Function 1 Function 2 SQL / DF Service Key Val Function 3 SQL / DF Service Key Val Service Apache Ignite Apache Ignite 23

Approach – The Workflow Each function runs as a service using Service Grid The function receives input from any source Kafka* JDBC Ignite Cache Once the function is applied, store the result into the Ignite cache store 24

Workflow Capabilities Start / Stop / Restart Execute single functions within a workflow Pause execution to validate intermediate steps 25

Approach - Spark Based Functions - Persistence After each function has completed its computation the Spark DataFrame is stored via distributed storage Table name is stored as SQL PUBLIC tableName df.write .format(FORMAT IGNITE) .option(OPTION TABLE, tableName) // table name to store data .option(OPTION CREATE TABLE PRIMARY KEY FIELDS, “id”) .save() Spark Function DF Service Key Apache Ignite Val 26

Approach – Intermediate Querying Once the data is in the cache, the data can be optionally persisted using the Ignite persistence module The data can be queried using the Ignite SQL grid module as well Allows for intermediate validation of the data as it proceeds through the workflow val cache ignite.getOrCreateCache(cacheConfig) val cursor cache.query(new SqlFieldsQuery(s”SELECT * FROM tableName limit 20")) val data cursor.getAll API Spark Function DF Service Key Val Apache Ignite 27

Approach - Applied to the Use Case Workflow API WITS ML Server Scheduler API Channel Mapping / Unit Conversion (Docker) Java WITSML Client (Docker) SQL Service Key Val Rig State Detection / Enrichment / Pivot (Spark) SQL Service Key Val Service Apache Ignite Apache Ignite 28

V2 Achieved Requirements Achieved Requirement Description 1 Heterogeneous Data Ingest Very flexible ingest Flexible transformations 2 Robust Data Pipeline Easy to debug Easy to modify 3 Extensible Feature Engineering Easy to add Easy to experiment 4 Scalable Scales up Scales down 5 Robust Easy to debug Reliable 29

Solution Benchmark Setup Dimension Tables already loaded 8 functions (6 wells of data – 5.7 billion points) Ingest / Parse WITSML Null Value Handling Interpolation Depth Adjustments Drill State Detection Rig State Detection Anomaly Detection Pivot Dataset For V1 everything was implemented as a Spark application For V2 the computations remained close to their original format 30

Solution Comparison V1 - Execute Time 9 Hours Without WITSML Download 7 Hours V2 - Execute Time 2 Hours Without WITSML Download 22 minutes 19x Improvement V1 to V2 31

Lessons Learned How Now?

Lessons Learned Apache Ignite is a great tool to speed up data processing without a wholesale replacement of technology Apache Ignite does have a learning curve, it is definitely worth doing an analysis beforehand to understand what it means to operationalize it Accelerating Hive via Ignite was not straightforward and, at times made it very difficult to debug the actual issues that we were facing Spatial querying, while great, is LGPL, so be aware of that before your specific implementation Understanding data locality in Ignite is crucial in larger data sets Ignite works very well inside of Kubernetes due to its peer-to-peer clustering mechanism The thin client JDBC driver does not have affinity awareness, so in multinode configurations, the thick client is preferred 33

What’s Next How Now?

What’s Next Implementation of a UI on top of the computational framework Implementation of a standard set of “functions” that can be leveraged on top of the memory grid Implementation of streaming sources via Kafka Ignite Sink 35

Questions Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation Chris Herrera Hashmap

Apache Ignite is a great tool to speed up data processing without a wholesale replacement of technology Apache Ignite does have a learning curve, it is definitely worth doing an analysis beforehand to understand what it means to operationalize it Accelerating Hive via Ignite was not straightforward and, at times made it

Related Documents:

Course Slides: Cloud Fundamentals (191213)

Getting Started with the Cloud . Apache Bigtop Apache Kudu Apache Spark Apache Crunch Apache Lucene Apache Sqoop Apache Druid Apache Mahout Apache Storm Apache Flink Apache NiFi Apache Tez Apache Flume Apache Oozie Apache Tika Apache Hadoop Apache ORC Apache Zeppelin

42 Views

3y ago

In-Memory Databases and Apache Ignite - Université libre de Bruxelles

Ignite has flexible deployment options: it can be deployed on-premise or on-cloud, on physical servers or virtual environments. Ignite can be deployed from Docker, Kubernetes or Mesos containers. Additionally, images are available in both AWS (ignite-ami) and Google Compute (ignite-image) for quickly deploying Ignite clusters on the cloud.

7 Views

1y ago

11/16/2011, Stanford EE380 Computer Systems Colloquium ...

CDH: Cloudera’s Distribution Including Apache Hadoop Coordination Data Integration Fast Read/Write Access Languages / Compilers Workflow Scheduling Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE APACHE OOZIE APACHE OOZIE APACHE HIVE File System Mount UI

41 Views

2y ago

Apache'IgniteTM (Incubating))A)In:Memory'Data'Fabric'

Apache ,)Apache)Ignite,)Ignite oundation)in)the)United)States .

10 Views

3m ago

apache sql webinar - GridGain Systems

Apache , Apache Ignite, Ignite , and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States .

9 Views

1y ago

Apache Ignite How to Migrate Your Data Schema to - GridGain Systems

Ignite SQL 10 Ignite can be used as distributed SQL database - ANSI-99 compliant - Horizontally scalable - Fault-tolerant Ignite SQL architecture - Tightly coupled with H2 database parsing, optimizing, local query execution - Distributed logic based on map-reduce - Data is stored in Ignite Durable Memory

7 Views

1y ago

Best Practices for Stream Processing with GridGain and Apache Ignite ...

Ignite/GridGain has a 3rdParty Persistence feature (Cache Store) that allows: Propagating cache changes to external storage like RDBMS Automatically copying data from external storage to Ignite upon accessing data missed in Ignite What if you want to propagate external storage change to Ignite at the

6 Views

1y ago

Evidence synthesis: the impact of artificial intelligence ...

Artificial intelligence (AI) technologies are developing apace, with many potential ben-efits for economies, societies, communities, and individuals. Realising their potential requires achieving these benefits as widely as possible, as swiftly as possible, and with as smooth a transition as possible. Across sectors, AI technologies offer the promise of boosting productivity and creating new .

65 Views

3y ago

Recent Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

OWNER'S GUIDE - NinjaKitchen

auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. please keep these important safeguards in mind when using the . appliance: mportant: make sure that the .

1y ago

285 Views

Consumer Guide Auto Insurance - Tennessee

Auto insurance doesn't cover paying off your loan if your car is damaged and its market value is less than what you owe. Auto dealers and lenders may offer guaranteed auto protection (GAP) insurance for this purpose. Your auto insurance will cover you if you drive into Canada. To drive into Mexico, however, you'll need to buy Mexican auto .

1y ago

199 Views

NAIC Consumer Shopping Tool for Auto Insurance

Whether you are buying auto insurance for the first time, or shopping to be sure you are getting the best deal, you already know how important auto insurance is. By law in most states, if you own a car, you must have some auto insurance. Remember, there is no such thing as a "full coverage" auto insurance policy. Policies are made up of

1y ago

185 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Decision Tree Tutorial by Kardi Teknomo - TAN THIAM HUAT 陳添發

Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car Based on above training data, we can induce a decision tree as the following:

10m ago

84 Views

Broadway towing winchester ky

MO 77 Motors: Rock Hill, SC 7th Avenue Auto Salvage: Fargo, ND 81 Auto Parts & Recycling : Salem, VA 82 Auto Wrecking: Brookfield, OH #9 Truck & Auto Parts (No US Shipping) : Tottenham, ON 97 Auto Wrecking Shull's Towing: Brewster , WA 98 Auto Recyclers: Brooksville, FL 99 Auto Dismantler: Stockton, CA A & A Auto & Truck LLC:

2y ago

465 Views

All about auto insurance - Option Consommateurs

of insurance companies with which they have agreements. Insurance agents: agents work for a specific insurance company. Before you decide to do business with either a broker or an agent, check out prices, the products being proposed and the quality of the service. Buying auto insurance 4 All about auto insurance

1y ago

230 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

A Message from Our President - Fox Valley Corvette

Bob Jass Chev-rolet 630-365-6481 Auto Parts 25% in most cas-es Ron Westphal Chevrolet 630-898-9630 Auto Parts 25% in most cas-es Thomsons Auto Parts 630-879-6363 Auto Parts 10% in most cas-es American Mod-ern Insurance Co. Collector Car Auto Insurance 10% on Collector Auto Polic

2y ago

225 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

A CONSUMER GUIDE TO AUTO INSURANCE - Maryland

AUTO INSURANCE Comparison shopping is the key to getting the most for your insurance dollar . Consumers think nothing of price shopping for televisions, computer tablets or appliances to save 20 or 30, but forget to shop around for auto insurance where hundreds of dollars can be saved . There are more than 150 auto insurers (or

1y ago

147 Views

Auto Insurance Affordability: Countrywide Trends and State Comparisons

Auto Insurance Expenditures as Percent of Median Income 1990s Average 1.93% 2000s Average 1.71% 2010s Average 1.61%. 3 State Rankings Based on the 2018 affordability index, auto insurance was most affordable in Iowa, where households spent 1.02 percent of income on auto insurance. Other states with low expenditure-

1y ago

177 Views

Business Auto Insurance made simple - Allstate

And with our range of innovative insurance and ﬁnancial products, we can help you protect your lifestyle. Personal Auto Insurance Your Choice Auto Featuring: Accident Forgiveness, Safe Driving Bonus Check, Deductible Rewards and New Car Replacement Standard auto Property Insurance House Condo Renters Manufactured home

1y ago

133 Views

Apache Ignite - Using A Memory Grid For Heterogeneous Computation .

It looks like you're using an ad-blocker