Apache Ignite How To Migrate Your Data Schema To - GridGain Systems

1y ago
7 Views
1 Downloads
804.72 KB
50 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Madison Stoltz
Transcription

How to Migrate Your Data Schema toApache IgniteIvan RakovDecember 4, 20192019 GridGain Systems

How to Migrate Your Data Schema to Apache IgniteDecember 4, 2019Ivan Rakov Work at GridGain Systems Leading data consistency dev team Apache Ignite Committer2019 GridGain Systems

Agenda What is and what is not Ignite SQL: pros and cons32019 GridGain Systems

Agenda What is and what is not Ignite SQL: pros and cons Ignite SQL typical successful use cases42019 GridGain Systems

Agenda What is and what is not Ignite SQL: pros and cons Ignite SQL typical successful use cases How to cook Ignite SQL: four-step guide52019 GridGain Systems

Agenda 6What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance tuning2019 GridGain Systems

Agenda 7What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Agenda 8What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance fine-tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Ignite SQL Ignite can be used as distributed SQL database–––9ANSI-99 compliantHorizontally scalableFault-tolerant2019 GridGain Systems

Ignite SQL Ignite can be used as distributed SQL database––– ANSI-99 compliantHorizontally scalableFault-tolerantIgnite SQL architecture–Tightly coupled with H2 database ––Distributed logic based on map-reduceData is stored in Ignite Durable Memory 10parsing, optimizing, local query executionRAM offheap speed optional disk durability2019 GridGain Systems

Ignite SQL is good for Providing SQL access to large datasets–11not fitting well in one server2019 GridGain Systems

Ignite SQL is good for Providing SQL access to large datasets– Fault-tolerance–12not fitting well in one serverone node crashes, but cluster live and SQL works2019 GridGain Systems

Ignite SQL is good for Providing SQL access to large datasets– Fault-tolerance– one node crashes, but cluster live and SQL worksEasy data distribution–13not fitting well in one serverdata is partition according to affinity function, no need for manual sharding2019 GridGain Systems

Ignite SQL is good for Providing SQL access to large datasets– Fault-tolerance– one node crashes, but cluster live and SQL worksEasy data distribution– not fitting well in one serverdata is partition according to affinity function, no need for manual shardingProviding better query performance than single database server––all cluster nodes work on your query simultaneouslytrue when query fits well in a single map-reduce cycle 14query fits in map-reduce cycle when data rows from different nodes don’t interact2019 GridGain Systems

Ignite SQL is not good for Consistent processing–15Ignite SQL is not transactional2019 GridGain Systems

Ignite SQL is not good for Consistent processing– AD-HOC SQL–16Ignite SQL is not transactionalCustom complex SQL query is likely to run long or even cause OOM2019 GridGain Systems

Ignite SQL is not good for Consistent processing– AD-HOC SQL– Custom complex SQL query is likely to run long or even cause OOMQuery optimization–17Ignite SQL is not transactionalIgnite relies on H2, which is unaware of distributed specifics2019 GridGain Systems

Ignite SQL is not good for Consistent processing– AD-HOC SQL– Ignite relies on H2, which is unaware of distributed specificsThoughtless usage of blocking operators–––18Custom complex SQL query is likely to run long or even cause OOMQuery optimization– Ignite SQL is not transactionalIgnite SQL accumulates intermediate query result in RAMgroup by / order by over large table without condition can cause OOMselect over large table without condition can cause OOM2019 GridGain Systems

Agenda 19What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Separating side activities from main SQL processing Oracle / PostgreSQL / CRM is used for critical business processing–20user transactions, money transfer2019 GridGain Systems

Separating side activities from main SQL processing Oracle / PostgreSQL / CRM is used for critical business processing– Side business activities––21user transactions, money transferend of day / month clearingapplications maintenance2019 GridGain Systems

Separating side activities from main SQL processing Oracle / PostgreSQL / CRM is used for critical business processing– Side business activities–– end of day / month clearingapplications maintenanceWhich are too resource intensive to run at main DB–22user transactions, money transfermay affect performance of critical operation2019 GridGain Systems

Separating side activities from main SQL processing Oracle / PostgreSQL / CRM is used for critical business processing– Side business activities–– may affect performance of critical operationOr too expensive to run at main DB––23end of day / month clearingapplications maintenanceWhich are too resource intensive to run at main DB– user transactions, money transfermay require buying additional instancesbilling in some SaaS CRMs is bound to number of API calls2019 GridGain Systems

Separating side activities from main SQL processingSolution: Keep your main business activities where they areSetup CDC between main database and Ignite SQL databaseTune Ignite SQL to work well on your specific queries, enjoy the performanceCDC replicationmain TXprocessing242019 GridGain Systemscomplexqueries

Combining Ignite JCache and SQL Ignite JCache provides ACID distributed transactions–25can be used for business critical processing2019 GridGain Systems

Combining Ignite JCache and SQL Ignite JCache provides ACID distributed transactions––26can be used for business critical processingnot convenient for complex queries: joins, lookups, etc2019 GridGain Systems

Combining Ignite JCache and SQLSolution: Access the same data both with JCache and SQL––JCache for consistent modificationSQL for complex queries where consistency under load is not criticalJCache:consistentupdates272019 GridGain SystemsSQL:complexqueries

Agenda 28What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Step one: bootstrap your Ignite SQL schemaJCache-first way Mark your data classes with annotationspublic class Person implements Serializable {/** Indexed field. Will be visible for SQL engine. */@QuerySqlField (index true)private long id;/** Queryable field. Will be visible for SQL engine. */@QuerySqlFieldprivate String name;/** Will NOT be visible for SQL engine. */private int age;292019 GridGain Systems

Step one: bootstrap your Ignite SQL schemaJCache-first way Multi-fields and descending indexes are also supportedpublic class Person implements Serializable {/** Indexed in a group index with "salary". */@QuerySqlField(orderedGroups {@QuerySqlField.Group(name "age salary idx", order 0, descending true)})private int age;/** Indexed separately and in a group index with "age". */@QuerySqlField(index true, orderedGroups {@QuerySqlField.Group(name "age salary idx", order 3)})private double salary;302019 GridGain Systems

Step one: bootstrap your Ignite SQL schemaSQL-first way 31Ignite SQL supports creating table with DDLTemplate allows to specify JCache representation parametersCREATE TABLE IF NOT EXISTS Person (age int, id int, city id int, name varchar, company varchar,PRIMARY KEY (name, id))WITH "key type org.company.PersonId, value type org.company.PersonInfo”2019 GridGain Systems

Step two: prepare queries that should perform well in advanceCase: collect big quantity traders that use basic tariffs at the end ofbusiness day in order to prepare premium account offeringsORDERS int account idforeign key int order id int qty int instrument id date place dateACCOUNTS int account id int tariff id varchar account name int balanceforeign keyTARIFFS int tariff id double commission percentselect account name from ACCOUNTS A inner join ORDERS O on A.account id O.accound id innerjoin TARIFFS T on T.tariff id A.tariff id where O.qty 100 and T.comission percent 0.02322019 GridGain Systems

Step three: collocate your data By default, distributed joins work only when joined data is collocated332019 GridGain Systems

Step three: collocate your data By default, distributed joins work only when joined data is collocated You can specify distributedJoins true parameter at your own risk– Join will work, but execution time and memory consumption may grow significantly342019 GridGain Systems

Step three: collocate your dataORDERS int account idforeign key int order id int qty int instrument id date place date 35ACCOUNTS int account id int tariff id varchar account name int balanceforeign keyTARIFFS int tariff id double commission percentORDERS and ACCOUNTS table contain lots of data and joined on account idMark account id with @AffinityKeyMappedTARIFFS has less data and can’t be colocated along with ORDERS and ACCOUNTSMake cache for TARIFFS table REPLICATED2019 GridGain Systems

Step four: tune your performance Create indexes––36Ignite supports only ordered B tree indexesIndexing account id and tariff id would speed up join2019 GridGain Systems

Step four: tune your performance Create indexes–––37Ignite supports only ordered B tree indexesIndexing account id and tariff id would speed up joinUse inlineSize to specify maximum number of index field bytes thatwill be inlined in B tree If index is inlined, lookup won’t require data row dereferencing and will be faster CREATE INDEX fast city idx ON sales (country, city) INLINE SIZE 60;2019 GridGain Systems

Step four: tune your performance Create indexes––– If index is inlined, lookup won’t require data row dereferencing and will be faster CREATE INDEX fast city idx ON sales (country, city) INLINE SIZE 60;Proceed to fine-tuning–38Ignite supports only ordered B tree indexesIndexing account id and tariff id would speed up joinUse inlineSize to specify maximum number of index field bytes thatwill be inlined in B treeIgnite SQL provides various parameters to match your specific query2019 GridGain Systems

Agenda 39What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance fine-tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Query parallelismBy default, every query is executed with one thread per node Can be changed with cacheCfg.setQueryParallelismCan’t be changed in runtimeCauses storage place overhead–40every B tree is present in queryParallelism instances2019 GridGain Systems

Lazy flagBy default, the whole query result is composed in-memory on reducer Can be changed with setLazyUse lazy mode for cases when only part of the large result is needed– Won’t help for blocking operators–41Query will be uploaded to reducer in batch on-demand modelike order by / group by2019 GridGain Systems

Set collocatedBy default, aggregation functions are considered as non-collocated Aggregation is performed on reducerdataRowsreducer nodedataRowsGROUP BYaggregation422019 GridGain Systems

Set collocatedBy default, aggregation functions are considered as non-collocated Aggregation is performed on reducerIf you are sure that you perform aggregation on collocated field– Use setCollocatedCollocation will be performed on mapped nodes, which is more scalablereducer nodeaggregatedresultsaggregatedresultsGROUP BYconcatenation432019 GridGain Systemslocalaggregationlocalaggregation

Set IGNITE SQL MERGE TABLE MAX SIZE ifbig aggregation result is expectedBy default, aggregation functions are supported for up to 10000various keys If you expect that aggregation on larger number of keys is possible– 44Override aforementioned system propertyCan’t be changed in runtime2019 GridGain Systems

Force tables join orderDon’t rely on Ignite query optimizer:it doesn’t understand distributed specifics well Only nested loop joins are supported in IgniteHash joins are present thought, but in experimental modeUse EXPLAIN PLAN on your queriesIf you are not satisfied with explained order–45setEnforceJoinOrder true will force joining in the order of mention2019 GridGain Systems

Agenda 46What is and what is not Ignite SQL: pros and consIgnite SQL typical successful use casesHow to cook Ignite SQL: four-step guideIgnite SQL: performance fine-tuningLiving with Ignite SQL: schema evolution2019 GridGain Systems

Schema evolution You can add or remove columns from tableALTER TABLE City ADD COLUMN IF NOT EXISTS population int;ALTER TABLE Person DROP COLUMN (code, gdp); Storage data won’t be changed– Changing of column type is not supported–47Only select (*) is affectedRemove column and add another with different name instead2019 GridGain Systems

Summary 48Ignite SQL will not serve you like“as good and universal as Oracle/Postgre, but distributed”2019 GridGain Systems

Summary 49Ignite SQL will not serve you like“as good and universal as Oracle/Postgre, but distributed”Planning your queries and configuring collocation in advance will bringdecent performance, empowered by other Ignite features2019 GridGain Systems

Thanks for your attention!Questions?e-mail: irakov@gridgain.compublic list for discussions: user@ignite.apache.orgSQL documentation from reference/sql-reference-overview502019 GridGain Systems

Ignite SQL 10 Ignite can be used as distributed SQL database - ANSI-99 compliant - Horizontally scalable - Fault-tolerant Ignite SQL architecture - Tightly coupled with H2 database parsing, optimizing, local query execution - Distributed logic based on map-reduce - Data is stored in Ignite Durable Memory

Related Documents:

Getting Started with the Cloud . Apache Bigtop Apache Kudu Apache Spark Apache Crunch Apache Lucene Apache Sqoop Apache Druid Apache Mahout Apache Storm Apache Flink Apache NiFi Apache Tez Apache Flume Apache Oozie Apache Tika Apache Hadoop Apache ORC Apache Zeppelin

CDH: Cloudera’s Distribution Including Apache Hadoop Coordination Data Integration Fast Read/Write Access Languages / Compilers Workflow Scheduling Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE APACHE OOZIE APACHE OOZIE APACHE HIVE File System Mount UI

Ignite has flexible deployment options: it can be deployed on-premise or on-cloud, on physical servers or virtual environments. Ignite can be deployed from Docker, Kubernetes or Mesos containers. Additionally, images are available in both AWS (ignite-ami) and Google Compute (ignite-image) for quickly deploying Ignite clusters on the cloud.

Apache , Apache Ignite, Ignite , and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States .

Apache ,)Apache)Ignite,)Ignite oundation)in)the)United)States .

Apache Ignite is a great tool to speed up data processing without a wholesale replacement of technology Apache Ignite does have a learning curve, it is definitely worth doing an analysis beforehand to understand what it means to operationalize it Accelerating Hive via Ignite was not straightforward and, at times made it

Ignite/GridGain has a 3rdParty Persistence feature (Cache Store) that allows: Propagating cache changes to external storage like RDBMS Automatically copying data from external storage to Ignite upon accessing data missed in Ignite What if you want to propagate external storage change to Ignite at the

Nutrition is an integral aspect of animal husbandry and the pet food trade now makes up a substantial proportion of the animal care industry. Providing animals with the appropriate feeds in the correct quantities, taking into account factors such as species, breed, activity level and age, requires an understanding of the fundamentals of animal nutrition. A balanced diet is vital to the .