@JamesPlusPlus Phoenix-hbase.blogspot /

2y ago
10 Views
3 Downloads
2.87 MB
107 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Brenna Zink
Transcription

PhoenixWe put the SQL back in NoSQLhttps://github.com/forcedotcom/phoenixJames com/

Agendal What/why HBase?Completed

AgendaWhat/why HBase?l What/why Phoenix?l Completed

AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l

AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol

AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol Roadmapl

AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol Roadmapl Q&Al

What is HBase?l Developed as part of Apache HadoopCompleted

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Completed

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl Map

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapDistributed

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapDistributedSparse

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributedSparse

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributed ConsistentSparse

What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributed ConsistentSparseMultidimensional

Cluster Architecture

Sharding

Why Use HBase?l If you have lots of dataCompleted

Why Use HBase?l If you have lots of datal Scales linearlyCompleted

Why Use HBase?l If you have lots of datal Scales linearlyl Shards automaticallyCompleted

Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl

Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl If your data changesl

Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl If your data changesl If you need strict consistencyl

What is Phoenix?Completed

What is Phoenix?l SQL skin for HBaseCompleted

What is Phoenix?SQL skin for HBasel Alternate client APIl Completed

What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl

What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl

What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl Compiles SQL into native HBase callsl

What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl Compiles SQL into native HBase callsl So you don’t have to!l

Cluster Architecture

Cluster ArchitecturePhoenix

Cluster ArchitecturePhoenixPhoenix

Phoenix Performance

Why Use Phoenix?

Why Use Phoenix?l Give folks an API they already knowCompleted

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Completed

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl CompletedSELECT TRUNC(date,'DAY’),AVG(cpu)FROM web statWHERE domain LIKE 'Salesforce%’GROUP BY TRUNC(date,'DAY’)

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyl Completed

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Aggregationl Skip Scanl Secondary indexing (soon!)l

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Leverage existing toolingl

Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Leverage existing toolingl SQL client/terminall OLAP enginel

How Does Phoenix Work?Overlays on top of HBase Data Modell Keeps Versioned Schema Respositoryl Query ProcessorCompletedl

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase Table

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AColumn Family B

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Qualifier 2Column Family BQualifier 3

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1ValueQualifier 2Column Family BQualifier 3

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1Row Key 2Column Family BQualifier 2Qualifier 3ValueValueValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableColumn Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Row Key 1Row Key 1Row Key 2Row Key 2Row Key 3Row Key 3Column Family BColumn Family BQualifier 3Qualifier 3ValueValueValueValueValueValueValueValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableHBaseColumn FamilyA TableRow Key 1Row Key 1Row Key 1Row Key 2Row Key 2Row Key 2Row Key 3Row Key 3Row Key 3Column Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Qualifier1Qualifier mn Family BColumn Family BColumn Family BQualifier 3Qualifier 3Qualifier 3ValueValueValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableHBaseColumn FamilyA TableRow Key 1Row Key 1Row Key 1Row Key 2Row Key 2Row Key 2Row Key 3Row Key 3Row Key 3Column Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Qualifier1Qualifier 2ValueValueValueValueValueValueMultiple VersionsValueValueValueColumn Family BColumn Family BColumn Family BQualifier 3Qualifier 3Qualifier 3ValueValueValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValue

Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValueKey Value Columns

Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValueRow Key ColumnsKey Value Columns

Phoenix Metadatal Stored in a Phoenix HBase tableCompleted

Phoenix Metadatal Stored in a Phoenix HBase tablel SYSTEM.TABLECompleted

Phoenix Metadatal l Stored in a Phoenix HBase tableUpdated through DDL commandsCompleted

Phoenix Metadatal l Stored in a Phoenix HBase tableUpdated through DDL commandsl CREATE TABLEl ALTER TABLECompletedl DROP TABLEl CREATE INDEXl DROP INDEX

Phoenix Metadatal l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCompleted

Phoenix Metadatal l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompleted

Phoenix Metadatal l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedl Flashback queries useschema that was in-place then

Phoenix Metadatal l l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedAccessible via JDBC metadataAPIs

Phoenix Metadatal l l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedAccessible via JDBC metadataAPIsl java.sql.DatabaseMetaDatal Through Phoenix queries!

ExampleOver metrics data for clusters of servers with a schema like this:SERVER METRICSHOSTDATERESPONSE TIMEVARCHARDATEINTEGERGC TIMECPU TIMEIO TIME INTEGERINTEGERINTEGERRow Key

ExampleOver metrics data for clusters of servers with a schema like this:SERVER METRICSHOSTDATERESPONSE TIMEVARCHARDATEINTEGERGC TIMECPU TIMEIO TIME INTEGERINTEGERINTEGERKey Values

ExampleWith 90 days of data that looks like this:SERVER METRICSHOSTDATERESPONSE TIMEsf1.s1sf1.s1 sf3.s1sf3.s1sf7.s9 Jun 5 10:10:10.234Jun 5 11:18:28.4561234Jun 5 10:10:10.234Jun 6 12:46:19.123Jun 4 08:23:23.4562345GC TIME8012500223401234

ExampleWalk through query processing for three scenarios

ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster

ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster2. Identify 5 Longest GC Times

ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster2. Identify 5 Longest GC Times3. Identify 5 Longest GC Times again and again

Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1sf3

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1sf3sf7

Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1t1 - *sf3sf7

Step 2: ClientOverlay Row Key Ranges with RegionsR1sf1R2Completedsf4sf1sf3R3sf6R4sf7

Step 3: ClientExecute Parallel R4sf7

Step 4: ServerFilter using Skip Scansf1.s1SKIPCompletedt0

Step 4: ServerFilter using Skip ScanINCLUDECompletedsf1.s1t1

Step 4: ServerFilter using Skip Scansf1.s2SKIPCompletedt0

Step 4: ServerFilter using Skip ScanINCLUDECompletedsf1.s2t1

Step 4: ServerFilter using Skip ScanSKIPsf1.s3t0

Step 4: ServerFilter using Skip ScanINCLUDEsf1.s3t1

Step 5: ServerIntercept Scan in CoprocessorSERVER METRICSSERVER METRICSHOSTDATEHOSTDATEAGGsf1.s1Jun 2 10:10:10.234sf1Jun 1 sf1.s2Jun 3 23:05:44.975sf1Jun 2 sf1.s2Jun 9 08:10:32.147sf1Jun 3 sf1.s3Jun 1 11:18:28.456sf1Jun 8 sf1.s3Jun 3 22:03:22.142sf1Jun 9 sf1.s4Jun 1 10:29:58.950sf1.s4Jun 2 14:55:34.104sf1.s4Jun 3 12:46:19.123sf1.s5Jun 8 08:23:23.456sf1.s6Jun 1 10:31:10.234

Step 6: ClientPerform Final Merge SortR1R2SERVER METRICSscan1Completedscan2R3R4scan3HOSTDATE AGGsf1Jun 5 sf1Jun 9 sf3Jun 1 sf3Jun 2 sf7Jun 1 sf7Jun 8

Scenario 2Find 5 Longest GC TimesSELECT host, date, gc timeFROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedORDER BY gc time DESCLIMIT 5

Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filtering

Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filteringServer holds 5 longest GC TIME value for each scanCompletedR1SERVER METRICSHOSTDATEGC TIMEsf1.s1Jun 2 10:10:10.23422123sf1.s1Jun 3 23:05:44.97519876sf1.s1Jun 9 08:10:32.14711345sf1.s2Jun 1 11:18:28.45610234sf1.s2Jun 3 22:03:22.14210111

Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filteringServer holds 5 longest GC TIME value for each scanClient performs final merge sort among parallel scansSERVER METRICSScan1Scan2Scan3HOSTDATEGC TIMEsf1.s1Jun 2 10:10:10.23422123sf1.s1Jun 3 23:05:44.97519876sf1.s1Jun 9 08:10:32.14711345sf1.s2Jun 1 11:18:28.45610234sf1.s2Jun 3 22:03:22.14210111

Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed

Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed

Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed

Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)CompletedGC TIME INDEXGC TIMEINTEGERDATEDATEHOSTVARCHARRESPONSE TIME INTEGERRow Key

Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)CompletedGC TIME INDEXGC TIMEINTEGERDATEDATEHOSTVARCHARRESPONSE TIME INTEGERKey Value

Scenario 3Find 5 Longest GC TimesSELECT host, date, gc timeFROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedORDER BY gc time DESCLIMIT 5

DemoPhoenix Stock Analyzerl Fortune 500 companiesl 10 years of historical stock pricesCompletedl Demonstrates Skip Scan in actionl Running locally on my single nodelaptop clusterl

Phoenix RoadmapSecondary Indexingl Count distinct and percentilel Derived tablesCompletedl Hash Joinsl Apache Drill integrationl Cost-based query optimizerl OLAP extensionsl Transactionsl

Thank you!Questions/comments?

Phoenix maps HBase data model to the relational world . HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family

Related Documents:

HBase Standalone is a mode which allow you to get rid of HDFS and to test HBase before deploying in a cluster, It is not production oriented. Installing HBase in standalone is extremely simple. First you have to download the HBase archive named hbase

Limitations of the Backup-Restore Only one active backup session is supported. HBASE-16391 will introduce multiple-backup sessions support Both backup and restore can't be canceled while in progress. (HBASE-15997,15998) Single backup destination only supported. HBASE-15476 There is no merge for incremental images (HBASE-14135)

01/02/2020 TP4 - HBase - TP Big Data 127.1:8000/tp4/ 5/ 18 Autres caractéristiques de HBase: HBase n'a pas de schéma pr édéni, sauf qu'il faut dénir les familles de colonnes à la cr éation des tables, car elles r eprésentent l'organisation physique des données HBase est décrite comme étant un magasin de données clef/v aleur, où la

1.2 VICTRON PHOENIX INVERTER The coding of the Phoenix inverter model is composed as follows: For example Phoenix 12/500: '12' 12 VDC battery voltage '500' 500 W continuous electrical load The Victron Phoenix inverter is designed for 12 or 24 VDC battery input voltages and produces a sinusoïdal output voltage of 230 VAC, 50 Hz

Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner. What is HBase? HBase is a distributed column-oriented database built on top of the Hadoop file system. I

explaining the data model and build on that by going into the various concepts at play in designing HBase tables through an example . Crash Course on hbase Data Model HBase's data model is very different from what you have likely worked with or know of in relational databases . As described in the original Bigtable paper, it's a

Apache HBase, a Hadoop NoSQL database, offers the following benefits: Efficient storage of sparse data—Apache HBase provides fault-tolerant storage for large quantities of sparse data using column-based compression. Apache HBase is capable of storing and processing billions of rows and millions of columns per row.

ACCOUNTING 0452/22 Paper 2 May/June 2019 1 hour 45 minutes Candidates answer on the Question Paper. No Additional Materials are required. READ THESE INSTRUCTIONS FIRST Write your Centre number, candidate number and name on all the work you hand in. Write in dark blue or black pen. You may use an HB pencil for any diagrams or graphs. Do not use staples, paper clips, glue or correction fluid. DO .