DATA WAREHOUSE - QCon

2y ago
24 Views
2 Downloads
1.11 MB
25 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Ophelia Arruda
Transcription

DATA WAREHOUSEBUILT FOR THE CLOUDQCON San Francisco, November 2019Thierry Cruanes, Co-Founder & CTO

THE DREAM DATA WAREHOUSE(CIRCA 2012)Unlimited andInstant ScalingNo data silos10x faster for thesame price, noover provisioningStore allyour dataStructured andsemi-structuredPetabyte scaleat very low costExtremesimplicityNo compromisesfull fledge DataWarehouseNo managementtasks, offered as aserviceFull support for ACIDtransactions withread consistencyFast out-of-box withno tuning knobsANSI SQL, RBAC

WHY THEN?OUR VIEW OF THE CLOUD §Storage became dirt cheap§Flat network offered uniformbandwidth§Single core performancestalled20x§Data warehouse and analyticworkload are mostly CPUboundDesign forabundanceand notscarcity ofresources

THREE PILLARSMulti-cluster shareddata ArchitectureImmutable ScalableStorageLeverage cloud elasticityand pay only what you useExtremely fast responsetime at scaleInstant scaleFine grain vertical andhorizontal pruning on anycolumn20xPerformance isolationReal-time Data sharingAutomatically applied to anydata (structured and semistructured)Multi-TenantServiceSelf-tuning, self-healingTransparent upgradeService architecturedesigned for availability,durability and security

ARCHITECTURE

AN ARCHITECTURE BUILT FOR THE CLOUDTraditional ArchitecturesShared-diskShared-nothingShared storageDecentralized, local storageSingle clusterSingle clusterMulti-cluster, shared dataCentralized, scale-out storageMultiple, independent compute clusters

MULTI-CLUSTER, SHARED DATA ARCHITECTUREETL & DataLoadingNo data silosStorage decoupled from computeAny dataNative for structured & semistructuredVirtualWarehouseData mited scalabilityAlong many dimensionsVirtualWarehouseLow costCompute on demandDatabasesInstantly cloningIsolate prod from dev & v, Test,QAHighly available11 9’s durability, 4 9’s availabilityVirtualWarehouseDashboards

VIRTUAL WAREHOUSEHow to allow concurrent workloads run without impacting each other?One or more MPP compute clusterUnit of fault and performance isolationVirtualwarehouse AVirtualwarehouse BVirtualwarehouse CUse multiple warehouses to segregateworkloadETLTransformationSQLBISSD/RAM CacheSSD/RAM CacheSSD/RAM CacheSSD/RAM CacheResizable on the flyAble to access data in any databaseTransparently caches data accessedTransaction manager synchronizesdata accessAutomatic suspend when idle andresume when neededVirtualwarehouse D

MULTI-CLUSTER WAREHOUSELEVERAGE ABUNDANCE OF COMPUTE RESOURCESQueryQueryAutomatically scales computeresources based on concurrent usageQuerySingle virtual warehouse of multiplecompute clustersQuery schedulerQueries are load balanced across theclusters in a virtual warehouseSplit across availability zones for highavailabilityCluster 1Cluster 2Cluster 3VirtualWarehouseGroup

IN THE REAL-WORLDInteractiveDashboard50% 1s85% 2s95% 5sContinuousLoading (4TB/day)S3 5min SLAVirtual WarehouseAuto Scale – X-Large x 5Reporting(Segmented)Virtual WarehouseMediumETL &MaintenanceProd DBVirtual Warehouse2X-LargeVirtual WarehouseLarge4 trillion rows3 petabyte raw8x compression ratio25M micro-partitions

SCALABLE IMMUTABLESTORAGE

STORAGE IMMUTABILITYAccumulates immutable data over timeWell supported by all cloud vendor object storesAllow separation of storage and compute resourcesEnable workload scalabilityHeavily optimized for read mostly workloadNatural fit for analytic systemsTransaction management becomes a metadata problemMulti-version concurrency control and Snapshot isolation semanticTransaction coordination separated from storage and computeAllow for consistent access across compute resources

SCALABLE STORAGEAUTOMATIC MICRO-PARTITIONINGData is automatically partitioned at load timeStorage decoupled from computePartitionsColumnar organization in each micro-partitionEnable both horizontal vertical pruningMicro partition – only few 10MBsFine grain pruning, no skewMetadata structure tracks data distributionVery fast pruning at optimization timeApplied to both structured and semi-structured dataVery fast response time for bothColumnar

AUTOMATICALLY APPLIED TO SEMI-STRUCTURED DATASemi-structured data SELECT FROM (JSON, Avro, XML, Parquet, ORC)Structured dataOptimized SQLquerying(e.g., CSV, TSV, )Full benefit of database optimizations(pruning, filtering, )Native supportLoaded in raw form (e.g. JSON, Avro, XML)Optimized storageOptimized data type, no fixed schema ortransformation required

EXAMPLEClient ApplicationODBC Driver JDBC DriverWeb UIHTTPS (JDBC/ODBC/Python)ComputeLC2XLHSO7QKNode Node Node NodeGPJ83Node Node Node NodeF6BR1Node Node Node NodeCustom ReportsLCampaignAnalystsCustomReportsNode Node Node MgmtSecurityDDLSNode Node Node NodeNode Node Node Node2CloudServicesLoadingWHTUVWNode Node Node NodeCampaign AnalysisStorageStorageS3MetadataMetadataNode Node Node NodeLoading WH19H23ABI DataJ45CDKSaleLs678EFGM MarketingNOPTQURVSWMetadata

ENABLE DATA SHARINGProvidersConsumersSecure and integratedSnowflake’s access controlmodelGet access to the datawithout any need to moveor transform it.Only pay normal storage costsfor shared dataQuery and combine shareddata with existing data orjoin together data frommultiple publishersNo limit to the number ofconsumer accounts with whicha dataset may be sharedData ProvidersData Consumers

ENABLE GLOBAL REPLICATIONAzure(Frankfurt)AWS(Ireland)Azure(US East)AWS(US West)AWS(Frankfurt)AWS(US East)AzureAWS(Sydney)AWS

MULTI-TENANT SERVICE

DATA WAREHOUSE AS A SERVICEMulti-Tenant ServiceNo administration, self-tuningand healing,Transparent upgradeService architecture designedfor high availability anddurabilitySecurity is at the coreAvailabilityAll tier distributed over multipledatacenters with active-activedata replicationNo maintenance downtime,fully transparent software &hardware upgradeAutomatic repair of any failedservers with transparent reexecution of any failed queriesPersistent session for loadbalancing and transparentfail-overDurabilitySynchronous replication ofdata over multiple data centersAutomatic data retention andfail safe technology to guardagainst any data removal

SNOWFLAKE SERVICEThree independent layersAuthentication & Access mizerCloud servicesSecurityCompilation and ManagementMetadataCacheCacheCacheCacheData processingVirtual warehousesStorageDatabases

MANAGED SERVICEBUILT-IN DISASTER RECOVERY AND HIGH AVAILABILITYScale-out of all ServicesMetadatametadata, compute, storageResiliency across multipleavailability zonesgeographic separationseparate power gridsbuilt for synchronous replicationFully online updates & patcheszero downtimeBack pressure and throttlingall the way back to the client

ADAPTIVE ALL THE WAY TO THE CORESELF TUNING & SELF HEALING entAutomaticDistributionMethodAutomaticDegree adManagementNo VacuumingNo StatisticsDo no harm!AutomaticDefault

EXAMPLE: AUTOMATIC SKEW AVOIDANCE1Detect popular values on the build side of the join2Use broadcast for those and directed join for the othersAdaptiveSelf-tuningpopular values detected at runtimenumber of valuesDo no harm!no performance degradationAutomatickicks in when neededDefaultenabled by default for all joinsExecution Plan12joinfilterscanscan

WHAT’S NEXT?SERVERLESS DATA SERVICESTarget predictable well-identified database workloadsHorizontal scaling is automaticFine grain unit of work allow for degree of parallelism to be arbitrarily small or largeSecure since handled by the serviceTransparent retry on failuresService state entirely managed by the serviceMonitoring and observability of the service

CLOUD NATIVEARCHITECTUREA GIFT THAT KEEPS ON GIVING

THE DREAM DATA WAREHOUSE (CIRCA 2012) No management tasks, offered as a service Fast out-of-box with no tuning knobs Structured and semi-structured Petabyte scale . resume when needed SSD/RAM Cache SSD/RAM Cache SSD/RAM Cache SSD/RAM Cache Virtual warehouse A Virtual warehouse B Virtual warehouse C Virtua

Related Documents:

Management under Master Data Define Warehouse Numbers. 2. Check the warehouse number assignment in Customizing for Extended Warehouse Management under Master Data Assign Warehouse Numbers. 3. Check the warehouse number control in Customizing for Extended Warehouse Management under Master Data Define Warehouse Number Control.

location: fort worth, tx warehouse status: approved county: tarrant warehouse capacity: 85,000 warehouse code: 853007 001 location(s) warehouse name: eugene b smith & company , inc license type: unlicensed location: galveston, tx warehouse status: approved county: galveston warehouse capacity: 37,180 warehouse code: 858054 001 location(s)

location: fort worth, tx warehouse status: approved county: tarrant warehouse capacity: 85,000 warehouse code: 853007 001 location(s) warehouse name: eugene b smith & company , inc license type: unlicensed location: galveston, tx warehouse status: approved county: galveston warehouse capacity: 37,180 warehouse code: 858054 001 location(s)

1.3 Common Data Warehouse Tasks 1-4 1.4 Data Warehouse Architectures 1-5 1.4.1 Data Warehouse Architecture: Basic 1-5 1.4.2 Data Warehouse Architecture: with a Staging Area 1-6 1.4.3 Data Warehouse Architecture: with a Staging Area and Data Marts 1-6 2 Data Warehousing Logical Design 2.1 Logical Versus Physical Design in Data Warehouses 2-1

Inventory data Warehouse Outgoing Inventory IoT Cloud gathers warehouse inventory data from Warehouse IoT Cloud gathers dispatched inventory data from Warehouse . Based on the warehouse floor design, budget, type of industry and materials , suitable option or combination of options possible to choose.

The following table maps standard data-warehouse concepts to those in BigQuery: Data warehouse BigQuery Data warehouse The BigQuery service replaces the typical hardware setup for a traditional data warehouse. That is, it serves as a collective home for all analytical data in an organization. Data mart Datasets are collections of tables that .

a modern data warehouse: The data warehouse is unable to keep up with explosive volumes. The data warehouse is falling behind the velocity of real-time performance requirements. The data warehouse is slower than desired in adopting a variety of new data sources, slowing time-to-value The platform costs more, while performance lags.

27 Science Zoology Dr. O. P. Sharma Amrita Mallick Full Time 18/2009 11.06.2009 Evaluation of Genotxic Effects & Changes in Protein Profile in Muscle Tissue of Freshwater Fish Channa Punctatus Exposed to Herbicides Page 3 of 10. Sl. No. Faculty Department Name of the supervisor Name of the Ph.D. Scholar with Aadhar Number/Photo ID Mode of Ph.D. (Full Time/Part-Time) Registration Number Date of .