Cloud Datastore- A NoSQL Database At Google Scale

2y ago
6 Views
2 Downloads
501.46 KB
29 Pages
Last View : 29d ago
Last Download : 3m ago
Upload by : Lilly Andre
Transcription

Cloud Datastore: A NoSQLDatabase at Google ScaleRandy ShoupEngineering Director, Google Cloud PlatformNoSQL-Search RoadShow SFJune 6, 2013

Agenda Google Cloud Platform Google Scale Google Cloud Datastore Google Storage Infrastructure Parting Thoughts

Google Cloud Platform

Cloud Computing at GoogleGoogle builds and operates one of the largest computing infrastructuresin the world Dozens of data centers located around the worldDesigned from the ground up to run massive Internet-scale servicesIntegrated design of facility and computing machineryHomogeneous hardware and system softwareCluster-level networking fabric

Cloud Computing at GoogleAll Google Computing is Cloud Computing Custom-built machines and networkCluster is typically thousands of machinesCommon pool of resources with central cluster managemento Fungible units of compute, memory, storage, networko Sophisticated bin-packing to maximize utilization Hundreds to thousands of active jobs, from one task to thousands of tasksMix of low-latency, user-facing jobs and batch workloadsMassively multitenant

Google Cloud PlatformGoogle ComputeEngineGoogle Cloud StorageGoogleBigQueryFull Linux virtualmachines running onGoogle's infrastructure.Store, access, andmanage applicationdata.Analyze terabytes ofdata in seconds.GoogleApp EnginePlatform as a Service:Powerful, scalableapplicationdevelopment andexecution environment.

Google Scale

Elements of Google ScaleLayering and Composition Compose complex systems from simple primitivesAs much as possible, make it possible to reason independently andintuitively about behavior of primitives All Google services rely on (often many!) lower layers of infrastructure

Elements of Google ScaleAt Scale, Everything Breaks Service-level outageso Networkingo Powero Oops Node-level outages (industry average)o 1% uncorrectable DRAM errors per server per yearo 2-10% disk drive failure rate per yearo 2 crashes per server per yearo 1 utility event per year 2000 node service sees 10 server crashes per day (!)Source: http://dl.acm.org/citation.cfm?id 1837133

Elements of Google ScalePredictable Performance Systems at scale highly exposed to performance variabilityo Imagine an operation . 1ms latency median, but 1 second latency at99.99%ile (1 in 10,000)o Service using 1 machine - 0.01% slowo Service using 5K machines - 50% slow Consistent performance trumps low average latencyo Low latency inconsistent performance ! low latency (!)o Far easier to program for consistent performanceo Tail latencies are *much* more important than average latencies

Elements of Google ScaleOpinionated Platform Encourage scalable development practiceso Small discrete units of processingo No single points of failureo Automated testingo Staged deployments Make it easy to do the right thing, and hard to do the wrong thing "It Just Works" (TM)

Google Cloud Datastore

(Re-)Introducing Google Cloud DatastoreBased on High Replication Datastore in Google App Engine Multiple generations of evolutiono Originally introduced with Google App Engine in 2008 3M applications, 300K unique developers Layered on top ofPetabytes of storage4.5T operations / montho MegaStoreo BigTableo Colossus

Google Cloud DatastoreAccessible RESTful interfaceHTTP with JSON or Protocol Buffer APIAccessible fromo Google Compute Engineo Google App Engineo Anywhere else Web-based interface for configuration and managementDevelopment server for local development

Google Cloud DatastoreFully Managed No planned downtimeo Completely automated failover Replicated across multiple data-centerso All data replicated across multiple disks and multiple data centers Managed and operated as a service by Google99.95% SLA

Google Cloud DatastoreScalable Arbitrary horizontal scalingAutoscales as traffic increasesAutoshards as data increasesMore distributed as more data is stored

Google Cloud DatastoreResilient Cross-data center active-active replicationo All data replicated across multiple disks and multiple data centers Synchronous replication via PaxosApplication can seamlessly migrate between data centers with no data lossApplications can read locally in separate data centers with no inconsistencyor replication lag Resilient to catastrophic failure ("meteorite durability")

Google Cloud DatastoreSchemaless No configuration needed; just start writing dataArbitrary attributes on any entityo Different entities can have different attributeso Attributes can be multi-valued Arbitrary-depth parent-child relationships"Entity groups" can associate many related entitieso E.g., all emails for a user

Google Cloud DatastoreConsistency Strongly consistent, with atomic transactionsStrong serial consistency within entity groupo Will always Get an entity once Puto Never see partial transactions Strong consistency on reads and ancestor queriesMulti-entity group transactionso Transactions can read / write entities within (small number of) entitygroups Eventual consistency only when querying across many entity groups

Google Cloud DatastoreRich Query Features GQL is an ever-growing subset of SQLFilterso Equality ( , IN)o Inequality (! , , , , )o AND, OR, NOT, sub-expressions SortDISTINCTProjections, index-only queriesGeo radius, Date rangeCursors for paged iteration

Google Cloud DatastorePredictable Performance Fixed cost querieso Query latency scales in the size of the result set, not in the size of theoverall datao Constant latency for queries over 1M or 1B or 1T entities All queries are index queries"It's not a limitation, it's a discipline"

Google Storage Infrastructure

Colossus (GFSv2)Next-generation clustered file system, successor to GFS Exabyte scale global storage systemAutomatically sharded metadata layerData blocks for a given stripe replicated to multiple different fault domainso Different disks, servers, racks Blocks distributed across entire clustero Easy to load-balance readso Efficient to recover"You know you have a large storage system when you get paged at 1 AM because youonly have a few petabytes of storage left." -- Google EngineerSource: http://static.googleusercontent.com/external content/untrusted s/facultysummit2010/storage architecture and challenges.pdf

BigTableCluster-level structured storage Distributed multi-dimensional sparse mapo (row, column, timestamp) - cell contents Layered on Colossus for file storageAutomatically splits and rebalances tablets based on size and loadFault-tolerant within data centerAsynchronous, eventually-consistent replication"If you look at every NoSQL solution out there, everyone goes back to the AmazonDynamo paper or the Google BigTable paper" -- Jason Hoffman, Joyent

MegaStoreGeo-scale structured database Layered on BigTable for structured storageMulti-row transactions across machineso Strong ACID consistency within fine-grained partitions ("entitygroups") Eventual consistency across partitionsSynchronous cross-datacenter replication via PaxosTransparent failover

Parting Thoughts

Thoughts on SQL, NoSQL, NearSQL"One Size Does Not Fit All" Everything is a tradeoffo Data structures are fundamental to performance and features of anystorage systemo No data structure can optimize for every possible use-case Polyglot persistence is expectedo Column stores for analyticso Inverted indexes for searcho Simple key-value storeso Scalable, powerful NearSQL systems We use everything at Google (!)

Thoughts on ScaleScale Depends On . Discipline, not permissivenessSharing, not couplingArchitecture, not language or programming environmentSimplicity and elegance, not complexity

Questions?and . We are hiring!https://cloud.google.com/rshoup@google.com

o Originally introduced with Google App Engine in 2008 3M applications, 300K unique developers Petabytes of storage 4.5T operations / month Layered on top of o MegaStore o BigTable o Colossus Based on High Replication Datastore in Google App Engine (Re-)Introducing Google Cloud Datastore

Related Documents:

Oracle NoSQL Database Hands on Workshop Lab Exercise 1 - Start Oracle NoSQL Database instance and access data from Formatter classes In this exercise, you will start an Oracle NoSQL Database instance that has movie data preloaded. KVLite will be used as the Oracle NoSQL Database Instance. A very brief introduction to KVLite follows:

NoSQL database. A NoSQL database can be used to solve new problems that require: Scalability - A NoSQL database can scale horizontally to the scale required by big data. Applications can run in parallel on a cloud-based cluster comprising of dozens, hundreds, or even thousands of commodity servers. The NoSQL scale-out architecture

towards NoSQL databases is the high cost of legacy RDBMS vendors versus NoSQL software. In general, NoSQL software is a fraction of what vendors such as IBM and Oracle charge for their databases. What Constitutes an Enterprise NoSQL Solution? What should a technology leader or decision-maker look for in a NoSQL offering that defines it as truly

Welcome to SQL for Oracle NoSQL Database. This language provides a SQL-like interface to Oracle NoSQL Database. The SQL for Oracle NoSQL Database data model supports flat relational data, hierarchical typed (schema-full) data, and schema-less JSON data. SQL for Oracle NoSQL Database is designed to handle all such data seamlessly without any

1. SQL Interface to RDB and NoSQL Database. To access both RDB and NoSQL databases, we provide a general SQL interface. It consists of a SQL query parser and Apache Phoenix to connect HBase as a NoSQL database to a SQL translator and a MySQL JDBC driver to an RDB connector. The application does not need to change the queries or manage NoSQL .

sites cloud mobile cloud social network iot cloud developer cloud java cloud node.js cloud app builder cloud cloud ng cloud cs oud database cloudinfrastructureexadata cloud database backup cloud block storage object storage compute nosql

Amazon Web Services - NoSQL Database in the Cloud: Riak on AWS June 2013 Page 3 of 13 Abstract Amazon Web Services (AWS) is a flexible, cost-effective, easy-to-use cloud computing platform. Running your own NoSQL data store on Amazon EC2 may be ideal if your application or service requires the unique properties offered by NoSQL databases.

1) DNA is made up of proteins that are synthesized in the cell. 2) Protein is composed of DNA that is stored in the cell. 3) DNA controls the production of protein in the cell. 4) The cell is composed only of DNA and protein. 14) The diagram below represents a portion of an organic molecule. This molecule controls cellular activity by directing the