Distributed SQL: The Architecture Behind MariaDB Xpand

1y ago
73 Views
13 Downloads
3.70 MB
19 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Ryan Jay
Transcription

DISTRIBUTEDSQL: THEARCHITECTUREBEHINDMARIADB XPANDApril 2021WHITEPAPER

TABLE OF D ARCHITECTURE4SLICES6INDEXES7REPLICAS7CACHING8QUERY PATH12TRANSACTIONS12HIGH AVAILABILITY14SCALABILITY17CONCLUSIONDistributed SQL: The architecture behind MariaDB XpandWHITEPAPER

INTRODUCTIONBusinesses, both startups and Fortune 500 companies alike, are facing scalability challenges asmost, if not all, customer engagement now takes place online via web, mobile and Internet of Things(IoT) applications. The scalability needed by businesses such as Google and Facebook was onceconsidered unimaginable. Now, it’s fast becoming a modern requirement for any business who makesit possible for customers to do things online – whether it’s placing a grocery order, watching a movie,playing games, meeting with coworkers or buying stock.Early on, these scalability challenges were addressed by manually sharding MariaDB and MySQL databases, resulting incustom solutions created for specific applications, services and queries. While sharding adds scalability, it is complex,brittle, difficult to maintain and places limitations on how data can be queried. It may not be a problem for businesseswith virtually unlimited engineering resources, but for everyone else, it’s neither preferred nor practical.Then there were multimillion-dollar hardware appliances like Oracle Exadata a simpler alternative if cost was not afactor. And finally, the emergence of NoSQL databases like Apache Cassandra. They provided scalability, but at the costof data integrity, consistency and a standard query language.All of these approaches were workarounds that required businesses to give up something in order to get scalability,whether it was flexibility, data integrity or millions of dollars. However, there is now a proper solution – distributed SQL.In the past, relational databases were limited to replication for read scaling and scaling up (i.e., moving to bigger andbigger servers) for write and storage scaling. Thus the need for sharding, hardware appliances or NoSQL.Distributed SQL databases are relational databases engineered with a distributed architecture to scale out oncommodity hardware, on premises or in the cloud. However, there are no trade-offs. They perform ACID transactionsand support standard SQL like standard relational databases, but they can do it at a much greater scale.MariaDB Xpand is a distributed SQL database built for scalability and performance. It can handle everything fromthousands of transactions per second to millions – all while maintaining sub-millisecond latency. In addition, MariaDBXpand implements the MySQL protocol and supports standard MariaDB and MySQL connectors, making it easy to addscalability to applications running on MariaDB and MySQL without rewriting them.This white paper describes the architecture of MariaDB Xpand, and explains how it scales out while at the same timemaintaining data integrity and strong consistency with ACID transactions.1Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

PROPERTIESLogicalIn a replicated database, primary/replica or multi-primary, every database instance is itself a complete databasecapable of running on its own (i.e., a physical database). When an application connects to a database instance withina replicated database deployment, it handles all application requests by itself, independent of any other databaseinstance.In a distributed SQL database, no database instance is a complete database itself. When an application connects to adatabase instance within a distributed SQL database, that database instance represents all of the database instancescombined (i.e., a logical database). As a result, applications can connect to any database instance to read and writedata.DistributedAll of the database instances within a replicated database maintain a full copy of the data and execute queries on theirown, independent of any other database instance, whereas each database instance within a distributed SQL databasemaintains a subset of the data and has other database instances participate in query execution when needed.The tables within a distributed SQL database are divided into groups of rows with different groups stored on differentdatabase instances. As a result, the rows within a table are spread evenly across all database instances. However, anydatabase instance can read or write rows stored on any other. If the results of a query contain rows stored on multipledatabase instances, query execution will be expanded to include them.TransactionalLike replicated databases, distributed SQL databases use ACID transactions, and like queries, transactions can spanmultiple database instances. If a write modifies rows stored on multiple database instances, those database instanceswill participate in a single distributed transaction – either all of the rows are modified, or none of them are.AvailableReplicated databases provide high availability with automatic failover, whereby if the primary database instance fails,one of the replicas is promoted to be the new primary. Applications may be unable to write data until the promotion iscomplete.Distributed SQL databases provide continuous availability because there is no primary – every database instance iscapable of writing data (and does). In order to ensure all data remains available after a database instance has failed,distributed SQL databases store copies of rows on multiple database instances. If a database instance fails, its dataremains available through copies of its rows stored on other database instances.2Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

ConsistentWhile standalone databases provide strong consistency, replicated databases can suffer from replication lag. Ifasynchronous or semi-synchronous replication is used, at least one of the replicas will have the latest transaction, butthe others may not. If synchronous replication is used, writes will be replicated within a transaction, but they may notbe applied within the transaction. Regardless of how replication is configured, stale reads can be possible, if not likely.This is not a concern with distributed SQL databases as writes to rows stored on multiple nodes are performedsynchronously within transactions, ensuring strong consistency and preventing stale reads.ElasticAdding replicas to replicated databases scales reads, but it’s not necessarily easy or fast – and it often can’t be doneon demand. In many cases, new replicas must be created from backups and caught up via replication before they canbe used. In terms of writes and storage, it is necessary to scale up or down by creating a new database instance on abigger or smaller server – and often using the same approach as adding a new replica (i.e., creating a new databaseinstance from a backup).Distributed SQL databases are intended to be scaled on demand, allowing nodes to be added or removed in productionwithout disrupting applications. When nodes are added or removed, distributed SQL databases will automaticallyrebalance the data in order to maintain an even distribution of data, and in turn, an even distribution of the workloaditself. This makes distributed SQL databases particularly effective for applications with volatile workloads or significantbut temporary peaks (e.g., e-commerce and Black Friday).In addition, distributed SQL databases are designed to be scaled out and back. Rather than having to upgrade ordowngrade servers, a time-consuming and disruptive process, nodes can be added or removed on demand in order toincrease or decrease capacity on the fly – and without incurring any downtime.3Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

XPAND ARCHITECTURESlicesXpand automatically divides tables into groups of rows called slices. The number of slices is equal to the number ofnodes by default, with each slice stored on a separate node. When the size of a slice exceeds the defined threshold,8GB by default, it is split into two smaller slices.In the following example, the books table in a three-node cluster (tbl books) has three slices by default, with each slicestored on a separate node.Figure 1Rows are mapped to slices using a distribution key. By default, it is the first column of the primary key. Optionally, itcan be the first n columns of the primary key if it is a composite key, with n ranging from one to the total number ofcolumns in the primary key.The distributed key is hashed into a 64-bit number using a consistent hashing algorithm, with each slice assigned arange (i.e., min/max). In order to determine which slice a row is stored in, the hash of its distribution key is comparedagainst the range of each slice.The example below shows min/max values for three slices, and for simplicity, assumes the maximum value of anunsigned short. 4If the distribution key of a row is hashed to 100, it would be stored in Slice 1If the distribution key of a row is hashed to 22000, it would be stored in Slice 2If the distribution key of a row is hashed to 50000, it would be stored in Slice 3Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

SliceMin hashMax hash10218452218464369034369165535Continuing the example above, different rows within the books table are stored in separate slices based on the hash oftheir primary key (id).tbl books, slice 1idnameauthortype1Against a Dark BackgroundIain M. BanksHardcover5SalvationPeter F. HamiltonHardcover9Barbary StationR. E. StearnsHardcovertbl books, slice 2idnameauthortype2The PeripheralWilliam GibsonHardcover3The Abyss Beyond DreamsPeter F. HamiltonPaperback7On the Steel BreezeAlastair ReynoldsPaperbacktbl books, slice 35idnameauthortype4The Reality DysfunctionPeter F. HamiltonHardcover6The Three-Body ProblemCixin LiuHardcover8Shadow CaptainAlastair ReynoldsHardcoverDistributed SQL: The architecture behind MariaDB XpandWHITEPAPER

IndexesUnlike other distributed SQL databases, Xpand automatically divides secondary indexes into slices as well – and liketables, the default number of slices is equal to the number of nodes, with each slice stored on a separate node.In the following example, a secondary index on the author column is created for the books table (idx author). It isdivided into three slices by default, with each slice stored on a separate node.Figure 2Like table rows, index entries are mapped to slices using a distribution key and consistent hashing. By default, thedistribution key is the first column of the index. Optionally, it can be the first n columns of a composite index, with nranging from one to the total number of columns in the index.Continuing the example above, different entries within the index are stored in separate slices based on the hash oftheir column(s).idx author, slice 1Key (columns author)Value (PK, columns id)Iain M. Banks1Alastair Reynoldss7, 8, 8idx author, slice 2Key (columns author)Value (PK, columns id)William Gibson2R. E. Stearnss9, 8idx author, slice 36Key (columns author)Value (PK, columns id)Peter F. Hamilton3, 4, 5Cixin Liu68Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

ReplicasXpand can store multiple copies of table and index slices. They are referred to as replicas. By default, slices have tworeplicas. However, a slice can have as few as one replica or as many replicas as there are nodes (i.e., one replica pernode). Further, different tables can have different numbers of replicas, as can tables and their secondary indexes.In the following example, there are two replicas of the books table and author index slices.Figure 3CachingXpand, like any other database, caches as much data as possible in memory. However, unlike many others, it uses adistributed cache – effectively combining the memory of multiple nodes to create a single cache with no duplicateentries. When slices have multiple replicas, one of them is designated the ranking replica. This replica is used for readswhile the others are there for high availability. And because nodes only cache their ranking replicas (of which there isonly one), there are no duplicates – maximizing the amount of data the entire cluster can cache.In the following example, ranking replicas are cached in memory while all replicas, including ranking replicas, are storedon disk. If needed, simply add nodes to ensure all ranking replicas can be stored in memory for low-latency reads.7Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

Figure 4Query pathQueries can be routed to any node within a cluster because every node is aware of how the data is distributed, and candetermine which nodes need to participate in a query and can forward the query (or part of it) to them.ReadsPrimary keyIf a table is queried by its primary key, the Xpand node that receives the query hashes the primary key and determineswhich node has the table slice the row is stored in. The receiving node then forwards the query to this node and returnsthe results.Example: Query a table using a primary keySELECT * FROM tbl books WHERE id 1;MariaDB MaxScale is an advanced database proxy. It abstracts away the database infrastructure, making it look likea standalone database regardless of whether there are 3 nodes or a 100. In the example below, MaxScale routes thequery to Node 2. Node 2 then determines the row with id 1 is stored in slice 3 based on its hash value (60000), withreplicas of slice 3 stored on Node 1 and Node 3.hash(1) 60000 Slice 3 Node 1, Node 38Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

tbl books slicesMin hashMax hashSliceNodes02184511 (rr), 2218464369022 (rr), 3436916553533 (rr), 1Node 2 will then forward the query to Node 3 because it has the ranking replica, and return the results to MaxScale(which returns the results to the client).Figure 5Secondary indexesIf a table is queried using a secondary index, the Xpand node that receives the query hashes the index key anddetermines which node has the index slice the key is stored in.Example: Query a table using a secondary indexSELECT * FROM tbl books WHERE author ’Alastair Reynolds’;MaxScale forwards the query to Node 2. Node 2 then determines the index key ‘Alastair Reynolds’ is stored in Slice 1based on its hash value (5000), with replicas of slice 1 stored on Node 1 and Node 2.hash(‘Alastair Reynolds’) 5000 Slice 1 Node 1, Node 29Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

idx author slicesMin hashMax hashSliceNodes02184511 (rr), 2218464369022 (rr), 3436916553533 (rr), 1Node 2 then looks up the primary keys of matching rows by checking Slice 1 on Node 1 because it is the ranking replica.idx authorKey (columns author)Value (PK, columns id)Alastair Reynolds7, 8Finally, Node 2 determines rows with id 7 and id 8 are stored in Slice 2 and Slice 3 based on their hashes, and thenforwards the query to Node 2 (itself) and Node 3 because they have the ranking replicas of Slice 2 and Slice 3.Figure 610Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

WritesXpand performs writes spanning multiple nodes within distributed transactions, implicitly and explicitly. In addition,within these transactions, all replicas are written in parallel in order to maximize performance and provide strongconsistency.In the same way reads use consistent hashing to determine which node(s) queries using primary keys or secondaryindexes should be forwarded to, so do writes.Example: Update a table using a primary keyUPDATE tbl books SET num pages 435 WHERE id ’9’;MaxScale forwards the query to Node 2. Node 2 then determines the row with id 9 is stored in Slice 3 based on its hashvalue (64000), with replicas of Slice 3 stored on Node 1 and Node 3.hash(9) 64000 Slice 3 Node 1, Node 3tbl books slicesMin hashMax hashSliceNodes02184511 (rr), 2218464369022 (rr), 3436916553533 (rr), 1Node 2 then forwards the query to all nodes with replicas of Slice 3, updating the row on Node 1 and Node 3 at thesame time.Figure 711Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

TransactionsXpand implements distributed transactions with snapshot isolation using a combination of three-phase commit (3PC),consensus via Paxos, two-phase locking (2PL) and multi-version concurrency control (MVCC).When a node receives a transaction, it becomes the transaction coordinator for it. The transaction is assigned atransaction id (xid), and each subsequent statement within the transaction is assigned an invocation id (iid). Whenrows are modified, write locks are obtained for them and new MVCC entries are created. The write locks do not blockreads, only other writes. If a write fails because it cannot obtain a write lock, the transaction will be rolled back.To commit the transaction, the transaction coordinator initiates the PREPARE phase. Next, during the ACCEPT phase,it selects three random nodes to persist the transaction state and waits for a consensus using Paxos – ensuringthe transaction can be completed even if the transaction coordinator fails. Finally, during the COMMIT phase, allparticipating nodes persist their MVCC entries and remove any write locks.In terms of consistency, as mentioned previously, Xpand defaults to SNAPSHOT ISOLATION. In terms of ANSI SQL, it issimilar to REPEATABLE READ but does not exhibit phantom reads because all reads within a transaction will use the lastcommitted data prior to the start of it.High availabilityNode failureIf a node fails, the cluster will automatically detect it and perform a group change. After the group change, therebalancer will automatically replace the lost ranking replicas on the failed node by promoting replicas on the remainingnodes to be ranking replicas.In addition, the rebalancer will replace any lost replicas on the failed node by using the remaining ranking replicas tocreate new ones.Example #1: Promoting replicas upon node failureNode 3 has the ranking replica for Slice 3. If Node 3 fails, the replica of Slice 3 on Node 1 will be promoted to rankingreplica.Figure 812Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

Example #2: Recreating lost replicas upon node failureNode 3 has a replica of Slice 2. While the ranking replica remains on Node 2, there is only one replica of it now. Therebalancer will automatically use it to create a new replica on Node 1, and thereby restore full fault tolerance for Slice 2.In addition, while the replica of Slice 3 on Node 1 has been promoted to ranking replica, there is only one replica of itnow. The rebalancer will automatically use it to create another on Node 2 and thereby restore full fault tolerance ofSlice 3 as well.When the rebalancing is complete, there will be two replicas of all slices, with every node storing the same amount ofdata.Figure 9Zone failureXpand can be deployed across multiple zones in the cloud, or multiple racks on premises. When it is, it willautomatically store replicas on nodes in separate zones in order to maintain availability even if an entire zone fails. Thedefault number of replicas, 2, ensures each replica will be stored in a different zone. If the number of replicas is set to 3,and there are three zones, Xpand will ensure there is a replica in every zone.Example #1: Replica in different zonesIf six nodes are deployed across three zones, there should be two nodes per zone. By default, there will be six sliceswith one slice per node. However, instead of replicas being on two different nodes, they will be in two different zones.13Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

Figure 10Example #2: Re-creating lost replicas upon zone failureIf Zone A fails, Xpand will automatically promote replicas in Zone B to replace the ranking replicas lost when Zone Afailed (P1 in Node 1 and P2 in Node 2). It will then use them to create additional replicas in Zone C in order to restorefault tolerance. Finally, it will use the ranking replicas in Zone C (P5 in Node 5 and P6 in Node 6) to create additionalreplicas in Zone B in order to restore fault tolerance for them as well. When the rebalancing is finished, there will betwo replicas of every slice, and they will be evenly distributed across the remaining zones.Figure 11ScalabilityXpand scales out with near linear performance. It allows nodes to be added or removed at any time, and continuouslymonitors node resource utilization and cluster workload distribution in order to optimize performance.This is all handled by the rebalancer, a critical component of Xpand. It is responsible for data and workload distribution.In addition to promoting replicas and recreating lost replicas when a node and/or zone fails, the rebalancerautomatically makes changes to scale and maximize performance.14Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

If the size of a slice grows beyond its threshold, the rebalancer will automatically split the slice into two. If a node growsto store more data than the others, the rebalancer will automatically move one or more of its slices to other nodes. Inaddition, and shown below, the rebalancer ensures an even distribution of data when nodes are added or removed, aswell as an even distribution of the workload.Example #1: Adding a nodeWhen nodes are added to a cluster, the rebalancer automatically moves one or more slices from existing nodes to thenew nodes in order to ensure an even distribution of data. If two nodes (Node 4 and Node 5) are added to the threenode cluster below, it will automatically move Slice 4 on Node 1 to Node 4 and Slice 5 on Node 2 to Node 5. After therebalancing is complete, each of the five nodes will have one slice.Figure 12While not shown above, the same would be true if there were multiple replicas of slices. If there were two, every nodewould have two slices after the rebalancing is complete, with one of them being a ranking replica for reads and theother a replica for high availability.Example #2: Workload managementIt’s possible for a slice, and thus a node, to end up doing much more work than the others. In fact, it’s possible for anode to end up with multiple slices whose data is being accessed more frequently than those on other nodes. Thesetypes of hotspots can lead to inefficient resource utilization and suboptimal performance.Xpand detects workload hotspots by continuously monitoring the resource utilization of each node. If one is detected,it can automatically remove it by redirecting reads to other nodes – and without having to rebalance the data. There isa ranking replica for every slice. It is the one used for reads. If a node is overutilized due to a busy ranking replica, Xpandcan promote another replica of it on a different node and make it the ranking replica, effectively rerouting its readsaway from the overutilized node.Ideally, each node in the three-node cluster below would be handling about 33% of the workload, but regardless of aneven distribution of data, Node 1 is handling 50% of the workload while Node 2 is only handling 10% of the workload.Xpand can simply change which replica is the ranking replica for one or more slices in order to even out the workloaddistribution.15Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

Figure 13Slice 1 on Node 2 is promoted to ranking replica in order to transfer some of the workload from Node 1 which isoverutilized to Node 2 which is underutilized. In addition, Slice 2 on Node 3 and Slice 6 on Node 1 are both promotedto ranking replicas. After replicas have been reranked, every node has two ranking replicas but now the workload isdistributed 30/35/35 – a far more even distribution of the workload.Node 116Node 2Node 3BeforeAfterBeforeAfterBeforeAfterSlice 130%0%0%30%N/AN/ASlice 2N/AN/A5%0%0%5%Slice 30%0%N/AN/A30%30%Slice 420%20%0%0%N/AN/ASlice 5N/AN/A5%5%0%0%Slice 60%0%N/AN/A10%0%Total50%30%10%35%40%35%Distributed SQL: The architecture behind MariaDB XpandWHITEPAPER

CONCLUSIONDistributed SQL removes the last remaining limitation of standard relational databases, scalability. DistributedSQL databases retain standard SQL, ACID transactions and strong consistency – all the characteristicsbusinesses have long relied upon to support mission-critical applications – while adding the scalabilitynecessary to support those whose throughput and latency requirements exceed those standard relationaldatabases are capable of.With the introduction of distributed SQL, businesses no longer have to turn to NoSQL databases such asApache Cassandra, giving up data integrity, easy data modeling and powerful querying in order to scalefurther. Nor do they have to spend millions of dollars on a hardware appliance such as Oracle Exadata or getstuck building and maintaining their own in-house sharding solution, both diverting budget and engineeringaway from application development.There are few distributed SQL databases available today because turning a standard relational database intoa distributed database is a hard problem, and most of them are relatively new. However, Xpand is proven,robust and mature. It has been used in production to power mission-critical applications for years, includingSamsung Cloud and the several hundred million Samsung phones connected to it.Xpand implements the MySQL protocol and is compatible with standard MariaDB and MySQL connectors,making it easy to migrate MySQL applications to Xpand. In addition, Xpand can be used as a storage enginewith MariaDB, providing DBAs with an easy way to add unlimited scalability to existing MariaDB deployments.Get started todayGet an Xpand cluster up and running in minutes.Free trialDownload the 45-day free trial and deploy an Xpand cluster with unlimited nodes and cores.https://mariadb.com/downloads/#mariadb platform-mariadb xpandSkySQLSign up for SkySQL and get a 500 credit to deploy a distributed SQL database within minutes to AWS or GCP usingany instance types, and with up to 18 d SQL: The architecture behind MariaDB XpandWHITEPAPER

and support standard SQL like standard relational databases, but they can do it at a much greater scale. MariaDB Xpand is a distributed SQL database built for scalability and performance. It can handle everything from thousands of transactions per second to millions - all while maintaining sub-millisecond latency. In addition, MariaDB

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

SQL Server supports ANSI SQL, which is the standard SQL (Structured Query Language) language. However, SQL Server comes with its own implementation of the SQL language, T-SQL (Transact- SQL). T-SQL is a Microsoft propriety Language known as Transact-SQL. It provides further capab

MS SQL Server: MS SQL Server 2017, MS SQL Server 2016, MS SQL Server 2014, MS SQL Server 2012, MS SQL Server 2008 R2, 2008, 2008 (64 bit), 2008 Express, MS SQL Server 2005, 2005 (64 bit), 2005 Express, MS SQL Server 2000, 2000 (64 bit), 7.0 and mixed formats. To install the software, follow the steps: 1. Double-click Stellar Repair for MS SQL.exe.