Goal Of The Presentation Is To Give An Introduction Of NoSQL Databases .

1y ago

20 Views

3 Downloads

4.74 MB

70 Pages

Last View : 20d ago

Last Download : 3m ago

Upload by : Warren Adams

Report this link

Download PDF

Transcription

Goal of the presentation is to give an introduction of NoSQL databases, why they arethere.We want to present "Why?" first to explain the need of something like "NoSQL" andthen in "What?" we go in detail.In addition there are lots and lots of NoSQL databases available, we have chosensome widely used databases in the industry.We think it's important that one should be aware of these databases and have thebasic understanding of why they exist, and how they are different.2

Justify their usage. Let's look at new trends in recent years.3

1. Each year more and more data is created. Over two years we create more digitaldata than all the data created in history before that!2. The rigidly defined, schema-based approach used by relational databases makes itimpossible to quickly incorporate new types of data.3. RDBMs are really good at transactions. perfected over the years. but huge amountof data today doesn't require transactional properties.3. NoSQL provides a data model that maps better to these needs.4

1. Data now has much more complex relations. It has evolved from hyptertext, RSS,blogs(have backlinks) to highly complex social graphs.2. No more efficient to represent in strict tables. - We need different data models.Graph databases.5

1. Relational databases are fundamentally centralized. 3-tier systems. Scale upsystem.2. To scale the application you add more web servers.3. To support more concurrent users and/or store more data,you need a bigger and bigger server with more CPUs, more memory, and more diskstorage to keep all the tables .4. Maintaining this single server becomes a headache both in terms of man powerand cost.6

Now we are moving towards distributed databases.We'll talk more about this later - ACID properties. Relational databases aims forconsistency .In a distributed environment we need to make a choice because of CAP.7

A survey done by couchbase.com shows that the major reason for choosing NoSQLdatabases are Flexibility and Scalability.8

lots of traffic buy bigger boxes. Lot of small boxes. SQL was designed to run onsingle box.1. SQL databases are very reliable and mature technologies.People have tried to extend the scope by changing SQL databases to adapt to thenew trends that we saw.Distributed caching - offload reads, in memory cached, using memcached over SQLserver. (highly common, lot of big companies use it)Example: Zynga - roughly 600 memcached databases over 400 SQL databases.Massive software - difficult management.9

Lot of vendors have tried to extend the scope but what's evident is that one solutionis not enough.10

Will spend a minute or two on ACID slides, basically a very quick review.12

Single machines: partition tolerance is irrelevant. consistency and availability can beachieve on a single machine.Consistency: so you can read or write to/from any node and get the same data.13

We will not spend much time on this, since there is a group that's presenting CAP inquite a detail. Only thing to take from this slide is that all three properties cannot beachieved at the same time.14

An illustration to show where most of the NoSQL and Relational databases lie on theCAP spectrum.It is interesting to see that the databases following CA model are primarily relationaldatabases, this is because, they are not built for partitioning and distributedstructure.NoSQL databases either show CP model or AP model. We will discuss a singledatabase from each as our case study.15

Not just SQL16

1. A paradigm shift from the traditional data model. SQL databases enforce a strictschema, whereas NoSQL databases has a week notion of schema.At the core all NoSQL databases are key/value systems, the difference is whether thedatabase understands the value or not.Different type of NoSQL databases have different properties. We'll see four majordata models in a minute.2. As we are moving towards distributed databases and not all the data istransactional we need a separate set of guarantees.17

1. Key/Value stores don't understand the data in value. To query a key/value databaseyou must have the key.2. Redis is a very popular database with support of special data structures wherevalues are of special kind. It can perform common operations on the provideddataset.3. Another database that deserves a mention here is membase. It's an in-memoryonly database. Disk-based, fill cache, ADD/Remove nodes on the fly.So you have datastores with different features like only in-memory, persistent,support for data structures - this shows amount of diversity in NoSQL databases.18

1. Key/Value stores don't understand the data in value. To query a key/value databaseyou must have the key.2. Redis is a very popular database with support of special data structures wherevalues are of special kind. It can perform common operations on the provideddataset.So you have datastores with different features like only in-memory, persistent,support for data structures - this shows amount of diversity in NoSQL databases.3. Apache Dynamo is also one of them, which we will discuss in detail as a case study.19

Instead of Value the database takes in a document which is semi structureddata. Some use JSON, some XML and other BSON.20

1. BSON - binary version of JSON objects. Higher performance on the wire andcompact storage .2. In couchbase you need to materialize views to make ad-hoc queries. Declare whatyour indexes will be, you can query.MongoDB doesn't require xanti declaration of indexes to query.Ad-hoc queries are queries that are created on the fly with a variable parameters.21

Concept is still the same. Key - ValueNotion of column forms - i.e, instead of writing the whole document at a singlephysical location the document is now written split across these columnforms/families.Say a document has 10 columns or 10 attributes: you could write subsets of columnsat particular locations so that queries on those columns are answered faster. Thisworks well for predefined schema - HP Vertica.Cassandra is a little different from this type of storage. Cassandra writes these todifferent family objects which by themselves are column dependent stores. This isdriven not by the schema but by the queries that are expected to be answered.22

BigTable coined the column oriented structure.Joins as in relational databases is not supported. Usually different column familyobjects are there in a keyspace, each supporting one or more queries. To achieve theeffect of joins, some extent of denormalization is necessary.23

1. HBase runs only on top of HDFS while Cassandra can run on various file systems2. Both are modeled as per BigTable's model3. CP : Handles Consistency, Partioning out of the three in CAP.4. AP : Handles Availability, Partioning out of the three in CAP.Cassandra supports reads and writes in case of network partition and patches it uplater thus resulting in eventual consistency whereas Couchbase prevents thesenetwork partitioned writes thus maintaining consistency at any time.24

Concept is still the same. Key - Value25

When performing a write transaction on a slave each write operation will besynchronized with the master (locks will be acquired on both master andslave). When the transaction commits it will first be committed on the masterand then, if successful, on the slave. To ensure consistency, a slave has to beup to date with the master before performing a write operation26

Couchbase Membase(front backend for HA) CouchDB (deeper backend to providequery functionality)BDB can be setup as a persistent database. Depends on the config. Mostly used asembedded database.BDB when compared to membase has much much lower concurrency ratessupporting only in the lower tens.Also membase is memcached cluster compatible whereas there is no implementednotion of bdb cluster.27

To address above problems lot of big companies developed their in-house solutions.Non-relational, cluster friendly, open-source,28

Structured because data is stored in an indexed map.3-dimensional structure because it is just a large map that is indexed by a row key,column key, and a timestamp, which act as the dimensions. Will be more clear in thenext slide.Uninterpretated becuase Each value within the map is just an array of bytes that iseventually interpreted by the application.Consistency over Availability: BigTable will preserve the guarantees of its atomicreads and writes by refusing to respond to some requests. It may decide to shutdown entirely (like the clients of a single-node data store), refuse writes (like TwoPhase Commit), or only respond to reads and writes for pieces of data whose“master” node is inside the partition component (like Membase).It responds onlyafter having quorom of locks [Paxos] which is managed by Chubby. [not in current31

scope]31

Sparse : The table is sparse, meaning that different rows in a table may use differentcolumns, with many of the columns empty for a particular row.Distributed : BigTable's data is distributed among many independent machines. AtGoogle, BigTable is built on top of GFS (Google File System). The Apache open sourceversion of BigTable, HBase, is built on top of HDFS (Hadoop Distributed File System) orAmazon S3. The table is broken up among rows, with groups of adjacent rowsmanaged by a server. A row itself is never distributed.Scalable : Without changing applications, more and more nodes can be added to thenetwork to make the cluster more scalable.SortedA key is hashed to a position in a table. BigTable sorts its data by keys. This helpskeep related data close together, usually on the same machine — assuming that onestructures keys in such a way that sorting brings the data together. For example, if32

domain names are used as keys in a BigTable, it makes sense to store them in reverseorder to ensure that related domains are close together.map A map is an associative array; a data structure that allows one to look up a valueto a corresponding key quickly. BigTable is a collection of (key, value) pairs where thekey identifies a row and the value is the set of columns.32

A table is indexed by rows. Each row contains one or more named column families.Column families are defined when the table is first created. Within a column family,one may have one or more named columns. All data within a column family is usuallyof the same type.The implementation of BigTable usually compresses all the columns within a columnfamily together. Columns within a column family can be created on the fly. Rows,column families and columns provide a three-level naming hierarchy in identifyingdata.To get data from BigTable, you need to provide a fully-qualified name in the formcolumn-family:column.33

Chubby is a highly available and persistent distributed lock service that managesleases for resources and stores configuration information.In BigTable, Chubby is used to: ensure there is only one active master store the bootstrap location of BigTable data discover tablet serversLocating rows within a BigTable is managed in a three-level hierarchy. The root (toplevel) tablet stores the location of all Metadata tablets in a special Metadata tablet.Each Metadata table contains the location of user data tablets. This table is keyed bynode IDs and each row identifies a tablet's table ID and end row. For efficiency, theclient library caches tablet locations.34

Need of Bloom Filters:Typically, a read operation has to read from the user tables that make up the state ofa tablet. If these are not in memory , we may end up doing many disk accesses. Wereduce the number of accesses by allowing clients to specify that Bloom filters shouldbe created for these user tables. A Bloom filter allows us to ask whether an user tablemight contain any data for a specified row/column pair. Thus, a small amount oftablet server memory used for storing Bloom filters drastically reduces the number ofdisk seeks required for read operations. Interesting, isn't it!35

To improve read performance, tablet servers use two levels of caching.The Scan Cache is a higher level cache that caches the key-value pairs returned by theuser table interface to the tablet server code. It is most useful for applications thattend to read the same data repeatedly.The Block Cache is a lower-level cache that caches row blocks that were read fromGFS. It is useful for applications that tend to read data that is close to the data theyrecently read (e.g., sequential reads, or random reads of different columns in thesame locality group within a hot row)36

DynamoDB is database from amazon that they designed to solve their availabilityissues. Lot of their services didn't need transactional capabilities, and they requiredsimple key value access. They were ready to tolerate some inconsistency (forexample, an item may appear in the shopping cart after you have deleted it), howeveryou should always be able to add items to the shopping cart even in presence offailures.38

low latency, SLA (service level agreement) of serving 99.9% of requests with responsewithin 300ms at a max rate of 500req/sec39

Key techniques that the dynamo chooses.40

Dynamo uses consistent hashing to distribute content to nodes. Ring is the core ofconsistent hashing. In consistent hashing you map your data to points on ring.Ring is divided into regions and each region is then mapped to physical servers.However this approach may lead to load imbalance.allows you to have diverse set of machines by assigning diff. virtual nodes. Moreoverit allows you add/remove nodes on the fly.41

adding a node requires on an average 1/n 1 nodes to move.42

Removing a node requires only content of removed node to be shifted.43

Dynamo uses virtual nodes where multiple virtual nodes are assigned to physicalnodes. This helps in balancing of load44

Now we know how to distribute data. Consistent hashing also makes it easier toreplicate data. Simply choose next two nodes in the cycle and replicate the data tothose nodes.In the above figure N 3. So the data is replicated to total 3 nodes. In the givenexample, if the hash maps to 3, then it lies in the region of A. We put the data in A,now we follow the cycle and replicate the data to two more available nodes.45

"Sloppy quorums" choose the first N healthy nodes. This may lead to inconsistencies.Strict quorum systems become unavailable in case of simplest of failures, so sloppyquorums are used.46

Key ranges because one tree per key range. Merkel tree used for synchronizingreplicas.Each node keep route information to all other nodes. Routing can be done by loadbalancer or client library.Using client lib. it directly goes the node in the "preference list", however in case ofload balancer - node routes the request to first node in listAlso uses unreliable failure detection to identify failed nodes. Keeps checking in caseof partitions also.built into the nodes and not a separate entities.47

Hot topic in tech industryMore and more companies handling a lot of data are adding NoSQL to their workflow49

1. Social networks are often persisted in the form of trees and graphs.2. Other NoSQL models resemble storing blobs against a key or even a complete XMLdocuments against a key.3. The main characterstic of these models are that they do not interact with eachother unlike relations. Here model can be referred to the data structure used for thedata storage in the database. By interacting, we mean that one data structure isindependent in itself. It would never need to "join" with other data structure to getany other data.55

Key techniques that the dynamo chooses.58

Each write to a key K is associated with a vector clock VC(K)Track the version of data.60

In an atomic transaction, a series of database operations either all occur, or nothingoccurs. A guarantee of atomicity prevents updates to the database occurring onlypartially, which can cause greater problems than rejecting the whole series outright.Atomicity is said to be fulfilled in the example if either A and B both occur or neitherof A or B occurs, i.e. all or none.63

Consistency of the transaction in the above example requires that the total sum of Aand B remain constant before and after the transaction. If after transactions, the totalsum of A and B becomes a b-10, then the database is not consistent.64

Concurrency control comprises the underlying mechanisms in a DBMS which handlesisolation and guarantees related correctness. It is heavily utilized by the database andstorage engines both to guarantee the correct execution of concurrent transactions.(All discussed in detail in the class)65

Durability is the ACID property which guarantees that transactions that havecommitted will survive permanently. For example, if a flight booking reports that aseat has successfully been booked, then the seat will remain booked even if thesystem crashes.66

SortedA key is hashed to a position in a table. BigTable sorts its data by keys. This helpskeep related data close together, usually on the same machine — assuming that onestructures keys in such a way that sorting brings the data together. For example, ifdomain names are used as keys in a BigTable, it makes sense to store them in reverseorder to ensure that related domains are close together.map A map is an associative array; a data structure that allows one to look up a valueto a corresponding key quickly. BigTable is a collection of (key, value) pairs where thekey identifies a row and the value is the set of columns.67

According to CAP you can pick only two of the alternatives.BASE focuses on Availability and Partition tolerance whereas ACID focuses onConsistency and Availability.68

1. A paradigm shift from the traditional data model. SQL databases enforce a strict schema, whereas NoSQL databases has a week notion of schema. At the core all NoSQL databases are key/value systems, the difference is whether the database understands the value or not. Different type of NoSQL databases have different properties. We'll see four major

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

117 Views

9m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

470 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

329 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

127 Views

9m ago

Kinh Giải Thâm Mật HT. Thích Trí Quang dịch giải

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

1.6K Views

3y ago

1 REFERENCES GENERALES 2 - bourre

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

988 Views

3y ago

jean-marie-bourre bienvenue

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.

1.0K Views

3y ago

Recent Views

Rocket Lawyer Legal Benefits Plan Summary - avinc

3) You can also hire a lawyer for ongoing representation. Rocket Lawyer On Call attorneys generally offer Rocket Lawyer members 40% off their normal rate. Alternatively, you can speak to your Rocket Lawyer customer representative or call us at (877) 881-0947. We'll contact you within one business day and connect you to a local lawyer.

1y ago

139 Views

Rocket Lawyer Legal Benefits Plan Summary

1y ago

160 Views

Gorilla exceptions and the ethically apathetic corporate lawyer

Here, Dare sees the lawyer as the instrument of the institution of law. Postema disagrees: 'The lawyer must recognise that the institution acts only through the voluntary activities of the lawyer and client. The lawyer is not the instrument of the institution, rather the institution is the instrument of the client and the client engages the .

1y ago

122 Views

ALABAMA Alabama Legal Help Alabama Lawyer Referral Service How Does the .

The Lawyer Referral Service does not have free attorneys. How Does the Lawyer Referral Service Work? When you call the Lawyer Referral Service's toll free number (1-800-392-5660) you will be asked to briefly state your problem. All information will be held in the strictest confidence. After listening to your problem, the Lawyer Referral .

5m ago

68 Views

TEXAS DISCIPLINARY RULES OF PROFESSIONAL CONDUCT VII .

the lawyer’s letterhead, business cards, office sign, fee contracts, and with the lawyer’s signature on pleadings and other legal documents. (f) A lawyer shall not use a firm name, letterhead, or other professional designation that violates Rule 7.02(a). Comment 1. A lawyer or law firm may not practice law using a name that is misleading as .File Size: 137KB

2y ago

152 Views

RISK MANAGEMENT FOR LEGAL SUPPORT STAFF

A lawyer’s letterhead or a business card may include the name of a non-lawyer assistant if the assistant’s ca-pacity is clearly indicated and the document is otherwise neither false nor misleading. 10. A lawyer may use a non-lawyer, non-employee f

2y ago

186 Views

Hiring and Working with an Attorney

questions and how well the lawyer listens to you. You also want to pay attention to how easy it is to understand the lawyer's explanation of your legal problem, and how you feel about the lawyer's abilities. During the interview, you may ask questions about the lawyer's background, qualifi-cations and experience, such as:

1y ago

145 Views

A Guide to Setting Up and Using Your Lawyer Trust Account

The ethical obligations for those who set up lawyer trust accounts are rooted in the principle that a lawyer who holds funds of a client or third person in trust, even for a . those assets from the lawyer's personal and business assets. Oregon Rules of Professional Conduct (ORPC) 1.15-1 and ORPC 1.15-2 set forth the ethical duties and

1y ago

125 Views

Ethics and the Virtual Practice of Law

An office or a law firm may be "virtual," but the lawyer is in a specific location. Must the lawyer be licensed to practice in the jurisdiction where the lawyer is physically located? Must the lawyer be licensed in the jurisdiction where the client and the client's matter is located? 8. E THICS & V IRTUAL P RACTICE Relevant Ethics Rules .

1y ago

122 Views

Enron and the Corporate Lawyer: A Primer on Legal and Ethical Issues

144 The Business Lawyer; Vol. 58, November 2002 responsibilities when the lawyer learns, or has reason to know, that officers or other agents of the lawyer's corporate client are engaged in conduct that violates the law or their fiduciary duty to the corporation and is likely to result in harm

1y ago

126 Views

LegalZoom Inc v. Rocket Lawyer Incorporated Doc. 11

LegalZoom.com Inc v. Rocket Lawyer Incorporated Doc. 11 Dockets.Justia.com. EXHIBIT 1 EXHIBIT 1. EXHIBIT 1 -21-EXHIBIT 2 EXHIBIT 2. EXHIBIT 2 -22-EXHIBIT 3 EXHIBIT 3. rocketlaywer incorporate-Google Search https://www.google.coml . Free Legal Documents & Legal Forms I Find a Lawyer I Rocket Lawyer

1y ago

115 Views

LegalZoom Inc v. Rocket Lawyer Incorporated Doc. 17

LegalZoom.com Inc v. Rocket Lawyer Incorporated Doc. 17 Dockets.Justia.com. EXHIBIT 1 EXHIBIT 1. EXHIBIT 1 -23-EXHIBIT 2 EXHIBIT 2. EXHIBIT 2 -24-EXHIBIT 3 EXHIBIT 3. rocketlaywer incorporate-Google Search https://www.google.coml . Free Legal Documents & Legal Forms I Find a Lawyer I Rocket Lawyer

1y ago

115 Views

iLJ -- 2017) -, Issued by ACPE, CAA, & UPL June 21, 2017 ADVISORY .

enforcing a legal document (called "document defense"). Users also receive a "free" 3Ominute consultation with a lawyer, and can use the "ask a lawyer" section of its website for legal advice. Participating lawyers do not pay Rocket Lawyer but agree to offer a discounted fee for additional services; Rocket Lawyer retains the monthly subscription fees. The Committees find that the .

1y ago

108 Views

ETHICAL ISSUES IN CLASS ACTIONS . - Parker Mills LLP

DAVID B. PARKER is a trial lawyer and founder of Parker Mills LLP in Los Angeles. His practice is focused on commercial, professional liability and insurance litigation. Often described as a "lawyer’s lawyer," his practice extends to counseling and litigation in legal ethics and disputes between and among lawyers.

3y ago

190 Views

10TH ANNUAL PERFORMANCE REPORT OF THE NATIONAL PRO BONO .

Target. 10TH annual PERFORMANCE REPORT oF The National Pro Bono Aspirational Target PERFORMANCE OF TARGET SIGNATORIES AT LEAST 35 HOURS OF “PRO BONO LEGA L SERVICE S” PER LAWYER PER YEAR 48.6% 35.7 pro bono hours per lawyer across Target Signatories3. Down from 36.0 pro bono hours per lawyer in FY2016 of Signatories met or exceeded the .

3y ago

165 Views

Goal Of The Presentation Is To Give An Introduction Of NoSQL Databases .

It looks like you're using an ad-blocker