Architecture Of A Database System

3y ago

65 Views

2 Downloads

909.54 KB

119 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kamden Hassan

Report this link

Download PDF

Transcription

RFoundations and Trends inDatabasesVol. 1, No. 2 (2007) 141–259c 2007 J. M. Hellerstein, M. Stonebraker and J. HamiltonDOI: 10.1561/1900000002Architecture of a Database SystemJoseph M. Hellerstein1 , Michael Stonebraker2and James Hamilton3123University of California, Berkeley, USA, hellerstein@cs.berkeley.eduMassachusetts Institute of Technology, USAMicrosoft Research, USAAbstractDatabase Management Systems (DBMSs) are a ubiquitous and criticalcomponent of modern computing, and the result of decades of researchand development in both academia and industry. Historically, DBMSswere among the earliest multi-user server systems to be developed, andthus pioneered many systems design techniques for scalability and reliability now in use in many other contexts. While many of the algorithmsand abstractions used by a DBMS are textbook material, there has beenrelatively sparse coverage in the literature of the systems design issuesthat make a DBMS work. This paper presents an architectural discussion of DBMS design principles, including process models, parallelarchitecture, storage system design, transaction system implementation, query processor and optimizer architectures, and typical sharedcomponents and utilities. Successful commercial and open-source systems are used as points of reference, particularly when multiple alternative designs have been adopted by diﬀerent groups.

1IntroductionDatabase Management Systems (DBMSs) are complex, mission-criticalsoftware systems. Today’s DBMSs embody decades of academicand industrial research and intense corporate software development.Database systems were among the earliest widely deployed online serversystems and, as such, have pioneered design solutions spanning not onlydata management, but also applications, operating systems, and networked services. The early DBMSs are among the most inﬂuential software systems in computer science, and the ideas and implementationissues pioneered for DBMSs are widely copied and reinvented.For a number of reasons, the lessons of database systems architecture are not as broadly known as they should be. First, the applieddatabase systems community is fairly small. Since market forces onlysupport a few competitors at the high end, only a handful of successfulDBMS implementations exist. The community of people involved indesigning and implementing database systems is tight: many attendedthe same schools, worked on the same inﬂuential research projects, andcollaborated on the same commercial products. Second, academic treatment of database systems often ignores architectural issues. Textbookpresentations of database systems traditionally focus on algorithmic142

1.1 Relational Systems: The Life of a Query143and theoretical issues — which are natural to teach, study, and test —without a holistic discussion of system architecture in full implementations. In sum, much conventional wisdom about how to build databasesystems is available, but little of it has been written down or communicated broadly.In this paper, we attempt to capture the main architectural aspectsof modern database systems, with a discussion of advanced topics. Someof these appear in the literature, and we provide references where appropriate. Other issues are buried in product manuals, and some are simplypart of the oral tradition of the community. Where applicable, we usecommercial and open-source systems as examples of the various architectural forms discussed. Space prevents, however, the enumeration ofthe exceptions and ﬁner nuances that have found their way into thesemulti-million line code bases, most of which are well over a decade old.Our goal here is to focus on overall system design and stress issuesnot typically discussed in textbooks, providing useful context for morewidely known algorithms and concepts. We assume that the readeris familiar with textbook database systems material (e.g., [72] or [83])and with the basic facilities of modern operating systems such as UNIX,Linux, or Windows. After introducing the high-level architecture of aDBMS in the next section, we provide a number of references to background reading on each of the components in Section 1.2.1.1Relational Systems: The Life of a QueryThe most mature and widely used database systems in productiontoday are relational database management systems (RDBMSs). Thesesystems can be found at the core of much of the world’s applicationinfrastructure including e-commerce, medical records, billing, humanresources, payroll, customer relationship management and supply chainmanagement, to name a few. The advent of web-based commerce andcommunity-oriented sites has only increased the volume and breadth oftheir use. Relational systems serve as the repositories of record behindnearly all online transactions and most online content management systems (blogs, wikis, social networks, and the like). In addition to beingimportant software infrastructure, relational database systems serve as

144 IntroductionFig. 1.1 Main components of a DBMS.a well-understood point of reference for new extensions and revolutionsin database systems that may arise in the future. As a result, we focuson relational database systems throughout this paper.At heart, a typical RDBMS has ﬁve main components, as illustratedin Figure 1.1. As an introduction to each of these components and theway they ﬁt together, we step through the life of a query in a databasesystem. This also serves as an overview of the remaining sections of thepaper.Consider a simple but typical database interaction at an airport, inwhich a gate agent clicks on a form to request the passenger list for aﬂight. This button click results in a single-query transaction that worksroughly as follows:1. The personal computer at the airport gate (the “client”) callsan API that in turn communicates over a network to establish a connection with the Client Communications Managerof a DBMS (top of Figure 1.1). In some cases, this connection

1.1 Relational Systems: The Life of a Query145is established between the client and the database serverdirectly, e.g., via the ODBC or JDBC connectivity protocol.This arrangement is termed a “two-tier” or “client-server”system. In other cases, the client may communicate witha “middle-tier server” (a web server, transaction processing monitor, or the like), which in turn uses a protocol toproxy the communication between the client and the DBMS.This is usually called a “three-tier” system. In many webbased scenarios there is yet another “application server” tierbetween the web server and the DBMS, resulting in fourtiers. Given these various options, a typical DBMS needsto be compatible with many diﬀerent connectivity protocolsused by various client drivers and middleware systems. Atbase, however, the responsibility of the DBMS’ client communications manager in all these protocols is roughly thesame: to establish and remember the connection state forthe caller (be it a client or a middleware server), to respondto SQL commands from the caller, and to return both dataand control messages (result codes, errors, etc.) as appropriate. In our simple example, the communications managerwould establish the security credentials of the client, set upstate to remember the details of the new connection and thecurrent SQL command across calls, and forward the client’sﬁrst request deeper into the DBMS to be processed.2. Upon receiving the client’s ﬁrst SQL command, the DBMSmust assign a “thread of computation” to the command. Itmust also make sure that the thread’s data and control outputs are connected via the communications manager to theclient. These tasks are the job of the DBMS Process Manager (left side of Figure 1.1). The most important decisionthat the DBMS needs to make at this stage in the queryregards admission control : whether the system should beginprocessing the query immediately, or defer execution until atime when enough system resources are available to devoteto this query. We discuss Process Management in detail inSection 2.

146 Introduction3. Once admitted and allocated as a thread of control, the gateagent’s query can begin to execute. It does so by invoking thecode in the Relational Query Processor (center, Figure 1.1).This set of modules checks that the user is authorized to runthe query, and compiles the user’s SQL query text into aninternal query plan. Once compiled, the resulting query planis handled via the plan executor. The plan executor consistsof a suite of “operators” (relational algorithm implementations) for executing any query. Typical operators implementrelational query processing tasks including joins, selection,projection, aggregation, sorting and so on, as well as callsto request data records from lower layers of the system. Inour example query, a small subset of these operators — asassembled by the query optimization process — is invoked tosatisfy the gate agent’s query. We discuss the query processorin Section 4.4. At the base of the gate agent’s query plan, one or moreoperators exist to request data from the database. Theseoperators make calls to fetch data from the DBMS’ Transactional Storage Manager (Figure 1.1, bottom), which manages all data access (read) and manipulation (create, update,delete) calls. The storage system includes algorithms anddata structures for organizing and accessing data on disk(“access methods”), including basic structures like tablesand indexes. It also includes a buﬀer management module that decides when and what data to transfer betweendisk and memory buﬀers. Returning to our example, in thecourse of accessing data in the access methods, the gateagent’s query must invoke the transaction management codeto ensure the well-known “ACID” properties of transactions[30] (discussed in more detail in Section 5.1). Before accessing data, locks are acquired from a lock manager to ensurecorrect execution in the face of other concurrent queries. Ifthe gate agent’s query involved updates to the database, itwould interact with the log manager to ensure that the transaction was durable if committed, and fully undone if aborted.

1.1 Relational Systems: The Life of a Query147In Section 5, we discuss storage and buﬀer management inmore detail; Section 6 covers the transactional consistencyarchitecture.5. At this point in the example query’s life, it has begun toaccess data records, and is ready to use them to computeresults for the client. This is done by “unwinding the stack”of activities we described up to this point. The access methods return control to the query executor’s operators, whichorchestrate the computation of result tuples from databasedata; as result tuples are generated, they are placed in abuﬀer for the client communications manager, which shipsthe results back to the caller. For large result sets, theclient typically will make additional calls to fetch more dataincrementally from the query, resulting in multiple iterations through the communications manager, query executor, and storage manager. In our simple example, at the endof the query the transaction is completed and the connection closed; this results in the transaction manager cleaningup state for the transaction, the process manager freeingany control structures for the query, and the communications manager cleaning up communication state for theconnection.Our discussion of this example query touches on many of the keycomponents in an RDBMS, but not all of them. The right-hand sideof Figure 1.1 depicts a number of shared components and utilitiesthat are vital to the operation of a full-function DBMS. The catalogand memory managers are invoked as utilities during any transaction,including our example query. The catalog is used by the query processor during authentication, parsing, and query optimization. The memory manager is used throughout the DBMS whenever memory needsto be dynamically allocated or deallocated. The remaining moduleslisted in the rightmost box of Figure 1.1 are utilities that run independently of any particular query, keeping the database as a whole welltuned and reliable. We discuss these shared components and utilities inSection 7.

148 Introduction1.2Scope and OverviewIn most of this paper, our focus is on architectural fundamentals supporting core database functionality. We do not attempt to provide acomprehensive review of database algorithmics that have been extensively documented in the literature. We also provide only minimal discussion of many extensions present in modern DBMSs, most of whichprovide features beyond core data management but do not signiﬁcantlyalter the system architecture. However, within the various sections ofthis paper we note topics of interest that are beyond the scope of thepaper, and where possible we provide pointers to additional reading.We begin our discussion with an investigation of the overall architecture of database systems. The ﬁrst topic in any server system architecture is its overall process structure, and we explore a variety of viablealternatives on this front, ﬁrst for uniprocessor machines and then forthe variety of parallel architectures available today. This discussion ofcore server system architecture is applicable to a variety of systems,but was to a large degree pioneered in DBMS design. Following this,we begin on the more domain-speciﬁc components of a DBMS. We startwith a single query’s view of the system, focusing on the relational queryprocessor. Following that, we move into the storage architecture andtransactional storage management design. Finally, we present some ofthe shared components and utilities that exist in most DBMSs, but arerarely discussed in textbooks.

2Process ModelsWhen designing any multi-user server, early decisions need to be maderegarding the execution of concurrent user requests and how these aremapped to operating system processes or threads. These decisions havea profound inﬂuence on the software architecture of the system, and onits performance, scalability, and portability across operating systems.1In this section, we survey a number of options for DBMS process models, which serve as a template for many other highly concurrent serversystems. We begin with a simpliﬁed framework, assuming the availability of good operating system support for threads, and we initially targetonly a uniprocessor system. We then expand on this simpliﬁed discussion to deal with the realities of how modern DBMSs implement theirprocess models. In Section 3, we discuss techniques to exploit clustersof computers, as well as multi-processor and multi-core systems.The discussion that follows relies on these deﬁnitions: An Operating System Process combines an operating system(OS) program execution unit (a thread of control) with an1 Manybut not all DBMSs are designed to be portable across a wide variety of host operatingsystems. Notable examples of OS-speciﬁc DBMSs are DB2 for zSeries and Microsoft SQLServer. Rather than using only widely available OS facilities, these products are free toexploit the unique facilities of their single host.149

150 Process Modelsaddress space private to the process. Included in the statemaintained for a process are OS resource handles and thesecurity context. This single unit of program execution isscheduled by the OS kernel and each process has its ownunique address space. An Operating System Thread is an OS program executionunit without additional private OS context and without aprivate address space. Each OS thread has full access to thememory of other threads executing within the same multithreaded OS Process. Thread execution is scheduled by theoperating system kernel scheduler and these threads are oftencalled “kernel threads” or k-threads. A Lightweight Thread Package is an application-level construct that supports multiple threads within a single OSprocess. Unlike OS threads scheduled by the OS, lightweightthreads are scheduled by an application-level thread scheduler. The diﬀerence between a lightweight thread and akernel thread is that a lightweight thread is scheduled inuser-space without kernel scheduler involvement or knowledge. The combination of the user-space scheduler and all ofits lightweight threads run within a single OS process andappears to the OS scheduler as a single thread of execution.Lightweight threads have the advantage of faster threadswitches when compared to OS threads since there is noneed to do an OS kernel mode switch to schedule the nextthread. Lightweight threads have the disadvantage, however, that any blocking operation such as a synchronousI/O by any thread will block all threads in the process.This prevents any of the other threads from making progresswhile one thread is blocked waiting for an OS resource.Lightweight thread packages avoid this by (1) issuing onlyasynchronous (non-blocking) I/O requests and (2) notinvoking any OS operations that could block. Generally,lightweight threads oﬀer a more diﬃcult programming modelthan writing software based on either OS processes or OSthreads.

151 Some DBMSs implement their own lightweight thread(LWT) packages. These are a special case of general LWTpackages. We refer to these threads as DBMS threadsand simply threads when the distinction between DBMS,general LWT, and OS threads are unimportant to thediscussion. A DBMS Client is the software component that implementsthe API used by application programs to communicate witha DBMS. Some example database access APIs are JDBC,ODBC, and OLE/DB. In addition, there are a wide variety of proprietary database access API sets. Some programsare written using embedded SQL, a technique of mixing programming language statements with database access statements. This was ﬁrst delivered in IBM COBOL and PL/Iand, much later, in SQL/J which implements embeddedSQL for Java. Embedded SQL is processed by preprocessors that translate the embedded SQL statements into directcalls to data access APIs. Whatever the syntax used inthe client program, the end result is a sequence of callsto the DBMS data access APIs. Calls made to these APIsare marshaled by the DBMS client component and sent tothe DBMS over some communications protocol. The protocols are usually proprietary and often undocumented. In thepast, there have been several eﬀorts to standardize client-todatabase communication protocols, with Open Group DRDAbeing perhaps the best known, but none have achieved broadadoption. A DBMS Worker is the thread of execution in the DBMSthat does work on behalf of a DBMS Client. A 1:1 mapping exists between a DBMS worker and a DBMS Client:the DBMS worker handles all SQL requests from a singleDBMS Client. The DBMS client sends SQL requests to theDBMS server. The worker executes each request and returnsthe result to the client. In what follows, we investigate thediﬀerent approaches commercial DBMSs use to map DBMSworkers onto OS threads or processes. When the distinction is

152 Process Modelssigniﬁcant, we will refer to them as worker threads or workerprocesses. Otherwise, we refer to them simply as workers orDBMS workers.2.1Uniprocessors and Lightweight ThreadsIn this subsection, we outline a simpliﬁed DBMS process model taxonomy. Few leading DBMSs are architected exactly as described in thissection, but the material forms the basis from which we will discuss current generation production systems in more detail. Each of the leadingdatabase systems today is, at its core, an extension or enhancement ofat least one of the models presented here.We start by making two simplifying assumptions (which we willrelax in subsequent sections):1. OS thread support: We assume that the OS provides us witheﬃcient support for kernel threads and that a process canhave a very large number of threads. We also assume thatthe memory overhead of each thread is small and that thecontext switches are inexpensive. This is arguably true ona number of modern OS today, but was certainly not truewhen most DBMSs were ﬁrst designe

The most mature and widely used database systems in production today are relational database management systems (RDBMSs). These systems can be found at the core of much of the world’s application infrastructure including e-commerce, medical records, billing, human resources, payroll, customer relationship management and supply chain

Related Documents:

FIFTEENTH EDITION DATABASE PROCESSING

Database Applications and SQL 12 The DBMS 15 The Database 16 Personal Versus Enterprise-Class Database Systems 18 What Is Microsoft Access? 18 What Is an Enterprise-Class Database System? 19 Database Design 21 Database Design from Existing Data 21 Database Design for New Systems Development 23 Database Redesign 23

98 Views

2y ago

Advanced Database Systems - National Dong Hwa University

real world about which data is stored in a database. Database Management System (DBMS): A collection of programs to facilitate the creation and maintenance of a database. Database System DBMS Database A database system contains information about a particular enterprise. A database system provides an environment that is both

40 Views

1y ago

Administering Oracle Database Classic Cloud Service

Getting Started with Database Classic Cloud Service. About Oracle Database Classic Cloud Service1-1. About Database Classic Cloud Service Database Deployments1-2. Oracle Database Software Release1-3. Oracle Database Software Edition1-3. Oracle Database Type1-4. Computing Power1-5. Database Storage1-5. Automatic Backup Configuration1-6

36 Views

1y ago

Performance Evaluation of Cloud Database and Traditional Database in ...

The term database is correctly applied to the data and their supporting data structures, and not to the database management system. The database along with DBMS is collectively called Database System. A Cloud Database is a database that typically runs on a Cloud Computing platform, such as Windows Azure, Amazon EC2, GoGrid and Rackspace.

53 Views

1y ago

What is Computer Architecture? - University of Pennsylvania

What is Computer Architecture? “Computer Architecture is the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals.” - WWW Computer Architecture Page An analogy to architecture of File Size: 1MBPage Count: 12Explore further(PDF) Lecture Notes on Computer Architecturewww.researchgate.netComputer Architecture - an overview ScienceDirect Topicswww.sciencedirect.comWhat is Computer Architecture? - Definition from Techopediawww.techopedia.com1. An Introduction to Computer Architecture - Designing .www.oreilly.comWhat is Computer Architecture? - University of Washingtoncourses.cs.washington.eduRecommended to you b

376 Views

2y ago

Creating a Relational Database using Base - The Document Foundation

Creating a new database To create a new database, choose File New Database from the menu bar, or click the arrow next to the New icon on the Standard toolbar and select Database from the drop-down menu. Both methods open the Database Wizard. On the first page of the Database Wizard, select Create a new database and then click Next. The second page has two questions.

43 Views

1y ago

Detailed Syllabus Database Management Systems - RGMCET

Database Management Systems UNIT-I Introduction RGMCET (CSE Dept.) Page 1 UNIT-I INTRODUCTION TO DBMS Database System Applications, database System VS file System - View of Data - Data Abstraction -Instances and Schemas - data Models - the ER Model - Relational Model - Database Languages - DDL - DML - Database Access for applications Programs - Database Users and .

44 Views

1y ago

DBMS Architecture Chapter 6 - Database Management …

Distributed Database Cont 12 A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. In a distributed database system, the database is stored on several computers. Data management is decentralized but act as if they are centralized. A distributed database system consists of loosely coupled

47 Views

2y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

MSN Stock Quotes Web Part - Amrein Engineering

The MSN Stock Quotes Web Part uses the public MSN Money Central Stock Quote Web Service to display selected stock quote information. The data are delayed by 20 minutes and provided by MSN Mo

2y ago

242 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

TRAINING - CamInstructor

Mastercam Training Guide Mill-Lesson-4-9 6. Change the parameters to match the Stock Setup screenshot below: Stock Setup Stock Origin The stock origin is the X-Y-Z coordinate position of the point indicated by the cross in the picture of the stock model. Use it so Mastercam knows where your stock model is located relative to your part and

3y ago

242 Views

WPX Energy, Inc. - Feltl and Company

WPX Energy, Inc. Common Stock We are offering 27,000,000 shares of our common stock. Our common stock is listed on the New York Stock Exchange under the symbol “WPX.” On July 10, 2015, the last reported sale price for our common stock on the New York Stock Exchange (the “NYSE”) was 11.22 per share.

3y ago

172 Views

Spray 2020 Corporate Profiles - industry-publications

Custom plastic tubes (mono & multi-layer, ABL and Polyami) Stock and custom plastic, metal, and wood caps and closures Stock and custom fine mist, treatment and lotion pumps Stock and custom droppers Stock and custom rollerballs/roll-ons Stock sampler bottles and vials Stock German Quality cosmetic pencil sharpeners

2y ago

180 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

Operation of Stock Exchange - Williams College

Class Notes Operation of Stock Exchange - 3 - Buying on Margin "Margin" is borrowing money from your broker to buy a stock and using your invest-ment as collateral. Example Buy paying full price Buy stock at 60. Stock price goes to 90. Return (90 - 60)/60 50% Buy on "margin" Buy stock at 60. Borrow 30; you pay 30.

1y ago

138 Views

Architecture Of A Database System

It looks like you're using an ad-blocker