Query Interactions In Database Workloads

3y ago
36 Views
3 Downloads
287.90 KB
6 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Camille Dion
Transcription

Query Interactions in Database WorkloadsMumtaz AhmadUniversity of WaterlooAshraf AboulnagaUniversity of WaterlooABSTRACTDatabase workloads consist of mixes of queries that run concurrently and interact with each other. In this paper, wedemonstrate that query interactions can have a significantimpact on database system performance. Hence, we arguethat it is important to take these interactions into accountwhen characterizing workloads, designing test cases, or developing performance tuning algorithms for database systems. To capture and model query interactions, we proposeusing an experimental approach that is based on samplingthe space of possible interactions and fitting statistical models to the sampled data. We discuss using such an approachfor database testing and tuning, and we present some opportunities and research challenges.1.Shivnath BabuDuke Universityactions, consider the TPC-H benchmark, which is widelyused to test decision support systems. Studying the performance characteristics of individual TPC-H query typesgives us insights into database system performance, but itis not enough to answer questions such as the following: Ifwe run 5 concurrent TPC-H queries, namely, 2 instances ofQ1 with 3 instances of Q13 , then what would be the performance of the system? How would a single instance ofQ1 behave compared to the case of running 5 concurrentinstances of Q1 ? Looking at Q1 alone or Q13 alone doesnot allow us to accurately answer these questions. We needto incorporate interactions among concurrent queries intoour understanding of workload characteristics and databasesystem performance.INTRODUCTIONCharacterizing a database workload requires understanding the impact of this workload on all aspects of the system. Such workload characterization is required, for example, when designing workloads and test cases to test databasesystem features or performance, when tuning a database system in deployment, or when analyzing the performance ofa production system. To this end, many benchmark workloads exist that consist of queries and transactions that tryto stress different aspects of the database system [15].The typical workload in a database system consists ofmixes of queries of different types running concurrently andinteracting with each other. The interaction among queriescan have a significant impact on their performance, and thisimpact can be positive or negative. For example, a queryQ1 can bring data into the buffer pool that is then used bya concurrently running query Q2 . Alternatively, Q1 and Q2could interfere with each other on hardware resources suchas CPU or memory, or on internal database system resourcessuch as latches or locks. In this paper, we demonstrate theimpact of query interactions and discuss their implicationson database testing and performance tuning.To illustrate the need for reasoning about query inter-Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.DBTest’09, June 29, 2009, Providence, Rhode Island, USA.Copyright 2009 ACM 978-1-60558-551-2/09/06 . 5.00.Figure 1: Workload completion time for differentarrival orders.As an example of the effect of query interactions on performance, consider the two workloads shown in Figure 1. Boththese workloads consist of exactly the same 60 instances ofTPC-H queries running on a 10GB database on DB2 (detailsof our experimental setup are presented in the next section).The database physical design and the tuning parameters ofDB2 are exactly the same for both workloads. The onlydifference between the two workloads is the arrival orderof the queries, which results in different query mixes beingexecuted by the system. This simple change results in thecompletion time changing from 3.3 hours to 5.4 hours. InWorkload 1, queries that compete for resources get executedconcurrently, resulting in negative interactions. In Workload 2, queries that help each other get executed together,resulting in positive interactions. The 2.1 hour differencein performance is completely attributable to different query

Query TypeRun Time tj (sec)Q110.07Q75.76Q99.66Q136.12Q187.12Q217.3Table 1: Run time, tj , of different TPC-H query types on a 1GB database.Query TypeRun Time tj 570.37Table 2: Run time, tj , of different TPC-H query types on a 10GB database.interactions in the different runs. We show later that queryinteractions affect not only end-to-end workload completiontime but also resource consumption. Therefore, we arguethat it is important to consider query interactions whenanswering questions about different aspects of a databasesystem such as testing, tuning, capacity planning, and performance prediction.Surprisingly, very little work in the database literaturedeals with studying query interactions in the general sense.There are specific research works like work on multi-queryoptimization (e.g., [12]), and work on sharing scans in thebuffer pool (e.g., [10]). Some works deal with transactionmixes but define a transaction mix as the set of transactions observed during a monitoring interval, without considering the concurrent execution of these transactions and theinteractions that this concurrent execution induces. Suchinteraction-oblivious mix models have been used for performance prediction, capacity planning, and anomaly detection [14, 18]. In this paper we show that the interactionsignored by these works can have a significant impact onperformance.Our model of a database workload is that it consists ofa sequence of query mixes. Some previous work models aworkload as a sequence of queries, but ignores the effect ofconcurrent query execution [1]. To study the effect of queryinteractions on an entire workload, we start by studyingthe query interactions within individual query mixes. Weassume that the query types that appear in the mixes areknown a-priori. A query mix consists of a number of different instances of each query type, where different instancesof a query type may have different parameter values. Itis relatively simple to determine the query types a-prioriif the query workload is generated by a fixed set of applications (e.g., report generation applications in a businessintelligence setting). Having a fixed set of applications is acommon mode of operation for many database systems, sothe discussion in this paper is widely applicable. However,there are cases where a query workload is purely ad-hoc. Inthese cases, an extra step is required to classify the ad-hocqueries into a fixed set of query types. This extra step is beyond the scope of this paper and is an interesting directionfor future work.In this paper, we present some results from an experimental study of the impact of query interactions on performance and resource consumption. We present examples ofinteractions and their effect in Section 2, and we discuss theimplications of these interactions in Section 3.2.EXAMPLES OF QUERY INTERACTIONTo illustrate the effect of query interactions in differentquery mixes on run time and resource consumption, we conducted an experimental study using queries from the TPC-Hbenchmark with two database sizes, 1GB and 10GB. Thedatabase system we use is DB2 version 8.1, and we ranour experiments on machines with dual 3.4GHz Intel XeonCPUs and 4.0GB of RAM running Windows Server 2003.The buffer pool size of the database was set to 400MB forthe 1GB database, and 2.4GB for the 10GB database. Weused the DB2 Design Advisor to recommend a set of indexesfor our workload, and we ran the DB2 Configuration Advisorto ensure that the configuration parameters are well tuned.Let Q1 , Q2 , . . . , Q22 be the 22 TPC-H query types. Weuse query mixes consisting of different numbers of instancesof different TPC-H query types, where the instances havedifferent parameter values as required by the TPC-H specification (the queries are generated using the TPC-H QGENprogram). Table 1 shows the run time of the 6 longest running TPC-H queries on a 1GB database when they run alonein the system, which we denote by tj . Table 2 shows the runtime of these queries on a 10GB database. Each run timerepresents the average run time of 10 instances of the particular query type. There is little variance in run time for aspecific query type since TPC-H uses uniform distributionsfor data and query parameters. We also experimented withskewed (Zipfian) data distributions and observed significanteffects of query interactions in this setting. Our approach todealing with skewed data distributions is to sub-divide eachquery type into sub-types according to the range of parameter values to minimize the variance in run time within aquery type. We omit a detailed discussion of skewed distributions from this paper.The multi-programming level (MPL) of the database isM . The MPL represents the number of queries that executeconcurrently in the system at any time. A set of queriesthat execute concurrently in the system is referred to as aquery mix. Query mix mi can be represented as a vectorhNi1 , Ni2 , . . . , NiT i, where Nij is the number of instances ofPquery type Qj in mi , and Tj 1 Nij M . We denote theaverage run time of queries of type Qj in mix mi by Aij .We start with a simple example demonstrating the impactof interactions in a query mix on the completion time of agiven query type in this mix. Table 3 shows three mixesconsisting of the 6 long-running query types on the 10GBdatabase. The high variability in Aij illustrates the effectof query interactions. Consider the average run time of Q1and Q7 in the first two mixes. Both mixes have M 5,and both have one instance each of Q1 and Q7 , but there isan increase in Aij in m2 for all query types. In particular,the run time of Q7 is more than twice its time in m1 . Onemay be tempted to think that this is just because of thecharacteristics of Q13 which was introduced in m2 . Thenext mix m3 shows that this is not true. In this mix, bothQ1 and Q7 actually improve their performance from m2 ,even when we increase the number of instances of Q13 . Theeffect of query interactions in a 1GB database can be seenin Table 4. Consider the average run time of Q21 in the

ij013Q13Aij0.0432.12311.53Q18Nij Aij00.000.000.0Q21Nij Aij00.000.000.0Table 3: Aij for different query types in query mixes on a 10GB Nij43Q18Aij111.87114.21Nij11Q21Aij55.38159.95Table 4: Aij for different query types in query mixes on a 1GB 0.00.00.0Q13Nij j2111Q21Nij Aij00.000.000.000.0Table 5: Aij for different query types in query mixes on a 10GB database.two mixes m4 and m5 . Both these mixes have M 20, yetAij for Q21 in m5 is almost three times that for m4 . Theperformance of these mixes and all other mixes used in thispaper is repeatable and consistent across different runs ofthe experiment.Next, we present interesting cases of “positive interactions.” Table 5 shows some query mixes for this setting.Mix m6 in this table presents an example of positive interaction for Q7 . The average run time of Q7 in this mix, Aij ,is 72.7 seconds, while the run time of Q7 when it is run alonein the system is 102.06 seconds (Table 2). Thus, Q7 benefitsfrom being run in this mix, taking less time on average thanif it were run alone. Mix m7 presents another example ofpositive interaction, this time for Q18 . The average run timeof Q18 in this mix is 539.3 seconds, compared to a run timeof 554.56 seconds when it is run alone. Thus, Q18 benefitsfrom being run with 4 instances of Q1 . Mixes m7a and m7bshow repeated runs of mix m7 with different instances ofthe same query types. The results for all variants of m7 aresimilar, illustrating the repeatability of our results. Theabove examples show that query interactions can be negative(where Aij tj ) or positive (where Aij tj ). Interestingly,mix m6 exhibits both positive and negative interactions: Q7benefits from running in this mix, but the performance ofthe other three query types is severely degraded.Next, we demonstrate that query interactions can be fairlycomplex, with small changes in the query mix sometimeshaving a huge impact on performance that may be very difficult to predict. In Table 6, we focus on three-way interactions for mixes with one instance of Q21 on the 10GBdatabase. In all examples we have M 5. The completion time first increases with the introduction of an instanceof Q9 , then it decreases and increases alternatively as wekeep increasing the number of instance of Q9 and decreasing the number of instances of Q13 . The same behavior forthree-way interaction can be seen on the 1GB database inFigure 2. Here we fix Nij 3 for Q21 and vary the number of instances of Q7 and Q9 such that M is always fixedto be 30. Once again we can see that Aij for Q21 variessignificantly with no easily predictable pattern.Figure 2: Three-way interaction: effect on Q21 ofdifferent mixes.mixm8m9m10m11m12Q9 (Nij )01244Q13 (Nij )43210Q21 (Nij )11111Q21 (Aij )4188.205463.803476.13581.72782.4Table 6: Aij for Q21 .These examples demonstrate that we cannot accuratelypredict the performance of queries unless we are able tomodel the effect of other queries running concurrently withthem in the query mix. Focusing on individual query typesand ignoring interactions can lead to inaccurate conclusionsabout performance. Thus, it is important to develop mixbased characterization of query workloads to better understand the performance of database systems.Next, we present experiments measuring resource consumption in different query mixes. Here again we will seethat traditional approaches that profile the resource consumption of individual queries and workloads while ignoringinteractions may not be useful. When queries run concurrently, resource utilization and performance bottlenecks can

mixm13m14m15m16m17m18Q13 (Nij )012345Q9 (Nij )543210Q13 (Aij )0356422.59388.74289.1224.42Q9 (Aij )9191547.162079.722508.333762.550CPU Utilization24.1127.4319.684.9753.14586.055Sec / Disk Transfer0.02530.0250.02280.00980.0260.01655Table 7: Resource consumption for different mixes of Q13 and Q9 on a 10GB database.mixm19m20m21m22m23m24Q13 (Nij )012345Q21 (Nij )543210Q13 (Aij )0372.05436.62322.71206.88224.42Q21 (Aij )1300.7352196.812283.412576.064188.20CPU Utilization5.7859.314.3830.6859.3686.055Sec / Disk Transfer0.00720.01140.01280.01770.02150.01655Table 8: Resource consumption for different mixes of Q13 and Q21 on a 10GB database.change considerably from one mix to another. Tables 7 and 8show the resource consumption of different mixes with twoway query interaction and M 5 on the 10GB database.The tables report average CPU utilization (in %) and average seconds per disk transfer for the different mixes. Secondsper disk transfer is a direct measure of disk response timeincluding the queueing time (so it captures the effect of varying load). In both tables we run Q13 with one other querytype and observe the resource consumption of the mixes.As expected, resource consumption varies as we vary thequery mix. However, what is interesting is that even whenwe have only two query types, just replacing an instanceof one query type with an instance of the other can significantly change resource consumption, further demonstratingthe significance of query interactions. Consider mixes m13 tom18 which all consist of instances of query types Q13 and Q9 .In m16 , the CPU utilization and disk transfer time are significantly lower than m15 and m17 . The CPU utilization form18 is considerably higher than m17 . All these changes arethe result of changing just one query instance from one mixto the next. The pattern of resource consumption is complexand rapidly changing due to the nature of query interactions.In many cases the bottleneck resource is neither CPU nordisk, but some other resource not being monitored such aslocks, memory, database or operating system latches, etc.It is clear from these tables that considering query mixesis important for answering questions not only about queryrun time but also about resource consumption. Anotherinteresting observation from these tables is that there is little correlation between resource consumption and query runtime. This is further illustrated by Figures 3 and 4. Thefigures plot CPU utilization and seconds per disk transferin different query mixes (on the 10GB database) againstthe average completion time of Q9 in these mixes. There isno clear correlation between mix resource consumption andquery completion time.3.IMPLICATIONS OF QUERY INTERACTIONAfter presenting examples of query interactions and demonstrating the significant impact that they have on performance and resource consumption, we now ask the question:How do these interactions affect database testing and tuning? In this section, we present some implications of queryFigure 3: CPU utilization vs. Q9 completion time.Figure 4: Disk performance vs. Q9 completion time.interactions on database testing, and some challenges thatarise due to these interactions.Sampling and Test Case Design: An important consequence of query interactions is that the design of test casesand workloads for testing and evaluating database systemsshould take these interactions into account. For example,consider the two workloads in Figure 1. These workloadsuse the same queries, and the only difference between themis in query interactions. If database testing is not interaction aware, two test runs may use these queries in the twoorders shown in Figure 1, resulting in a 2.1 hour differencein run time that is not due to any feature of the databasesystem being tested, but rather due to query interactions.To make database testing and tuning interaction aware,

we need to view the problem of designing test cases andtest workloads as a problem of sampling from the space ofpossible query interactions. An important challenge is howto design the test workloads to maximize coverage of thespace of possible query interactions while minimizing thesampling budget. Principled sampling approaches such asLatin Hypercube Sampling (LHS) [8] can help address thischallenge, and it is important for database testers to employsuch approaches and design interaction-aware test cases.In addition to the “active sampling” approach proposedabove, it may be possible to obtain useful information aboutquery interactions through “passive sampling” from the workloads in a production system. If we (passively) monitorthe execution of production workloads, we could determinewhich query mixes were actually encountered in these workloads, how long each mix ran, what effect each mix had onresource consumption, etc. This passive sampling cannotguarantee the same comprehensive coverage of the space ofpossible query mixes as active sampling, since it is restrictedto mixes that have actually been observed and does not provide information about potential mixes that have not beenseen yet. However, passive sampling can provide a databasetester, DBA, or automatic tuning tool with a compact anduseful characterization of the workload on the system.Performance Modeling: Sampling the space of possiblequery interactions is a first step towards understanding theeffect of these interactions on performance. To build performance models that reflect the effect of query interactions, weadvocate the use of “black-box” modeling techniques that fitstatist

Characterizing a database workload requires understand-ing the impact of this workload on all aspects of the sys-tem. Such workload characterization is required, for exam-ple, when designing workloads and test cases to test database system features or performance, when tuning a database sys-tem in deployment, or when analyzing the performance of

Related Documents:

Why should you Query? Centers for Medicare and Medicaid Services supports the use of query forms as a supplement to the health care record. “Use of the physician query form is permissible to the extent it provides clarification and is consistent with other medical record documentation.” 3 File Size: 254KBPage Count: 26Explore furtherPhysician Query Examples Journal Of AHIMAjournal.ahima.org2019 update: Guidelines for achieving a compliant query .acdis.orgGuidelines for Achieving a Compliant Query Practice (2019 .bok.ahima.orgThe Physician Query Process Compliance Issuesassets.hcca-info.orgThe Physician Query: What Every Coder Wants You To Knowcapturebilling.comRecommended to you b

Tags:css media query, css media query examples, css media query for ipad, css media query for mobile, css media query value defined in the query. max-width Rules applied for any browser width below the value defined in the query. min-height Rules applied for any browser height over the value defined in the query. max-height Rules applied for any

for cloud query engines. The goal of Peregrine is to: (i) make it easy to analyze query workloads and build workload optimization features (or apps) using them, (ii) define the space of workloads optimizations that is relevant to typical production workloads, and (iii) provide design patterns to add workload awareness to the cloud query engines.

A graph query language is a query language designed for a graph database. When a graph database is implemented on top of a relational database, queries in the graph query language are translated into relational SQL queries [1]. Some graph query operations can be efficiently implemented by translating the graph query into a single SQL statement.

PeopleSoft Query Welcome to PeopleSoft Query! This versatile tool is simple to use and will allow Query Developers to create Queries in an effective and efficient manner. Introduction to PeopleSoft Query eopleSoft Query or PS Query is an end

Records for holidays, weather, eyeballs. Forecast is done one week ahead. Measure SMAPE: Query Previous Model Described Neural Network Query #1 10.60 13.05 Query #2 23.23 22.60 Query #3 48.57 18.23 Query #4 47.41 26.35 Query #5 19.40 16.87 Query #6 19.25 22.65 .

This system calls, Advanced SQL Query To Flink Translator This proposed system receives Ad-vanced SQL Query from the user then generate Flink Code for exe-cuting this Query. Finally, it returns the results of Query to the user. General Terms: SQL Query, Apache Flink Keywords Big data, Flink, SQL Translator, Hadoop, Hive, Advanced SQL Query 1 .

4.2 Retrieving Data with Queries The most basic SQL statement is a query. A query is a SQL statement that uses the SELECT command to retrieve information from a database. A query's result is the set of columns and rows that the query requests from a database server. For example, the following query retrieves all rows and