Lessons For The Optimizer From TPC-DS Benchmark - MariaDB

11m ago

5 Views

1 Downloads

655.13 KB

28 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Samir Mcswain

Report this link

Download PDF

Transcription

Lessons for the optimizer from TPC-DS benchmark Sergei Petrunia Query Optimizer developer MariaDB Corporation 2019 MariaDB Developers Unconference New York

The goals 1. Want to evaluate/measure the query optimizer 2. Hard to do, optimizer should handle – Different query patterns – Different data distributions, etc 3. How does one do it anyway? – Look at benchmarks – Or “optimizer part” of the benchmarks

Benchmarks 1. sysbench – Popular – Does only basic queries, few query patterns 2. DBT-3 (aka TPC-H) – 6 tables, 22 analytic queries – Was used to see some optimizer problems – Limited: Uniform data distribution, uncorrelated columns .

TPC-DS benchmark Obsoletes DBT-3 benchmark Richer dataset – 25 Tables, 99 queries – Non-uniform data distributions Uses advanced SQL features – 32 queries use CTE – 27 queries use Window Functions – etc Could not really run it until MariaDB 10.2 (or MySQL 8)

MariaDB still can’t run all of TPC-DS 2 Queries: FULL OUTER JOIN 10 Queries: ROLLUP ORDER BY problem (MDEV-17807) select . group by a,b,c with rollup order by a,b,c ERROR 1221 (HY000): Incorrect usage of CUBE/ROLLUP and ORDER BY 20 more queries have fixable problems – “Every derived table must have an alias”, etc

Oracle MySQL and TPC-DS ROLLUP ORDER BY is supported since 8.0.12 Doesn’t support FULL OUTER JOIN (2 queries) Doesn’t support EXCEPT (1 query) Doesn’t support INTERSECT (3 queries)

Running queries from TPC-DS Data generator creates CSV files – Adjust #define for MySQL/MariaDB Query generator produces “streams” from templates – A set of QueryNNN.tpl files – A stream is a text file with one instance of each of the 99 queries – One can add hooks at query start/end Queries have a few typos There’s no tool to run queries/measure time – Note that the read queries are a subset of benchmark (TpCX )

Getting it to run A collection of scripts at https://github.com/spetrunia/tpcds-run-tool The goal is a fully-automated run – MariaDB, MySQL, PostgreSQL Because we need to play with settings/options

Test runs done The dataset – Scale 1 – 1.2 GB CSV files – 6 GB when loaded The Queries – 10.20 “Streams” Tuning – Innodb buffer pool 8G (50% RAM) – shared buffers 4G (25% RAM)

Test results

Test results .

Test results a bit inconclusive – query times varied across my runs (?) Time to run one stream 20 min – 2 hours Searching for the source of randomness – Started to work on full automation – (did I run ANALYZE? Did I have correct with my.cnf parameters?) Started to look at rngseed in dataset/query generator

MariaDB/MySQL

MariaDB 10.2, 10.4, MySQL 8 Scale 1, 6.1 GB data, 8G buffer pool rngseed 1234 for both Benchmark takes 20 min Query times are very non-uniform ------------- --------------- query name QueryTime ms ------------- --------------- query72.tpl 678,321 query23.tpl 80,025 query2.tpl 65,156 query39.tpl 63,761 query78.tpl 63,473 query4.tpl 27,549 query31.tpl 24,344 query47.tpl 19,156 query11.tpl 17,484 query74.tpl 16,571 query21.tpl 16,212 query59.tpl 10,522 query88.tpl 9,965

Query#72 dominates 800000 700000 600000 500000 400000 300000 200000 100000 0 l l l l l l l l l l l l l l l l l l l l l tp 3.tp 9.tp 3.tp 8.tp 1.tp 5.tp 0.tp 4.tp 8.tp 3.tp 8.tp 1.tp 5.tp 9.tp 3.tp 8.tp 3.tp 9.tp 2.tp 6.tp . 3 9 1 2 2 3 4 4 4 5 5 6 6 6 7 7 8 8 9 y1 1 er ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery u u u u u u u u u u u u u u u u u u u u u q q q q q q q q q q q q q q q q q q q q q

Without Query #72 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 l l l l l l l l l l l l l l l l l l l l l tp 3.tp 9.tp 3.tp 8.tp 1.tp 5.tp 0.tp 4.tp 8.tp 3.tp 8.tp 1.tp 5.tp 9.tp 4.tp 9.tp 4.tp .tp 3.tp 8.tp . y1 1 1 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 y 9 9 9 er ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery ery uer ery ery u q qu qu qu qu qu qu qu qu qu qu qu qu qu qu qu qu qu q qu qu

PostgreSQL 11

PostgreSQL 11 There was a “fast” run Showing results from the last two runs (both where “slow”) – rngseed 5678 for both – 121 min – rngseed 1234 (data), rngseed 4321 (query) – 145.154 min.

Heaviest queries in the run ------------- ----------------- ----------------- -------- query name PG11-seed5678 PG11-seed1234 X ------------- ----------------- ----------------- -------- query4.tpl 3,628,830 3,578,944 1.0139 query11.tpl 2,004,392 2,013,597 0.9954 query1.tpl 87,981 1,947,624 0.0452 query74.tpl 693,784 641,696 1.0812 query47.tpl 624,717 539,941 1.1570 query57.tpl 116,570 112,472 1.0364 query81.tpl 22,089 47,366 0.4663 query6.tpl 27,896 27,009 1.0328 query30.tpl 11,214 11,171 1.0038 query39.tpl 10,803 10,702 1.0094 query95.tpl 16,418 10,065 1.6312 Execution time varies Is this a query optimizer issue? Do we need a “representative collection of datasets”? – Or different constants in a skewed dataset? Check N datasets?

Compare most heavy queries MariaDB ------------- --------------- query name QueryTime ms ------------- --------------- query72.tpl 678,321 query23.tpl 80,025 query2.tpl 65,156 query39.tpl 63,761 query78.tpl 63,473 query4.tpl 27,549 query31.tpl 24,344 query47.tpl 19,156 query11.tpl 17,484 query74.tpl 16,571 query21.tpl 16,212 query59.tpl 10,522 PostgreSQL ------------- ----------------- ----------------- -------- query name PG11-seed5678 PG11-seed1234 X ------------- ----------------- ----------------- -------- query4.tpl 3,628,830 3,578,944 1.0139 query11.tpl 2,004,392 2,013,597 0.9954 query1.tpl 87,981 1,947,624 0.0452 query74.tpl 693,784 641,696 1.0812 query47.tpl 624,717 539,941 1.1570 query57.tpl 116,570 112,472 1.0364 query81.tpl 22,089 47,366 0.4663 query6.tpl 27,896 27,009 1.0328 query30.tpl 11,214 11,171 1.0038 query39.tpl 10,803 10,702 1.0094 query95.tpl 16,418 10,065 1.6312 Some queries are present in both lists, but some are only in one. Not clear

Observations about the benchmark rngseed on the dataset matters A LOT – What is a representative set of rngseed values? rngseed on query streams – much less Hardware? Queries are not equal – Heavy vs lightweight queries – Is SUM(query time) an adequate metric? Wont see that a fast query got 10x slower

Other observations Both DBT-3 and TPC-DS workloads are relevant for the optimizer – Condition selectivities – Semi-join optimizations – But don’t match the optimizer issues we see – ORDER BY LIMIT optimization – Long IN-list –

Extra: parallel query in PG?

Extra – PostgreSQL 11, parallel query? Trying on a run with both rngseed 5678: Parallel settings max parallel workers per gather 8 dynamic shared memory type posix show max worker processes 8 (the default was 2) Results – Only saw one core to be occupied – The run still took 121 min, didin’t see any speedup

Try a parallel query select sum(inv quantity on hand*i current price) from inventory, item where i item sk inv item sk; max parallel workers per gather 0 QUERY PLAN ------------------------------Aggregate (cost 301495.25.301495.26 rows 1 width 32) - Hash Join (cost 1635.00.213408.54 rows 11744894 width 10) Hash Cond: (inventory.inv item sk item.i item sk) - Seq Scan on inventory (cost 0.00.180935.94 rows 11744894 width 8) - Hash (cost 1410.00.1410.00 rows 18000 width 10) - Seq Scan on item (cost 0.00.1410.00 rows 18000 width 10)

Try a parallel query select sum(inv quantity on hand*i current price) from inventory, item where i item sk inv item sk; max parallel workers per gather 8 QUERY PLAN inalize Aggregate (cost 125048.98.125048.99 rows 1 width 32) - Gather (cost 125048.55.125048.96 rows 4 width 32) Workers Planned: 4 - Partial Aggregate (cost 124048.55.124048.56 rows 1 width 32) - Parallel Hash Join (cost 1468.23.102026.87 rows 2936224 width 10) Hash Cond: (inventory.inv item sk item.i item sk) - Parallel Seq Scan on inventory (cost 0.00.92849.24 rows 2936224 width 8) - Parallel Hash (cost 1335.88.1335.88 rows 10588 width 10) - Parallel Seq Scan on item (cost 0.00.1335.88 rows 10588 width 10)

Try a parallel query select sum(inv quantity on hand*i current price) from inventory, item where i item sk inv item sk; Results – max parallel workers per gather 8: 1.0 sec – max parallel workers per gather 0: 3.8 sec Didn’t see anything like that in TPC-DS benchmark

Thanks!

Oracle MySQL and TPC-DS ROLLUP ORDER BY is supported since 8.0.12 Doesn't support FULL OUTER JOIN (2 queries) Doesn't support EXCEPT (1 query) Doesn't support INTERSECT (3 queries) Running queries from TPC-DS Data generator creates CSV files Adjust #define for MySQL/MariaDB Query generator produces "streams" from templates

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

116 Views

9m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

469 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

328 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

125 Views

9m ago

Quick Start Guide - Agilent Technologies

To open Optimizer Open MassHunter Optimizer in either of the following ways: Double-click the Optimizer icon on the desktop, or Select Programs Agilent MassHunter Workstation Optimizer from the Windows Start menu. To open Optimizer for Peptides A version of MassHunter Optimizer is available for optimizing peptides. Open

11 Views

1y ago

SUN2000-450W-P Smart PV Optimizer Quick Guide

The optimizer is faulty. 1. Measure the resistance when the sunlight is sufficient. 2. Connect the optimizer input power cables. 3. Correct the optimizer cable connection. Connect the optimizer input power cables to the output cables of the PV module. 4. If the resistance is still abnormal, replace the optimizer. Frame mounting bracket

22 Views

5m ago

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

376 Views

1y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Lessons For The Optimizer From TPC-DS Benchmark - MariaDB

It looks like you're using an ad-blocker