Solving Common PROC SQL Performance Killers When

2y ago

7 Views

2 Downloads

370.32 KB

11 Pages

Last View : Today

Last Download : 3m ago

Upload by : Francisco Tran

Report this link

Download PDF

Transcription

MWSUG 2016 - Paper BB16Solving Common PROC SQL Performance Killers when using ODBCJohn Schmitz, Luminare Data LLC, Omaha, NEABSTRACTThe PROC SQL routine is commonly used by many SAS programmers. However, poorly formedqueries can create needless performance killers. This paper reviews common SQL coding practicesobserved in code reviews and provides simple alternatives that can dramatically reduce query run timeand required resources. This paper will focus on techniques that should be comfortable to SQLprogrammers and yet can shave 75 percent or more off of query run times.INTRODUCTIONMuch has been and could be written on the topic of PROC SQL optimization. Many of the worst impactsoccur when developers write code without considering or understanding the basic elements of efficientPROC SQL coding. Sometimes, it may be the rush to complete the urgent request. Others may becopying and paste of code from another program that may not be optimal for this scenario. Regardless ofthe cause, the result can generate serious degradation of job performance and, left unchecked, lead tooverall system performance issues.This paper draws from more than 20 years' experience reviewing and optimizing SAS code. Thesesimple changes can dramatically impact query run time and require minimal effort to implement. A fewassumptions made within this document: Examples will be for PROC SQL. Some of these scenarios apply to DATA steps as well, but theDATA step equivalent will not be discussed here. The examples assume source tables are being retrieved from a remote database (RDBMS),typically using SAS ACCESS/ODBC or related access products. The specific access productshould matter little, but it is important to note that many of the scenarios do not apply if the coresource tables are SAS data sets. The scenarios presented are working and generally acceptable SQL code. The idea is tohighlight elements specific to the SAS/SQL query connection running through PROC SQL. The system architecture has adequate resources to support properly configured processes. Wewill briefly talk about architecture, but it is not the focal point of this discussion.Before diving too far into specifics of code, it seems appropriate to offer some high level systemperspectives in which to frame the conversation. Clearly, one paper cannot cover every scenario.Rather, the idea is to provide a framework through examples so that developers themselves can identifywhat may be impacting their specific query and how they may best address the performance concerns.KNOW YOUR STUFF (ARCHITECTURE AND SQL PASS-THROUGH OVERVIEW)There are many variations in system architecture and each can impact query performance, especiallywhen specific resources are strained. This section provides a very generalized overview of systemarchitecture to establish a framework to discuss how query style impacts the overall process. Use theseas rules of thumb; actual results may vary by system.Figure 1 provides a simplistic view of a typical system architectural environment for SAS client/server thatworks in conjunction with remote database and remote file server components. This diagram depicts 4primary hardware components: A SAS application server (where SAS code executes). RDBMS servers where SAS would access data via SAS/ACCESS. A file server where SAS may access network shared drives.1

An external drive array that provides the bulk of disc storage for all servers.Figure 1 Typical SAS Architecture EnvironmentIn this environment, servers communicate with their attached drives via drive channel connections.However, servers communicate with each other via network channel connections.In general, the primary resources to consider include: Available physical memory (RAM) Virtual memory utilization and the associated page swap activity. CPU load. Drive I/O performance. Network (Ethernet) performance.It is important to distinguish between physical and virtual memory. Virtual memory utilizes drive space tobuffer memory when physical memory is inadequate for system needs. Excessive page swaps on virtualmemory can be an indicator that physical memory limits are having a notable impact on performance.The key element to recognize in this is the flow of data and information through the network. The fastestdata activity occurs in physical memory. If the system hardware has adequate memory, there are simplesteps that can be leveraged to utilize in memory processing. Drive channels are slower than memory butthey are relied upon heavily within many environments. Network channels are typically much slower thandrive channel. I have worked in environments where drive channels provided 50 to 100 times fastertransfer rates than network connections. The goal in that environment clearly becomes one of reducingthe amount of data moved over network connections. Remember, network connections include access toboth the RDBMS systems and the remote file servers.The system used for query processing is determined through SQL pass-through. In SQL pass-through,PROC SQL will send the query or subsets of it to the source RDBMS for execution. There are two typesof pass-through to consider, EXPLICIT and IMPLICIT. Explicit pass-through occurs when the developerintentionally adds connection logic and includes native code for the source data system while implicitpass-through is done by SAS inside PROC SQL. In implicit pass-through, SAS generates code based onall or a portion of the SQL query and sends that code to the source system to process. In most cases,this results in faster query execution since it leverages the processing capacities of the source system2

and normally reduces the data traffic over the network connection. For a more detailed discussion onpass-through logic, refer to the paper by Johnson in from last year’s proceedings.As a final note before wrapping up this section, most client/server systems resources are activelymonitored and would rarely be the cause of consistent performance degradation. However, ifperformance issues arise sporadically, these resources may be reaching limits and restricting the query.Performance issues may arise from capacity limits on any resource on any hardware system involved.When investigating such potential issues, it is important to consider where the query processing occursand the overall process load from all active SAS and non-SAS jobs. Developers facing resource-basedperformance issues should work closely with their SAS Platform Administrators and internal hardwaresupport teams to identify and resolve the causes.“KILLER” SQL QUERIES (AS IN THEY KILL PERFORMANCE)With this framework in mind, it is time to turn attention to common query coding practices that can beperformance killers in an RDBMS environment. These have been gleaned from reviewing thousands ofprocess jobs and categorized into a few sample cases.Before moving to examples, here are some coding conventions used below: A libref DBx refers to a library on a remote database using ACCESS/ODBC or similar A libref FSx refers to a library on a remote file server connection. A libref SDx refers to a library on a mounted drive to the SAS application server. Table names are generic LARGE TABLEn, SMALL TABLEn, etc. Table name TARGET is used for output and is assumed to be a SAS data set. Field names are generic, FIELDn for general fields, DATEn for a date field, DTn for a datetimefield, KEYn for a table keys.USING SELECT *This comes up early on many people's list, but why, especially when you often need most of the fieldsanyway? Consider the simple example:proc sql;create table sd1.target asselect *from db1.large tablewhere date '01JAN2016'd;quit;The query selects all fields from the source table and all records beginning this year and writes them to atable named target on the SAS server. The concern over this approach primarily occurs when the fieldsyou don’t need are large character fields, perhaps names, addresses, descriptions, notes, or eve C')then output sd1.target1;if date '01jul2016'd and field1 in ('B','C','D')then output sd1.target2;run;Even in cases where queries are substantially different, code may benefit from completing all dataextracts from a given large table before proceeding to another. Most RMDBS as substantial memory andbuffering space. When a table query completes, some or all of that table may be accessible from buffers.Executing a second query on that table immediately following the first may allow the RDBMS to utilizedata in memory rather than forcing a second pull of data from spinning discs.8

INCLUDING DATA STEP OPTIONS WITH SOURCE TABLESMost SAS DATA step users are familiar with data step options such as DROP , KEEP , WHERE ,RENAME , OBS that can follow the dataset name enclosed in parenthesize. They are valid in mostcases where a dataset is referenced, including in PROC SQL. However, when these are associated withSQL source tables that are being retrieved from a remote database, the implicit pass-through logic isbypassed. Consider the query:proc sql;create table sd1.target asselect *from db1.a large table (drop notes) awhere a.date '01JAN2016';quit;Here, the user is trying to drop the NOTES field rather than entering the desired fields in place of *.Without implicit pass-through, the entire table is transferred to SAS, before the NOTES field is droppedand before the where clause executes. This can in fact make the query slower rather than faster.USING NON-INDEXED FIELDS FOR WHERE CLAUSE AND JOINSSQL databases are optimized to leverage indexes within queries. However, due to maintenancerequirements, it is not practical to index every field a user may wish to include in joins or where clauses.Therefore, the developer should consider the indexes that are available when writing the query. As anexample, suppose a table has two date fields: DATE which records the day when the transactionoccurred and DT which includes the DATE and TIME when the transaction occurred. Only the DATE fieldis indexed. The analyst is interested in transactions that occur between 8:00 AM and 11:00AM on August1, 2015. The query could look something like:proc sql;create table sd1.target asselect b.dt, from db1.LARGE TABLE bwhere b.dt between '01AUG2016:08:00:00'dt and '01AUG2016:11:00:00'dt;quit;To execute this query, the database must read every transaction to check the where clause criteria sincethere is no index. A simple addition of the date field to the where clause allows the database to leveragethe index on date:proc sql;create table sd1.target asselect b.dt, from db1.LARGE TABLE bwhere b.date '01AUG2016'dand b.dt between '01AUG2016:08:00:00'dt and '01AUG2016:11:00:00'dt;quit;A similar scenario can often arise within data warehouse systems. The warehouse as business keys,derived from the original source data, and warehouse keys, generated and maintained during thewarehouse construction. Analyst need to know whether business keys are indexed. In many cases,warehouse systems will be optimized to join on the warehouse keys and may or may not maintainindexing on business keys.RESOLVING LOOKUP TABLES ON THE REMOTE DATABASEImplicit pass-through is not always a good thing. Consider the case of a lookup table. A lookup tabletypically has few rows but may have one or more lengthy character fields. Perhaps the table contains a9

50-byte character description associated with 10 distinct code values. One can download the lookuptable and transfer the description a total of 10 times. If the lookup table is resolved on the remote side,the description is transferred one or more times for each row returned. This could result in downloadingmillions of cases. Consider the query:proc sql;create table sd1.target asselect a.key, a.key2, a.date, b.descriptionfrom db1.large table a,db1.lookup table bwhere a.key2 b.key2;quit;In this simple query, the easiest adjustment may be to turn off the SQL pass-through option and transferprocessing to the SAS application server. That can be done by adding the option NOIPASSTHRU to thePROC SQL statement.Although NOIPASSTHRU may be appropriate in the simple query above, we lose access to all thepotential benefits of implicit pass-through. More advanced queries may benefit from an approach more inline with the approach used for mixed librefs. An improved solution to the original code could be obtainedusing the logic:proc sql;create view t1 vw asselect a.key, a.key2, a.datefrom db1.large table aorder by a.key;create view t2 vw asselect b.key2, b.descriptionfrom db1.lookup table border by b.key2;create table sd1.target asselect a.key, a.key2, a.date, b.descriptionfrom t1 vw a,t2 vw bwhere a.key2 b.key2order by a.key;quit;This reduces the data transferred in description but has now placed sorting into the process. Left withoutsorting, the resulting table will typically be sorted in key2 order which will likely require some efforts tomanage in subsequent processing. This sort can potentially override the value of the data reduction,generating no added benefit.A potentially better solution is to leverage custom formats where the user converts the lookup table to oneor more custom formats and applies the format to the table rather than imposing a join. This is veryflexible and work well for simple lookup tables. Also, hash tables and set statements with key can beused effectively within a DATA step and have the ability to handle larger lookup tables. The specific codeexamples to complete these alternatives are outside the scope of this document, but for those familiarwith the process, they are fairly simple extensions.KNOW WHEN TO SAY WHENThese techniques can dramatically reduce job execution time. I have observed job execution times dropfrom 10-15 hours to under an hour. That offers clear value in most business scenarios. It not only runsquicker, but there is often a corresponding drop in use of limiting resources such as drive channel traffic,10

network traffic, CPU and memory utilization on servers, table lock times on databases, and job schedulerslots. The benefits can be especially valuable when the resources released are reaching maximumcapacity. For instance, if your SAS application server is running near max CPU consistently, codechanges that reduces CPU load like implicit or explicit pass-through can be a huge benefit. If the networkor drive channels are nearing max capacity, processes that reduce data transfers on these channels canbe equally huge.However, not all efforts translate to clear of benefit. While reducing a job from 5 hours to 1 hour may be awin, getting a job that runs in 5 seconds to run in 1 may not be greeted with the same enthusiasm. Thevalue may be situational. While reducing run time from 5 minutes to 1 minute for a daily batch jobs maynot offer much value, the same gain on a query executed multiple times per day on demand may be amajor win, especially if managers trigger the query and are waiting for results to be returned.For maximum benefit, focus effort on jobs that: Take longer to run, Run more frequently, Are more often used in interactive sessions, Tax resources that are heavily strained, Run during peak load periods.And then, know when to say when. There is always more that can be done to improve code efficiency,but not all such efforts justify the effort to find, alter and test the recommended changes.CONCLUSIONThe coding examples discussed here highlight common constructs that can significantly hamper queryand even system performance. The goal is to provide coders with specific examples to help them identifywhen their queries may be needlessly inefficient without getting into complex, system specific topics orthe level of performance tuning that can be accomplished through more detailed platform administration.This should give users some tools to use, but these must be tested in your environment and with yourdata. Every configuration and every case is unique. Take these examples, try them in your environment,and implement what is applicable to your situation.REFERENCESJohnson, Misty. 2015. “Just passing through Or are you? Determine when SQL Pass-Through occursto optimize your queries.” MWSUG 2015 Conference Proceedings, Omaha NE:Available at 5-BB-03.pdfCONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the author at:John SchmitzSr. Manager, Luminare Data LLCjohn.schmitz@luminaredata.comSAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks ofSAS Institute Inc. in the USA and other countries. indicates USA registration.Other brand and product names are trademarks of their respective companies.11

PROC SQL will send the query or subsets of it to the source RDBMS for execution. There are two types of pass-through to consider, EXPLICIT and IMPLICIT. Explicit pass-through occurs when the developer intentionally adds connection logic and includes native code for the source data system while implicit pass-through is done by SAS inside PROC SQL.

Related Documents:

QUICK RESULTS WITH PROC SQL

2. proc sql statement 1 ; 3. proc sql statement 2 ; 4. quit; /* required */ Lines 2, 3: all of the examples in the e-Guide are expected to be embedded between lines 1 and 4. SAS accepts one or more PROC SQL statements within each PROC SQL block of code. Eight common benefits for using PROC SQL:

22 Views

2y ago

DSCI 325: Handout 16 Introduction to PROC SQL - Winona State University

USING PROC SQL As stated above, PROC SQL is SAS's implementation of the Structured Query Language. This procedure takes on the following general form: PROC SQL; sql-statement; QUIT; The sql-statement referenced above can be any SQL clause (e.g., ALTER, CREATE, DELETE, DESCRIBE, DROP, INSERT, SELECT, UPDATE, or VALIDATE).

6 Views

4m ago

Stat 342 - Wk 13: Plotting and final exam prep. - SFU

proc gplot, proc sgplot, proc sgscatter, proc sgpanel, . In SAS/Graph: proc gcontour, proc gchart, proc g3d, proc gmap, Stat 342 Notes. Week 12 Page 26 / 58. KDE stands for Kernel Density Estimation. It's used to make a smooth estimation of the probability density of a distribution from the points in a data set.

17 Views

1y ago

PROC SQL vs. DATA Step Processing - cinsug.org

Agenda PROC SQL VS. DATA STEP PROCESSING Comparison of DATA Step and PROC SQL capabilities Joining SAS data using the DATA Step and PROC SQL Data Step and SQL Output Additional Comparisons Additional Resources SQL Data Step VS

22 Views

2y ago

SQL Server ebook - Guru99

SQL Server supports ANSI SQL, which is the standard SQL (Structured Query Language) language. However, SQL Server comes with its own implementation of the SQL language, T-SQL (Transact- SQL). T-SQL is a Microsoft propriety Language known as Transact-SQL. It provides further capab

65 Views

2y ago

Stellar Repair for MS SQL 9

MS SQL Server: MS SQL Server 2017, MS SQL Server 2016, MS SQL Server 2014, MS SQL Server 2012, MS SQL Server 2008 R2, 2008, 2008 (64 bit), 2008 Express, MS SQL Server 2005, 2005 (64 bit), 2005 Express, MS SQL Server 2000, 2000 (64 bit), 7.0 and mixed formats. To install the software, follow the steps: 1. Double-click Stellar Repair for MS SQL.exe.

62 Views

1y ago

Configuration Guide for SQL Server - Lepide

Server 2005 , SQL Server 2008 , SQL Server 2008 R2 , SQL Server 2012 , SQL Server 2014 , SQL Server 2005 Express Edition , SQL Server 2008 Express SQL Server 2008 R2 Express , SQL Server 2012 Express , SQL Server 2014 Express .NET Framework 4.0, .NET Framework 2.0,

11 Views

3m ago

Guide de la transition - Comprendre et utiliser les Listes ...

aliments contenant un additif alimentaire des dispositions des alinéas a) et d) du paragraphe 4(1) ainsi que du paragraphe 6(1) de la Loi sur les aliments et drogues de même que, s'il y a lieu, des articles B.01.042, B.01.043 et B.16.007 du Règlement sur les aliments et drogues uniquement en ce qui a trait

60 Views

3y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

IN THIS ISSUE CAR WASH INSIGHT Recent, Notable M&A Transactions .

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

9m ago

100 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

ESSENTIAL PLAN - Discovery

Car insurance only Car and home insurance Car insurance only Car and home insurance 12.5% 25% 5% 10% YOUR FUEL CASH BACK PERCENTAGE GET TO THE HIGHEST CASH BACK PERCENTAGE Add at least R250 000 of home insurance (household contents, buildings or both) Take your car to Tiger Wheel & Tyre and pass the Annual MultiPoint check

1y ago

269 Views

CAR INSURANCE EVERYTHING EXPLAINED - RSA Insurance Group

CAR INSURANCE 93013821.indd 1 15/03/2018 10:46. 2 WELCOME TO µ CAR INSURANCE Thank you for choosing µ to protect you and your car. This booklet is intended to help you check your cover and to reassure you that µ will give you the protection you need for the year ahead. First of all, to help you understand your car insurance policy we want to .

1y ago

274 Views

Describe types and purposes of insurance.

D.O. CAPS Consumer Skills: Insurance—10E 3 Your car - The car you drive can also affect your insurance rates. Insurance companies place certain kinds of cars in special risk categories. You should ask your insurance agent before making a car purchase to make sure you aren't getting a car that will cost you extra for your liability insurance.

1y ago

233 Views

Money Online Price Comparison - WordPress

you to compare car insurance quotes. You'll notice at the top of the screen is a warning regarding telling the truth when completing any form of car insurance quote as something withheld, which later becomes known, can void an insurance claim. 7 The process of completing a car insurance price comparison is broken down into 4

1y ago

174 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Contours Options Infant Car Seat Adapter Instruction Sheet

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.

2y ago

564 Views

Microsoft Advertising Travel Update

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

1y ago

337 Views

Design and development of lift for an automatic car parking system

1. Stacker type car parking system 2. Puzzle type car parking system 3. Level type car parking system 4. Chess type car parking system 5. Rotary type car parking system 6. Tower type car parking system But lift is used only in tower type car parking system. Objectives:-

6m ago

172 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Solving Common PROC SQL Performance Killers When

It looks like you're using an ad-blocker