Proc SQL – A Primer For SAS Programmers

2y ago
14 Views
2 Downloads
315.99 KB
10 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Sabrina Baez
Transcription

Proc SQL – A Primer for SAS ProgrammersJimmy DeFoorCiti CardIrving, Texasor her SAS code; 2) understanding Proc SQL canaid the programmer in understanding other DB2,T-SQL, PL-SQL and other SQL code.The Structured Query Language (SQL) has a verydifferent syntax and, often, a very different methodof creating the desired results than the SAS DataStep and the SAS procedures. Only a verythorough manual, such as the SAS Guide to theProc SQL Procedure, could even begin todescribe well the complete syntax and the fullcapabilities of Proc SQL. Still, there is value inpresenting some of the simpler capabilities of ProcSQL, especially those that are more efficient oreasier to code than the SAS Data Step. Thereasons: 1) the tables created by Proc SQL can beread by the SAS Data Step or SAS procedures, sothe SAS programmer can choose to use onlysome SQL code without impacting the rest of hisThis paper will assume that the reader is acapable SAS programmer, but is fairly uninformedabout Proc SQL. It will use familiar codingtechniques in the SAS Data Step and SASprocedures to teach the syntax and function ofProc SQL as it shows some of the uniquecapabilities of the Proc SQL.The first example involves creating a simple outputlisting in Proc SQL vs Proc Print.Creating an Output Listing with Proc SQL vs a Data StepFilename out ‘C’:\temp.txt’ new;/* assign output to a file */Proc Printo print out;Run;/* print contents of variables */Proc SQL;Select montonic() as obs,a.State, a.City as Town,a.Store, a.Year,a.Month, a.Sales,a.VarCost, a.FixedCostfrom Datamart.Measures a;quit;Filename out ‘C’:\temp.txt’ new;/* assign output to a file */Proc Printo print out;Run;/* print contents of variables */Proc Print data Datamart.Measures;Label City ‘Town’;Var State City StoreYear Month SalesVarCost FixedCost;Run;Compare the syntax of the Proc Print to the syntaxof Proc SQL. Commas separate the variableslisted in the Select statement of Proc SQL. TheQuit statement is the terminator of Proc SQL, notthe Run statement. The semi-colon does not endevery instruction, as it does in usual SAS code;instead, it ends only the SQL Select statement,which is really the only statement in SQLSince Proc SQL is a procedure, it requires a ProcPrintto to be used before it is executed if the listingoutput is to be directed to a text file instead of thedefault list file SAS uses. Proc Print has the samerequirementProc Print generates observation numbersautomatically; Proc SQL doesn’t. Only by addingthe little documented function of monotonic canobservation numbers be generated. Notice the‘as’ reference that follows the function. An ‘as’reference uses the name that follows to name theresult of the function. It can also be used torename variables: here, it is used to rename Cityto Town. A Label statement in the Proc Printexample accomplishes the same result.When SAS encounters the semi-colon that endsthe Select statement, it passes all of the precedingSQL code to the SQL compiler. The compilerseparates the Select statement into individual SQLclauses as it encounters such key words as From,Where, and On. It then evaluates the clauses andpasses them to the execution module if they are84

to be manipulated by Proc SQL. This will beshown in a later example when a data set iscreated by joining two data sets.syntactically correct. Thereafter, SQL processesall of the clauses and waits for another Selectstatement unless it encounters a Quit statement oranother SAS step. Thus, Proc SQL can createmultiple tables and/or listings until it is closed.Our next example shows the simplest way ofcreating a SAS data set: reading into it all thevariables and records of another SAS data set.Notice that the Select * statement is used tospecify that all variables are to be retrieved fromthe Measures data set. The SAS Data Stepretrieves all variables by default. Only thepresence of a Keep or Drop statement as a DataSet option prevents all variables in the input dataset from being read into the SAS Data Step. SQLeither retrieves all variables or only the specifiedvariables. It does not explicitly drop variables.The variables listed in Proc SQL can be definedrelative to the data set in which they occur. Theprogrammer can create an alias by following thetable name with a shorter name (‘a’, in theexamples below). He or She can then attach thealias to the front of the variable name with aperiod.This ability to reference variables by their datasets allows multiple variables with the same nameCreating a Table with Proc SQL compared to Creating It with a Data Step, Example 1Proc SQL;Create Table Work1 asSelect *from Datamart.Measures a;quit;Data Work1;Set Datamart.Measures;Output;Run;The syntax of the SAS Data Step states that itcreates a data set, while the syntax of Proc SQLstates that it creates a table, but SAS treats themas equals. The SAS Data Step can read a ProcSQL table and Proc SQL can read a SAS DataSet. Both entities are called tables by SAS whenviewed in its explorer window (when details istuned on).Format statements set order in a Data Step andthey must be placed before the Set statement ifthey are going to determine the order of allvariables in the data set. The problem is,however, that they require some knowledge of thevariables’ content and existing formats if they areto be used effectively. The SQL Select statementdoes not require such knowledge.The example that follows has the Select statementretrieving particular variables from the Measuresdata set and writing them to the table Sales. Asdiscussed earlier, the Data Step must accomplishthe same result with a keep action - in this case,the Keep data set option. However, the Selectstatement has another result that cannot be easilyduplicated in the SAS data step. It orders thevariables as it retrieves them. Keep, in any form,does not specify order. Only Length statements orOn the other hand, though the Data Step cannotmatch the ease of variable ordering that can bedone in Proc SQL, it can perform multiple recordoutputs to the same data set or record outputs tomultiple data sets with great ease. For example,the data statement below could easily be modifiedto write part of the record to Sales and part to adata set named CityList.Data Sales(keep Sales) CityList(keep City);Creating a Table with Proc SQL compared to Creating It with a Data Step, Example 2Data Sales;Proc SQL;Length State 12 City 20Create Table Sales asStore 06 Year 04SelectMonth 02 Sales 8;a.State, a.City,Set Datamart.Measuresa.Store, a.Year,(keep State City Storea.Month, a.SalesYear Month Sales);fromOutput;work.Datamart.Measures a;Run;quit;85

The Match-Merge Syntax of Proc SQL Compared to the Data StepProc SQL;Create Table Joined asSelecta.City, b.State,b.Store, b.Year,b.Month, b.Sales,b.VarCost, b.FixedCostfromwork.ParticularCities aleft outer joinDatamart.Measures bOn a.City b.City;quit;Data Joined;Merge Particular.Cities(in c keep City)Datamart.Measures(in mkeep State Store YearMonth SalesVarCost FixedCost)By city;If c thenoutput;run;These SAS code match-merges a list of selectedcities with the Measures data set and keeps onlythe cities in Measures that match the cities inParticularCities. The SQL code does the samething, but the action is called an equijoin instead ofa match-merge. This purpose of both sets of codeis to create a data set that has sales and costvalues for only particular cities. The same resultcould have been achieved in each example byusing a Where statement that named each city, butthat would have required more coding and,potentially, more errors.in order that the user can know if sales and costvalues were not found for a particular city.The SQL code does this because of the left outerjoin clause. The Data Step does it because the Ifstatement will output a record only if city in thatrecord was also on Particular.Cities. It use thespecial variable, C, that has a value of 1 only whenthe value of City comes from that data set.Below are examples of the coding needed toobtain particular results from a match-merge or anequijoin. The desired result is shown under theheading of function. Notice that the Match-Mergecode references the special variable of M whenrecords from Measures are being output.Notice, however, that the SQL code and Data Stepdo more than just keep the records in Measuresthat match the cities from ParticularCities. Theyalso keep the cities that aren’t found in MeasuresComparison of SQL Equijoins to Data Step Match-MergesFunctionKeep all values of themerging variable from thefirst file and matchingrecords from the second.Keep all values of themerging variable fromthe second file andmatching records from thefirst.Keep only the values ofthe merging variablethat are on both files.Keep all values of themerging variable fromboth files regardless ofwhether they match.EquijoinLeft outer joinMatch-MergeIf C then output;Right outer joinIf M then output;Inner joinIf C and M then output;Full JoinIf C or M then output;86

Indexes can speed selection of particular recordsby a factor of 10 to 20, but they require additionalprocessing time to create and additional storagespace to store. In addition, each record retrievedvia the index takes more processing time thanretrieving the same record directly from the dataset. Thus, indexes improve the overall speed ofaccessing data only when they can be used toavoid reading the majority of the records in thedata set. Creating and using an index is,therefore, only warranted if normal process is topull only subsets of the data set and not the fulldata set.Based upon the coding above, Proc SQL and theSAS Data Step join records on a matchingvariable with almost equal ease of coding. Theyalso join such records with almost equal efficiency,as long as the right outer data set is not indexed.Index the right data set, however, and Proc SQLexecutes an inner join with far more speed thandoes the SAS Data Step. The reason: the matchmerge of the SAS Data Step will not use the indexto retrieve the records on the right (second) dataset while Proc SQL will.Creating an Index is SAS is easy to code. Thereare several methods. Below is the Proc Datasetsapproach.Because a data set has been indexed, however,the programmer doesn’t have to use Proc SQL inorder to use the index. Besides coding Wherestatements with the values desired of the indexedvariable, the programmer can use the Setstatement with the Key option to pull records thatmatch a list of values. The code below pulls citiesfrom the Datamart Measures data set that matchthe values in Particular.Cities.Proc Datasets library Datamart;Modify Measures;Index Create City;Quit;No matter the method of creation, the index iscreated in the same way. A data set is built thathas the locations of each record matching aparticular value of the indexed variable. That dataset is accessed first whenever the indexedvariable is used to retrieve particular values of theindexed variable. Where statements that selectparticular values of the indexed variable will likelyuse the index, as will sorts on the indexedvariable, as will SQL joins. In most cases, the SASprogrammer needs do nothing special to causeSAS to use the index. If the data set is indexed onthat variable, SAS will evaluate whether the indexwill save processing time and use it if it will. Tomake certain, however, that SAS even considersusing the index in Proc SQL, however, the indexeddata set must be placed last in the join.Data Subset;Set Work.ParticularCities;Set Datamart.Measures key City;If iorc 0 thenFound ‘Yes’;ElseDo;error 0;Found ‘No’;End;Output;Run;But this approach requires special handling usingthe automatic IORC variable when the valuepassed to the index is not found. Proc SQL makesno such demand on the programmer.Select *fromwork.ParticularCities ainner joinDatamart.Measures bOn a.City b.City;Finally, it should be no

The Match-Merge Syntax of Proc SQL Compared to the Data Step Proc SQL; Create Table Joined as Select a.City, b.State, b.Store, b.Year, b.Month, b.Sales, b.VarCost, b.FixedCost from work.ParticularCities a left outer join Datamart.Measures b On a.City b.City; quit; Data Joined; Merge Particular.Cities (in c keep City) Datamart.MeasuresFile Size: 315KBPage Count: 10

Related Documents:

2. proc sql statement 1 ; 3. proc sql statement 2 ; 4. quit; /* required */ Lines 2, 3: all of the examples in the e-Guide are expected to be embedded between lines 1 and 4. SAS accepts one or more PROC SQL statements within each PROC SQL block of code. Eight common benefits for using PROC SQL:

USING PROC SQL As stated above, PROC SQL is SAS's implementation of the Structured Query Language. This procedure takes on the following general form: PROC SQL; sql-statement; QUIT; The sql-statement referenced above can be any SQL clause (e.g., ALTER, CREATE, DELETE, DESCRIBE, DROP, INSERT, SELECT, UPDATE, or VALIDATE).

proc gplot, proc sgplot, proc sgscatter, proc sgpanel, . In SAS/Graph: proc gcontour, proc gchart, proc g3d, proc gmap, Stat 342 Notes. Week 12 Page 26 / 58. KDE stands for Kernel Density Estimation. It's used to make a smooth estimation of the probability density of a distribution from the points in a data set.

Agenda PROC SQL VS. DATA STEP PROCESSING Comparison of DATA Step and PROC SQL capabilities Joining SAS data using the DATA Step and PROC SQL Data Step and SQL Output Additional Comparisons Additional Resources SQL Data Step VS

Latin Primer 1: Teacher's Edition Latin Primer 1: Flashcard Set Latin Primer 1: Audio Guide CD Latin Primer: Book 2, Martha Wilson (coming soon) Latin Primer 2: Student Edition Latin Primer 2: Teacher's Edition Latin Primer 2: Flashcard Set Latin Primer 2: Audio Guide CD Latin Primer: Book 3, Martha Wilson (coming soon) Latin Primer 3 .

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

SQL Server supports ANSI SQL, which is the standard SQL (Structured Query Language) language. However, SQL Server comes with its own implementation of the SQL language, T-SQL (Transact- SQL). T-SQL is a Microsoft propriety Language known as Transact-SQL. It provides further capab

MS SQL Server: MS SQL Server 2017, MS SQL Server 2016, MS SQL Server 2014, MS SQL Server 2012, MS SQL Server 2008 R2, 2008, 2008 (64 bit), 2008 Express, MS SQL Server 2005, 2005 (64 bit), 2005 Express, MS SQL Server 2000, 2000 (64 bit), 7.0 and mixed formats. To install the software, follow the steps: 1. Double-click Stellar Repair for MS SQL.exe.