Exploring HASH Tables Vs. SORT/Data Step Vs. PROC SQL

2y ago

17 Views

2 Downloads

250.98 KB

14 Pages

Last View : Today

Last Download : 3m ago

Upload by : Troy Oden

Report this link

Download PDF

Transcription

PharmaSUG 2016 - Paper TT11Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQLRichann Watson, Experis, Batavia, OhioLynn Mullins, PPD, Cincinnati, OhioABSTRACT There are often times when programmers need to merge multiple SAS data sets to combine data into one singlesource data set. Like many other processes, there are various techniques to accomplish this using SAS software.The most efficient method to use based on varying assumptions will be explored in this paper. We will describe thedifferences, advantages and disadvantages, and display benchmarks of using HASH tables, the SORT and DATAstep procedures, and the SQL procedure.INTRODUCTIONMerging data sets together is a common practice that programmers do in order to combine data sets based on keyfields. There are a number of possible solutions to merge SAS data sets, including the PROC SORT/DATA stepmerge, the PROC SQL join, and HASH table lookups. Some of the determinants on which method to use are thesize of the data sets, resource availability, and the programmers’ experience with the different techniques. Thispaper will discuss these three methods in detail beginning with the syntax for using HASH table lookups including theoptions associated with this method. We will then describe the differences in the complexity, memory type, data setsize, and other attributes between the three methods. Benchmarks will also be discussed using three data set sizes,small, medium, and large. And lastly, the ideal situations will be described for when to use each method.INTRODUCTION TO HASH TABLESHash tables, also referred to as hash objects, is an in-memory lookup table that can only be accessed from within theDATA step that creates it. Thus, once the DATA step ends, the hash table is deleted. A hash table provides anefficient way to search the data.The hash object has two parts to it. The first part is the key. The key can consist of a single variable or multiplevariables that will be used to perform a lookup. The key part can consist of character and/or numeric values. Thesecond part of a hash object is the data part. The data part is the data value(s) associated with the key. The datapart can also consist of character and/or numeric values.SYNTAX AND SOME METHODS OF HASH TABLESThe hash table is defined in a DATA step and is only available during the DATA step. The syntax of a hash objectcan be difficult and can take some time getting used to. Once the hash table is defined, then it can be used to add,find, replace, check, remove, and output data. Below is generic code that shows how a hash table is defined.data null ;define attributes for variables that will be retrieved, i.e., data partif n 1 then do;/* declare name for hash table with ascending sort order */declare hash hashobj(dataset: "lib.indsn", ordered: "a");/* define variables that will be used a key for lookup (key part) */hashobj.definekey ('keyvar1', 'keyvar2', 'keyvar3');/* define variables that will be retrieved (data part) */hashobj.definedata ('datavar1', 'datavar2');/* end definition of hash table */hashobj.definedone();1

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continued/* specify the main table(s) that are going to use the lookup table */set inlibnm.indsn;/* one or more hash methods can be used to add, find, replace, check, etc. */hashobj.check();if hashobj.find() then output;run;Some of the methods that can be used with hash tables along with a description of each and the syntax are in thetable below.MethodDescriptionSyntaxAddAdds the data associated with the key tothe hash tablehashobj.add();hashobj.add(key: keyvar1, , key: keyvarN,data: datavar1, , data: datavarN);Checks to see if key is stored in hashtablehashobj.check();ClearRemoves all entries in hash table withoutdeleting the hash tablehashobj.clear();DefinedataDefines the data that is to be stored inhash tablehashobj.definedata();DefinedoneIndicates that the key part and data partof the hash table are completehashobj.definedone();DefinekeyDefines the variables that will be used asthe key in the hash tablehashobj.definekey(keyvar1, , keyvarN);EqualsDetermines if two hash tables are equaland stores result in indicated DATA stepvariablehashobj.equals(hash: ‘hashobj1’, results: resvar)FindDetermines if key is stored in hash tablehashobj.find();Checkhashobj.check(key: keyvar1, , key: keyvarN);hashobj.definedata(datavar1, , datavarN);hashobj.definekey(all: ‘yes’);hashobj.find(key: keyvar1, , key: keyvarN);OutputCreates a data set which will contain datafrom hash tablehashobj.output(dataset: ‘lib.outdsn’);RefPerforms a find on the current key and ifthe key is not found it is added to thehash tablehashobj.ref();Removes the data associated with thekeyhashobj.remove();Replaces the data associated with thekey with new : keyvar1, , key: keyvarN);hashobj.remove(key: keyvar1, , key: keyvarN);hashobj.replace(key: keyvar1, , key: keyvarN,data: datavar1, , data: datavarN);SumGets the key summary for the indicatedkey and stores in the indicated DATAstep variablehashobj.sum(sumvar);hashobj.sum(key: keyvar1, , key: keyvarN,sum: sumvar);Table 1. Hash table methods2

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedILLUSTRATION OF HASH TABLEBelow is example code of creating a hash table./* 1. not applicable - sorting is not required when using HASH OBJECTS *//* 2. combine the result records into one data set by CASEID */data null ;/* specify the lookup table */if n 1 then do;/* define the attributes for the variables that are added to main data set */if 0 then set indsn.femresp1 (drop ***//*** BEGIN SECTION TO DECLARE HASH OBJECT /* declare name for hash table with ascending sort order */declare hash fresp(dataset: "indsn.femresp1", ordered: "a");/* define variables that will be used a key for lookup (key part) */fresp.definekey ('CASEID');/* define variables that will be retrieved (data part) */fresp.definedata (all: 'yes');/* end definition of hash table **************//*** END SECTION TO DECLARE HASH OBJECT d;/* specify the main tables that are going to use the lookup table */set indsn.femresp end eof;/*/*/*ifrun;if there is a match fresp.find() returns a 0 for success */otherwise it returns non-zero value for failure*/at the end of the file output the hash table to data set */eof and fresp.find() 0 then fresp.output(dataset: 'femresp hash');DIFFERENCES BETWEEN THE THREE METHODSThe differences between the three merging methods (DATA step merge, SQL Procedure, and HASH table) that webenchmarked are highlighted in the following table.Standard DATA StepPROC SQLDATA Step HASHSyntax ComplexityStraightforwardStraightforward to ModerateVery ConfusingMemory or Disk-BasedDiskDiskMemoryIdeal size of data setsAnyCan be a resource hogfor very large data setsand may not be veryefficient.Small to ModerateCan be a resource hog for verylarge data sets.Large to Very LargeMemory AllocationUpfrontUpfrontOnly when neededSorting/Indexing RequiredYesNoNoAdditional calculationsYesMaybeYesTable 2. Differences between the three methods3

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedBENCHMARKS OF THE THREE METHODSThe summary of real-time, the amount of time spent to process the SAS job, are shown in the tables below. Threedifferent data set sizes (# of observations) have been used for comparison. Real-time is also referred to as elapsedtime. The lowest real-time used is displayed in red.The following three tables display the real-time results using two data sets with many formatted variables.Real Time (seconds)StepStandardDATAStepPROCSQLDATAStepHASH# of Obs# of VarsSize (KB)5,7545,3111.1 (1st Sort)0.10N/AN/AData set 11001.2 (2nd Sort)0.14N/AN/AData set 21005,7545,31110011,5074,096*2 (Join)4.277.184.31FinalTotal4.517.184.31* Compressed using binary optionTable 3. Real-Time statistics of small size data sets with many variablesReal Time (seconds)StepStandardDATAStepPROCSQLDATAStepHASH# of Obs# of VarsSize (KB)1.1 (1st Sort)1.72N/AN/AData set 110,8475,754488,9261.2 (2nd Sort)1.66N/AN/AData set 210,8475,754488,9262 2.277.72* Compressed using binary optionTotalTable 4. Real-time statistics of moderate size data sets with many variablesStepReal Time (seconds)StandardDATADATAPROCStepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort)9.31N/AN/AData set 145,1035,7542,053,8241.2 (2nd Sort)9.40N/AN/AData set 245,1035,7542,053,8242 al39.29301.3138.09* Compressed using binary optionTable 5. Real-Time statistics of large size data sets with many variables4

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedThe following three tables display the real-time results using three data sets with few variables.StepReal Time (seconds)StandardDATADATAPROC StepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort*)0.05N/AN/A* Lookup8651281.2 (2nd Sort†)1.00N/AN/A†Lab Results40,2101998,7841.3 (3rd Sort‡)0.43N/AN/A‡Cancelled786192,1562.1 (1st Pre-join†)0.240.55N/AFinal40,9962120,480*2.2.1 (2nd Pre-join‡)0.04N/AN/A2.2.2 (2nd Pre-join‡)0.020.04N/A3 (Join)0.360.361.10Total2.140.951.10* Compressed using binary optionTable 6. Real-Time statistics of small size data sets with few variablesStepReal Time (seconds)StandardDATADATAPROC StepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort*)0.15N/AN/A* Lookup8651281.2 (2nd Sort†)14.79N/AN/A†Lab Results337,45321844,0001.3 (3rd Sort‡)0.42N/AN/A‡Cancelled786192,1562.1 (1st Pre-join†)2.095.63N/AFinal338,2392392,160*2.2.1 (2nd Pre-join‡)0.38N/AN/A2.2.2 (2nd Pre-join‡)0.020.06N/A3 (Join)1.962.074.1219.817.764.12Total* Compressed using binary optionTable 7. Real-Time statistics of moderate to large size data sets with few variablesStepReal Time (seconds)StandardDATADATAPROCStepStepSQLHASH# of Obs# ofVarsSize (KB)86512810,620,7912126,552,2000.08N/AN/A1.2 (2nd Sort†)1,257.10N/AN/A* Lookup†LabResults1.3 (3rd .86N/AFinal10,621,577232,580,480*2.2.1 (2nd Pre-join‡)1.91N/AN/A2.2.2 (2nd 112.33275.361.1 (1st Sort*)2.1 (1st Pre-join†)3 (Join)Total* Compressed using binary optionTable 8. Real-Time statistics of extremely large size data sets with few variables5

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedWHEN IT IS IDEAL TO USE ONE METHOD OVER THE OTHERThe data sets sizes have an impact on the efficiencies of the different methods. If the data sets to be joined arerelatively small, then using a standard sort and DATA step merge would be sufficient. The larger the data sets arethe less the DATA Step and PROC SQL methods become.The number of variables should also be considered when more than just the number of records needs to beconsidered. Below are recommended approaches of how to determine which method is best to use considering thedata set size and number of variables. Again, we see that the small size data sets independent of the number ofvariables can be merged using a standard sort and DATA step merge but the SQL procedure works just as well.Interestingly, the DATA Step with the HASH table method is more efficient with larger data sets having fewervariables.Standard DATAStepScenarioPROCSQLDATA StepHASHSmall size data sets with many variables Moderate size data sets with many variables Large size data sets with many variables X Small size data sets with few variables Moderate size data sets with few variablesX XXExtremely Large size data sets with few variablesThe use of the indicated method is not recommended.Use caution with the indicated method(s) in this scenarioIdeal method(s) for the indicated scenario Table 9. Recommendation based on number of records and number of variablesLIMITATIONSThese benchmarks were run on a Windows 7 environment using PC SAS v9.4. Different results may occur runningon other environments and SAS versions. The SAS option COMPRESS BINARY was used to make the programsrun quicker.SQL uses different algorithms to execute different types of joins. The SQL optimizer may choose to execute an innerjoin using a hash, index, or sort-and-merge technique under different circumstances. In our test, we used a left outerjoin on the small data sets to add two variables from the lookup table and only kept the lookup table data thatmatched. We did an inner join on the large data sets because there was a 1-1 match of the key variable in each dataset.Results from a DATA step merge can vary based on the environment (i.e. Compression ON or OFF and resultsfrom PROC SORT can vary based on any options used (i.e. TAGSORT vs. Non-TAGSORT vs. OUT used).CONCLUSIONThis was just a small test and there are other factors that can be considered when doing benchmarking but for ourpurposes we only looked at doing a basic DATA step, PROC SQL with a left outer join, and hash object. We lookedat various sizes of data sets and number of variables to see which process was the most efficient and then did acomparison of the data sets to make sure that all three processes produced the same results. There are severalfactors to consider when deciding which approach to use. It may sometimes be worthwhile to learn a new method,even if it is a bit cumbersome, if in the end it will save you a lot of processing time.6

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedREFERENCESSAS Institute, SAS9 Hash Object Tip Sheet, Available at tipsheet.pdf.Burlew, Michele M. 2012. SAS Hash Object Programming Made Easy. Cary, NC: SAS Institute Inc.Lafler, Kirk P. 2010-2015. Exploring SAS DATA Step Hash Programming Techniques. Software IntelligenceCorporation.Secosky, Jason and Bloom, Janice 2007. “Getting Started with the DATA Step Hash Object”. Cary, NC: SAS InstituteInc.ACKNOWLEDGMENTSThanks to Jamie Mabry, Lindsay Dean, Ken Borowiak, David Gray, Richard D’Amato, Lynn Clipstone, and PPD andExperis Management for their reviews and comments. Thanks to our families for their support.DISCLAIMERSThe contents of this paper are the work of the authors and do not necessarily represent the opinions,recommendations, or practices of PPD or Experis.SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SASInstitute Inc. in the USA and other countries. indicates USA registration.Other brand and product names are registered trademarks or trademarks of their respective companies.CONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the authors at:Richann WatsonExperis(513) 843-4081Richann.watson@experis.comLynn MullinsPPD(910) 558-4343Lynn.mullins@ppdi.com7

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedAPPENDIX A – ADDITIONAL HASH TABLE EXAMPLE/* 1. Add in the LBCAT and LBTESTCD from the lookup table */data labhash;/* specify the lookup table(s) */if n 1 then do;/* this will define the attributes for the variables *//* to be retrieved in the hash object*//* (i.e. defines attributes for LBCAT and LBTESTCD*/if 0 then set indsn.lookup (keep LBCAT ***************************//*** BEGIN SECTION TO DECLARE HASH OBJECT FOR LABS WITH RESULTS *********************//* declare name for hash table with ascending sort order */declare hash lrslt(dataset: "indsn.lookup", ordered: "a");/* define variables that will be used a key for lookup (key part) */lrslt.definekey ('PANEL', 'TEST', 'UNIT');/* define variables that will be retrieved (data part) */lrslt.definedata ('LBCAT', 'LBTESTCD');/* end definition of hash table ************************************//*** END SECTION TO DECLARE HASH OBJECT FOR LABS WITH RESULTS **********************************//*** BEGIN SECTION TO DECLARE HASH OBJECT FOR CANCELLED LABS ******************//* declare name for hash table with ascending sort order *//* want to keep only tests that don't have 'LE' in name *//* this is due heme diffs not being calculated b/c there *//* no results to determine the diffs*/declare hash lcncl(dataset: "indsn.lookup (where (not(LBTESTCD ? 'LE')))",ordered: "a");/* define variables that will be used a key for lookup */lcncl.definekey ('PANEL', 'TEST');/* define variables that will be retrieved */lcncl.definedata ('LBCAT', 'LBTESTCD');/* end definition of hash **************************************//*** END SECTION TO DECLARE HASH OBJECT FOR CANCELLED LABS ****************/end;/* specify the main tables that are going to use the lookup table */set indsn.labrslt (in rslt)indsn.labcncl (in cncl);/* if there is a match lxxxx.find() returns a 0 for success *//* otherwise it returns non-zero value for failure*/if rslt and lrslt.find() 0 then output;if cncl and lcncl.find() 0 then output;run;8

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedAPPENDIX B – BENCHMARK METHODSThe statistics compared during the benchmarking and a description of each are highlighted in the table below.StatisticReal-TimeUser CPU TimeSystem CPU TimeDescriptionThe amount of time spent to process the SAS job. Real-time is also referred to as elapsedtime.The CPU time spent to execute SAS code.Memorythe CPU time spent to perform operating system tasks (system overhead tasks) that supportthe execution of SAS codeThe amount of memory required to run a step.OS MemoryThe maximum amount of memory that a step requested from the System.Table 11. Benchmarks of the three methodsAPPENDIX C – FULL BENCHMARK RESULTSThe full benchmarks results using data sets with many formatted variables are displayed in the tables below.Real Time (seconds)StepStandardDATA StepPROC SQLUser CPU Time (seconds)DATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.10N/AN/A0.01N/AN/A1.2 (2nd Sort)0.14N/AN/A0.01N/AN/A2 56.981.68System CPU Time (seconds)Memory (k)StandardDATA Step0.01PROC SQLN/ADATA StepHASHN/AStandardDATA Step7,681.96PROC SQLN/ADATA StepHASHN/A1.2 (2nd Sort)0.06N/AN/A7,161.06N/AN/A2 2.500.102.5549,882.6460,757.4033,228.14Step1.1 (1st Sort)OS Memory (k)StepStandardDATA StepPROC SQLDATA StepHASH# of Obs# of Vars1.1 (1st Sort)37,824N/AN/AData set 11005,7541.2 (2nd Sort)37,824N/AN/AData set 21005,7542 35,09692,952Table 12. Benchmark statistics of small size data sets with many variables9

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStepStandardDATA StepReal Time (seconds)DATA StepPROC SQLHASHUser CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASH1.1 (1st Sort)1.72N/AN/A1.31N/AN/A1.2 (2nd Sort)1.66N/AN/A1.27N/AN/A2 11.214.32TotalStepSystem CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASHMemory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.40N/AN/A7,653.50N/AN/A1.2 (2nd Sort)0.34N/AN/A7,155.93N/AN/A2 otal3.891.433.2949,593.221,410,280.12580,830.31OS Memory (k)StepStandardDATA StepDATA StepHASH1.1 (1st Sort)34,504N/AN/AData set 1# of Obs10,8471.2 (2nd Sort)34,504N/AN/AData set 210,8475,75492,95258,124Final10,84711,5072 (Join)53,824PROC SQL122,832Total92,95258,124Table 13. Benchmark statistics of moderate size data sets with many variables10# of Vars5,754

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStepStandardDATA StepReal Time (seconds)DATA StepPROC SQLHASHUser CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASH1.1 (1st Sort)9.31N/AN/A8.62N/AN/A1.2 (2nd Sort)9.40N/AN/A8.73N/AN/A2 3138.0933.1844.0319.61StepSystem CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASHMemory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.62N/AN/A7,981.21N/AN/A1.2 (2nd Sort)0.59N/AN/A8,005.25N/AN/A2 S Memory (k)StandardDATA Step34,516PROC SQLN/ADATA StepHASHN/AData set 1# of Obs45,103# of Vars5,7541.2 (2nd Sort)34,516N/AN/AData set 245,1035,7542 p1.1 (1st Sort)122,812Total1,137,5722,077,800Table 14. Benchmark statistics of large size data sets with many variables11

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedThe full benchmarks results using data sets with few variables are displayed in the tables below.Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalReal Time (seconds)StandardDATADATAStepStepPROC SQL /AN/A0.020.04N/A0.360.361.102.140.951.10User CPU Time (seconds)DATAStandardStepDATA StepPROC 00N/AN/A0.000.01N/A0.240.260.310.760.660.31Memory (k)Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalSystem CPU Time (seconds)StandardDATADATAStepStepPROC SQL /AN/A0.010.01N/A0.010.030.100.180.350.10Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalOS Memory (k)StandardDATADATAStepStepPROC SQL 68360,536235,82051,468DATAStandardStepDATA StepPROC A1,859.43 86.622,185.84 17,482.34122,763.29 136,328.84 17,482.34*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 15. Benchmark statistics of small size data sets with few variables12# ofObs.8640,21078640,996# ofVars5191921

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedReal Time (seconds)User CPU Time (seconds)StandardDATA StepPROC SQLDATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)0.15N/AN/A0.01N/AN/A1.2 (2nd Sort†)14.79N/AN/A2.68N/AN/A1.3 (3rd Sort‡)0.42N/AN/A0.00N/AN/A2.1 (1st Pre-join†)2.095.63N/A1.803.82N/A2.2.1 (2nd Pre-join‡)0.38N/AN/A0.00N/AN/A2.2.2 (2nd Pre-join‡)0.020.06N/A0.000.01N/A3 StepTotal19.81System CPU Time (seconds)Memory (k)StandardDATA StepPROC SQLDATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)0.00N/AN/A1,153.03N/AN/A1.2 (2nd Sort†)1.49N/AN/A920,448.01N/AN/A1.3 (3rd Sort‡)0.04N/AN/A4,850.18N/AN/A2.1 (1st Pre-join†)0.092.09N/A1,865.40915,295.29N/A2.2.1 (2nd Pre-join‡)0.01N/AN/A1,321.31N/AN/A2.2.2 (2nd Pre-join‡)0.010.03N/A1,881.1815,494.03N/A3 672.260.32933,715.51932,984.2522,629.90StepOS Memory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)35,564N/AN/A1.2 (2nd Sort†)955,048N/AN/A1.3 (3rd Sort‡)39,568N/AN/A2.1 (1st Pre-join†)35,308948,256N/A2.2.1 (2nd Pre-join‡)35,564N/AN/A2.2.2 (2nd Pre-join‡)35,30848,512N/A3 epTotal*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 16. Benchmark statistics of moderate size data sets with few variables13# ofObs.86337,453786338,239# ofVars5211923

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStep1.1 (1st Sort*)Real Time (seconds)StandardDATADATAPROCStepStepSQLHASHUser CPU Time (seconds)DATAStandardStepDATA StepPROC SQLHASH0.08N/AN/A0.00N/AN/A1.2 (2nd Sort†)1,257.10N/AN/A120.60N/AN/A1.3 (3rd Sort‡)0.13N/AN/A0.00N/AN/AN/A59.20219.54N/A2.1 (1st Pre-join†)82.493,004.862.2.1 (2nd Pre-join‡)1.91N/AN/A0.00N/AN/A2.2.2 (2nd .203 (Join)TotalStepSystem CPU Time (seconds)StandardDATADATAPROCStepStepSQLHASHMemory (k)StandardDATA StepPROC SQLDATAStepHASH1.1 (1st Sort*)0.01N/AN/A1,672.59N/AN/A1.2 (2nd Sort†)59.99N/AN/A1,054,363.43N/AN/A1.3 (3rd Sort‡)0.01N/AN/A4,849.71N/AN/A2.1 (1st 2.1 (2nd Pre-join‡)0.00N/AN/A1,321.75N/AN/A2.2.2 (2nd Pre-join‡)0.030.10N/A1,880.5915,489.28N/A3 56.6514.151,068,373.061,076,601.8722,629.78TotalOS Memory (k)Step1.1 (1st N/A1.2 (2nd Sort†)1,088,704N/AN/A1.3 (3rd Sort‡)39,312N/AN/A2.1 (1st Pre-join†)35,3081,092,548N/A2.2.1 (2nd Pre-join‡)35,564N/AN/A2.2.2 (2nd Pre-join‡)35,30848,768N/A3 tal*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 17. Benchmark statistics of large size data sets with few variables14# ofObs.8610,620,79178610,621,577# ofVars5211923

There are a number of possible solutions to merge SAS data sets, including the PROC SORT/DATA step merge, the PROC SQL join, and HASH table lookups. Some of the determinants on which method to use are the size of the data sets, resource availability, and the programmers’ experience with the different techniques. . Syntax Complexity .

Related Documents:

Sorting - Computer Science

Sorting Algorithms (Sorted!) 17. Franceschini's sort 18. Gnome sort 19. Heapsort 20. In-place merge sort 21. Insertion sort 22. Introspective sort 23. Library sort 24. Merge sort 25. Odd-even sort 26. Patience sorting 27. Pigeonhole sort 28. Postman sort 29. Quantum sort 30. Quicksort 31. Rad

46 Views

2y ago

Welcome to the Words Their Way: Word Study in Action ...

Sort 1: Initial Consonant Blends sm, dr, tr, sk, br Sort 2: Consonant Digraphs ch, sh, wh, th Sort 3: Short and Long Vowel a Sort 4: Short and Long Vowel i Sort 5: Short and Long Vowel o Sort 6: Short and Long Vowel u Sort 7: Short and Long Vowel e Sort 8: Review Long Vowels a, e, i, o, u Sort 9: Final /k/ Sound Spelled -ck, -ke,or -k Sort 10:

37 Views

2y ago

Homework 4 and 5 - Columbia University

Given input {4371, 1323, 6173, 4199, 4344, 9679, 1989} and a hash function h(x) x mod 10, show the resulting: a. Separate chaining hash table. b. Hash table using linear probing. c. Hash table using quadratic probing. d. Hash table with second hash function h2 (x) 7 (x mod 7).File Size: 687KBPage Count: 18

21 Views

2y ago

Hashing and Hash Tables

CSci 335 Software Design and Analysis 3 Chapter 5 Hashing and Hash ablesT Prof. Stewart Weiss Hashing and Hash Tables 1 Introduction A hash table is a look-up table that, when designed well, has nearly O(1) average running time for a nd or insert operation. More precisely, a hash table is an array of xed size containing data

28 Views

2y ago

VORSHA: A Variable-sized, One-way and Randomized Secure Hash Algorithm

hash value ranges [256,1024), then the adversary needs to store all the hash values from 256-bit to 1024-bit (the hash value size can range between 256-bit and 1024-bit). It is computationally infeasible to store all such variants of hash values on a server. Moreover, a key can have (1024 256) 768 correct

15 Views

5m ago

GCSE Bubble, merge and insertion sort OCR

OCR Computer Science J276 Bubble, merge and insertion sort Unit 5 Algorithms 3 Understand and be able to trace sort algorithms: Bubble sort Insertion sort Merge sort Objectives . Bubble, merge and insertion sort Unit 5 Algori

163 Views

2y ago

Lecture 13: Review and Wrap-up - Northeastern University

Lecture 13: Review and Wrap-up Post-midterm: Sorting Insertion Sort, Selection Sort, Merge Sort, Quicksort (and Radix Sort) Be able to explain all of the abo ve sorts (except Radix sort) Be able to actually code selection sort and insertion sort Be able to talk about wh y me

20 Views

2y ago

Environmental Values and Behaviours of Adventure Tourism ...

Although adventure tourism is rapidly growing South Africa, research on the subject in this region is relatively limited. A few studies have examined issues and challenges facing the adventure tourism industry as a whole. Rogerson (2007) noted some of the challenges facing the development of adventure tourism in South Africa. One was the lack of marketing, particularly marketing South Africa .

95 Views

3y ago

Recent Views

Novell SUSE Linux Package Description and Support Level .

aspell-eo An Esperanto Dictionary for Aspell L2 aspell-es A Spanish Dictionary for ASpell L2 aspell-et An Estonian dictionary for aspell L2 aspell-fa A Persian dictionary for aspell L2 aspell-fi Finnish Dictionary Package L2 aspell-fo A Faroese Dictionary for ASpell L2 aspell-fr A French Dictionary for ASpell L2 aspell-ga An Irish Dictionary .

2y ago

348 Views

Dictionary of Aviation - THE AIRLINE PILOTS

Dictionary of Accounting 0 7475 6991 6 . Dictionary of Computing 0 7475 6622 4 Dictionary of Economics 0 7136 8203 5 Dictionary of Environment and Ecology 0 7475 7201 1 Dictionary of Food Science and Nutrition 0 7136 7784 8 Dictionary of Human Resources and Personnel Management 0 7136 8142 X

2y ago

162 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Oxford and the Dictionary - Oxford English Dictionary

What makes an Oxford Dictionary? People find dictionary-making fascinating. The 250th anniversary of Samuel Johnson’s Dictionary in 2005 was widely celebrated, and the recent BBC television series Balderdash and Piffle had a huge response to its call to viewers to help track down elusive word and phrase or

2y ago

210 Views

Cambridge Essential English Dictionary

These Dictionary Guide Worksheets are downloadable versions of the Guide to the Dictionary presented in the Cambridge Essential English Dictionary, Second Edition. The Guide is designed to help you develop skills in using a dictionary. The worksheets are grouped as five separate units, whi

2y ago

516 Views

The Interactive Arabic Dictionary: Another Collaboratively .

the Interactive Arabic Dictionary” [11], and “Conceptual Design of the Interactive Arabic Dictionary” [12], were the main studies used in HIAST to implement the Interactive dictionary. 2.1. Objectives IAD is a Monolingual dictionary (Arabic-Arabic), targeted to

2y ago

333 Views

Dictionary-guided Scene Text Recognition

A dictionary is an explicit language model, and the ben-eﬁts of a dictionary for scene text recognition are well es-tablished. In most previous works, a dictionary was used to ensure that the output sequence of characters is a legit-imate word from the dictionary, and it improved the accu-r

2y ago

313 Views

Going Online with a German Collocations Dictionary - unibas.ch

dictionary articles on two levels: a minimalistic view for the search and navigation stage and a more detailed view once a collocation is found. Keywords: online dictionary, collocations, dictionary design, learners' dictionary, German language . 1. Introduction Many dictionaries are available on the Web today. However, as yet there areno well-

7m ago

66 Views

A Fault Dictionary-Based Fault Diagnosis Approach for CMOS Analog .

Step 5: Fault dictionary construction: The fault dictionary is a collection of potential faulty and fault-free responses. The signatures obtained will be stored in the dictionary. This dictionary involves for each fault a correspondence between the faulty circuit responses and the defect sites.

4m ago

56 Views

On Entries for Neologisms in English-Chinese Learner's Dictionaries

A New English Chinese Dictionary of Journalism (2007) by Hu Zhiyong, An English -Chinese Dictionary of Neologisms (2009) by Li Mingyi, English-Chinese Neologism Dictionary (2013) by Wu Xuemei, A Dictionary of New Chinese Phrases in English (2015) by China Daily and A Chinese-English Dictionary of New Words and Expressions (2015) by Wu .

4m ago

63 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ross E. Davies, George Mason University School of Law

Jan 15, 2012 · 4. Bryan A. Garner, Preface to the First Pocket Edition of BLACK‘S LAW DICTIONARY, reprinted in BLACK‘S LAW DICTIONARY vii (3d Pocket ed. 2006). Garner is the current editor-in-chief of Black‟s Law Dictionary and (even more surely than was Black in his own time) the most influential contemporary scho-lar of American legal language. 5.

2y ago

297 Views

Exploring HASH Tables Vs. SORT/Data Step Vs. PROC SQL

It looks like you're using an ad-blocker