Exploring HASH Tables Vs. SORT/Data Step Vs. PROC SQL

2y ago
17 Views
2 Downloads
250.98 KB
14 Pages
Last View : Today
Last Download : 3m ago
Upload by : Troy Oden
Transcription

PharmaSUG 2016 - Paper TT11Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQLRichann Watson, Experis, Batavia, OhioLynn Mullins, PPD, Cincinnati, OhioABSTRACT There are often times when programmers need to merge multiple SAS data sets to combine data into one singlesource data set. Like many other processes, there are various techniques to accomplish this using SAS software.The most efficient method to use based on varying assumptions will be explored in this paper. We will describe thedifferences, advantages and disadvantages, and display benchmarks of using HASH tables, the SORT and DATAstep procedures, and the SQL procedure.INTRODUCTIONMerging data sets together is a common practice that programmers do in order to combine data sets based on keyfields. There are a number of possible solutions to merge SAS data sets, including the PROC SORT/DATA stepmerge, the PROC SQL join, and HASH table lookups. Some of the determinants on which method to use are thesize of the data sets, resource availability, and the programmers’ experience with the different techniques. Thispaper will discuss these three methods in detail beginning with the syntax for using HASH table lookups including theoptions associated with this method. We will then describe the differences in the complexity, memory type, data setsize, and other attributes between the three methods. Benchmarks will also be discussed using three data set sizes,small, medium, and large. And lastly, the ideal situations will be described for when to use each method.INTRODUCTION TO HASH TABLESHash tables, also referred to as hash objects, is an in-memory lookup table that can only be accessed from within theDATA step that creates it. Thus, once the DATA step ends, the hash table is deleted. A hash table provides anefficient way to search the data.The hash object has two parts to it. The first part is the key. The key can consist of a single variable or multiplevariables that will be used to perform a lookup. The key part can consist of character and/or numeric values. Thesecond part of a hash object is the data part. The data part is the data value(s) associated with the key. The datapart can also consist of character and/or numeric values.SYNTAX AND SOME METHODS OF HASH TABLESThe hash table is defined in a DATA step and is only available during the DATA step. The syntax of a hash objectcan be difficult and can take some time getting used to. Once the hash table is defined, then it can be used to add,find, replace, check, remove, and output data. Below is generic code that shows how a hash table is defined.data null ;define attributes for variables that will be retrieved, i.e., data partif n 1 then do;/* declare name for hash table with ascending sort order */declare hash hashobj(dataset: "lib.indsn", ordered: "a");/* define variables that will be used a key for lookup (key part) */hashobj.definekey ('keyvar1', 'keyvar2', 'keyvar3');/* define variables that will be retrieved (data part) */hashobj.definedata ('datavar1', 'datavar2');/* end definition of hash table */hashobj.definedone();1

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continued/* specify the main table(s) that are going to use the lookup table */set inlibnm.indsn;/* one or more hash methods can be used to add, find, replace, check, etc. */hashobj.check();if hashobj.find() then output;run;Some of the methods that can be used with hash tables along with a description of each and the syntax are in thetable below.MethodDescriptionSyntaxAddAdds the data associated with the key tothe hash tablehashobj.add();hashobj.add(key: keyvar1, , key: keyvarN,data: datavar1, , data: datavarN);Checks to see if key is stored in hashtablehashobj.check();ClearRemoves all entries in hash table withoutdeleting the hash tablehashobj.clear();DefinedataDefines the data that is to be stored inhash tablehashobj.definedata();DefinedoneIndicates that the key part and data partof the hash table are completehashobj.definedone();DefinekeyDefines the variables that will be used asthe key in the hash tablehashobj.definekey(keyvar1, , keyvarN);EqualsDetermines if two hash tables are equaland stores result in indicated DATA stepvariablehashobj.equals(hash: ‘hashobj1’, results: resvar)FindDetermines if key is stored in hash tablehashobj.find();Checkhashobj.check(key: keyvar1, , key: keyvarN);hashobj.definedata(datavar1, , datavarN);hashobj.definekey(all: ‘yes’);hashobj.find(key: keyvar1, , key: keyvarN);OutputCreates a data set which will contain datafrom hash tablehashobj.output(dataset: ‘lib.outdsn’);RefPerforms a find on the current key and ifthe key is not found it is added to thehash tablehashobj.ref();Removes the data associated with thekeyhashobj.remove();Replaces the data associated with thekey with new : keyvar1, , key: keyvarN);hashobj.remove(key: keyvar1, , key: keyvarN);hashobj.replace(key: keyvar1, , key: keyvarN,data: datavar1, , data: datavarN);SumGets the key summary for the indicatedkey and stores in the indicated DATAstep variablehashobj.sum(sumvar);hashobj.sum(key: keyvar1, , key: keyvarN,sum: sumvar);Table 1. Hash table methods2

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedILLUSTRATION OF HASH TABLEBelow is example code of creating a hash table./* 1. not applicable - sorting is not required when using HASH OBJECTS *//* 2. combine the result records into one data set by CASEID */data null ;/* specify the lookup table */if n 1 then do;/* define the attributes for the variables that are added to main data set */if 0 then set indsn.femresp1 (drop ***//*** BEGIN SECTION TO DECLARE HASH OBJECT /* declare name for hash table with ascending sort order */declare hash fresp(dataset: "indsn.femresp1", ordered: "a");/* define variables that will be used a key for lookup (key part) */fresp.definekey ('CASEID');/* define variables that will be retrieved (data part) */fresp.definedata (all: 'yes');/* end definition of hash table **************//*** END SECTION TO DECLARE HASH OBJECT d;/* specify the main tables that are going to use the lookup table */set indsn.femresp end eof;/*/*/*ifrun;if there is a match fresp.find() returns a 0 for success */otherwise it returns non-zero value for failure*/at the end of the file output the hash table to data set */eof and fresp.find() 0 then fresp.output(dataset: 'femresp hash');DIFFERENCES BETWEEN THE THREE METHODSThe differences between the three merging methods (DATA step merge, SQL Procedure, and HASH table) that webenchmarked are highlighted in the following table.Standard DATA StepPROC SQLDATA Step HASHSyntax ComplexityStraightforwardStraightforward to ModerateVery ConfusingMemory or Disk-BasedDiskDiskMemoryIdeal size of data setsAnyCan be a resource hogfor very large data setsand may not be veryefficient.Small to ModerateCan be a resource hog for verylarge data sets.Large to Very LargeMemory AllocationUpfrontUpfrontOnly when neededSorting/Indexing RequiredYesNoNoAdditional calculationsYesMaybeYesTable 2. Differences between the three methods3

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedBENCHMARKS OF THE THREE METHODSThe summary of real-time, the amount of time spent to process the SAS job, are shown in the tables below. Threedifferent data set sizes (# of observations) have been used for comparison. Real-time is also referred to as elapsedtime. The lowest real-time used is displayed in red.The following three tables display the real-time results using two data sets with many formatted variables.Real Time (seconds)StepStandardDATAStepPROCSQLDATAStepHASH# of Obs# of VarsSize (KB)5,7545,3111.1 (1st Sort)0.10N/AN/AData set 11001.2 (2nd Sort)0.14N/AN/AData set 21005,7545,31110011,5074,096*2 (Join)4.277.184.31FinalTotal4.517.184.31* Compressed using binary optionTable 3. Real-Time statistics of small size data sets with many variablesReal Time (seconds)StepStandardDATAStepPROCSQLDATAStepHASH# of Obs# of VarsSize (KB)1.1 (1st Sort)1.72N/AN/AData set 110,8475,754488,9261.2 (2nd Sort)1.66N/AN/AData set 210,8475,754488,9262 2.277.72* Compressed using binary optionTotalTable 4. Real-time statistics of moderate size data sets with many variablesStepReal Time (seconds)StandardDATADATAPROCStepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort)9.31N/AN/AData set 145,1035,7542,053,8241.2 (2nd Sort)9.40N/AN/AData set 245,1035,7542,053,8242 al39.29301.3138.09* Compressed using binary optionTable 5. Real-Time statistics of large size data sets with many variables4

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedThe following three tables display the real-time results using three data sets with few variables.StepReal Time (seconds)StandardDATADATAPROC StepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort*)0.05N/AN/A* Lookup8651281.2 (2nd Sort†)1.00N/AN/A†Lab Results40,2101998,7841.3 (3rd Sort‡)0.43N/AN/A‡Cancelled786192,1562.1 (1st Pre-join†)0.240.55N/AFinal40,9962120,480*2.2.1 (2nd Pre-join‡)0.04N/AN/A2.2.2 (2nd Pre-join‡)0.020.04N/A3 (Join)0.360.361.10Total2.140.951.10* Compressed using binary optionTable 6. Real-Time statistics of small size data sets with few variablesStepReal Time (seconds)StandardDATADATAPROC StepStepSQLHASH# of Obs# ofVarsSize (KB)1.1 (1st Sort*)0.15N/AN/A* Lookup8651281.2 (2nd Sort†)14.79N/AN/A†Lab Results337,45321844,0001.3 (3rd Sort‡)0.42N/AN/A‡Cancelled786192,1562.1 (1st Pre-join†)2.095.63N/AFinal338,2392392,160*2.2.1 (2nd Pre-join‡)0.38N/AN/A2.2.2 (2nd Pre-join‡)0.020.06N/A3 (Join)1.962.074.1219.817.764.12Total* Compressed using binary optionTable 7. Real-Time statistics of moderate to large size data sets with few variablesStepReal Time (seconds)StandardDATADATAPROCStepStepSQLHASH# of Obs# ofVarsSize (KB)86512810,620,7912126,552,2000.08N/AN/A1.2 (2nd Sort†)1,257.10N/AN/A* Lookup†LabResults1.3 (3rd .86N/AFinal10,621,577232,580,480*2.2.1 (2nd Pre-join‡)1.91N/AN/A2.2.2 (2nd 112.33275.361.1 (1st Sort*)2.1 (1st Pre-join†)3 (Join)Total* Compressed using binary optionTable 8. Real-Time statistics of extremely large size data sets with few variables5

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedWHEN IT IS IDEAL TO USE ONE METHOD OVER THE OTHERThe data sets sizes have an impact on the efficiencies of the different methods. If the data sets to be joined arerelatively small, then using a standard sort and DATA step merge would be sufficient. The larger the data sets arethe less the DATA Step and PROC SQL methods become.The number of variables should also be considered when more than just the number of records needs to beconsidered. Below are recommended approaches of how to determine which method is best to use considering thedata set size and number of variables. Again, we see that the small size data sets independent of the number ofvariables can be merged using a standard sort and DATA step merge but the SQL procedure works just as well.Interestingly, the DATA Step with the HASH table method is more efficient with larger data sets having fewervariables.Standard DATAStepScenarioPROCSQLDATA StepHASHSmall size data sets with many variables Moderate size data sets with many variables Large size data sets with many variables X Small size data sets with few variables Moderate size data sets with few variablesX XXExtremely Large size data sets with few variablesThe use of the indicated method is not recommended.Use caution with the indicated method(s) in this scenarioIdeal method(s) for the indicated scenario Table 9. Recommendation based on number of records and number of variablesLIMITATIONSThese benchmarks were run on a Windows 7 environment using PC SAS v9.4. Different results may occur runningon other environments and SAS versions. The SAS option COMPRESS BINARY was used to make the programsrun quicker.SQL uses different algorithms to execute different types of joins. The SQL optimizer may choose to execute an innerjoin using a hash, index, or sort-and-merge technique under different circumstances. In our test, we used a left outerjoin on the small data sets to add two variables from the lookup table and only kept the lookup table data thatmatched. We did an inner join on the large data sets because there was a 1-1 match of the key variable in each dataset.Results from a DATA step merge can vary based on the environment (i.e. Compression ON or OFF and resultsfrom PROC SORT can vary based on any options used (i.e. TAGSORT vs. Non-TAGSORT vs. OUT used).CONCLUSIONThis was just a small test and there are other factors that can be considered when doing benchmarking but for ourpurposes we only looked at doing a basic DATA step, PROC SQL with a left outer join, and hash object. We lookedat various sizes of data sets and number of variables to see which process was the most efficient and then did acomparison of the data sets to make sure that all three processes produced the same results. There are severalfactors to consider when deciding which approach to use. It may sometimes be worthwhile to learn a new method,even if it is a bit cumbersome, if in the end it will save you a lot of processing time.6

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedREFERENCESSAS Institute, SAS9 Hash Object Tip Sheet, Available at tipsheet.pdf.Burlew, Michele M. 2012. SAS Hash Object Programming Made Easy. Cary, NC: SAS Institute Inc.Lafler, Kirk P. 2010-2015. Exploring SAS DATA Step Hash Programming Techniques. Software IntelligenceCorporation.Secosky, Jason and Bloom, Janice 2007. “Getting Started with the DATA Step Hash Object”. Cary, NC: SAS InstituteInc.ACKNOWLEDGMENTSThanks to Jamie Mabry, Lindsay Dean, Ken Borowiak, David Gray, Richard D’Amato, Lynn Clipstone, and PPD andExperis Management for their reviews and comments. Thanks to our families for their support.DISCLAIMERSThe contents of this paper are the work of the authors and do not necessarily represent the opinions,recommendations, or practices of PPD or Experis.SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SASInstitute Inc. in the USA and other countries. indicates USA registration.Other brand and product names are registered trademarks or trademarks of their respective companies.CONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the authors at:Richann WatsonExperis(513) 843-4081Richann.watson@experis.comLynn MullinsPPD(910) 558-4343Lynn.mullins@ppdi.com7

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedAPPENDIX A – ADDITIONAL HASH TABLE EXAMPLE/* 1. Add in the LBCAT and LBTESTCD from the lookup table */data labhash;/* specify the lookup table(s) */if n 1 then do;/* this will define the attributes for the variables *//* to be retrieved in the hash object*//* (i.e. defines attributes for LBCAT and LBTESTCD*/if 0 then set indsn.lookup (keep LBCAT ***************************//*** BEGIN SECTION TO DECLARE HASH OBJECT FOR LABS WITH RESULTS *********************//* declare name for hash table with ascending sort order */declare hash lrslt(dataset: "indsn.lookup", ordered: "a");/* define variables that will be used a key for lookup (key part) */lrslt.definekey ('PANEL', 'TEST', 'UNIT');/* define variables that will be retrieved (data part) */lrslt.definedata ('LBCAT', 'LBTESTCD');/* end definition of hash table ************************************//*** END SECTION TO DECLARE HASH OBJECT FOR LABS WITH RESULTS **********************************//*** BEGIN SECTION TO DECLARE HASH OBJECT FOR CANCELLED LABS ******************//* declare name for hash table with ascending sort order *//* want to keep only tests that don't have 'LE' in name *//* this is due heme diffs not being calculated b/c there *//* no results to determine the diffs*/declare hash lcncl(dataset: "indsn.lookup (where (not(LBTESTCD ? 'LE')))",ordered: "a");/* define variables that will be used a key for lookup */lcncl.definekey ('PANEL', 'TEST');/* define variables that will be retrieved */lcncl.definedata ('LBCAT', 'LBTESTCD');/* end definition of hash **************************************//*** END SECTION TO DECLARE HASH OBJECT FOR CANCELLED LABS ****************/end;/* specify the main tables that are going to use the lookup table */set indsn.labrslt (in rslt)indsn.labcncl (in cncl);/* if there is a match lxxxx.find() returns a 0 for success *//* otherwise it returns non-zero value for failure*/if rslt and lrslt.find() 0 then output;if cncl and lcncl.find() 0 then output;run;8

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedAPPENDIX B – BENCHMARK METHODSThe statistics compared during the benchmarking and a description of each are highlighted in the table below.StatisticReal-TimeUser CPU TimeSystem CPU TimeDescriptionThe amount of time spent to process the SAS job. Real-time is also referred to as elapsedtime.The CPU time spent to execute SAS code.Memorythe CPU time spent to perform operating system tasks (system overhead tasks) that supportthe execution of SAS codeThe amount of memory required to run a step.OS MemoryThe maximum amount of memory that a step requested from the System.Table 11. Benchmarks of the three methodsAPPENDIX C – FULL BENCHMARK RESULTSThe full benchmarks results using data sets with many formatted variables are displayed in the tables below.Real Time (seconds)StepStandardDATA StepPROC SQLUser CPU Time (seconds)DATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.10N/AN/A0.01N/AN/A1.2 (2nd Sort)0.14N/AN/A0.01N/AN/A2 56.981.68System CPU Time (seconds)Memory (k)StandardDATA Step0.01PROC SQLN/ADATA StepHASHN/AStandardDATA Step7,681.96PROC SQLN/ADATA StepHASHN/A1.2 (2nd Sort)0.06N/AN/A7,161.06N/AN/A2 2.500.102.5549,882.6460,757.4033,228.14Step1.1 (1st Sort)OS Memory (k)StepStandardDATA StepPROC SQLDATA StepHASH# of Obs# of Vars1.1 (1st Sort)37,824N/AN/AData set 11005,7541.2 (2nd Sort)37,824N/AN/AData set 21005,7542 35,09692,952Table 12. Benchmark statistics of small size data sets with many variables9

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStepStandardDATA StepReal Time (seconds)DATA StepPROC SQLHASHUser CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASH1.1 (1st Sort)1.72N/AN/A1.31N/AN/A1.2 (2nd Sort)1.66N/AN/A1.27N/AN/A2 11.214.32TotalStepSystem CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASHMemory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.40N/AN/A7,653.50N/AN/A1.2 (2nd Sort)0.34N/AN/A7,155.93N/AN/A2 otal3.891.433.2949,593.221,410,280.12580,830.31OS Memory (k)StepStandardDATA StepDATA StepHASH1.1 (1st Sort)34,504N/AN/AData set 1# of Obs10,8471.2 (2nd Sort)34,504N/AN/AData set 210,8475,75492,95258,124Final10,84711,5072 (Join)53,824PROC SQL122,832Total92,95258,124Table 13. Benchmark statistics of moderate size data sets with many variables10# of Vars5,754

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStepStandardDATA StepReal Time (seconds)DATA StepPROC SQLHASHUser CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASH1.1 (1st Sort)9.31N/AN/A8.62N/AN/A1.2 (2nd Sort)9.40N/AN/A8.73N/AN/A2 3138.0933.1844.0319.61StepSystem CPU Time (seconds)StandardDATA StepDATA StepPROC SQLHASHMemory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort)0.62N/AN/A7,981.21N/AN/A1.2 (2nd Sort)0.59N/AN/A8,005.25N/AN/A2 S Memory (k)StandardDATA Step34,516PROC SQLN/ADATA StepHASHN/AData set 1# of Obs45,103# of Vars5,7541.2 (2nd Sort)34,516N/AN/AData set 245,1035,7542 p1.1 (1st Sort)122,812Total1,137,5722,077,800Table 14. Benchmark statistics of large size data sets with many variables11

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedThe full benchmarks results using data sets with few variables are displayed in the tables below.Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalReal Time (seconds)StandardDATADATAStepStepPROC SQL /AN/A0.020.04N/A0.360.361.102.140.951.10User CPU Time (seconds)DATAStandardStepDATA StepPROC 00N/AN/A0.000.01N/A0.240.260.310.760.660.31Memory (k)Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalSystem CPU Time (seconds)StandardDATADATAStepStepPROC SQL /AN/A0.010.01N/A0.010.030.100.180.350.10Step1.1 (1st Sort*)1.2 (2nd Sort†)1.3 (3rd Sort‡)2.1 (1st Pre-join†)2.2.1 (2nd Pre-join‡)2.2.2 (2nd Pre-join‡)3 (Join)TotalOS Memory (k)StandardDATADATAStepStepPROC SQL 68360,536235,82051,468DATAStandardStepDATA StepPROC A1,859.43 86.622,185.84 17,482.34122,763.29 136,328.84 17,482.34*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 15. Benchmark statistics of small size data sets with few variables12# ofObs.8640,21078640,996# ofVars5191921

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedReal Time (seconds)User CPU Time (seconds)StandardDATA StepPROC SQLDATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)0.15N/AN/A0.01N/AN/A1.2 (2nd Sort†)14.79N/AN/A2.68N/AN/A1.3 (3rd Sort‡)0.42N/AN/A0.00N/AN/A2.1 (1st Pre-join†)2.095.63N/A1.803.82N/A2.2.1 (2nd Pre-join‡)0.38N/AN/A0.00N/AN/A2.2.2 (2nd Pre-join‡)0.020.06N/A0.000.01N/A3 StepTotal19.81System CPU Time (seconds)Memory (k)StandardDATA StepPROC SQLDATA StepHASHStandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)0.00N/AN/A1,153.03N/AN/A1.2 (2nd Sort†)1.49N/AN/A920,448.01N/AN/A1.3 (3rd Sort‡)0.04N/AN/A4,850.18N/AN/A2.1 (1st Pre-join†)0.092.09N/A1,865.40915,295.29N/A2.2.1 (2nd Pre-join‡)0.01N/AN/A1,321.31N/AN/A2.2.2 (2nd Pre-join‡)0.010.03N/A1,881.1815,494.03N/A3 672.260.32933,715.51932,984.2522,629.90StepOS Memory (k)StandardDATA StepPROC SQLDATA StepHASH1.1 (1st Sort*)35,564N/AN/A1.2 (2nd Sort†)955,048N/AN/A1.3 (3rd Sort‡)39,568N/AN/A2.1 (1st Pre-join†)35,308948,256N/A2.2.1 (2nd Pre-join‡)35,564N/AN/A2.2.2 (2nd Pre-join‡)35,30848,512N/A3 epTotal*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 16. Benchmark statistics of moderate size data sets with few variables13# ofObs.86337,453786338,239# ofVars5211923

Exploring HASH tables vs. SORT/DATA step vs. PROC SQL, continuedStep1.1 (1st Sort*)Real Time (seconds)StandardDATADATAPROCStepStepSQLHASHUser CPU Time (seconds)DATAStandardStepDATA StepPROC SQLHASH0.08N/AN/A0.00N/AN/A1.2 (2nd Sort†)1,257.10N/AN/A120.60N/AN/A1.3 (3rd Sort‡)0.13N/AN/A0.00N/AN/AN/A59.20219.54N/A2.1 (1st Pre-join†)82.493,004.862.2.1 (2nd Pre-join‡)1.91N/AN/A0.00N/AN/A2.2.2 (2nd .203 (Join)TotalStepSystem CPU Time (seconds)StandardDATADATAPROCStepStepSQLHASHMemory (k)StandardDATA StepPROC SQLDATAStepHASH1.1 (1st Sort*)0.01N/AN/A1,672.59N/AN/A1.2 (2nd Sort†)59.99N/AN/A1,054,363.43N/AN/A1.3 (3rd Sort‡)0.01N/AN/A4,849.71N/AN/A2.1 (1st 2.1 (2nd Pre-join‡)0.00N/AN/A1,321.75N/AN/A2.2.2 (2nd Pre-join‡)0.030.10N/A1,880.5915,489.28N/A3 56.6514.151,068,373.061,076,601.8722,629.78TotalOS Memory (k)Step1.1 (1st N/A1.2 (2nd Sort†)1,088,704N/AN/A1.3 (3rd Sort‡)39,312N/AN/A2.1 (1st Pre-join†)35,3081,092,548N/A2.2.1 (2nd Pre-join‡)35,564N/AN/A2.2.2 (2nd Pre-join‡)35,30848,768N/A3 tal*Test Code Lookup†Lab Results‡Cancelled Lab TestFinalTable 17. Benchmark statistics of large size data sets with few variables14# ofObs.8610,620,79178610,621,577# ofVars5211923

There are a number of possible solutions to merge SAS data sets, including the PROC SORT/DATA step merge, the PROC SQL join, and HASH table lookups. Some of the determinants on which method to use are the size of the data sets, resource availability, and the programmers’ experience with the different techniques. . Syntax Complexity .

Related Documents:

Sorting Algorithms (Sorted!) 17. Franceschini's sort 18. Gnome sort 19. Heapsort 20. In-place merge sort 21. Insertion sort 22. Introspective sort 23. Library sort 24. Merge sort 25. Odd-even sort 26. Patience sorting 27. Pigeonhole sort 28. Postman sort 29. Quantum sort 30. Quicksort 31. Rad

Sort 1: Initial Consonant Blends sm, dr, tr, sk, br Sort 2: Consonant Digraphs ch, sh, wh, th Sort 3: Short and Long Vowel a Sort 4: Short and Long Vowel i Sort 5: Short and Long Vowel o Sort 6: Short and Long Vowel u Sort 7: Short and Long Vowel e Sort 8: Review Long Vowels a, e, i, o, u Sort 9: Final /k/ Sound Spelled -ck, -ke,or -k Sort 10:

Given input {4371, 1323, 6173, 4199, 4344, 9679, 1989} and a hash function h(x) x mod 10, show the resulting: a. Separate chaining hash table. b. Hash table using linear probing. c. Hash table using quadratic probing. d. Hash table with second hash function h2 (x) 7 (x mod 7).File Size: 687KBPage Count: 18

CSci 335 Software Design and Analysis 3 Chapter 5 Hashing and Hash ablesT Prof. Stewart Weiss Hashing and Hash Tables 1 Introduction A hash table is a look-up table that, when designed well, has nearly O(1) average running time for a nd or insert operation. More precisely, a hash table is an array of xed size containing data

hash value ranges [256,1024), then the adversary needs to store all the hash values from 256-bit to 1024-bit (the hash value size can range between 256-bit and 1024-bit). It is computationally infeasible to store all such variants of hash values on a server. Moreover, a key can have (1024 256) 768 correct

OCR Computer Science J276 Bubble, merge and insertion sort Unit 5 Algorithms 3 Understand and be able to trace sort algorithms: Bubble sort Insertion sort Merge sort Objectives . Bubble, merge and insertion sort Unit 5 Algori

Lecture 13: Review and Wrap-up Post-midterm: Sorting Insertion Sort, Selection Sort, Merge Sort, Quicksort (and Radix Sort) Be able to explain all of the abo ve sorts (except Radix sort) Be able to actually code selection sort and insertion sort Be able to talk about wh y me

Although adventure tourism is rapidly growing South Africa, research on the subject in this region is relatively limited. A few studies have examined issues and challenges facing the adventure tourism industry as a whole. Rogerson (2007) noted some of the challenges facing the development of adventure tourism in South Africa. One was the lack of marketing, particularly marketing South Africa .