Running Stata Parallel E Ciently

2y ago
44 Views
4 Downloads
331.83 KB
19 Pages
Last View : Today
Last Download : 3m ago
Upload by : Anton Mixon
Transcription

multishellRunning Stata parallel efficiently. or .I was a final year PhD student and needed computational power.Jan DitzenHeriot-Watt University, Edinburgh, UKCenter for Energy Economics Research and Policy (CEERP)September 7, 2018Jan Ditzen (Heriot-Watt University)multishell7. September 20181 / 19

IntroductionTime is limited.Simulations to asses bias of an estimator run over a huge variety ofdifferent parameters. This is very time consuming.Likewise, running many large do files to process datasets or createtables can take lot of time. Often it does not matter in which orderthey run.Does Stata help with it?IIOnly possible to run a single do file at a time.Multi core systems would allow parallel computing. Stata/IC andStata/SE use only one core. Stata/MP supports multiple cores, butonly commands are speeded up.Why not run a simulation or do files parallel.Jan Ditzen (Heriot-Watt University)multishell7. September 20182 / 19

Introduction - Example(N,T)Bias (x100)4050Pφ 1/N Ni 1 φi40 -42.85 -31.6950 -42.91 -30.28100 -43.33 -31.51150 -42.16 -31.11200 -43.65 -31.43Pβ0 1/N Ni 1 .214.07Pβ1 1/N Ni 1 β1i40-8.61-6.2050-6.82-5.27100 -5.12-3.04150 -6.78-2.76200 -5.13-2.88RMSE 2.151.85Table: Monte Carlo Results for coefficients φ, β0 and β1 , estimating a dynamic common correlated effects model usingxtdcce2. The DGP is yi,t cyi φi yi,t 1 β0i xi,t β1i xi,t 1 γi0 ft i,t . Example taken from Table 1 Ditzen (2017).Example: Monte Carlo to asses bias of an estimator with 5 parametrisations fornumber of time periods (T) and cross sections (N).5 * 5 runs with 1000 repetitions necessary to generate this table, with no otherparameters changed.Assume 1 estimation takes 1 second, 1000 seconds needed for one parametrisation,25,000 seconds or 7 hours required for all simulations.If 5 runs could be run parallel, 5000 seconds or 1.5 hours would be needed.Jan Ditzen (Heriot-Watt University)multishell7. September 20183 / 19

Introduction - What exists?parallelIIInspired by R library ”snow” implements parallel computing throughStata’s batch mode.Can be used to speed up commands like simulate or bootstrap andspeeds up computations on datasets.qsubIQueues a list of jobs and submits them to different Stata instances.multishellIIIA mix of both routines with the extension to use it across computers.Loops (forvalues and foreach) are dissected into variations andqueued.The queue is then processed on multiple instances of Stata on one ormore computers.Jan Ditzen (Heriot-Watt University)multishell7. September 20184 / 19

ExampleAssume a Monte Carlo to assess the bias of the OLS estimator isplanned with an increasing number of observations. Results for eachrun are saved in a seperated dataset.A straightforward way to code this would be:simulation.doprogram define MCprog, rclasssyntax anything(name N)clearset obs ‘N’drawnorm x egen y 2 0.5*x ereg y xreturn scalar x multishellexepath "C:/Stata/Stata.exe"path "C:/documents/multishell/temp"add "./simulation.do"start , threads(3) sleep(2000)clearforvalues n 10 (10) 130 {simulate bx r(x), reps(1000): MCprog ‘n’save results ‘n’, replace}Jan Ditzen (Heriot-Watt University)multishell7. September 20185 / 19

ExampleWhat does multishell do?multishell will create a new sub folder for each variation, i.e. n 10, n 20, n 30, ., n 130. In each folder a do file with thecorresponding variation, a log file and file containing the status aresaved.The do files are queued and further do files can be added.The running Stata instance acts as the main multishell instance. Itcreates a batch file for each job and coordinates the number ofparallel Stata instances.As soon as a job (or variation) is completed, the status in the subfolder is changed and the instance will be closed.multishell main instance will scan the folders and check ifadditional instances can be started.Jan Ditzen (Heriot-Watt University)multishell7. September 20186 / 19

Single ComputerMain Instancemultishell run, threads(3) sleep(1000)Instance 1Instance 2Instance 3#1, i 10#2, i 20#3, i 30#4, i 40#5, i 50#6, i 60.forvalues i 10 (10) 130:idVariation#1i 10#2i 20#3i 30.#13i 130starts instancesreportsJan Ditzen (Heriot-Watt University)multishell7. September 20187 / 19

Single ComputerIn Stata. multishell run, threads(4) ation.don 10n 20n 30n 40n 50n 60n 70n 80n 90n 100n 110n 120n 130queued and runningfinished2 Sep 2018 - 15:25:29finished2 Sep 2018 - 15:25:29finished2 Sep 2018 - 15:25:29finished2 Sep 2018 - 15:25:29running2 Sep 2018 - 15:25:31running2 Sep 2018 - 15:25:31running2 Sep 2018 - 15:25:31running2 Sep 2018 - 15:25:31queued2 Sep 2018 - 15:25:14queued2 Sep 2018 - 15:25:14queued2 Sep 2018 - 15:25:14queued2 Sep 2018 - 15:25:15queued2 Sep 2018 - 15:25:15MachineThis al50448Computername: HPJDas of 2 Sep 2018 - 15:25:32; started atnext refresh in 2s.Jan Ditzen (Heriot-Watt 2 Sep 2018 - 15:25:15multishell7. September 20188 / 19

Cluster of ComputersIn case of multiple computers, one computer acts as the server.Prerequisite: the computers must have shared access to the foldermultishell uses to save do files.The main instance of the server allocates tasks to the clients, so acluster is set up.Each computer has a main instance, which then starts new instancesof Stata processing the allocated tasks.Jan Ditzen (Heriot-Watt University)multishell7. September 20189 / 19

Cluster of ComputersMain InstanceClient Instancemultishell run, threads(3) sleep(1000)multishell run client , threads(2) sleep(1000)Instance 1Instance 2Instance 3Instance 1Instance 2#1, i 10#2, i 20#3, i 30#4, i 40#5, i 50#6, i 60#7, i 70#8, i 80#9, i 90#10, i 100.forvalues i 10 (10) 130:idVariation#1i 10#2i 20#3i 30.#13i 130starts instancesreportsassigns tasksJan Ditzen (Heriot-Watt University)multishell7. September 201810 / 19

Cluster of ComputersIn Stata (from help CarloSimulation.don 50n 60n 70n 80n 90n 100n 110n 120n 130MonteCarloSimulation panel.don 30 , t 30n 30 , t 40n 30 , t 50n 40 , t 30n 40 , t 40n 40 , t 50n 50 , t 30n 50 , t 40n 50 , t 50running and finishedfinished17 Julfinished17 Julfinished17 Julfinished17 Julfinished17 Julrunning17 Julfinished17 Julfinished17 Julfinished17 Julqueued and runningrunning17 Julrunning17 Julassigned17 Julassigned17 Julrunning17 Julassigned17 Julassigned17 Julqueued17 Julqueued17 JulTimeMachineHPJDThis : Research181as of 17 Jul 2018 - 14:26:54; started at 17 Jul 2018 - 14:26:33next refresh in 2s.Jan Ditzen (Heriot-Watt University)multishell7. September 201811 / 19

Syntax and set-up I1Set paths.II2Add do files.I3multishell path "C:/Documents/Multishell"Path for folder to store files.multishell exepath "C:/Programs/Stata/Stata.exe"Path to Stata exe.multishell add "C:/Documents/Multishell/simulation.do"Do file to be queued. For each job, a sub folder in the path set aboveis created, do file and status file are saved.Additional ParametersIImultishell adopath "C:/Documents/myado"Load additional ados.multishell alttext "old text @ new text"Replace old text in with new text. Possible to adjust paths in the dofile for each computer.Jan Ditzen (Heriot-Watt University)multishell7. September 201812 / 19

Syntax and set-up IIImultishell seed type filename, [fill](.yes, I am using Stata 14 and not Stata 15)Setting up the seed using dataset filename. type can beFFF4create creates a dataset with empty seeds for each variation. If optionfill is used, then seeds are random numbers.save saves the dataset with the seeds used for each variation infilename.load uses seeds from dataset filename.Start the multishell server (or client).IImultishell run [client] , threads(integer)sleep(integer) [nostop networkdrive]Starts the multishell main instance. If option client is used, then theinstance is started as a client and waits for a server to assign tasks tothe computer.OptionsFFFFthreads(integer) Sets the number of parallel Stata instances.sleep(integer) milliseconds until status of tasks is refreshed.nostop Client is restarted if all tasks are finished.networkdrive log file is saved in the path folder.Jan Ditzen (Heriot-Watt University)multishell7. September 201813 / 19

Syntax and set-up III5DiagnosisIImultishell statusShows the status of the multishell, including the number of tasks,clients and path set up.multishell reset type, computer(Computername)Re-queues tasks for computer.where type is assigned, running, finished, error, id(#)Jan Ditzen (Heriot-Watt University)multishell7. September 201814 / 19

Examplemultishell server.dolocal google drive "C:/Users//‘c(username)’/Google lmultishellmultishellmultishellmultishellpath "‘google drive’/Code/simulation/temp" , clearexepath "C:/Program Files (x86)/Stata14/StataSE-64.exe"adopath "‘google drive’/Code/ados/"alttext "GOOGLE FILE @ ‘google drive’"add "‘google drive’/Code/simulation/simulation loop.do"seed create seed all , fillrun , threads(7) sleep(1000) networkmultishell client.dolocal google drive "C:/Users//‘c(username)’/Google tishellmultishellmultishellpath "‘google drive’/Code/simulation/temp"exepath "C:/Program Files (x86)/Stata14/StataSE-64.exe"adopath "‘google drive’/Code/ados/"alttext "GOOGLE FILE @ ‘google drive’"run client, threads(4) sleep(1000) networkJan Ditzen (Heriot-Watt University)multishell7. September 201815 / 19

PerformanceIs it all worth it?Simulation from above repeated with varying number of threads onan Intel Core i5-2450M with 4 cores, Windows 7 and Stata 14.2.Threads12345Jan Ditzen (Heriot-Watt 18127.317. September 201816 / 19

Limitations(Sadly) there are some limitationsOnly Windows is supported.multishell only speeds up loops or processing multiple do files. Itdoes not improve the speed of Stata commands.If there are synch or speed problems with Cloud services such asGoogle Backup and Sync, Dropbox, etc. or the local network,multishell will slow down or stop. Read/write problems in a localnetwork may occur as well and cause problems.If run on a mapped network drive, then the log files may be saved inMy Documents or the Stata folder.No locals in loops are supported (such as foreach type in ‘one’ ‘two’‘three’).All loops are dissected.Jan Ditzen (Heriot-Watt University)multishell7. September 201817 / 19

Conclusionmultishell helps to speed up simulations or running multiple largedo files.Parallel instances of Stata can be run on a single machine. Numberdepends on the number of cores.Computational power from multiple machines can be combined bymimicking a cluster.On SSC since July.OutlookIIIMore robust for networks and less tempfiles.Ordering the tasks better.Allow to preserve loops.Jan Ditzen (Heriot-Watt University)multishell7. September 201818 / 19

References IDitzen, J. (2017): “XTDCCE2: Stata module to estimate heterogeneouscoefficient models using common correlated effects in a dynamic panel,” .Jan Ditzen (Heriot-Watt University)multishell7. September 201819 / 19

Stata/IC and Stata/SE use only one core. Stata/MP supports multiple cores, but only commands are speeded up. . I am using Stata 14 and not Stata 15) Setting up the seed using dataset lename. type can be F create creates a dataset with empty seeds for each variation. If option fill is used, then seeds are random numbers.

Related Documents:

Stata is available in several versions: Stata/IC (the standard version), Stata/SE (an extended version) and Stata/MP (for multiprocessing). The major difference between the versions is the number of variables allowed in memory, which is limited to 2,047 in standard Stata/IC, but can be much larger in Stata/SE or Stata/MP. The number of

Categorical Data Analysis Getting Started Using Stata Scott Long and Shawna Rohrman cda12 StataGettingStarted 2012‐05‐11.docx Getting Started Using Stata – May 2012 – Page 2 Getting Started in Stata Opening Stata When you open Stata, the screen has seven key parts (This is Stata 12. Some of the later screen shots .

To open STATA on the host computer, click on the “Start” Menu. Then, when you look through “All Programs”, open the “Statistics” folder you should see a folder that says “STATA”. Click on the folde r and it will open up three STATA programs (STATA 10, STATA 11, and STATA 12). These are all the

There are several versions of STATA 14, such as STATA/IC, STATA/SE, and STATA/MP. The difference is basically in terms of the number of variables STATA can handle and the speed at which information is processed. Most users will probably work with the “Intercooled” (IC) version. STATA runs on the Windows, Mac, and Unix computers platform.

Stata/MP, Stata/SE, Stata/IC, or Small Stata. Stata for Windows installation 1. Insert the installation media. 2. If you have Auto-insert Notification enabled, the installer will start auto-matically. Otherwise, you will want to navigate to your installation media and double-click on Setup.exe to start the installer. 3.

STATA/IC, STATA/SE, and STATA/MP. The difference is basically in terms of the number of variables STATA can handle and the speed at which information is processed. Most users will probably work with the “Intercooled” (IC) version. STATA runs on the Windows (2000, 2003, XP, Vista, Server 2008, or Windows 7), Mac, and Unix computers platform.

- However, as of Stata 11: can record edits and apply them to other graphs . A Visual Guide To Stata Graphics, Third Edition, by Michael Mitchell Stata 12 Graphics Manual (may want to start with "graph intro") Stata 12 Graphics. 3 Stata Graphics Syntax graph graphtype graph bar graph twoway plottype graph twoway scatter

BASICS!OF!SCRUM!IN!AGILE! Abstract(Basic!Scrum!handbookfor!the!beginners!in! the!Agile!world!and!CSM!(Certified!Scrum! Master)!aspirants.! SudaRamakrishna((Thiparthy .