FRED-MD: A Monthly Database For Macroeconomic Research

2y ago
13 Views
2 Downloads
372.02 KB
28 Pages
Last View : 14d ago
Last Download : 2m ago
Upload by : Sabrina Baez
Transcription

FRED-MD: A Monthly Database for Macroeconomic ResearchMichael W. McCracken Serena Ng†December 20, 2014AbstractThis paper presents and describes a large, monthly frequency, macroeconomic database withthe goal of establishing a convenient starting point for empirical analysis that requires ”BigData.” The dataset mimics the coverage of those already used in the literature but has threeappealing features. First, it is designed to be updated in real-time using the FRED database.Second, it will be publicly accessible, facilitating the replication of empirical work. Third, itwill relieve researchers from the task of data changes and revisions. This will be handled by thedata desk at the Federal Reserve Bank of St. Louis. We show that factors extracted from ourdataset share the same predictive content as those based on the various vintages of the so-calledStock-Watson data. In addition, we suggest that diffusion indexes constructed as the partialsum of the factor estimates can potentially be useful for the study of business cycle chronology.JEL Classification: C30, C33, G11, G12.Keywords: diffusion index, forecasting, big data, factors. Research Division; Federal Reserve Bank of St. Louis; P.O. Box 442; St.Louis, MO nt of Economics, Columbia University, 420 W. 118 St. Room 1117, New York, NY 10025; Serena.Ngat Columbia.edu.We thank Kenichi Shimizu and Joseph McGillicuddy for excellent research assistance. Financial support to thesecond author is provided by the National Science Foundation, SES-0962431. The views expressed here are those ofthe individual authors and do not necessarily reflect official positions of the Federal Reserve Bank of St. Louis, theFederal Reserve System, or the Board of Governors.

1IntroductionA new trend in research is to make use of data that two decades ago were either not available,or that computational constraints prohibit their use. This is true not just in medical scienceand engineering research, but also in many disciplines of social science. Economic research is noexception. Instead of working with T time series observations of N variables where T is large andN is quite small, macroeconomic policy and forecasting can now consider many more variableswithout compromising information in the time series dimension. When we work with datasetsthat have large N and large T , we are in what Bernanke and Boivin (2003) referred to as a datarich environment. Of course, more data is not always desirable unless the data are informativeabout the economic variables that we seek to explain. As such, assembling a good database is animportant part of economic research. However, not only is the process time consuming, it ofteninvolves judgment on details with which academic researchers have little expertise. The task canbe overwhelming when N is large.Over the course of the year, we have worked with the FRED data desk at the Federal ReserveBank of St. Louis to develop FRED-MD, a macroeconomic database of 135 monthly U.S. indicators.The data will be updated in a timely manner and can be downloaded for free from the en/. We hope that easy access to the data willstimulate more research that exploits the data rich environment. Working with a more or lessstandard database should also facilitate replication and comparison of results. This paper providesbackground information about FRED-MD.To better understand the motivation of this project, it is useful to give some history of big dataanalysis in macroeconomic research. The first personalized U.S. macroeconomic database appearsto be compiled by Stock and Watson (1996) for analyzing parameter instability over the sample1959:1-1993:12. Their data collection was guided by four considerations:First, the sample should include the main monthly aggregates and coincident indicators. Second, the data should include important leading economic indicators. Third,the data should represent broad class of variables with differing time series properties.Fourth, the data should have consistent historical definitions or when the definitionsare inconsistent, it should be possible to adjust the series with a simple additive ormultiplicative splice. [Stock and Watson (1996), p.12]Using these criteria, Stock and Watson collected 76 series mostly drawn from CITIBASE.The data included industrial production, weekly hours, personal inventories, monetary aggregates,interest rates and interest-rate spreads, stock prices, and consumer expectations. The data were1

then classified into 8 categories: output and sales, employment, orders, inventories, prices, interestrates, exchange rates, government spending /taxes, and miscellaneous leading indicators. Thisdataset was expanded in Stock and Watson (1998, 2002) to include 215 series, subsequently classifiedinto 14 categories. In this iteration, the data were taken from the DRI/McGraw Hill database.Although over 200 series were collected, the statistical analysis was based on a balanced panelof 149 series. The exercise consists of compressing information in the 149 series into a handfulof factors, and then use the factor estimates as predictors. This methodology has come to beknown as ‘diffusion index forecasting’. Marcellino et al. (2006) analyzed 171 series for the sample1959:1-2002:12 to assess different implementations of diffusion forecasting.In an influential paper, Bernanke and Boivin (2003) considered the use of big data in monetarypolicy analysis.1 This marked the beginning of using big data not just for forecasting, but also instructural macroeconomic modeling. Bernanke et al. (2005) used 120 series to estimate a factoraugmented autoregression (FAVAR). Boivin and Giannoni (2006) considered estimation of DSGEmodels using 91 variables and interpreted measurement error as the difference between the dataand model concepts. Data for these exercises were taken from the DRI database.Up till this point, more data were collected than used in analysis because some of these serieswere available only from 1967:01. The next phase of this work focused primarily on balancedpanels. Stock and Watson (2005, 2006) constructed data for 132 macroeconomic time series overthe sample 1959:01-2003:12. The data, used to estimate structural FAVARs, were organized into14 categories: real output and income, employment and hours, real retail, manufacturing and tradesales, consumption, housing starts, sales, real inventories, orders, stock prices, exchange rates,interest rates and spreads, money and credit quantity aggregates, price indexes, average hourlyearnings, and miscellaneous. The data were draw primarily from Global Insights Basic EconomicsDatabase (GSI), with a few series from the Conference Board, and a few series based on the authors’calculations. This database of 132 series is sometimes referred to as the ”Stock-Watson dataset” inthe research community. Bai and Ng (2008) used the data to compare diffusion index forecastingwith predictors selected by hard thresholding.Ludvigson and Ng (2011) updated the Stock-Watson data to 2007:12 and more broadly classified the data into 8 groups: output and income, labor market, housing, consumption, orders andinventories, money and credit, interest rate, and exchange rates, prices and stock market. Factorsestimated using the entire dataset were compared with an alternative estimator that takes advantage of the structure of the eight blocks. The data were again updated in Jurado et al. (2013) to1They used three datasets to assess the robustness of their results. The first combined real time data based onStark and Croushore (2001). The second was a version of the first but with revised data. The third used the 215variables used in Stock and Watson (1998).2

2011:12 and merged with 147 monthly financial time series to construct an index of macroeconomicuncertainty. The database has since been updated to 2013:05. Hereafter, we distinguish the vintages of GSI data by the end of sample. The 2003 vintage is the original data used in Stock andWatson (2005) and the 2011 vintage is the data used in Jurado et al. (2013).Many researchers have collected larger or smaller datasets but the coverage of the data is quitesimilar to the original Stock-Watson data. This is not surprising because most of the data come fromthe statistical agencies. Whether the database has more or fewer data series depends on desired levelof disaggregation. For example, Stock and Watson (2014a) collected 270 disaggregated monthlyseries for the sample 1959:01-2010:08 to estimate turning points. For macroeconomic forecasting,most analyses use between 100 and 150 series.2FRED-MDIf the same variables were reported year after year, the data updating exercise is straightforward.Assuming one has access to GSI, one would download the data and run a few programs. A datasetsatisfying the first three criteria outlined in Stock and Watson (1996) should then be available.But the process is more involved in practice. The main difficulty is almost entirely due to changingdefinitions and data availability. Even with careful selection of variables that meet the fourthcriterion of Stock and Watson (1996), researchers often have to deal with data revisions that tookplace for one reason or another. As an example, an oil price variable is widely used in empirical work.Yet, the OILPRICE series in FRED which existed since 1946:1 has recently been discontinued. Inits place is a WTI series that only starts from 1986:1. If one was to analyze 50 years of monthlydata, one cannot avoid having to melt or splice data from different sources, which is what makesthe data updating process difficult.To get a sense of the problems involved, consider the process of updating the data from thevintage which ended in 2011:12 to 2013:12. Based on the mnemonics of the 2011 data, used inJurado et al. (2013), we started by retrieving from GSI the same data but for the extended sample.It was found that some series have changed names, so the first task was to locate the variablesunder their new names. Then quarterly implicit price deflators from the NIPA tables and monthlynominal consumption from the BLS were used to construct real monthly consumption. Next,we gathered data for business loans from FRED, the nominal effective exchange rates from theIMF, the Michigan index of consumer sentiment index from the Institute of Survey Research, andmerged the GSI help wanted index with the calculations Barnichon (2010). This completed thedata collection exercise. The next step was to compare the new and old data over the overlappingsample to check for irregularities. It was found that the housing series in the 2014 dataset starts at3

a later date, orders and inventories have a new chain base, the exchange rate variables have beenrevised because of changes in trade weights, and several other series have gone through minor datarevisions. To deal with such problems, replacing non-existing data by close substitutes or splicingseems routine. It is difficult if not impossible to automate the process as judgment is involved.Two researchers starting with the same raw data can end up using different data for analysis.One advantage of taking the data from GSI is that it is ‘one-stop shopping ’ as over 100 seriescan be retrieved from one source, albeit with missing values for some variables. But the data areavailable only on a subscription basis; researchers without access will have to look to alternativeswhich inevitably involve multiple sources. There is also a catch to using the GSI data. The licensingagreement understandably prohibits redistribution of the data. Yet it is increasingly common tobe required by journals to post the data used in empirical work. Authors are often at a loss whatcan and cannot be posted.FRED-MD seeks to make available a database with three objectives in mind. First, it will bepublicly available so that US and international researchers alike have access to the same data thatsatisfy the four criteria established in Stock and Watson (1996). Second, it will be updated on atimely basis. Third, it will relieve the researchers from the burden of handling data changes andrevisions. With these objectives in mind, we collect 135 monthly series with coverage that is similarto the original Stock-Waton data. A full list of the data is given in Appendix I, along with thecomparable series in the GSI database. The suggested data transformation for each series is givenin the column under tcode. As of the writing of this paper, the latest vintage is 2014:10. Whilewe provide a csv file with data for this sample, but FRED-MD is not a balanced panel for a numberof reasons:(1) The S&P PE ratio (series 84) is taken from Shiller’s website and is released with roughly a6-month lag. Hence observations are missing at the end of the sample,(2) The Michigan Survey of Consumer Sentiment (series 131) is available only quarterly prior to1977:11 and recent data is available in FRED only with a 1-year lag,(3) The trade-weighted exchange rate (series 102) is available in FRED only through 1973:1 andwe have not found other documented sources with which to splice the series,(4) Seasonally adjusted housing permits (series 55-59) only exist through 1960:01,(5) Currently, FRED primarily holds NAICS data (though some older SIC data exists and isused whenever possible) from the Census Manufacturers Survey and hence a few Value ofManufacturers’ Orders components like Nondefense Capital Goods (series 66) and especiallyConsumer Goods (series 64) have a limited history.4

Of course, the dataset can easily be turned into a balanced panel by removing these series involved.In MATLAB, these series can be identified by checking if the mean over the full sample is a NaN.We have not made outlier adjustments to the data. To be consistent with the previous GSI dataused in empirical work, we start the data in 1959:01. In the first vintage of FRED-MD with thesample ending in 2014:08, the balanced panel has 122 series. A balanced panel consisting of 128series can be constructed if the sample terminates in 2014:05.In addition to data revisions and definitional changes, going from GSI to FRED necessitatesfinding close substitutes to replace the proprietary variables constructed by GSI. A major appeal ofFRED-MD is that this task is left to the data experts. In first vintage of FRED-MD, 21 out of 135series require some adjustments to the raw data available in FRED. We tag these variables withan ”x” to indicate that they been adjusted and thus differ from the series at source. A summaryof the adjustments is as follows:Number4VariableReal Manu. and il/Food SalesIP: Resid. UtilitiesCapacity UtilizationHelp WantedHelp Wanted to unemployedInitial ClaimsNew orders (durables)New orders (non-defense)Unfilled orders (durables)Business InventoriesInventory to salesConsumer credit to P.I.3month Comm. Paper3month CP -FFSwitzerland/US FXJapan/US FXUK/US FXCdn/US FXCrude OilConsumer sentimentAdjustments(i) adjust M0602BUSM144NNBR for inflation using PCEPI(ii) seasonal adjust with ARIMA X12(iii) splice with NAICS series CMRMTSPLsplice SIC series RETAIL with NAICS series RSAFSFRB series IP.B51222.SFRB series CAPUTL.B00004.Sfrom Barnichon (2010)HWI/UNEMPLOYsplice monthly series M08297USM548NNBR with weekly ICNSAsplice SIC series AMDMNO and NAICS series DGORDERsplice SIC series ANDENO and NAICS series ANDENOsplice SIC series AMDMUO and NAICS series AMDMUOsplice SIC series and NAICS series BUSINVsplice SIC series and NAICS series ISRATIONONREVSL to PIsplice M13002US35620M156NNBR, CP3M with CPF3Msplice CP3M-FedFundsfilled back to 1959 from Banking/Monetary statisticsfilled back to 1959 from Banking/Monetary statisticsfilled back to 1959 from Banking/Monetary statisticsfilled back to 1959 from Banking/Monetary statisticssplice OILPRICE with MCOILWTICOsplice UMSCENT1 with UMSCENTSome comments on these adjustments are in order. To replace the GSI data for manufacturing andtrade series, we have to deal with the fact that data for orders, sales, and inventories are availablefrom FRED starting in 1992 when the standard industrial classification (SIC) was changed to the5

North American Industry Classification System (NAICS). These series in FRED-MD have beenspliced with the SIC historical data when available from the CENSUS. Consumer credit outstandingin GSI is replaced by non-revolving consumer credit. The exchange rate data in FRED start from1971, the three month commercial paper rate series has been discontinued since 1997:08 though a3 month financial commercial paper rate series existed since 1997:01. The FRED-MD data splicethe data with historical data from the Banking and Monetary Statistics series produced by theFederal Reserve Board of Governors and obtained from FRASER. The West Texas oil price whichwas discontinued in 2013:07 is spliced with a West Texas-Oklahoma series available since 1986:01.We note that some these adjusted series are of independent interest even if the entire database isnot.Going forward, the FRED-MD data will come in one (csv) file available for download . The series listed in the Appendix is thecore of FRED-MD but it is likely that some series will eventually be retired and new ones willbe gradually added. The help-wanted column of newspapers is no longer as good a measure oflabor market slackness as it once was, as job-search websites like monster.com have become morepopular. At the moment, there is not enough data to build a HWI series based on internet dataalone, but it should eventually be possible to splice the old help wanted index with one that betterreflects the modern economy. This work will be handled by the experts at the data desk at FRED.3Factor EstimatesA primary use of big macro datasets is diffusion index forecasting and FAVAR which augments anotherwise standard vector autoregression with factors estimated from the big panel of data. Thismethodology has been found to produce superior forecasts over competing methods, especially thosethat are based on a small set of predictors. The factors serve the purpose of dimension reduction. Ina large N and large T setting, the space spanned by the latent factors can be consistently estimatedby static or dynamic principal components.2We begin by examining the properties of the factors estimated from the vintage of FRED-MDthat spans 1959:1 to -2014:08. After transforming the data, our estimation is based on the sample1960:3-2014:08 for a total of T 655 observations. As mentioned earlier, a few series have missingobservations in the beginning or the end of the sample. We estimate the static factors by PCAadapted to allow for missing values. It is essentially the EM algorithm given in Stock and Watson(2002). In brief, observations that are missing are initialized to the unconditional mean based onthe non-missing values (which is zero since the data are demeaned and standardized) so that the2See Forni et al. (2000, 2005), Boivin and Ng (2005), Bai and Ng (2008), Stock and Watson (2006).6

panel is re-balanced. A r 1 vector f factors ft and a N r matrix of loadings λ are estimatedfrom this panel using the normalization that λ0 λ/N Ir . The missing value for series i at timeb0 fbt . This is multiplied by the standard deviation of the series and thet is updated from zero to λimean is re-added back. Treating resulting value as an observation for series i and time t, the meanand variance of the complete sample are re-calculated. The data are demeaned and standardizedagain, and the factors and loadings are re-estimated from the updated panel. The iteration stopswhen the factor estimates do not change.3 After the factors are estimated, we regress the i-th seriesin the dataset on a set of r (orthogonal) factors. For k 1, . . . , r, this yields Ri (k)2 for series i.The incre

uential paper, Bernanke and Boivin (2003) considered the use of big data in monetary policy analysis.1 This marked the beginning of using big data not just for forecasting, but also in structural macroeconomic modeling. Bernanke et al. (2005) used 120 series to est

Related Documents:

Life of Fred: Edgewood. You may be wondering’ Edgewood? Thinking of . Life of Fred: Apples, you might wonder if Edgewood is some kind of tree. I’ Thinking of . Life of Fred: Butterflies, Life of Fred: Cats, and . Life of Fred: Dogs, you might suppose that Edgewood

fred.dat. Note that the extension must be of type .dat. In order to run your own data file . fred.dat, once more double- click on the executable icon . slope64.exe, and when prompted type the basename of the data file, namely . fred . If all goes well, the following additional files will appear in your folder: fred.res, fred.msh, fred.vec . and .

able to shed light on Fred's WC challenge. Sargent and Burgh put together an analysis of Fred's WC claims, discovering quickly that Fred's had the thirteenth highest experience mod in the State of Washington! It was clear that Fred's needed a complete change in corporate culture, from the top-down. A Snapshot of Fred's Appliance 400,000

XSEDE HPC Monthly Workshop Schedule January 21 HPC Monthly Workshop: OpenMP February 19-20 HPC Monthly Workshop: Big Data March 3 HPC Monthly Workshop: OpenACC April 7-8 HPC Monthly Workshop: Big Data May 5-6 HPC Monthly Workshop: MPI June 2-5 Summer Boot Camp August 4-5 HPC Monthly Workshop: Big Data September 1-2 HPC Monthly Workshop: MPI October 6-7 HPC Monthly Workshop: Big Data

supports data on FRED as well as historical vintage data on Archival FRED (ALFRED). freddescribe and fredsearch provide tools to describe series in the database and to search FRED for data based on keywords and tags. Quick start Before running any of the commands below, you will need to obtain a FRED key and set it using set fredkey.

average monthly income or net operating loss. Small Business Startup Eligibility Calculation Reduction in Average Monthly Income Net Operating Loss 1.2 X Average Monthly Income (Prior to March 1, 2020) Average Monthly Income (March 1, 2020 to August 31st 2020) Monthly Cash Flow ( ) Monthly Cash Flow (-) Monthly Cash Flow (-) Monthly Cash Flow ( )

January monthly files – March 15th February monthly files – April 15th March monthly files – May 15th April monthly files – June 15th May monthly files – July 15th June monthly files – August 15th July monthly files – September 15

Billings Clinic Cost Total Monthly Cost Billings Clinic Cost Total Monthly Semi-Monthly Monthly Monthly Semi-Monthly Monthly Monthly Cost Employee Only 99.45 198.90 419.52 618.42 128.01 256.02 415.75 671.77 Preventive Care The Billings Clinic medical plans cover in-network preventive care at 100%. This includes routine screenings and .