Forecasting With Moving Averages - Duke University

3y ago
22 Views
2 Downloads
689.52 KB
28 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Farrah Jaffe
Transcription

Forecasting with moving averagesRobert NauFuqua School of Business, Duke UniversityAugust 20141. Simple moving averages2. Comparing measures of forecast error between models3. Simple exponential smoothing4. Linear exponential smoothing5. A real example: housing starts revisited6. Out-of-sample validation1. SIMPLE MOVING AVERAGESIn previous classes we studied two of the simplest models for predicting a model from its ownhistory—the mean model and the random walk model. These models represent two extremes asfar as time series forecasting is concerned. The mean model assumes that the best predictor ofwhat will happen tomorrow is the average of everything that has happened up until now. Therandom walk model assumes that the best predictor of what will happen tomorrow is whathappened today, and all previous history can be ignored. Intuitively there is a spectrum ofpossibilities in between these two extremes. Why not take an average of what has happened insome window of the recent past? That’s the concept of a “moving” average.You will often encounter time series that appear to be “locally stationary” in the sense that theyexhibit random variations around a local mean value that changes gradually over time in a nonsystematic way. Here’s an example of such a series and the forecasts that are produced for it bythe mean model, yieldinga root-mean-squarederror (RMSE)1 of 121:Time SequencePlot for XConstant mean 455.074900AutocorrelationsX700500300actualResidual Autocorrelations for XforecastConstant mean 0120123456lag1The mean squared error (MSE) statistic that is reported in the output of various statistical procedures is the simpleaverage of the squared errors, which is equal to the population variance of the errors plus the square of the meanerror, and RMSE is its square root. RMSE is a good statistic to use for comparing models in which the mean erroris not necessarily zero, because it penalizes bias (non-zero mean error) as well as variance. RMSE does not includeany adjustment for the number of parameters in the model, but very simple time series models usually have at mostone or two parameters, so this doesn’t make much difference.(c) 2014 by Robert Nau, all rights reserved. Main web site: people.duke.edu/ rnau/forecasting.htm1

Here the local mean value displays a cyclical pattern. The (global) mean model doesn’t pick thisup, so it tends to overforecast for many consecutive periods and then underforecast for manyconsecutive periods. This tendency is revealed in statistical terms by the autocorrelation plot ofthe residuals (errors). We see a pattern of strong positive autocorrelation that gradually fadesaway, rather than a random pattern of insignificant values. In particular, the autocorrelations atlags 1 and 2 are both around 0.5, which is far outside the 95% limits for testing a significantdeparture from zero (the red bands). The 50% (not 95%) confidence limits for the forecasts arealso shown on the time series plot, and they are clearly not realistic. If the model is obviouslywrong in its assumptions, then neither its point forecasts nor its confidence limits can be takenseriously.Now let’s try fitting a random walk model instead. Here are the forecasts 50% limits, andresidual autocorrelations:Time Sequence Plot for XRandom w alk900AutocorrelationsX700500300actualResidual Autocorrelations for XforecastRandom 0123456lagAt first glance this looks like a much better fit, but its RMSE is 122, about the same as the meanmodel. (122 is not “worse” than 121 in any practical sense. You shouldn’t split hairs that finely.)If you look closer you will see that this model perfectly tracks each jump up or down, but it isalways one period late in doing so. This is characteristic of the random walk model, andsometimes it is the best you can do (as in the case of asset prices), but here it seems to be overresponding to period-to-period changes and doing more zigging and zagging than it should. Inthe residual autocorrelation plot we see a highly significant “negative spike” at lag 1, indicatingthat the model tends to make a positive error following a negative error, and vice versa. Thismeans the errors are not statistically independent, so there is more signal that could be extractedfrom the data. The 50% confidence limits for the forecasts are also shown, and as is typical of arandom walk model they widen rapidly for forecasts more than 1 period ahead according to thesquare-root-of-time rule. 2 Here they are too wide—the series appears to have some “inertia” anddoes not change direction very quickly. Again, if the model assumptions appear to be wrong, itsconfidence limits don’t reflect the true uncertainty about the future.It’s intuitively plausible that a moving-average model might be superior to the mean model inadapting to the cyclical pattern and also superior to the random walk model in not being toosensitive to random shocks from one period to the next. There are a number of different ways in295% limits would be three times as wide and way off the chart!2

which a moving average might be computed, but the most obvious is to take a simple average ofthe most recent m values, for some integer m. This is the so-called simple moving averagemodel (SMA), and its equation for predicting the value of Y at time t 1 based on data up to timet is:ˆYt 1 Y t Y t-1 . Y t-m 1mThe RW model is the special case in which m 1.characteristic properties:The SMA model has the following Each of the past m observations gets a weight of 1/m in the averaging formula, so as mgets larger, each individual observation in the recent past receives less weight. Thisimplies that larger values of m will filter out more of the period-to-period noise and yieldsmoother-looking series of forecasts. The first term in the average is “1 period old” relative to the point in time for which theforecast is being calculated, the 2nd term is two periods old, and so on up to the mth term.Hence, the “average age” of the data in the forecast is (m 1)/2. This is the amount bywhich the forecasts will tend to lag behind in trying to follow trends or respond to turningpoints. For example, with m 5, the average age is 3, so that is the number of periods bywhich forecasts will tend to lag behind what is happening now.In choosing the value of m, you are making a tradeoff between these two effects: filtering outmore noise vs. being too slow to respond to trends and turning points. The following sequenceof plots shows the forecasts, 50% limits, and residual autocorrelations of the SMA model for m 3, 5, 9, and 19. The corresponding average age factors are 2, 3, 5, and 10. If you look veryclosely, you’ll see that the forecasts of the models tend to lag behind the turning points in thedata by exactly these amounts. Notice as well that the forecasts get much smoother-looking andthe errors become more positively autocorrelated for higher values of m.3Time Sequence Plot for XSimple mov ing av erage of 3 terms900AutocorrelationsX700500300actualResidual Autocorrelations for Xforecast Simple moving average of 3 20123456lagWith m 3 the plot of SMA forecasts is quite choppy3The oddball negative spike at lag 3 in the 3-term model is of no consequence unless we have some a priori reasonto believe there is something special about a 3-period time lag. What we are concerned with here is whether there issignificant autocorrelation at the first couple of lags and whether there is some kind of overall pattern in theautocorrelations. In any case, residual autocorrelations are not the bottom line, just a red flag that may wave toindicate that there may be a better model out there somewhere.3

Time Sequence Plot for XSimple mov ing av erage of 5 terms900AutocorrelationsX700500300actualResidual Autocorrelations for Xforecast Simple moving average of 5 21203456lagWith m 5 it looks a little smootherTime Sequence Plot for XSimple mov ing av erage of 9 terms900AutocorrelationsX700500300actualResidual Autocorrelations for Xforecast Simple moving average of 9 21203456lagWith m 9 the forecasts are even smoother but starting to lag behind turning points noticeably—theaverage age of data in the forecast is 5. The errors are also starting to be positively autocorrelated.Time Sequence Plot for XSimple mov ing av erage of 19 terms900AutocorrelationsX700500300actualResidual Autocorrelations for Xforecast Simple moving average of 19 20With m 19 the forecasts have a nice smooth cyclical patternbut they lag behind turning points by 10 periods, alas.4123456lagThere is now very significant positiveautocorrelation in the errors,indicating long runs of consecutiveerrors with the same sign.

2. COMPARING MEASURES OF FORECAST ERROR BETWEEN MODELSWhat’s the best value of m in the simple moving average model? A good value is one thatyields small errors and which otherwise makes good sense in the decision-making environmentin which it will be used. In the Forecasting procedure in Statgraphics there is a nifty (if I do sayso myself) model-comparison report that lets you make side-by-side comparisons of error statsfor 1-step-ahead forecasts for up to 5 different time series models, which could be SMA modelswith different values of m or different types of models altogether. Here is the model comparisontable for the random walk model and the four SMA models shown above:Model ComparisonData variable: XNumber of observations 99Start index 1.0Sampling interval 1.0Models(A) Random walk(B) Simple moving average of 3 terms(C) Simple moving average of 5 terms(D) Simple moving average of 9 terms(E) Simple moving average of 19 termsEstimation 32013-4.66414-6.43064Models B, C, and D (m 3, 5, 9) haveabout equally good error statsThe various error stats are as follows:RMSE: root mean squared error4 (the most common standard of goodness-of-fit, penalizesbig errors relatively more than small errors because it squares them first; it isapproximately the standard deviation of the errors if the mean error is close to zero)MAE: mean absolute error (the average of the absolute values of the errors, more tolerant ofthe occasional big error because errors are not squared)MAPE: mean absolute percentage error (perhaps better to focus on if the data varies over awide range due to compound growth or inflation or seasonality, in which case you maybe more concerned about measuring errors in percentage terms)ME: mean error (this indicates whether forecasts are biased high or low—should be close to 0)MPE: mean percentage error (ditto in percentage terms)4For a regression model, the RMSE is almost the same thing as the standard error of the regression—the onlydifference is the minor adjustment for the number of coefficients estimated. In calculating the RMSE of aforecasting model, the sum of squared errors is divided by the sample size, n, before the square root is taken. Incalculating the standard error of the regression, the sum of squared errors is divided by n-p, where p is the number ofcoefficients estimated, including the constant. If n is large and p is small, the difference is negligible. So, focusingon RMSE as the bottom line is the same thing as focusing on the standard error of the regression as the bottom line.5

Usually the best measure of the size of a typical error is the RMSE, provided that the errors areapproximately normally distributed and that you worry about making a few big mistakes morethan you worry about making a lot of little ones. At any rate, you software assumes that this iswhat you want to minimize, because it estimates the model parameters by the least-squaresmethod.5 However, MAE and MAPE are easier for non-specialists to understand, so they mightbe useful numbers for a presentation. They are also less sensitive to the effects of big outliersand so might give a better estimate of the size of an “average” error when the distribution oferrors is far from normal. Also, MAPE gives relatively more weight to accuracy in predictingsmall values because it is computed in percentage terms. ME and MPE are usually not veryimportant, because bias (a non-zero value for the average error) is usually small when parametersare estimated by minimizing squared error. 6 However, if there is a consistent upward ordownward trend in the data, then models that do not include a trend component (like the SMAmodel) will have biased forecasts. Here the mean error is very slightly positive, because there isa very slight positive trend.Models B, C, and D are very similar on all the error-size statistics, and their residualautocorrelation plots are OK (although a bit of positive autocorrelation is creeping in when youhit m 9). Only models A and E are obviously bad in terms of error measures and residualautocorrelations. C is a little better than B or D in terms of RMSE, but the difference ishairsplitting. You don’t need to go with the model whose RMSE is the absolute lowest in a caselike this—you can exercise some discretion based on other considerations. In this case there arequalitative differences between these 3 models that perhaps also should be considered. Model B(m 3) is the most responsive to the last few data points (which might be a good thing) but it alsodoes the most zigging and zagging (which might be a bad thing). Model D (m 9) is much moreconservative in revising its forecasts from period to period, and model C (m 5) strikes a balancebetween the two.The SMA model can be easily customized in several ways to fine-tune its performance. If thereis a consistent trend in the data, then the forecasts of any of the SMA models will be biasedbecause they do not contain any trend component. The presence of a trend will tend to give anedge to models with lower values of m, regardless of the amount of noise that needs to be filteredout. You can fix this problem by simply adding a constant to the SMA forecasting equation,analogous to the drift term in the random-walk-with-drift model:ˆYt 1 Y t Y t-1 . Y t-m 1m5 dSimple Moving Averagewith TrendThe simple moving average model does not have any continuous-ranged parameters for Statgraphics to estimatefor you, but you can do your own estimation on the basis of RMSE (or other criteria) by manual means, as done here.6ME is likely to be significantly different from zero only in cases where a log or deflation transformation has beenused, with model parameters estimated by minimizing the squared error in transformed units. In Statgraphics, if youspecify a log transformation or fixed-rate deflation transform as a model option inside the Forecasting procedure,this is what is done, and you shouldn’t be surprised if you end up with some bias in the untransformed forecasts. Ifyou use a log transformation, then you are implicitly minimizing mean squared percentage error, so you shouldexpect MAPE to be relatively lower and MPE to be relatively closer to zero than without a log transformation.6

Another way to fine-tune the SMA model is to use a tapered moving average rather than anequally-weighted moving average. For example, in the 5-term moving average model, you couldchoose to put only half as much weight on the newest and oldest values, like this:1ˆYt 1 2Y t Y t-1 Y t-2 Y t-3 12Y t-4Tapered MovingAverage (5-term)4This average is centered 3 periods in the past, like the 5-term SMA model, but when anunusually large or small value is observed, it doesn’t have as big an impact when it first arrivesor when it is finally dropped out of the calculation, because its weight is ramped up or down overtwo periods. So, a tapered moving average is more robust to outliers in the data.3. SIMPLE EXPONENTIAL SMOOTHINGThe SMA model is an easy-to-understand method for estimating the local mean value aroundwhich a time series is thought to be randomly varying, but putting equal weight on the last mobservations and no weight on any previous observations is usually not the best way to averagevalues that are arriving consecutively in time. Intuitively, all the past values have some relevance,but each newer one is more relevant than older ones for predicting what is going to happen next.It would make more sense to gradually decrease the weights placed on the older values. Andfor the same reason, we should expect forecasts to be less accurate (and therefore to have widerconfidence intervals) the farther into the future they are extended. The SMA model does notreflect this. It lacks any underlying theory (a “stochastic equation of motion”) to explain why orby how much it should be harder to predict 2 or 3 periods ahead than to predict 1 period ahead,and therefore—very implausibly—its confidence intervals for long-horizon forecasts do notwiden at all.These shortcomings of the SMA problem are addressed by the simple exponential smoothingmodel (SES), which is otherwise known as the exponentially weighted moving average model,because it weights the past data in an exponentially decreasing manner, analogous to thediscounting of cash flows over time. The SES model is the most widely used time series modelin business applications, partly because it does a good job of forecasting under a wide range ofconditions and partly because computationally it is extremely simple. You easily forecast 10,000different things in parallel using this model. The latter property isn’t quite as important as itonce was, given the size and speed of modern computers, but it has contributed to the popularityof this model over the last 50 years.There are different ways in which you can write the forecasting equation for the SES model.You don’t need to memorize them, but it is at least worth seeing them in order to appreciate theintuitive appeal of this model. One way to write the model is to define a series L that representsthe current level (i.e., local mean value) of the series as estimated from data up to the present.The value of L at time t computed recursively (i.e., from its own previous value) like this:Lt Yt (1 )Lt-17

where is a “smoothing constant” that is between 0 and 1. Thus, the estimated level at time t iscomputed by interpolating between the just-observed value and the previous estimated level,with weights of and 1- ., respectively. This seems like an intuitively reasonable way to usethe latest information to update the estimate of the current level. The series L gets smoother as approaches zero, because it doesn’t change as fast with each new observation of Y. The modelassumes that the series has no trend, so it predicts zero change in the level from one period to thenext. Given this assumption, the forecast for period t 1 is simply the estimated level of theseries at time t:Ŷ t 1 L tThe only reason for defining the separate series L here is to emphasize that what we are doing isestimating a local mean before turning around and using it as a forecast for the next period. (Inmore general versions of the model, we will also estimate a local trend.) It is equivalent to justsay that the next forecast is computed by interpolating between the last observed value and theforecast that had been made for it:Simple ExponentialˆˆSmoothing,version 1Y α Y (1-α )Yt 1ttWritten in this way, it is clear that the random walk model is an SES model with 1, and theconstant-forecast model (of which the mean model is a special case) is an SES model with 0.Hence the SES model is an interpolation between the mean model and the random walk modelwith respect to the way it responds to new data. As such it might be exp

far as time series forecasting is concerned. The mean model assumes that the best predictor of what will happen tomorrow is the average of everything that has happened up until now. The random walk model assumes that the best predictor of what will happen tomorrow is what happened today, and all previous history can be ignored.

Related Documents:

Dual moving averages are moving averages of moving averages, and according to symbols are written as MA (k k), which means moving averages as much as k periods of moving averages as much as k periods [10]. The steps used in calculating a double moving average are as follows: 1. Calculates the first moving average Mt Yt Yt-1 Yt-2 n (1) 2.

Foundations of Stocks and Options Class 7: Moving Averages & Indicators "!! Moving Averages" Moving Average Overview Very useful because most institutional traders use moving averages" Based on an average of the closing prices over a time period (20 Day average of the close

Section 6 Using Averages, Weighted Averages, and Indices Section 6 Using Averages, Weighted Averages, and Indices Since middle school, you have worked with meanmedian, , and possibly midrange. Each of these is an average, or measure of center. ExamplE 1 Computing Mean, Median, and Midrange

Moving averages smooth out a data series and make it easier to identify the direction of the trend. Because past price data is used to form moving averages, they are considered lagging, or trend following, indicators. Moving averages will not predict a change in trend, but rather follow behind the current trend.File Size: 448KBPage Count: 15

MY.DUKE.EDU/STUDENTS- Personal info & important links Navigate Campus CALENDAR.DUKE.EDU-University events calendar STUDENTAFFAIRS.DUKE.EDU- Student services, student groups, cultural centers DUKELIST.DUKE.EDU- Duke’s Free Classifieds Marketplace Stay Safe EMERGENCY.DUKE.EDU-

„Doris Duke of the illuminati Duke family was an heiress (at 12 years old) to the large tobacco fortune of the Duke family. She was the only child of American tobacco Co. founder James Buchanan Duke. Doris Duke, herself a member of the illuminati. Doris Duke had 5 houses (which have served as sites for illuminati rituals) – one in Beverly

The Duke MBA—Daytime Academic Calendar 2015-16 9 Preface 10 General Information 11 Duke University 11 Resources of the University 13 Technology at Fuqua 14 Programs of Study 15 The Duke MBA—Daytime 15 Concurrent Degree Programs 17 The Duke MBA—Weekend Executive 18 The Duke MBA—Global Executive 18 The Duke MBA—Cross Continent 19

There are five averages. Among them mean, median and mode are called simple averages and the other two averages geometric mean and harmonic mean are called special averages. Arithmetic mean or mean Arithmetic mean or simply the mean of a variable is defined as the sum of the observations divided by the number of observations.