Forecasting
The Difference Between AR(1) and Lagged Dependent Terms
Recently, a utilities regulator tried replicating a client’s regression model used for forecasting customer growth. The regulator wished to build the regression model from scratch, and so the client explained what variables were used, including an autocorrelation correction term of order 1. The regulator then attempted to estimate the same coefficients on each of the variables, but kept getting different numbers. As it turned out, the regulator had used a lagged dependent variable instead of an AR(1). Once we discovered this, it put a lot of minds at ease, and we thought it would be a good topic to address in our blog.
So what exactly is the difference between using an AR(1) term versus a lagged dependent variable? To implement an AR(1) model in MetrixND, there is a checkbox for activating ARMA Errors in the lower left-hand corner of the Regression and Neural Network objects. To correct for first-order autocorrelation, you would check the ARMA Errors box and then set the value for P equal to 1. With a single X variable, the resulting model is as follows: To implement a lagged dependent, you go to the X Variables list and do a Right Click>Insert LagDep operation. With a single X variable, the resulting model is: In both cases there are two parameters to be estimated, b and r in the AR(1) case, and b and c in the LagDep case. The two equations in the AR(1) specification can be combined to give the following: The first thing to notice about equation (3) is that it is nonlinear in the parameters. The third term on the right-hand side of the equation has the lagged value of the explanatory variable, X, multiplied by b and r. This type of nonlinearity requires a nonlinear optimization approach, which MetrixND handles automatically using nonlinear estimation with the conditional sum of squares approach.
In contrast, the lagged dependent model in equation (2) can be estimated directly using ordinary least squares. In the presence of a lagged dependent, the Durbin-H Statistic is used in place of the Durbin-Watson to test for first-order autocorrelation.
The second thing to notice is that there is a Yt-1 on the right-hand side of equation (3), but it is combined with anXt-1. To understand this, set both autoregressive parameters r and c in equations (2) and (3) to 0.99 and compare. Equation (2) says to add 99% of the lagged value of Y, which will be a large number. Equation (3) says to add 99% of the lagged structural model residual, which will be a small number. Obviously, these are very different equations.
This difference comes home when we look at the behavior of the models in the forecast period, after we run out of Y values. First, think about estimating equation (1) without the AR(1) adjustment. The estimated slope coefficient is an unbiased estimate of the true slope, but you are likely to see strong autocorrelation in the model, because the time profile of the driving variable (say households) has somewhat different cycles than monthly customers. The residuals may be relatively small, but the residual pattern will have runs of positive values followed by runs of negative values, which is strong positive autocorrelation indicated by a Durbin-Watson Statistic far below the neutral value of 2.0.
It can be argued that the inclusion of an AR(1) term provides an improved estimate of the model slopes and more reliable standard errors and t-statistics for those slopes. In the forecast period, the influence of the AR(1) process will die out geometrically. But the key thing is that changes in X will pass through immediately to changes in Y when they occur.
The behavior of the lagged dependent model is entirely different. This is a dynamic model with initial effects and feedback effects. The immediate impact of a change in X comes through the slope coefficient b. In following periods, the feedback effects gradually work themselves out through the lagged dependent variable, and these effects are of size bc, bc2, bc3, … So the ultimate change in Y caused by a 1 unit change in X is b × (1 + c + c2 + c3, +…) = b/(1 - c). For a customer model, the coefficient on the lagged term is likely to be large (as in 0.99). In this case, the ultimate effect is 100 times the first month effect.
So that is the biggest difference. With the AR(1) model, if households jump up by 10%, customers will increase by 10% with the same timing if the elasticity is close to 1.0. With the lagged dependent model having a lag parameter of 0.99 and a long-run elasticity close to 1.0, a 10% increase in households would cause a 0.1% increase in the first month. After 5 years, the result would be about a 4.5% increase, and after 10 years the result would be about a 7% increase. The long-run elasticity is close to 1.0, but it takes a long time for the feedback effects to work out.
A final warning for the AR(1) model: If inclusion of the AR(1) term causes a large change in the slope coefficient and the AR(1) coefficient is close to 1.0, we would recommend switching to a moving average process (say an MA(6) or an MA(12)) instead. This will behave much the same as the AR(1) model, but will preserve the structural model elasticity.
So what exactly is the difference between using an AR(1) term versus a lagged dependent variable? To implement an AR(1) model in MetrixND, there is a checkbox for activating ARMA Errors in the lower left-hand corner of the Regression and Neural Network objects. To correct for first-order autocorrelation, you would check the ARMA Errors box and then set the value for P equal to 1. With a single X variable, the resulting model is as follows: To implement a lagged dependent, you go to the X Variables list and do a Right Click>Insert LagDep operation. With a single X variable, the resulting model is: In both cases there are two parameters to be estimated, b and r in the AR(1) case, and b and c in the LagDep case. The two equations in the AR(1) specification can be combined to give the following: The first thing to notice about equation (3) is that it is nonlinear in the parameters. The third term on the right-hand side of the equation has the lagged value of the explanatory variable, X, multiplied by b and r. This type of nonlinearity requires a nonlinear optimization approach, which MetrixND handles automatically using nonlinear estimation with the conditional sum of squares approach.
In contrast, the lagged dependent model in equation (2) can be estimated directly using ordinary least squares. In the presence of a lagged dependent, the Durbin-H Statistic is used in place of the Durbin-Watson to test for first-order autocorrelation.
The second thing to notice is that there is a Yt-1 on the right-hand side of equation (3), but it is combined with anXt-1. To understand this, set both autoregressive parameters r and c in equations (2) and (3) to 0.99 and compare. Equation (2) says to add 99% of the lagged value of Y, which will be a large number. Equation (3) says to add 99% of the lagged structural model residual, which will be a small number. Obviously, these are very different equations.
This difference comes home when we look at the behavior of the models in the forecast period, after we run out of Y values. First, think about estimating equation (1) without the AR(1) adjustment. The estimated slope coefficient is an unbiased estimate of the true slope, but you are likely to see strong autocorrelation in the model, because the time profile of the driving variable (say households) has somewhat different cycles than monthly customers. The residuals may be relatively small, but the residual pattern will have runs of positive values followed by runs of negative values, which is strong positive autocorrelation indicated by a Durbin-Watson Statistic far below the neutral value of 2.0.
It can be argued that the inclusion of an AR(1) term provides an improved estimate of the model slopes and more reliable standard errors and t-statistics for those slopes. In the forecast period, the influence of the AR(1) process will die out geometrically. But the key thing is that changes in X will pass through immediately to changes in Y when they occur.
The behavior of the lagged dependent model is entirely different. This is a dynamic model with initial effects and feedback effects. The immediate impact of a change in X comes through the slope coefficient b. In following periods, the feedback effects gradually work themselves out through the lagged dependent variable, and these effects are of size bc, bc2, bc3, … So the ultimate change in Y caused by a 1 unit change in X is b × (1 + c + c2 + c3, +…) = b/(1 - c). For a customer model, the coefficient on the lagged term is likely to be large (as in 0.99). In this case, the ultimate effect is 100 times the first month effect.
So that is the biggest difference. With the AR(1) model, if households jump up by 10%, customers will increase by 10% with the same timing if the elasticity is close to 1.0. With the lagged dependent model having a lag parameter of 0.99 and a long-run elasticity close to 1.0, a 10% increase in households would cause a 0.1% increase in the first month. After 5 years, the result would be about a 4.5% increase, and after 10 years the result would be about a 7% increase. The long-run elasticity is close to 1.0, but it takes a long time for the feedback effects to work out.
A final warning for the AR(1) model: If inclusion of the AR(1) term causes a large change in the slope coefficient and the AR(1) coefficient is close to 1.0, we would recommend switching to a moving average process (say an MA(6) or an MA(12)) instead. This will behave much the same as the AR(1) model, but will preserve the structural model elasticity.
Related Articles
Region Selector Select a region and country for the best experience.