A lot of business forecasting is done by the classical time series methods that are solely based on past history. Methods like exponential smoothing and Box Jenkins are usually applied as time series methods and they are good at pulling trends and seasonal patterns and projecting them forward. However, the future of business (sales, demand etc.) can’t rest solely on its past pattern, there are always external factors “influencing” the behavior. In addition to using the historical data, we can look at relationships between what we are trying to forecast, e.g. sales and explanatory variables like prices, measures of the economy (GDP growth and inflation), promotional schedules, market opportunity etc.
Unlike Regression, in Time series models the sequence of measurements of a process over time introduces auto-correlations between measured values. The serial nature of the measurements must be addressed by careful examination of the lag structure of the model.
Regression technique is slightly different from time-series as it allows for the introduction of explanatory variables. So it’s a model that can not only generate forecasts but also provides insights into relationships between variables. This model also allows for a What if? scenario, say, what if we lower the price/ what if we keep it constant? Here, by varying the scenarios for the explanatory variables we can see what kind of impact it has on the forecasts.
Dynamic regression can also exploit what we know as leading indicators. One of the key things about using variables here is the question, when does the impact occur? It is here that we can think of what is called concurrent relationships. For that let’s take a simple example of price and demand. Thus if we change the price this month, when does the change in demand occur. If its concurrent, then its immediate. On the other hand somethings are not so immediate. Thus in the latter case ,what we need to consider are leading indicators that would take into account the time lag for the change to reflect.
It’s worth noting that dynamic regression is not automated and does require some amount of expertise and experimentation. Besides, even if the model built is good, there can be difficulties in forecasting the exogenous predictors (explanatory variables) which in turn will result in poor forecasts of the dependent variable. As an instance, consider building a forecasting model for the demand for an existing product. It may be price sensitive, so the relationship between price and sales (volume) with changes in price has an impact on the sales. The downside is that if it is concurrent relationship, then we have to incorporate in this scenario the price in the forecast period, i.e. future price. Put in another way, we have to forecast the prices before we can forecast the sales, which isn’t easy, and if we do not do a good job with that, then we are going to get poor forecast of sales down the line. Finally dynamic regression requires a lot of data, so the more amount of data we have, we can better understand the relationships among all the variables.
The Classic Regression Model
Sales= b x Advertising + Constant + Error
The classic regression model is the basis of dynamic regression model. For example when it is a variable we are trying to forecast, in our case its sales also may be called as the dependent variable. In the above example we are relating sales, with advertising, the independent variable or the explanatory variable. The relationship between the variables sales and advertising is quantified by using what is called the regression co-efficient. Here we have symbolized it by “b”. What it means here in simple terms is that if we spend for example a dollar on advertising, we are going to get b amount of sales. The rest of the equation has a constant term, which is just a number, some people call it an Intercept but the idea is what if our independent variable were zero, in our example advertising were zero, what would we sell. So in a month when we are not going to have any advertising expense, we are still going to sell something and the constant term lets us know what that is. Finally we have what is called the error .In text books it’s mostly represented as:
Yt = bXt + C + Et
- Yt: Dependent variable
- Xt: Independent(explanatory) variable
- b: Regression coefficient
- C: Constant term
- Et: Error term
- t: times series
This is the basis or the framework on which the Dynamic regression model can be extended and is basically here where it all begins.
The Variables – Internal and External
The selection of variables is the most challenging part the dynamic regression process. Here is where we need to ask ourselves what are our key sales drivers and what data or variables we have to capture them. For forecasting, it’s important to focus on the handful of the important sales drivers, so if you see graphically and visually identify the explanatory variables that have a direct impact on the sales variable we can categorise all the variables broadly into external and internal variables.
Internal Variables: These are the ones that we can control, for example, price, promotion, number of sales representatives etc. Anything that we are doing to drive sales are the type of internal variables that we would need to keep in mind when we are collecting data.
External Variables: In contrast to internal variables, there are external variables. These are the ones that have the potential to impact sales but we do not have any control over them. These may include variables like weather, economy, competition, demographics trends.
Likewise for both we need to provide forecasts for them .This is pretty straight forward for internal variables as we can plan for prices and promotions etc. For the external variables on the other hand, it is a lot more challenging and if we have a leading indicator, we would be in much better shape. If its concurrent, then it becomes a lot more challenging again, as a lot of external variables (aforementioned), will be difficult to forecast. It is here that the, “What if? “analysis can come handy. For example to forecast external variables we can create a range of possibilities like what if the economy is booming, we then can create a very optimistic sales outlook as against a period where the economy is slowing down, we would then have a very pessimistic sales outlook, and between these two outcomes would our real range for possibilities sales would lie.
Another thing that we would need to determine is the number of variables we would be using in the model ,as the more variables we are going to put in the model, the more unstable the model shall become and subsequently lesser accurate would the outcome of the model would be. Thus for a reasonable forecast we would need to be using the most important and filtered variables.
To summarise it a bit, we actually start of by building a exponentially smoothing model for the data and that can be used as a base line so that we can use dynamic regression to improve upon the results. It is often useful to creating time series model for the dependent variables prior to building the dynamic regression model. Again this is quick and easy and gives us a benchmark to outperform.
What is the work involved in regression?
The most important chore is to collect all the necessary independent variables that we think might be important and if they are not leading indicators to also forecast the future scenarios for them. After we have that, we need to build the model. It is here that a lot of softwares are available in the market that automate the dynamic regression. The dynamic models use the past history as well as the explanatory variables to generate forecasts. This can take different forms but the common forms includes lags of the dependent variables, Cochrane-Orcutt terms. Any forecasting textbooks that include dynamic regression models will also describe these models. What we are doing here is combining the past history much in the same way exponential smoothing in the Box Jenkins do with using explanatory variables.
Building regression models is generally an iterative process. We start with an initial model and experiment with adding or removing variables, lags and dynamic terms. Intuition, hypothesis tests and other diagnostics can guide the process. Historically, automatic algorithms have not performed well for dynamic regression modelling, but significant strides are now being made. Validate the model by making sure that it makes economic sense (e.g. Coefficients must be with the proper sign).The coefficients should be significant (therefore the t-statistics probability should ideally be .99 or above).Errors should not be auto-correlated (Ljung-box probability less than .95 error auto-correlation function un-patterned).In the end dynamic regression should be considered when there are important explanatory variables. It lends insights into relationships between variables and allows for “what-if” scenarios.
Alternative approach such as Vector autoregression (VAR) or vector error correction model (VECM) is relevant for forecasting a system of time series with all of the variables within one model. One-step-ahead forecasts from a VAR are straightforward, while multiple-step-ahead forecasts can be constructed iteratively.
Intervention Analysis using Dummy Variables
Sales promotions are often used by businesses to highlight their products and services, and increase their demand. These promotions cost money and these costs must be justified by structured analysis. Additionally, proposed sales promotions affect future demand (sometimes adversely because of cannibalization effect), so resources must be allocated accordingly to satisfy the promoted demand. In many cases, promotions can be powerful for business revenue, but disruptive for production planning and inventory management.
Business often want to understand whether a promotion activity was successful/ profitable, and whether it generated more customer base for longer term or just capped on short-term buying. This is where promotional analysis comes in picture. By encoding promotional activity as an intervention event (ξt) over limited duration, and looking at historical data to see how the event caused deviation, we can isolate the effect of the promotion. Intervention events are essentially dummy variables regressors or indicator variables (explicitly determined by the time, duration, and type) that are introduced through a transfer function filter (specified by the response) that results in an intervention effect.
Types of Intervention Events
- Point Intervention: Used for events that occur at a single, specific event of time t, like a pulse. An example is buy-two-get-one free offer on certain items during clearance sale. The value of exogenous variable ξt is 1 during the event, and 0 before and after it.
- Step Intervention: For events that start at a certain time t and stay on, in a permanent fashion. ξt initializes as 0 and steps-up to 1 once the event starts.
- Ramp Intervention:
A ramp intervention is a dummy regressor whose values before and during the time of the intervention are zero and whose subsequent values increase linearly thereafter.
- ξt = 0, if t < time
- ξt = (t – time), otherwise
Interventions Response/ Transfer Function
We can define a transfer function to explain the effects of introducing exogenous variable (ξt in this case). In general, a typical transfer function model has the form:
υ(B) = ω0 ω(B) Bb/ δ(B) ω(B) =1 - ω1B - ω2B2 - … - ωsBs δ(B) =1 - δ1B - δ2B2 - … - δrBr
where υ(B) is the transfer function filter, ω0 is the scaling factor, ω(B) is the sth order numerator polynomial, δ(B) is the rth order denominator polynomial, and b accounts for lagged effects. If r = 0, υ(B) is of finite order; otherwise, it is of infinite order. The overall influence of the intervention event is subsequently referred to as the intervention effect (υ(B)ξt) which describes the promotion’s influence over time.