Introduction

This paper makes the case that the familiar Koyck (1954) or geometric lag model, which is often used to model marketing responseFootnote 1 to for example advertising, is robust to temporal aggregation. The assumption is that there is an adequate Koyck model for high-frequency data. The focus is on four cases, and these are the case (a) of high frequency, say, sales, and advertising, the case (b) of high-frequency sales and aggregated advertising, the case (c) of aggregated sales and higher frequency advertising and the well-known caseFootnote 2 (d) of both low-frequency sales and low-frequency advertising. The two cases (b) and (c) which involve one of the two variables being aggregated have not been studied before.

Temporal aggregation is relevant for the analysis of marketing response, in particular when examining carry-over effects, see also Leone (1995). When the true marketing response process occurs at a high-frequency level, of say hours, and the data are only available at a lower frequency level, say days, the estimation results from models for lower frequency data cannot be one-to-one translated to the true higher frequency process. This point was already madeFootnote 3 in the seminal paper by Clarke (1976), who argued that the model for aggregated data must differ from the model for the high-frequency data.

A key issue for temporal aggregation is that aggregation makes the model to change. That is, if one fits an econometric time series model, like the Koyck model, to sales and advertising data, then the model changes due to aggregation.Footnote 4 Clarke (1976) was the first to recognize that not modifying the model leads to biased results and incorrect advertising duration intervals. In the present paper, it is demonstrated that the Koyck model is very useful even when the data are all or partly available in temporally aggregated format. In fact, it is shown that the Koyck model is robust to such aggregation. The focus is on just two variables for notational convenience, but extensions to more variables are conceptually straightforward.

The outline of the paper is as follows. Section 2 presents the Koyck model. Section 3 discusses four variants of temporal aggregation. Each variant is illustrated with a real-life example. The data source is Tellis et al. (2000). A scatter plot of sales against advertising in Fig. 2 suggests that there is a positive correlation between the two variables. Mathematical derivations are delegated to the technical appendix. Section 4 presents the results of some simulation experiments, and Sect. 5 concludes.

The Koyck model

The Koyck (1954) model, or geometric lag model, yields insights in the key parameters on marketing response. When sales are denoted as \(y_{t}\), and advertising (or any other marketing-mix variable) as \(x_{t}\), and L is the familiar lag operator with

$$L^{k} y_{t} = y_{t - k} ,\,k = \cdots - 2, - 1, 0, 1, 2, \ldots ,$$

and deleting the intercept for notational convenience, the original Koyck model reads as follows:

$$y_{t} = \beta x_{t} + \beta \lambda x_{t - 1} + \beta \lambda^{2} x_{t - 2} + \ldots + \varepsilon_{t} ,$$
(1)

where \(\varepsilon_{t}\) is an uncorrelated white noise process with mean 0 and variance \(\sigma_{\varepsilon }^{2}\) and \(\left| \lambda \right| < 1\). When using the L operator, it reads as follows:

$$y_{t} = \left( {\beta + \beta \lambda L + \beta \lambda^{2} L^{2} + \ldots } \right)x_{t} + \varepsilon_{t} .$$
(2)

Because \(\left| \lambda \right| < 1\), it holds that.

$$\beta + \beta \lambda L + \beta \lambda^{2} L^{2} + \cdots = \beta \left( {1 + \lambda L + \lambda^{2} L^{2} + \cdots } \right) = \frac{\beta }{1 - \lambda L}.$$
(3)

Hence, the infinite regression model in (1) can be written as follows:

$$y_{t} = \frac{\beta }{1 - \lambda L}x_{t} + \varepsilon_{t} .$$
(4)

This expression suggests what became known as the “Koyck transformation,” i.e., when both sides of (4) are multiplied with \(1 - \lambda L\), one obtains

$$y_{t} = \lambda y_{t - 1} + \beta x_{t} + \varepsilon_{t} - \lambda \varepsilon_{t - 1} .$$
(5)

The Koyck model has an autoregressive term \(\lambda y_{t - 1}\), a term involving current advertising \(\beta x_{t}\) and a so-called moving average term \(\varepsilon_{t} - \lambda \varepsilon_{t - 1}\). From the model parameters, one can derive the short-run (or current or direct) effect of advertising, using the partial derivative:

$$\frac{{\partial y_{t} }}{{\partial x_{t} }} = \beta .$$
(6)

The total (or carry-over) effect of advertising follows from

$$\frac{{\partial y_{t} }}{{\partial x_{t} }} + \frac{{\partial y_{t} }}{{\partial x_{t - 1} }} + \frac{{\partial y_{t} }}{{\partial x_{t - 2} }} + \cdots = \beta + \beta \lambda + \beta \lambda^{2} + \cdots = \frac{\beta }{1 - \lambda }.$$
(7)

As the focus is on the direct effect and the carry-over effect, in practice, one usually considers the unrestricted version of (5), i.e.,

$$y_{t} = \lambda y_{t - 1} + \beta x_{t} + \varepsilon_{t} - \theta \varepsilon_{t - 1} ,$$
(8)

where \(\lambda\) and \(\theta\) are not restricted to be equal from the start.Footnote 5 Suppose that an analyst knows that the advertising response process works at the high-frequency data level, denoted as t, with \(t = 1,2, \ldots ,N\). For example, t can be associated with weeks within a period of 4 weeks. Suppose further that the sales and advertising data can be available after temporal aggregation at a lower frequency, denoted as T. For example, weekly data could be aggregated to four-weekly data. To introduce some formal notation, consider the polynomial \(S\left( L \right)\) defined as

$$S\left( L \right) = 1 + L + L^{2} + \cdots + L^{K - 1}$$
(9)

which amounts to a temporal aggregation of the high-frequency data over K periods. In the case of weeks and hours, K would be equal to 168. Hence, \(T = 1,2, \ldots ,\frac{N}{K}\). Further, consider the notion of skip sampling at every Kth observation at frequency t. This means that, for t equal to K, 2 K, 3 K, and so on, there is an observation at the lower frequency T, with \(= 1,2,3, \ldots , \frac{N}{K}\). For the hourly case, where the first hour of the week can be 1.00AM on Monday morning, then K = 168 concerns 12.00PM on Sunday evening.

Four cases of aggregation

In relation to the frequencies t and T, there are now four cases of potential interest and practical relevance.

High-frequency sales and high-frequency advertising

The first and most simple case is when the analyst has data on sales and advertising both at the high frequency t. A Koyck model as in (8) can be estimated using Maximum Likelihood for the illustrative data, where now also an intercept is included. This results in the following estimates (with estimated standard errors in parentheses) of the two key parameters:

$$\hat{\lambda } = 0.939 \left( {0.031} \right)$$
$$\hat{\beta } = 0.279 \left( {0.139} \right).$$

The \(R^{2}\) of this model is 0.682. The short-run effect is 0.279, and the total long-run effect is

$$\frac{0.279}{{1 - 0.939}} = 4.543.$$

Suppose now that this model for the weekly data corresponds with the true frequency of the sales and advertising relationship. The topic of interest in this paper is that it can happen that one does not have the weekly data, but for example, only four-weekly data. This can occur when commercials are only broadcasted once per four weeks, while sales are measured per week. Or, the other way around that commercials are broadcasted once per week, while sales are only measured at a four-weekly level.

It might be the case that one has (a) weekly data for both sales and advertising as above, (b) weekly data on sales but only four-weekly data for advertising, (c) four-weekly data on sales and weekly data on advertising, or (d) four-weekly data for both sales and advertising. The question is now whether in cases (b), (c), and (d), one can estimate the parameters concerning the true high, weekly, frequency-relating sales with advertising. The key assumption is of course that (a) amounts to the correct frequency, but, note again, this is here for illustration only. Whether it is true for the illustration data is unknown, and therefore, later on a simulation experiment will be carried out. In the high-frequency case, skip sampling will lead to suboptimal inference in terms of efficiency as information will be lost. Consider

$$y_{t} = \lambda y_{t - 1} + \beta x_{t} + \varepsilon_{t} - \lambda \varepsilon_{t - 1}$$

and K periods later:

$$y_{t + K} = \lambda y_{t + K - 1} + \beta x_{t + K} + \varepsilon_{t + K} - \lambda \varepsilon_{t + K - 1} .$$

Skip-sampling towards the frequency implied by K would allow the inclusion of \(y_{t - 1}\) and \(x_{t}\) in the model, but not the moving average term with \(\varepsilon_{t - 1}\), \(\varepsilon_{t + K - 1}\), and so on. This, thus, leads to bias in estimating \(\lambda\). So, when all high-frequency data are available, it is recommended to consider a model for the high-frequency data and not to temporally aggregate the high-frequency data. See also Tellis and Franses (2006) for evidence based on simulations.

High-frequency sales and low-frequency advertising

The second case (b) is where sales are observed at frequency t, while advertising is observed at the lower frequency T after aggregating over K units. In the Appendix, it is derived that the modified Koyck model becomes

$$Y_{T} = \lambda S\left( L \right)y_{t + K - 1} + \beta X_{T} + u_{T} - \theta u_{T - 1} .$$
(10)

The parameters in (10) can be estimated using the unrestricted maximum likelihood method.Footnote 6 Note that the parameters in (10) are estimated for N/K observations instead of N, and temporal aggregation means loss of efficiency.

For the running example with the data in Figs. 1 and 2, the key estimation results for (10) (with an intercept) for 25 effective observations are

$$\hat{\lambda } = 0.942 \left( {0.023} \right)$$
$$\hat{\beta } = 0.363 \left( {0.209} \right)$$

The \(R^{2}\) of this model is 0.974. The short-run effect is 0.363, whereas the total long-run effect at the high frequency is

$$\frac{0.363}{{1 - 0.942}} = 6.259.$$

We see that this long-run effect is a bit larger than the “true” high-frequency effect of 4.543, while the “true” short-run effects are close to each other.

Fig. 1
figure 1

source is Tellis et al. (2000)

Weekly sales and advertising data. The data

Fig. 2
figure 2

Scatter plot of weekly sales against weekly advertising

Low-frequency sales and high-frequency advertising

The third caseFootnote 7 is when sales are observed at frequency T, after aggregating over K units, while advertising is observed at the higher frequency t. In the Appendix, it is derived that the modified Koyck model becomes

$$Y_{T} = \lambda^{K} Y_{T - 1} + \beta \left( {1 + \lambda L + \lambda^{2} L^{2} + \cdots + \lambda^{K - 1} L^{K - 1} } \right)S\left( L \right)x_{t} + \varepsilon_{T} - \lambda^{K} \varepsilon_{T - 1} .$$
(11)

When the unrestricted version of this model is estimated, that is, when we replace \(\varepsilon_{T} - \lambda^{K} \varepsilon_{T - 1}\) in (12) by \(\varepsilon_{T} - \theta \varepsilon_{T - 1}\), then, for the running example, the estimation results obtained using iterative Maximum Likelihood for (11) are

$$\hat{\lambda }^{4} = 0.947 \left( {0.191} \right),$$
$$\hat{\beta } = - 0.133 \left( {0.361} \right),$$

which gives \(\hat{\lambda } = \sqrt[4]{0.947} = 0.986\). The \(R^{2}\) of this model is 0.769. The short-run effect is, however, not significant. This may perhaps reflect that the weekly frequency cannot be assumed to be the true frequency.

Low-frequency sales and low-frequency advertising

Finally, the fourth case (d) arises where both sales and advertising are observed only after temporal aggregation at the low-frequency T. Tellis and Franses (2006) conveniently show that when it is assumedFootnote 8 that an advertising impulse occurs only once in each Kth period, and at the same time within that Kth period, (5) can become

$$Y_{T} = \lambda^{K} Y_{T - 1} + \beta_{1} X_{T} + \beta_{2} X_{T - 1} + \varepsilon_{T} - \lambda^{K} \varepsilon_{T - 1} ,$$
(12)

where \(\beta_{1}\) and \(\beta_{2}\) are functions of \(\beta\) and \(\lambda\), such that

$$\frac{{\beta_{1} + \beta_{2} }}{{1 - \lambda^{K} }} = \frac{\beta }{1 - \lambda }.$$

Tellis and Franses (2006) recommend that if aggregation is necessary, one should collect data such that the key assumption on the advertising process holds.

For the illustrative four-weekly data, the key estimation results for (12), where we replace \(\varepsilon_{T} - \lambda^{K} \varepsilon_{T - 1}\) in (12) by \(\varepsilon_{T} - \theta \varepsilon_{T - 1}\), are

$$\hat{\lambda }^{4} = 0.902 \left( {0.176} \right)$$
$$\hat{\beta }_{1} = 1.195 \left( {0.967} \right)$$
$$\hat{\beta }_{2} = - 1.109 \left( {0.763} \right)$$

which gives \(\hat{\lambda } = \sqrt[4]{0.902} = 0.975\). The \(R^{2}\) of this model is 0.796. The short-run effect \(\beta = \frac{{\left( {1 - \lambda } \, \right)\left( {\beta_{1} + \beta_{2} } \right)}}{{1 - \lambda^{4} }}\) is 0.022, whereas the total long-run effect for the high-frequency data would be

$$\frac{1.195 - 1.109}{{1 - 0.902}} = 0.966.$$

We now see that this long-run effect is about one fifth of the “true” high-frequency effect of 4.543. This result may perhaps be driven by the potential fact that the advertising impulse does not occur only once in each four-weekly period, at least for these illustrative data.

Simulation experiments

The empirical results in the previous section for just a single illustrative case in part seem to confirm that the Koyck model is robust to temporal aggregation, at least, after proper modification. Cases (c) and (d) did not work so well in the illustration, although the parameter \(\lambda\) is estimated at a consistent value across the four cases. As this is just a single empirical case with actual data, we now turn to simulation experiments.

The data-generating process (DGP) is

$$y_{t} = \lambda y_{t - 1} + \beta x_{t} + \varepsilon_{t} - \lambda \varepsilon_{t - 1}$$

with \(\varepsilon_{t} \sim N\left( {0,1} \right)\), \(y_{0} = 0\), and \(x_{t}\) is the absolute value of a draw from a \(N\left( {0,1} \right)\) distribution halfway the K-period, and otherwise, it is zero. So, \(x_{t}\) has a positive value after each K periods, where the size of the value can change over time. Tellis and Franses (2006) use the same format for the simulations. Here, we set \(K = 5\).

The sample size is set at 1000. The \(x_{t }\) obtains a positive non-zero value at observation 3 within K = 5. The short-run effect is set at \(\beta = 5\), and we set the decay parameter at \(\lambda = 0.8.\) Hence, the true carry-over effect is \(\frac{5}{1 - 0.8} = 25\).

Table 1 reports the estimates of \(\lambda\), \(\beta , \theta\) and \(\frac{\beta }{1 - \lambda }\), when averaged over 100 replications, which is a reasonable amount for a sample size of 1000. Each time, as in the illustration before, we use an unrestricted version of the Koyck model, in terms of the moving average part. The simulation results seem to confirm the theory that the Koyck model is robust to temporal aggregation, for cases (b) and (d), although we observe some bias for case (c). This last bias seems to be caused by (on average) too small an estimate of \(\lambda\) and too small an estimate of \(\beta\).

Table 1 Average estimates of \(\lambda\), \(\beta , \theta ,\) and \(\frac{\beta }{1 - \lambda }\), when averaged over 100 replications, \(K = 5\), sample size is 1000

Conclusion

This paper has shown that the Koyck (1954) model is a useful model to estimate advertising response at the true high frequency, even when the analyst has temporally aggregated sales data or temporally aggregated advertising data, or both. Inference using the Koyck model is robust to temporal aggregation. An empirical example, in part, and a simulation exercise, almost fully, emphasized the theoretical claims. Further research should concern more illustrations to see how the Koyck model fares in other empirical settings. Also, more theoretical results can be derived that in case, the Koyck model is extended to more than a single explanatory variable.

The practical implications are that, given a situation of partial or full temporal aggregation of the data, a practitioner can retrieve the proper current and carry-over effects of marketing efforts on marketing response at the high frequency.