Skip to main content
Log in

Nonparametric estimation of customer segments from censored sales panel data

  • Research Article
  • Published:
Journal of Revenue and Pricing Management Aims and scope

Abstract

Specifically addressing different customer segments via revenue management or customer relationship management, lets firms optimize their market response. Identifying such segments requires analysing large amounts of transactional data. We present a nonparametric approach to estimate the number of customer segments from censored panel data. We evaluate several model selection criteria and imputation methods to compensate for censored observations under different demand scenarios. We measure estimation performance in a controlled environment via simulated data samples, benchmark it to common clustering indices and imputation methods, and analyse an empirical data sample to validate practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The concept of bootstraps was developed by Efron et al. (1979). It refers to increasing limited data sets by sampling new data sets from the original one. This is done via resampling with replacement.

References

  • Agresti, Alan, Brian Caffo, and Pamela Ohman-Strickland. 2004. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis 47 (3): 639–653.

    Article  Google Scholar 

  • Azadeh, Shadi Sharif. 2013. Demand Forecasting in Revenue Management Systems. École Polytechnique de Montréal: PhD diss.

  • Azadeh, Shadi Sharif, M. Hosseinalifam, and G. Savard. 2015. The impact of customer behavior models on revenue management systems. Computational Management Science 12 (1): 99–109.

    Article  Google Scholar 

  • Azadeh, Shadi Sharif, P. Marcotte, and G. Savard. 2015. A non-parametric approach to demand forecasting in revenue management. Computers & Operations Research 63: 23–31.

    Article  Google Scholar 

  • Bronnenberg, Bart J., Michael W. Kruger, and Carl F. Mela. 2008. Database paper-The IRI marketing data set. Marketing science 27 (4): 745–748.

    Article  Google Scholar 

  • Caliński, Tadeusz, and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3 (1): 1–27.

    Article  Google Scholar 

  • Cohen, Joel E., and Uriel G. Rothblum. 1993. Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra and its Applications 190: 149–168.

    Article  Google Scholar 

  • Efron, B., et al. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7 (1): 1–26.

    Article  Google Scholar 

  • Farias, Vivek F., Srikanth Jagabathula, and Devavrat Shah. 2013. A nonparametric approach to modeling choice with limited data. Management Science 59 (2): 305–322.

    Article  Google Scholar 

  • Haensel, Alwin, and Ger Koole. 2011. Estimating unconstrained demand rate functions using customer choice sets. Journal of Revenue & Pricing Management 10 (5): 438–454.

    Article  Google Scholar 

  • Halkidi, Maria, and Michalis Vazirgiannis. 2001. “Clustering validity assessment: Finding the optimal partitioning of a data set.” In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 187–194. IEEE.

  • Härdle, Wolfgang, and Enno Mammen. 1993. Comparing nonparametric versus parametric regression fits. The Annals of Statistics, 1926–1947.

  • Hubert, Lawrence, and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2 (1): 193–218.

    Article  Google Scholar 

  • Jagabathula, Srikanth, and Gustavo Vulcano. 2015. “A Model to Estimate Individual Preferences Using Panel Data.” Available at SSRN 2560994.

  • Kasahara, Hiroyuki, and Katsumi Shimotsu. 2014. Non-parametric identification and estimation of the number of components in multivariate mixtures. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (1): 97–111.

    Article  Google Scholar 

  • Kunnumkal, Sumit. 2014. Randomization approaches for network revenue management with customer choice behavior. Production and Operations Management 23 (9): 1617–1633.

    Article  Google Scholar 

  • Linoff, Gordon S, and Michael JA Berry. 2011. Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons.

  • Liti‘ere, Saskia, Ariel Alonso, and Geert Molenberghs. . 2008. The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in medicine 27 (16): 3125–3144.

  • Little, Roderick JA., and Donald B. Rubin. 2014. Statistical analysis with missing data. Hoboken: Wiley.

    Google Scholar 

  • Liu, Yanchi, Zhongmou Li, Hui Xiong, Xuedong Gao, and Junjie Wu. 2010. “Understanding of internal clustering validation measures.” In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 911–916. IEEE.

  • McLachlan, Geoffrey, and David Peel. 2000. Finite mixture models. Hoboken: Wiley.

    Book  Google Scholar 

  • Meissner, Joern, Arne Strauss, and Kalyan Talluri. 2013. An enhanced concave program relaxation for choice network revenue management. Production and Operations Management 22 (1): 71–87.

    Article  Google Scholar 

  • Moe, Wendy W., and Peter S. Fader. 2004. Dynamic conversion behavior at e-commerce sites. Management Science 50 (3): 326–335.

    Article  Google Scholar 

  • Müller, Sven, and Knut Haase. 2014. Customer segmentation in retail facility location planning. Business Research 7 (2): 235–261.

    Article  Google Scholar 

  • Queenan, Carrie Crystal, Mark Ferguson, Jon Higbie, and Rohit Kapoor. 2007. A comparison of unconstraining methods to improve revenue management systems. Production and Operations Management 16 (6): 729–746.

    Article  Google Scholar 

  • Ratliff, Richard M., B. Venkateshwara Rao, Chittur P. Narayan, and Kartik Yellepeddi. 2008. A multi-flight recapture heuristic for estimating unconstrained demand from airline bookings. Journal of Revenue and Pricing Management 7 (2): 153–171.

    Article  Google Scholar 

  • Revolution Analytics and Steve Weston. 2015. doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package. R package version 1.0.10. https://CRAN.R-project.org/packag e=doParallel.

  • Robin, Jean-Marc., and Richard J. Smith. 2000. Tests of rank. Econometric Theory 16 (02): 151–175.

    Article  Google Scholar 

  • Rousseeuw, Peter J. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20: 53–65.

    Article  Google Scholar 

  • Rusmevichientong, Paat, David Shmoys, Chaoxu Tong, and Huseyin Topaloglu. 2014. Assortment optimization under the multinomial logit model with random choice parameters. Production and Operations Management 23 (11): 2023–2039.

    Article  Google Scholar 

  • Salch, J. 1997. Unconstraining passenger demand using the EM algorithm. In Proceedings of the INFORMS Conference.

  • Saleh, R. 1997. Estimating lost demand with imperfect availability indicators. In AGIFORSReservations and Yield Management Study Group Meeting Proceedings. Montreal, Canada.

  • Skwarek, Daniel Kew. 1996. Competitive impacts of yield management system components: forecasting and sell-up models. Master’s thesis. Cambridge, Mass.: Massachusetts Institute of Technology, Flight Transportation Laboratory.

  • Talluri, Kalyan. 2009. A finite-population revenue management model and a risk-ratio procedure for the joint estimation of population size and parameters. Available at SSRN 1374853.

  • Talluri, Kalyan T., and Garrett J. Van Ryzin. 2004. The theory and practice of revenue management, vol. 68. New York: Springer.

    Book  Google Scholar 

  • Van Buuren, Stef. 2012. Flexible imputation of missing data. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Van Ryzin, Garrett, and Gustavo Vulcano. 2015. A market discovery algorithm to estimate a general class of nonparametric choice models. Management Science 61 (2): 281–300.

    Article  Google Scholar 

  • Weatherford, Larry R., and Stefan Pölt. 2002. Better unconstraining of airline demand data in revenue management systems for improved forecast accuracy and greater revenues. Journal of Revenue and Pricing Management 1 (3): 234–254.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes F. Jörg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Technical details

This appendix adds some technical results for the methodology presented in section “Methodology”. Theorem 1 describes the asymptotic distribution of critical root statistic CRT.

Let \(C = (c_1, \ldots , c_p)\) be a \(p \times p\) matrix, where \(c_i\) is the eigenvector of matrix \(PP^T\) that belongs to the i-th largest eigenvalue \(\lambda _i\) and let \(D = (d_1, \ldots , d_q)\) be the \(q \times q\) matrix, where \(d_i\) is the eigenvector of matrix \(P^TP\) that belongs to its i-th largest eigenvalue. Partition both C and D such that \(C = (C_r, C_{p-r}) = (c_1, \ldots , c_r, c_{r+1}, \ldots , c_p)\) and \(D = (D_r, D_{q-r}) = (d_1, \ldots , d_r, d_{r+1}, \ldots , d_q)\).

Theorem 1

(Robin and Smith (2000)) Assume that

$$\begin{aligned}&(i)&\sqrt{n} \; vec(\hat{P} - P) \overset{d}{\longrightarrow } \mathcal {N}(0, \Omega ), \text { where } \Omega \text { is finite and of rank } s, 0< s \le pq, \\&(ii)&\text {If } r_0 < q \le p, \text { the matrix } (D_{q - r_0} \otimes C_{p - r_0})^T \Omega (D_{q - r_0} \otimes C_{p - r_0}) \text { is non-zero}. \end{aligned}$$

If \(r_0 < q\), then statistic \(CRT(r_0)\) (5) has an asymptotic distribution described by \(\sum _{i = 1}^t \gamma _i Z_i^2\), where \({t \le \min \{s, (p - r_0)(q - r_0)\}}\), and \(\gamma _1 \ge \ldots \ge \gamma _t\) are the non-zero ordered eigenvalues of the matrix \((D_{q - r_0} \otimes C_{p - r_0})^T \Omega (D_{q - r_0} \otimes C_{p - r_0})\), and \(\{Z_i\}_{i = 1}^t\) are independent standard normal random variables.

Here, \(\overset{d}{\longrightarrow }\) indicates the convergence in distribution and \(A \otimes B\) denotes the outer or tensor product of matrix A with matrix B.

Proposition 1 establishes the convergence properties of the information criteria in section “Model selection”.

Proposition 1

(Kasahara and Shimotsu (2014), Proposition 6) Let

$$\begin{aligned} \tilde{r} = \text {arg\, min}_{1 \le r \le q} \; (CRT(r) - f(N) g(r)) \end{aligned}$$

be the selected rank for an information criterion. If \(f(N) \rightarrow \infty\), \(f(N)/N \rightarrow 0\) and \(P(g(r) - g(r_0) < 0) \rightarrow 1\) for all \(r > r_0\) as \(N \rightarrow \infty\), then \(\tilde{r} \overset{p}{\rightarrow } r_0\).

Here, \(\overset{p}{\longrightarrow }\) indicates the convergence in probability.

Now, \(g(r) = (p-r)(q-r)\) satisfies these conditions, if either \((d_r \otimes c_r)^T \Omega (d_r \otimes c_r) > 0\) for \(1 \le r \le q\) or if there exists a pair (ij) such that \((d_i \otimes c_j)^T \Omega (d_i \otimes c_j) > 0\) with \(r+1 \le i \le p\) and \(r+1 \le j \le q\) for any \(1 \le r \le q\). Thus, for \(g(r) = (p-r)(q-r)\) and \(f(N) = 2\) or \(f(N) = log(N)\), respectively, we have AIC or BIC, respectively. Note that the assumption \(f(N) \rightarrow \infty\) is not satisfied by the AIC and therefore AIC does not estimate the right rank asymptotically.

Estimation from observations with continuous attributes

If we are interested in segmenting demand when observing continuous attributes, we may apply the original continuous version of Kasahara and Shimotsu (2014). To do so, we need to slightly change the way we count observations. In this setting, segments may be interpreted as groups of customers that purchase products in similar price categories or products with similar quality. Let \(x_t^i\) be observations with continuous attributes at time \(t \in \{1,2\}\). Since attributes are continuous, we cannot count the respective observations as individual elements, as in (4). To arrange observations into a matrix, we create partitions \(I_1^1, I_2^1, \ldots , I_{n_1}^1\) and \(I_1^2, I_2^2, \ldots , I_{n_2}^2\) with \(n_1, n_2 \in \mathbb {N}^+\) such that

figure a

denotes the union of disjoint sets. Without loss of generality, choose \(n_1 > n_2\). For example, when observing product prices, we may create non-overlapping price intervals, in which we count observations.

Similar to (4), we define the empirical probability as

$$\begin{aligned} \hat{P}(x_1 \in I_k^1, x_2 \in I_l^2) = \frac{1}{N} \sum _{i = 1}^N \mathbb {1}_{(I_k^1, I_l^2)}(x_1^i, x_2^i), \qquad \forall k \in \{1, \ldots , n_1\}, l \in \{1, \ldots , n_2\}. \end{aligned}$$
(6)

Here, we use sample quantiles of observations to partition the continuous attribute space. The remaining estimation procedure is then performed identically to the discrete case.

Supplementary results on estimation performance

Here, we show a complete recollection of the results. All figures contain diagrams for \(\alpha\)-levels 0.01, 0.05,  and 0.1.

We report the full results for observations of discrete attributes complementing Fig. 1 in the main text. The diagrams show the performance of the three model selection approaches AIC (dashed line), BIC (dotted line), and SHT (solid line). The x-axes show the sample size. Note that sample sizes vary depending on the number of segments and the gaps between data points are not equidistant. The y-axis displays the selection ratio of the correct number of segments (Figs. 6, 7, 8).

Fig. 6
figure 6

Discrete attributes—selection ratio of \(\hat{M}=3\) for \(D_3^*\)

Fig. 7
figure 7

Discrete attributes—selection ratios for \(D_4^*\) (a) and \(D_5^*\) (b)

Fig. 8
figure 8

Discrete attributes—selection ratios for \(D_6^*\) (a) and \(D_6'\) (b)

Supplementary results on sensitivity to segment similarity

Discrete attributes

This appendix reports results for different degrees of segment similarity complementing Fig. 2 in the main text. The results show selection ratios for \(\hat{M} = 2\). The x-axis shows the overlap of the two customer segments, the y-axis shows the sample size. Darker shades indicate a low selection ratio, whereas lighter shades indicate a high selection ratio (Fig. 9).

Fig. 9
figure 9

Selection ratios for two segments with increasingly overlapping consideration sets

Continuous attributes

This appendix reports results for different degrees of segment similarity complementing the discussion of continuous attributes “Estimation performance for choices characterized by continuous attributes” section in Appendix. The results show selection ratios for \(\hat{M} = 2\). The x-axis shows the expected value \(\mu ^1_2\), the y-axis shows the expected value \(\mu ^2_2\). Values towards the lower left corner are closer to \(\mu _1 = (0, 0)\). Darker shades indicate a low selection ratio, whereas lighter shades indicate a high selection ratio (Fig. 10).

Fig. 10
figure 10

Selection ratios for bivariate, normally distributed random variables

Supplementary results on performance on censored panel data

This appendix reports results for censored sales panel data of discrete attributes complementing Fig. 3 in the main text. The diagrams show the performance of the two model selection approaches BIC and SHT. The x-axes show the estimated number of segments. The y-axis displays the selection ratio. The sample size is indicated by the shading of the bars (Figs. 11, 12).

Fig. 11
figure 11

Selection ratios for heuristics with censored panel data and three customer segments

Fig. 12
figure 12

Selection ratios for heuristics with censored panel data and six customer segments

Estimation performance for choices characterized by continuous attributes

This appendix assesses estimation performance for observations differentiated by common continuous attributes C. In this case, we define customer segments through bivariate normal distributions, e.g., creating segments with a normally distributed willingness to pay. To validate the performance in other settings, we also analysed data sets with exponentially distributed random variables. On these, the approach yields similar results to the results shown in this section.

Again, we consider scenarios with up to six segments, where the basic approach to generating simulated observations is similar to that for discrete attributes. We first draw the allocation to a segment and then a realisation of a bivariate, normally distributed random variable \(X \sim \mathcal {N}_2(\mu , \Sigma )\), with segment-specific expected value \(\mu\) and covariance matrix \(\Sigma\). Here, we assume uncorrelated random variables with variance 1, i.e., \(\Sigma = I_2\), where \(I_2\) denotes the two-dimensional identity matrix. This variable defines the allocated segment. The segments are specified as follows:

$$\begin{aligned}&X_i \sim \mathcal {N}_2\left( (\mu _i^1, \mu _i^2), \Sigma _i \right) \; \text {with } \Sigma _i = I_2 \; \text { and}&\\&\mu _1 = (0,0),&\mu _4&= (5,6),&\\&\mu _2 = (2,1),&\mu _5&= (-2,-3),&\\&\mu _3 = (4,3),&\mu _6&= (-1,3).&\end{aligned}$$

Note that \(\mu _3^2 = \mu _6^2\), such that these two segments generate similar data for the second purchase decision. This is deliberately chosen to assess the behaviour of the estimation approach in such a case. We create data sets from scenarios including three to six customer segments and refer to these scenarios by \(C_3\) to \(C_6\). Table 7 lists all analysed scenarios. Again, the number of observations varies per data set, depending on the scenario. As previously, we report our findings only for \(C_3\) and \(C_6\). We omit any further discussion of the results from the benchmarked cluster indices as given in section “Estimation performance”, limiting ourselves to the conclusion that the findings from applying these indices to continuous data are similar to those from applying them to discrete data.

Table 7 Overview of scenarios with continuous attributes
Fig. 13
figure 13

Continuous attributes—selection ratios for \(C_3\) (a) and \(C_6\) (b)

Figure 13 shows the selection ratio of the correct number of segments for different sample sizes. Different line patterns mark the performance of the model selection criteria AIC, BIC, and SHT. The x-axis shows the sample size. Note that the gaps between data points are not equidistant. The y-axis displays the selection ratio.

Figure 13a shows the selection ratio of \(\hat{M}=3\) for \(C_3\). Sample sizes include \(\{100;\ 500;\) \(1000;\ 5000;\ 10,000\}\). AIC performs relatively well for small samples, but exhibits its known overfitting for larger samples. SHT outperforms BIC for smaller samples and is slightly less accurate for larger samples. Overall, we can deduce that for three customer segments, 1000 individual observations suffice to accurately estimate the number of segments with both BIC and SHT. At this point, the selection ratio for SHT is \(91\%\), whilst BIC selects three segments in \(86\%\) of cases.

Because \(\mu _3^2\) and \(\mu _6^2\) are identical in \(C_6\), the empirical probability matrix \(\hat{P}\) (6) counts these observations in the same row or column. This results in a linear dependence within the matrix and a decreasing rank. Thus, segments 3 and 6 are expressed as a single segment in the finite mixture model. Therefore, Fig. 13b shows the selection ratio of \(\hat{M} = 5\) for \(C_6\). Sample sizes include \(\{100;\ 500;\ 1000;\ 5000;\) \(10,000;\ 20,000\}\).

Again, AIC performs comparatively well for small samples, but overfits the number of segments for more than 5000 observations. Similar to \(C_3\), SHT slightly outperforms BIC for smaller samples and is slightly less accurate for larger samples. Overall, their performance is comparable for these settings. The analysis suggests that for five segments, 5000 observations suffice to obtain reasonable estimates. For that sample size, SHT selects five segments in \(98\%\) of the samples, whereas BIC exhibits a selection ratio of \(95\%\).

In the following, we also report the results for the remaining \(\alpha\)-values. Figures 14 and 15 show the performance of the three model selection approaches AIC (dashed line), BIC (dotted line), and SHT (solid line). The x-axes show the sample size. Note that sample sizes vary depending on the number of segments and the gaps between data points are not equidistant. The y-axis displays the selection ratio of the correct number of segments.

Fig. 14
figure 14

Continuous attributes—selection ratios for \(C_3^*\) (a) and \(C_4^*\) (b)

Fig. 15
figure 15

Continuous attributes—selection ratios for \(C_5^*\) (a) and \(C_6^*\) (b)

Supplementary results for the empirical data sample of airline bookings

This appendix reports results for the empirical data set complementing Fig. 5 in the main text. We resampled the data set 1000 times and report the selection ratios of BIC and SHT. The x-axes show the estimated number of segments. The y-axes display the selection ratio. The lighter grey bars indicate the performance of BIC, whilst the darker grey bars show the performance of SHT. AIC is not represented due to its overfitting behaviour (Figs. 16, 17).

Fig. 16
figure 16

Artificially censored empirical data sample

Fig. 17
figure 17

Empirical data sample

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jörg, J.F., Cleophas, C. Nonparametric estimation of customer segments from censored sales panel data. J Revenue Pricing Manag 21, 393–417 (2022). https://doi.org/10.1057/s41272-021-00339-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41272-021-00339-6

Keywords

Navigation