Skip to main content
Log in

Clustering discrete-valued time series

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete-valued time series data. With the use of a finite mixture model, several existing techniques such as the selection of the number of clusters, estimation using expectation-maximization and model selection are applicable. The proposed model is then demonstrated on real data to illustrate its clustering applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  • Aitken AC (1926) A series formula for the roots of algebraic and transcendental equations. Proc R Soc Edinb 45:14–22

    Article  Google Scholar 

  • Alonso A, Peña D (2019) Clustering time series by linear dependency. Stat Comput 29(4):655–676

    Article  MathSciNet  Google Scholar 

  • Atkins DC, Baldwin SA, Zheng C, Gallop RJ, Neighbors C (2013) A tutorial on count regression and zero-altered count models for longitudinal substance use data. Psychol Addict Behav J Soc Psychol Addict Behav 27(1):166–177

    Article  Google Scholar 

  • Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the AAAI-94 workshop knowledge discovery in databases, pp 359–370

  • Böckenholt U (1998) Mixed INAR (1) poisson regression models: analyzing heterogeneity and serial dependencies in longitudinal count data. J Econ 89(1–2):317–338

    Article  MathSciNet  Google Scholar 

  • Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388

    Article  Google Scholar 

  • Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684

    Article  MathSciNet  Google Scholar 

  • Caiado J, Maharaj EA, D’Urso P (2015) Time series clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman & Hall/CRC Press, Boca Raton

    Google Scholar 

  • da Silva IMM (2005) Contributions to the analysis of discrete-valued time series. PhD thesis, University of Porto

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • D’Urso P, De Giovanni L, Massari R (2019) Trimmed fuzzy clustering of financial time series based on dynamic time warping. Annals of operations research, pp 1–17

  • D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589

    Article  MathSciNet  Google Scholar 

  • Freeland RK (1998) Statistical analysis of discrete time series with applications to the analysis of workers compensation claims data. PhD thesis, University of British Columbia, Canada

  • Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26(1):78–89

    Article  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter S (2011) Panel data analysis: a survey on model-based clustering of time series. Adv Data Anal Classif 5(4):251–280

    Article  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter S, Pamminger C, Winter-Ember R, Weber A (2011) Model-based clustering of categorical time series with multinomial logit classification. AIP Conf Proc 1281(1):1897–1900

    Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  • Izakian H, Pedrycz W, Jamal I (2015) Fuzzy clustering of time series data using dynamic time warping distance. Eng Appl Artif Intell 39:235–244

    Article  Google Scholar 

  • Krishnapuram R, Joshi A, Nasraoui O, Yil L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 9(4):595–607

    Article  Google Scholar 

  • Lindsay BG (1995) Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5. California: Institute of Mathematical Statistics: Hayward

  • Maharaj EA, D’Urso P, Caiado J (2019) Time series clustering and classification. Chapman & Hall/CRC Press, Boca Raton

    Book  Google Scholar 

  • McNicholas PD (2016a) Mixture model-based classification. Chapman & Hall/CRC Press, Boca Raton

    Book  Google Scholar 

  • McNicholas PD (2016b) Model-based clustering. J Classif 33(3):331–373

    Article  MathSciNet  Google Scholar 

  • McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723

    Article  MathSciNet  Google Scholar 

  • Neighbors C, Lewis MA, Atkins DC, Jensen MM, Walter T, Fossos N, Lee CM, Larimer ME (2010) Efficacy of web-based personalized normative feedback: a two-year randomized controlled trial. J Consult Clin Psychol 78(6):898–911

    Article  Google Scholar 

  • Pamminger C, Frühwirth-Schnatter S (2010) Model-based clustering of categorical time series. Bayesian Anal 5(2):345–368

    MathSciNet  MATH  Google Scholar 

  • R Core Team R: a language and environment for statistical computing

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  Google Scholar 

  • Sobell MB, Sobell LC, Klajner F, Pavan D, Basian E (1986) The reliability of a timeline method for assessing normal drinker college students’ recent drinking history: utility for alcohol research. Addict Behav 11(2):149–161

    Article  Google Scholar 

  • Steutel FW, van Harn K (1979) Discrete analogues of self-decomposability and stability. Ann Prob 7:893–899

    Article  MathSciNet  Google Scholar 

  • Weiss CH (2018) An introduction to discrete-valued time series. John Wiley & Sons, Hoboken

    Book  Google Scholar 

  • Weiß CH (2008) Thinning operations for modeling time series of counts—a survey. AStA Adv Stat Anal 92(2):319–341

    Article  MathSciNet  Google Scholar 

  • Xiong Y, Yeung D (2004) Time series clustering with ARMA mixtures. Pattern Recogn 37(8):1675–1689

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to anonymous reviewers for their very helpful comments. This work was supported by the Canada Research Chairs program and an E.W.R. Steacie Memorial Fellowship (McNicholas).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul D. McNicholas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roick, T., Karlis, D. & McNicholas, P.D. Clustering discrete-valued time series. Adv Data Anal Classif 15, 209–229 (2021). https://doi.org/10.1007/s11634-020-00395-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-020-00395-7

Keywords

Mathematics Subject Classification

Navigation