Association measures for interval variables

Oliveira, M. Rosário; Azeitona, Margarida; Pacheco, António; Valadas, Rui

doi:10.1007/s11634-021-00445-8

Association measures for interval variables

Regular Article
Published: 03 July 2021

Volume 16, pages 491–520, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

M. Rosário Oliveira ORCID: orcid.org/0000-0002-5234-3713¹,
Margarida Azeitona²,
António Pacheco¹ &
…
Rui Valadas³

379 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Symbolic Data Analysis (SDA) is a relatively new field of statistics that extends conventional data analysis by taking into account intrinsic data variability and structure. Unlike conventional data analysis, in SDA the features characterizing the data can be multi-valued, such as intervals or histograms. SDA has been mainly approached from a sampling perspective. In this work, we propose a model that links the micro-data and macro-data of interval-valued symbolic variables, which takes a populational perspective. Using this model, we derive the micro-data assumptions underlying the various definitions of symbolic covariance matrices proposed in the literature, and show that these assumptions can be too restrictive, raising applicability concerns. We analyze the various definitions using worked examples and four datasets. Our results show that the existence/absence of correlations in the macro-data may not be correctly captured by the definitions of symbolic covariance matrices and that, in real data, there can be a strong divergence between these definitions. Thus, in order to select the most appropriate definition, one must have some knowledge about the micro-data structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New models for symbolic data analysis

Article Open access 19 September 2022

Basic statistics for distributional symbolic variables: a new metric-based approach

Article 18 May 2014

Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering

References

Anderson TW (2011) Anderson–Darling tests of goodness-of-fit. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 52–54
Chapter Google Scholar
Beranger B, Lin H, Sisson SA (2020) New models for symbolic data analysis. arXiv:1809.03659
Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock HH, Diday E (eds) Analysis of symbolic data, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 106–124
MATH Google Scholar
Billard L (2008) Sample covariance functions for complex quantitative data. In: Proceedings of World IASC conference, Yokohama, Japan, pp 157–163
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487
Article MathSciNet Google Scholar
Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, Hoboken
Book Google Scholar
Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New York
Book Google Scholar
Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdiscip Rev Data Min Knowl Discov 4(4):281–295
Article Google Scholar
Brito P, Duarte Silva AP (2012) Modelling interval data with normal and skew-normal distributions. J Appl Stat 39(1):3–20
Article MathSciNet Google Scholar
Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24
Google Scholar
Cheira P, Brito P, Duarte Silva AP (2017) Factor analysis of interval data. arXiv:1709.04851
Chouakria A (1998) Extension des méthodes d’analyse factorielle à des données de type intervalle. Ph.D. thesis, Université Paris-Dauphine
de Carvalho FAT, Lechevallier Y (2009) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recogn 42(7):1223–1236
Article Google Scholar
de Carvalho FAT, Brito P, Bock HH (2006) Dynamic clustering for interval data based on L2 distance. Comput Stat 21(2):231–250
Article Google Scholar
Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118–1130
Article MathSciNet Google Scholar
Diday E (1987) The symbolic approach in clustering and related methods of Data Analysis. In: Bock H (ed) Proceedings of first conference IFCS, Aachen, Germany. North-Holland
Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32(3):516–541
Article MathSciNet Google Scholar
Duarte Silva AP, Filzmoser P, Brito P (2018) Outlier detection in interval data. J Adv Data Anal Classif 12(3):785–822
Article MathSciNet Google Scholar
Filzmoser P, Brito P, Duarte Silva AP (2014) Outlier detection in interval data. In: Gilli M, Gonzalez-Rodriguez G, Nieto-Reyes A (eds) Proceedings of COMPSTAT 2014, p 11
Fox J, Weisberg S (2011) An R companion to applied regression, 2nd edn. Sage, Thousand Oaks
Google Scholar
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Prentice-Hall Inc, Upper Saddle River
MATH Google Scholar
Le-Rademacher J (2008) Principal component analysis for interval-valued and histogram-valued data and likelihood functions and some maximum likelihood estimators for symbolic data. Ph.D. thesis, University of Georgia, Athens, GA
Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141(4):1593–1602
Article MathSciNet Google Scholar
Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. Comput Graph Stat 21(2):413–432
Article MathSciNet Google Scholar
Lima Neto EA, Cordeiro GM, de Carvalho FA (2011) Bivariate symbolic regression models for interval-valued variables. J Stat Comput Simul 81(11):1727–1744
Article MathSciNet Google Scholar
Maia ALS, de Carvalho FAT, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352
Article Google Scholar
Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min ASA Data Sci J 4(2):157–170
Article MathSciNet Google Scholar
Oliveira MR, Vilela M, Pacheco A, Valadas R, Salvador P (2017) Extracting information from interval data using symbolic principal component analysis. Aust J Stat 46:79–87
Article Google Scholar
Queiroz DCF, de Souza RMCR, Cysneiros FJA, Araújo MC (2018) Kernelized inner product-based discriminant analysis for interval data. Pattern Anal Appl 21(3):731–740
Article MathSciNet Google Scholar
R Core Team: R (2015) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Rahman PA, Beranger B, Roughan M, Sisson SA (2020) Likelihood-based inference for modelling packet transit from thinned flow summaries. arXiv:2008.13424
Salvador P, Nogueira A (2014) Customer-side detection of Internet-scale traffic redirection. In: 16th international telecommunications network strategy and planning symposium (Networks 2014), pp 1–5
Sato-Ilic M (2011) Symbolic clustering with interval-valued data. Procedia Comput Sci 6:358–363
Article Google Scholar
Subtil A (2020) Latent class models in the evaluation of biomedical diagnostic tests and internet traffic anomaly detection. Doctoral’s thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Teles P, Brito P (2015) Modeling interval time series with space-time processes. Commun Stat Theory Methods 44(17):3599–3627
Article MathSciNet Google Scholar
Vilela M (2015) Classical and robust symbolic principal component analysis for interval data. Master’s Thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169
Article Google Scholar
Zhang X, Sisson SA (2020) Constructing likelihood functions for interval-valued random variables. Scand J Stat 47:1–35
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research has been supported by Fundação para a Ciência e Tecnologia (FCT), Portugal, through the projects UIDB/04621/2020, UIDB/50008/2020, PTDC/EEI-TEL/32454/2017, and PTDC/EGE-ECO/30535/2017. We thank the reviewers for their constructive comments and suggestions, which greatly enriched the paper.

Author information

Authors and Affiliations

CEMAT and Mathematics Department, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
M. Rosário Oliveira & António Pacheco
CEMAT, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Margarida Azeitona
Department of Electrical and Computer Engineering, Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Rui Valadas

Authors

M. Rosário Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Margarida Azeitona
View author publications
You can also search for this author in PubMed Google Scholar
António Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Rui Valadas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Rosário Oliveira.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oliveira, M.R., Azeitona, M., Pacheco, A. et al. Association measures for interval variables. Adv Data Anal Classif 16, 491–520 (2022). https://doi.org/10.1007/s11634-021-00445-8

Download citation

Received: 04 January 2019
Revised: 21 May 2021
Accepted: 25 May 2021
Published: 03 July 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11634-021-00445-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Association measures for interval variables

Abstract

Access this article

Similar content being viewed by others

New models for symbolic data analysis

Basic statistics for distributional symbolic variables: a new metric-based approach

Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Association measures for interval variables

Abstract

Access this article

Similar content being viewed by others

New models for symbolic data analysis

Basic statistics for distributional symbolic variables: a new metric-based approach

Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation