当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Association measures for interval variables
Advances in Data Analysis and Classification ( IF 1.4 ) Pub Date : 2021-07-03 , DOI: 10.1007/s11634-021-00445-8
M. Rosário Oliveira 1 , António Pacheco 1 , Margarida Azeitona 2 , Rui Valadas 3
Affiliation  

Symbolic Data Analysis (SDA) is a relatively new field of statistics that extends conventional data analysis by taking into account intrinsic data variability and structure. Unlike conventional data analysis, in SDA the features characterizing the data can be multi-valued, such as intervals or histograms. SDA has been mainly approached from a sampling perspective. In this work, we propose a model that links the micro-data and macro-data of interval-valued symbolic variables, which takes a populational perspective. Using this model, we derive the micro-data assumptions underlying the various definitions of symbolic covariance matrices proposed in the literature, and show that these assumptions can be too restrictive, raising applicability concerns. We analyze the various definitions using worked examples and four datasets. Our results show that the existence/absence of correlations in the macro-data may not be correctly captured by the definitions of symbolic covariance matrices and that, in real data, there can be a strong divergence between these definitions. Thus, in order to select the most appropriate definition, one must have some knowledge about the micro-data structure.



中文翻译:

区间变量的关联测度

符号数据分析 (SDA) 是一个相对较新的统计领域,它通过考虑内在数据可变性和结构来扩展传统数据分析。与传统的数据分析不同,在 SDA 中,表征数据的特征可以是多值的,例如区间或直方图。SDA 主要是从抽样的角度来处理的。在这项工作中,我们提出了一个模型,该模型将区间值符号变量的微观数据和宏观数据联系起来,该模型采用了人口视角。使用这个模型,我们推导出了文献中提出的符号协方差矩阵的各种定义背后的微观数据假设,并表明这些假设可能过于严格,引起适用性问题。我们使用工作示例和四个数据集分析各种定义。我们的结果表明,符号协方差矩阵的定义可能无法正确捕获宏观数据中相关性的存在/不存在,并且在实际数据中,这些定义之间可能存在很大差异。因此,为了选择最合适的定义,必须对微数据结构有一定的了解。

更新日期:2021-07-04
down
wechat
bug