当前位置: X-MOL 学术Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting Unbiased Associations in Large Data Sets
Big Data ( IF 2.6 ) Pub Date : 2022-08-12 , DOI: 10.1089/big.2021.0193
Chuanlu Liu 1 , Shuliang Wang 1, 2 , Hanning Yuan 1 , Xiaojia Liu 1
Affiliation  

Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.

中文翻译:

检测大型数据集中的无偏关联

最大信息系数 (MIC) 探索了复杂关系中成对变量之间的关联。它通过优化轴上的分区来接近相关性。但是,当关系遇到特殊噪声时,MIC可能会高估相关值,从而导致对无噪声关系的误识别。在本文中,提出了一种新的加权信息系数平均值(WICM)方法来检测大型数据集中的无偏关联。首先,我们从数学上分析了将异常相关值赋予噪声关系的原因。然后,WICM 以两个核心步骤呈现。一是从具有高价值的关系中检测潜在的高估,另一种是通过计算信息系数均值来纠正高估,而不是仅仅选择特征矩阵中的最大元素。最后,函数关系和现实世界数据关系的实验表明,WICM可以解决高估问题,具有可行性和有效性。
更新日期:2022-08-16
down
wechat
bug