当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A general index for linear and nonlinear correlations for high dimensional genomic data
BMC Genomics ( IF 3.5 ) Pub Date : 2020-11-30 , DOI: 10.1186/s12864-020-07246-x
Zhihao Yao 1, 2 , Jing Zhang 1, 2 , Xiufen Zou 1, 2
Affiliation  

With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.

中文翻译:


高维基因组数据线性和非线性相关性的通用索引



随着高通量测序的进步,产生了高维数据。检测这些数据集之间的依赖性/相关性正在成为多维数据集成和共表达网络构建中最重要的问题之一。 RNA测序数据被广泛用于构建基因调控网络。当引入甲基化数据、拷贝数畸变数据和其他类型的数据时,这种网络可能会更加准确。因此,用于检测高维数据之间关系的通用索引是必不可少的。我们提出了一种基于核的 RV 系数,名为 KBRV,通过将核函数引入 RV2(修改后的 RV 系数)来测试两个矩阵之间的线性和非线性相关性。对模拟数据采用排列检验等验证方法来检验KBRV的显着性和合理性。为了证明KBRV在构建基因调控网络方面的优势,我们将该索引应用于真实数据集(卵巢癌数据集和人类骨髓分化中的外显子级RNA-Seq数据),以说明其相对于向量相关性的优越性。我们得出的结论是,KBRV 是检测高维数据中线性和非线性关系的有效指标。高维数据的关联方法在基因调控网络的构建中具有可能的应用。
更新日期:2020-12-01
down
wechat
bug