当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A general index for linear and nonlinear correlations for high dimensional genomic data
BMC Genomics ( IF 4.4 ) Pub Date : 2020-11-30 , DOI: 10.1186/s12864-020-07246-x
Zhihao Yao , Jing Zhang , Xiufen Zou

With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.

中文翻译:

高维基因组数据线性和非线性相关性的一般指标

随着高通量测序的发展,产生了高维数据。在多维数据集成和共表达网络构建中,检测这些数据集之间的依赖性/相关性已成为最重要的问题之一。RNA测序数据被广泛用于构建基因调控网络。当引入甲基化数据,拷贝数畸变数据和其他类型的数据时,此类网络可能会更加准确。因此,用于检测高维数据之间的关系的一般指标是必不可少的。我们提出了一种基于内核的RV系数KBRV,用于通过将内核函数引入RV2(经修改的RV系数)来测试两个矩阵之间的线性和非线性相关性。对模拟数据进行了置换检验和其他验证方法,以检验KBRV的重要性和合理性。为了证明KBRV在构建基因调控网络中的优势,我们将此索引应用于实际数据集(人类骨髓分化中的卵巢癌数据集和外显子级RNA-Seq数据)以说明其优于载体相关性的优势。我们得出结论,KBRV是检测高维数据中线性和非线性关系的有效指标。高维数据的相关方法在基因调控网络的构建中可能具有应用价值。我们在实际数据集(卵巢癌数据集和人类髓样细胞分化中的外显子水平RNA-Seq数据)上应用了该索引,以说明其优于向量相关性的优势。我们得出结论,KBRV是检测高维数据中线性和非线性关系的有效指标。高维数据的相关方法在基因调控网络的构建中可能具有应用价值。我们在实际数据集(卵巢癌数据集和人类髓样细胞分化中的外显子水平RNA-Seq数据)上应用了该索引,以说明其优于向量相关性的优势。我们得出结论,KBRV是检测高维数据中线性和非线性关系的有效指标。高维数据的相关方法在基因调控网络的构建中可能具有应用价值。
更新日期:2020-12-01
down
wechat
bug