当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mixed Distribution Models Based on Single-Cell RNA Sequencing Data
Interdisciplinary Sciences: Computational Life Sciences ( IF 3.9 ) Pub Date : 2021-03-22 , DOI: 10.1007/s12539-021-00427-6
Min Wu 1 , Junhua Xu 1 , Tao Ding 2 , Jie Gao 1
Affiliation  

Progress in single-cell RNA sequencing (scRNA-seq) has yielded a lot of valuable data. Analysis of these data can provide a new perspective for studying the intratumoral heterogeneity and identifying gene markers. In this paper, the scRNA-seq data of colorectal cancer (CRC) are analyzed, and it is found that the shape of the gene expression difference (GED) data shows certain distribution regularity. To study the distribution regularity, mixed stable-normal distribution (MSND) model and mixed stable-exponential distribution (MSED) model are constructed to fit the GED data. And the estimated parameters of MSND and MSED are used to describe some characteristics of their distribution. Through the comparison of root mean square error and the chi-squared goodness of fit test, it is found that the fitting effect of MSED and MSND are both better than that of stable distribution and Cauchy distribution. Considering the given quantile thresholds, MSND and MSED can be used to identify tumor-related genes. The results of functional analysis indicate that the selected genes are highly correlated with CRC. In addition, the parameters of MSND and MSED exhibit a certain trend with the development of CRC. To explore the association, Gene-set enrichment analysis (GSEA) is performed. The results of GSEA reveal that the trend can well characterize the intratumoral heterogeneity of CRC. In addition, the application of MSED model on hepatocellular carcinoma shows that our model can analyze other cancers. Overall, MSND model and MSED model can well fit the GED data in different disease stages, the parameters of the two models can characterize the heterogeneity of CRC tumor cells, and the two models can be used to identify genes highly correlated with tumors.



中文翻译:

基于单细胞 RNA 测序数据的混合分布模型

单细胞 RNA 测序 (scRNA-seq) 的进展产生了大量有价值的数据。对这些数据的分析可以为研究肿瘤内异质性和识别基因标记提供新的视角。本文对结直肠癌(CRC)的scRNA-seq数据进行分析,发现基因表达差异(GED)数据的形状表现出一定的分布规律。为了研究分布规律,构建了混合稳定正态分布(MSND)模型和混合稳定指数分布(MSED)模型来拟合GED数据。并且用MSND和MSED的估计参数来描述它们分布的一些特征。通过均方根误差和卡方拟合优度检验的比较,发现MSED和MSND的拟合效果均优于稳定分布和柯西分布。考虑到给定的分位数阈值,MSND 和 MSED 可用于识别肿瘤相关基因。功能分析结果表明所选基因与CRC高度相关。此外,随着CRC的发展,MSND和MSED的参数呈现出一定的趋势。为了探索这种关联,进行了基因集富集分析 (GSEA)。GSEA 的结果表明,该趋势可以很好地表征 CRC 的瘤内异质性。此外,MSED模型在肝细胞癌上的应用表明我们的模型可以分析其他癌症。总体而言,MSND 模型和 MSED 模型可以很好地拟合不同疾病阶段的 GED 数据,

更新日期:2021-03-23
down
wechat
bug