Abstract
Progress in single-cell RNA sequencing (scRNA-seq) has yielded a lot of valuable data. Analysis of these data can provide a new perspective for studying the intratumoral heterogeneity and identifying gene markers. In this paper, the scRNA-seq data of colorectal cancer (CRC) are analyzed, and it is found that the shape of the gene expression difference (GED) data shows certain distribution regularity. To study the distribution regularity, mixed stable-normal distribution (MSND) model and mixed stable-exponential distribution (MSED) model are constructed to fit the GED data. And the estimated parameters of MSND and MSED are used to describe some characteristics of their distribution. Through the comparison of root mean square error and the chi-squared goodness of fit test, it is found that the fitting effect of MSED and MSND are both better than that of stable distribution and Cauchy distribution. Considering the given quantile thresholds, MSND and MSED can be used to identify tumor-related genes. The results of functional analysis indicate that the selected genes are highly correlated with CRC. In addition, the parameters of MSND and MSED exhibit a certain trend with the development of CRC. To explore the association, Gene-set enrichment analysis (GSEA) is performed. The results of GSEA reveal that the trend can well characterize the intratumoral heterogeneity of CRC. In addition, the application of MSED model on hepatocellular carcinoma shows that our model can analyze other cancers. Overall, MSND model and MSED model can well fit the GED data in different disease stages, the parameters of the two models can characterize the heterogeneity of CRC tumor cells, and the two models can be used to identify genes highly correlated with tumors.
Similar content being viewed by others
References
Suvà ML, Tirosh I (2019) Single-cell RNA sequencing in cancer: lessons learned and emerging challenges. Mol Cell 75(1):7–12. https://doi.org/10.1016/j.molcel.2019.05.003
Yasen A, Aini A, Wang H, Li W, Zhang C et al (2020) Progress and applications of single-cell sequencing techniques. Infect Genet Evol 80:104198–104209. https://doi.org/10.1016/j.meegid.2020.104198
Wu Z, Zhang Y, Stitzel ML, Wu H (2018) Two-phase differential expression analysis for single cell RNA-seq. Bioinformatics 34(19):3340–3348. https://doi.org/10.1093/bioinformatics/bty329
Shalek AK, Satija R, Shuga J, Trombetta JJ, Gennert D et al (2014) Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510(7505):363–369. https://doi.org/10.1038/nature13437
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E et al (2019) Comprehensive integration of single-cell data. Cell 177(7):1888–1902. https://doi.org/10.1016/j.cell.2019.05.031
Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A et al (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486. https://doi.org/10.1038/nmeth.4236
Wang J, Huang M, Torre E, Dueck H, Shaffer S et al (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci USA 115(28):E6437–E6446. https://doi.org/10.1073/pnas.1721085115
Thomas R, de la Torre L, Chang X, Mehrotra S (2010) Validation and characterization of DNA microarray gene expression data distribution and associated moments. BMC Bioinf 11:576–589. https://doi.org/10.1186/1471-2105-11-576
de Torrente L, Zimmerman S, Suzuki M, Christopeit M, Greally JM et al (2020) The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data. BMC Bioinform 21:562–579. https://doi.org/10.1186/s12859-020-03892-w
Shahrezaei V, Swain PS (2008) Analytical distributions for stochastic gene expression. Proc Natl Acad Sci USA 105(45):17256–17261. https://doi.org/10.1073/pnas.0803850105
Wan C, Chang W, Zhang Y, Shah F, Lu X et al (2019) LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data. Nucleic Acids Res 47(18):e111. https://doi.org/10.1093/nar/gkz655
Vu TN, Wills QF, Kalari KR, Niu N, Wang L et al (2016) Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32(14):2128–2135. https://doi.org/10.1093/bioinformatics/btw202
Li H, Courtois ET, Sengupta D, Tan Y, Chen KH et al (2017) Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat Genet 49(5):708–718. https://doi.org/10.1038/ng.3818
Nolan JP (1998) Parameterizations and modes of stable distributions. Stat Probab Lett 38:187–195. https://doi.org/10.1016/S0167-7152(98)00010-8
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102(43):15545–15550. https://doi.org/10.1073/pnas.0506580102
Koutrouvelis IA (1981) An iterative procedure for the estimation of the parameters of stable laws: An iterative procedure for the estimation. Commun Stat-Simul C 10:17–28. https://doi.org/10.1080/03610918108812189
Slimane SN, Marcel V, Fenouil T, Catez F, Saurin JC et al (2020) Ribosome biogenesis alterations in colorectal cancer. Cells 9(11):2361–2385. https://doi.org/10.3390/cells9112361
Qin M, Liu S, Li A, Xu C, Tan L et al (2016) NIK- and IKKβ-binding protein promotes colon cancer metastasis by activating the classical NF-κB pathway and MMPs. Tumour Biol 37(5):5979–5990. https://doi.org/10.1007/s13277-015-4433-8
Zheng C, Zheng L, Yoo JK, Guo H, Zhang Y et al (2017) Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169(7):1342–1356. https://doi.org/10.1016/j.cell.2017.05.035
Younossi ZM, Koenig AB, Abdelatif D, Fazel Y, Henry L et al (2016) Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64(1):73–84. https://doi.org/10.1002/hep.28431
He G, Karin M (2011) NF-κB and STAT3 - key players in liver inflammation and cancer. Cell Res 21(1):159–168. https://doi.org/10.1038/cr.2010.183
Dalerba P, Kalisky T, Sahoo D, Rajendran PS, Rothenberg ME et al (2011) Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nat Biotechnol 29(12):1120–1127. https://doi.org/10.1038/nbt.2038
Liu ZH, Dai XM, Du B (2015) Hes1: a key role in stemness, metastasis and multidrug resistance. Cancer Biol Ther 16(3):353–359. https://doi.org/10.1080/15384047.2015.1016662
Zhang Y, Zheng L, Lao X, Wen M, Qian Z et al (2019) Hes1 is associated with long non-coding RNAs in colorectal cancer. Ann Transl Med 7(18):459–465. https://doi.org/10.21037/atm.2019.08.11
Acknowledgements
This research was supported by the Key Project of National Natural Science Foundation of China (Grant no. 11831015), the Major Research Plan of National Natural Science Foundation of China (Grant no. 91730301), Postgraduate Research & Practice Innovation Program of Jiangnan University (Grant no. JNKY19_051) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant no. KYCX18_1864).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no competing interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wu, M., Xu, J., Ding, T. et al. Mixed Distribution Models Based on Single-Cell RNA Sequencing Data. Interdiscip Sci Comput Life Sci 13, 362–370 (2021). https://doi.org/10.1007/s12539-021-00427-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-021-00427-6