Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter May 30, 2019

A penalized regression approach for DNA copy number study using the sequencing data

  • Jaeeun Lee and Jie Chen EMAIL logo

Abstract

Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.

References

Abyzov, A., A. E. Urban, M. Snyder and M. Gerstein (2011): “CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing,” Genome Res., 21, 974–984.10.1101/gr.114876.110Search in Google Scholar PubMed PubMed Central

BIOBASE. (2013): Biological Databases www.Biobase-international.com.Search in Google Scholar

Boeva, V., A. Zinovyev, K. Bleakley, J. Vert, I. Janoueix-Lerosey, O. Delattre and E. Barillot (2011): “Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization,” Bioinformatics, 27, 268–269.10.1093/bioinformatics/btq635Search in Google Scholar PubMed PubMed Central

Chen, J. and Y. P. Wang (2009): “A statistical change point model approach for the detection of DNA copy number variations in array cgh data,” IEEE/ACM Trans. Comput. Biol. Bioinform., 6, 529–541.10.1109/TCBB.2008.129Search in Google Scholar PubMed PubMed Central

Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson and E. S. Lander (2009): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nature Methods, 6, 99–103.10.1038/nmeth.1276Search in Google Scholar PubMed PubMed Central

Duan, J., J. Zhang, H. Deng and Y. Wang (2013): “CNV-TV: a robust method to discover copy number variation from short sequencing reads,” BMC Bioinformatics, 14, 150.10.1186/1471-2105-14-150Search in Google Scholar PubMed PubMed Central

Eilers, P. and R. De Menezes (2005): “Quantile smoothing of array CGH data,” Bioinformatics, 21, 1146–1153.10.1093/bioinformatics/bti148Search in Google Scholar PubMed

Gill, P. E., W. Murray and M. A. Saunders (1997): User’s guide for SQOPT 5.3: A Fortran package for large-scale linear and quadratic programming. Institution Tech. Rep. NA 97-4, Department of Mathematics, University of California, San Diego.Search in Google Scholar

Huang, T., B. Wu, P. Lizardi and H. Zhao (2005): “Detection of DNA copy number alterations using penalized least squares regression,” Bioinformatics, 21, 3811–3817.10.1093/bioinformatics/bti646Search in Google Scholar PubMed

Levy-leduc, C. and Z. Harchaoui (2008): “Catching change-points with lasso,” Adv. Neural Inf. Process. Syst., 20, 617–624.Search in Google Scholar

Ji, T. and J. Chen (2015): “Modeling the next generation sequencing read count data for DNA copy number variant study,” Stat. Appl. Genet. Mol. Biol., 14, 361–374.10.1515/sagmb-2014-0054Search in Google Scholar PubMed

Lee, J. (2017): A modified information criterion in the 1d fused lasso for DNA copy number variant detection using next generation sequencing data. Ph.D Dissertation presented to the Graduate School at Augusta University, Augusta, Georgia, USA.Search in Google Scholar

Li, Y. and J. Zhu (2007): “Analysis of array CGH data for cancer studies using fused quantile regression,” Bioinformatics, 23, 2470–2476.10.1093/bioinformatics/btm364Search in Google Scholar PubMed

Magi, A., L. Tattini, T. Pippucci, F. Torricelli and M. Benelli (2012): “Read count approach for DNA copy number variants detection,” Bioinformatics, 28, 470–478.10.1093/bioinformatics/btr707Search in Google Scholar PubMed

Olshen, A. B., E. S. Venkatraman, R. Lucito and M. Wigler (2004): “Circular binary segmentation for the analysis of array-based DNA copy number data,” Biostatistics, 5, 557–572.10.1093/biostatistics/kxh008Search in Google Scholar PubMed

Pan, J. and J. Chen (2006): “Application of modified information criterion to multiple change point problems,” J. Multivar Anal., 97, 2221–2241.10.1016/j.jmva.2006.05.009Search in Google Scholar

Picard, F., S. Robin, M. Lavielle, C. Vaisse and J. Daudin (2005): “A statistical approach for array CGH data analysis,” BMC Bioinformatics, 6, 1.10.1186/1471-2105-6-1Search in Google Scholar PubMed PubMed Central

Qian, J. and L. Su (2016): “Shrinkage estimation of regression models with multiple structural changes,” Econ. Theory, 32, 1376–1433.10.1017/S0266466615000237Search in Google Scholar

Scheinin, I., D. Sie, H. Bengtsson, M. A. Van De Wiel, A. B. Olshen, H. F. Van Thuijl, P. P. Eijk, F. Rustenburg, G. A. Meijer, J. C. Reijneveld, P. Wesseling, D. Pinkel, D. G. Albertson and B. Ylstra. (2014): “DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly,” Genome Res., 24, 2022–2032.10.1101/gr.175141.114Search in Google Scholar PubMed PubMed Central

Schwarz, G. (1978): “Estimating the dimension of a model,” Ann. Stat., 6, 461–464.10.1214/aos/1176344136Search in Google Scholar

Teo, S. M., Y. Pawitan, C. S. Ku, K. S. Chia and A. Salim (2012): “Statistical challenges associated with detecting copy number,” Bioinformatics, 28, 2711–2718.10.1093/bioinformatics/bts535Search in Google Scholar PubMed

Tibshirani, R. J. (1996): “Regression shrinkage and selection via the LASSO,” J. Royal Stat. Soc., 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Tibshirani, R. and P. Wang (2008): “Spatial smoothing and hot spot detection for CGH data using the fused lasso,” Biostatistics, 9, 18–29.10.1093/biostatistics/kxm013Search in Google Scholar PubMed

Tibshirani, R. J. and J. Taylor (2011): “The solution path of the generalized LASSO,” Ann. Stat., 39, 1335–1371.10.1214/11-AOS878Search in Google Scholar

Tibshirani, R., M. A. Saunders, S. Rosset, J. Zhu and K. Knight (2005): “Sparsity and smoothness via the fused LASSO,” J. Royal Stat. Soc., 67, 91–108.10.1111/j.1467-9868.2005.00490.xSearch in Google Scholar

Wang, P., Y. Kim, J. Pollack, B. Narasimhan and R. Tibshirani (2005): “A method for calling gains and losses in array CGH data,” Biostatistics, 6, 45–58.10.1093/biostatistics/kxh017Search in Google Scholar PubMed

Xie, C. and M. Tammi (2009): “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, 10, 80.10.1186/1471-2105-10-80Search in Google Scholar PubMed PubMed Central

Yao, Y. C. and S. T. Au (1989): “Least-squares estimation of a step function,” Sankhya Ser. A, 51, 370–381.Search in Google Scholar

Zhang, N. R. and D. O. Siegmund (2007): “A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data,” Biometrics, 63, 22–32.10.1111/j.1541-0420.2006.00662.xSearch in Google Scholar PubMed


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0001).


Published Online: 2019-05-30

©2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2018-0001/html
Scroll to top button