Abstract
In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.
Funding source: Natural Science Foundation of China
Award Identifier / Grant number: 11371227, 61432010, 11626247
Funding statement: The research was supported by the Natural Science Foundation of China Grants (Funder Id: 10.13039/501100001809, 11371227, 61432010, 11626247).
Appendix A. Supplementary Materials
The type I error rate performance of three models with different time delays are shown in Supplementary Materials.
References
Andersson, M. G. I., M. Berga, E. S. Lindström and S. Langenheder (2014): “The spatial structure of bacterial communities is influenced by historical environmental conditions,” Ecology, 95, 1134–1140.10.1890/13-1300.1Search in Google Scholar PubMed
Balasubramaniyan, R., E. Hüllermeier, N. Weskamp and J. Kämper (2005): “Clustering of gene expression data using a local shape-based similarity measure,” Bioinformatics, 21, 1069–1077.10.1093/bioinformatics/bti095Search in Google Scholar PubMed
Barberán, A., S. T. Bates, E. O. Casamayor and N. Fierer (2011): “Using network analysis to explore co-occurrence patterns in soil microbial communities,” ISME J., 6, 343–351.10.1038/ismej.2011.119Search in Google Scholar PubMed PubMed Central
Beman, J. M., J. A. Steele and J. A. Fuhrman (2011): “Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california,” ISME J., 5, 1077–1085.10.1038/ismej.2010.204Search in Google Scholar PubMed PubMed Central
Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. B, 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar
Berkowitz, J. and L. Kilian (2000): “Recent developments in bootstrapping time series,” Economet. Rev., 19, 1–48.10.1080/07474930008800457Search in Google Scholar
Caporaso, J. G., C. L. Lauber, E. K. Costello, D. Berg-Lyons, A. Gonzalez, J. Stombaugh, D. Knights, P. Gajer, J. Ravel, N. Fierer, J. I. Gordon and R. Knight (2011): “Moving pictures of the human microbiome,” Genome Biol., 12, R50.10.1186/gb-2011-12-5-r50Search in Google Scholar PubMed PubMed Central
Carlstein, E. (1986): “The use of subseries values for estimating the variance of a general statistic from a stationary sequence,” Ann. Stat., 14, 1171–1179.10.1214/aos/1176350057Search in Google Scholar
Chaffron, S., H. Rehrauer, J. Pernthaler and C. von Mering (2010): “A global network of coexisting microbes from environmental and whole-genome sequence data,” Genome Res., 20, 947–959.10.1101/gr.104521.109Search in Google Scholar PubMed PubMed Central
Cram, J. A., L. C. Xia, D. M. Needham, R. Sachdeva, F. Sun and J. A. Fuhrman (2015): “Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes,” ISME J., 9, 2573–2586.10.1038/ismej.2015.76Search in Google Scholar PubMed PubMed Central
Durno, W. E., Hanson, N. W., Konwar, K. M & Hallam, S. J. 2013, ‘Expanding the boundaries of local similarity analysis’, BMC Genomics, vol. 14, pp. S3–.10.1186/1471-2164-14-S1-S3Search in Google Scholar PubMed PubMed Central
Faust, K., J. F. Sathirapongsasuti, J. Izard, N. Segata, D. Gevers, J. Raes and C. Huttenhower (2012): “Microbial co-occurrence relationships in the human microbiome,” PLOS Comput. Biol., 8, 1–17.10.1371/journal.pcbi.1002606Search in Google Scholar PubMed PubMed Central
Faust, K., L. Lahti, D. Gonze, W. M. de Vos and J. Raes (2015): “Metagenomics meets time series analysis: unraveling microbial community dynamics,” Curr. Opin. Microbiol., 25, 56–66.10.1016/j.mib.2015.04.004Search in Google Scholar PubMed
Fierer, N., D. Nemergut, R. Knight and J. M. Craine (2010): “Changes through time: integrating microorganisms into the study of succession,” Res. Microbiol., 161, 635–642.10.1016/j.resmic.2010.06.002Search in Google Scholar PubMed
Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown and S. Naeem (2006): “Annually reoccurring bacterial communities are predictable from ocean conditions,” Proc. Natl. Acad. Sci. USA, 103, 13104–13109.10.1073/pnas.0602399103Search in Google Scholar PubMed PubMed Central
Gilbert, J. A., J. A. Steele, J. G. Caporaso, L. Steinbrück, J. Reeder, B. Temperton, S. Huse, A. C. McHardy, R. Knight, I. Joint, P. Somerfield, J. A. Fuhrman and D. Field (2012): “Defining seasonal marine microbial community dynamics,” ISME J., 6, 298–308.10.1038/ismej.2011.107Search in Google Scholar PubMed PubMed Central
Giovannoni, S. J. and K. L. Vergin (2012): “Seasonality in ocean microbial communities,” Science, 335, 671–676.10.1126/science.1198078Search in Google Scholar PubMed
Gonçalves, J. and S. Madeira (2014): “Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification,” IEEE/ACM T. Comput. Bi, 11, 801–813.10.1109/TCBB.2014.2312007Search in Google Scholar PubMed
Ji, L. and K.-L. Tan (2004): “Mining gene expression data for positive and negative co-regulated gene clusters,” Bioinformatics, 20, 2711–2718.10.1093/bioinformatics/bth312Search in Google Scholar PubMed
Künsch, H. R. (1989): “The jackknife and the bootstrap for general stationary observations,” Ann. Stat., 17, 1217–1241.10.1214/aos/1176347265Search in Google Scholar
Liu, R. Y. and K. Singh (1992): Moving blocks jackknife and bootstrap capture weak dependence, New York: John Wiley, pp. 225–248.Search in Google Scholar
Lagnoux, A., S. Mercier, P. Vallois (2017): “Statistical significance based on length and position of the local score in a model of i.i.d. sequences,” Bioinformatics, 33, 654–660.10.1093/bioinformatics/btw699Search in Google Scholar PubMed
Ljung, G. M. and G. E. P. Box (1978): “On a measure of lack of fit in time series models,” Biometrika, 65, 297–303.10.1093/biomet/65.2.297Search in Google Scholar
Madeira, S. C., M. C. Teixeira, I. Sa-Correia and A. L. Oliveira (2010): “Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm,” IEEE/ACM T. Comput. Bi, 7, 153–165.10.1109/TCBB.2008.34Search in Google Scholar PubMed
Mudelsee, M. (2010): Climate Time Series Analysis: Classical Statistical and Bootstrap Methods, Dordrecht: Atmospheric and Oceanographic Sciences Library, Springer.10.1007/978-90-481-9482-7Search in Google Scholar
Palmer, C., E. M. Bik, D. B. DiGiulio, D. A. Relman and P. O. Brown (2007): “Development of the human infant intestinal microbiota,” PLOS Biol., 5, 1–18.10.1371/journal.pbio.0050177Search in Google Scholar PubMed PubMed Central
Pei, Y., Q. Gao, J. Li and X. Zhao (2014): “Identifying local co-regulation relationships in gene expression data,” J. Theor. Biol., 360, 200–207.10.1016/j.jtbi.2014.06.032Search in Google Scholar PubMed
Qian, J., M. Dolled-Filhart, J. Lin, H. Yu and M. Gerstein (2001): “Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions11edited by f. cohen,” J. Mol. Biol., 314, 1053–1066.10.1006/jmbi.2000.5219Search in Google Scholar PubMed
Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J.-M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, S. Li, M. Jian, Y. Zhou, Y. Li, X. Zhang, S. Li, N. Qin, H. Yang, J. Wang, S. Brunak, J. Doré, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, M. Consortium, P. Bork, S. D. Ehrlich and J. Wang (2010): “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, 464, 59–65.10.1038/nature08821Search in Google Scholar PubMed PubMed Central
Ruan, Q., D. Dutta, M. S. Schwalbach, J. A. Steele, J. A. Fuhrman and F. Sun (2006): “Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors,” Bioinformatics, 22, 2532–2538.10.1093/bioinformatics/btl417Search in Google Scholar PubMed
Shade, A., J. S. Read, N. D. Youngblut, N. Fierer, R. Knight, T. K. Kratz, N. R. Lottig, E. E. Roden, E. H. Stanley, J. Stombaugh, R. J. Whitaker, C. H. Wu and K. D. McMahon (2012): “Lake microbial communities are resilient after a whole-ecosystem disturbance,” ISME J., 6, 2153–2167.10.1038/ismej.2012.56Search in Google Scholar PubMed PubMed Central
Shade, A., J. Gregory Caporaso, J. Handelsman, R. Knight and N. Fierer (2013): “A meta-analysis of changes in bacterial and archaeal communities with time,” ISME J., 7, 1493–1506.10.1038/ismej.2013.54Search in Google Scholar PubMed PubMed Central
Sherman, M., F. M. Speed Jr and F. M. Speed (1998): “Analysis of tidal data via the blockwise bootstrap,” J. Appl. Stat., 25, 333–340.10.1080/02664769823061Search in Google Scholar
Steele, J. A., P. D. Countway, L. Xia, P. D. Vigil, J. M. Beman, D. Y. Kim, C.-E. T. Chow, R. Sachdeva, A. C. Jones, M. S. Schwalbach, J. M. Rose, I. Hewson, A. Patel, F. Sun, D. A. Caron and J. A. Fuhrman (2011): “Marine bacterial, archaeal and protistan association networks reveal ecological linkages,” ISME J., 5, 1414–1425.10.1038/ismej.2011.24Search in Google Scholar PubMed PubMed Central
Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. B, 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar
Storey, J. D., A. J. Bass, A. Dabney and D. Robinson (2015): qvalue: Q-value estimation for false discovery rate control. R package version 2.6.0.Search in Google Scholar
The Human Microbiome Project Consortium. (2012): “Structure, function and diversity of the healthy human microbiome,” Nature, 486, 207–214.10.1038/nature11234Search in Google Scholar PubMed PubMed Central
Trosvik, P., N. C. Stenseth and K. Rudi (2010): “Convergent temporal dynamics of the human infant gut microbiota,” ISME J., 4, 151–158.10.1038/ismej.2009.96Search in Google Scholar PubMed
Weiss, S., W. V. Treuren, C. Lozupone, K. Faust, J. Friedman, D. Ye, L. C. Xia, Z. Z. Xu, L. Ursell, E. J. Alm, A. Birmingham, J. A. Cram, J. A. Fuhrman, J. Raes, F. Sun, J. Zhou and R. Knight (2016): “Correlation detection strategies in microbial data sets vary widely in sensitivityand precision.” ISME J., 10, 1669–1681.10.1038/ismej.2015.235Search in Google Scholar PubMed PubMed Central
Waterman, M. S. (1995): Introduction to Computational Biology: Maps, Sequences and Genomes, NY, USA: Chapman and Hall/CRC.10.1007/978-1-4899-6846-3Search in Google Scholar
Xia, L. C., J. A. Steele, J. A. Cram, Z. G. Cardon, S. L. Simmons, J. J. Vallino, J. A. Fuhrman and F. Sun (2011): “Extended local similarity analysis (elsa) of microbial community and other time series data with replicates,” BMC Syst. Biol., 5, S15.10.1186/1752-0509-5-S2-S15Search in Google Scholar PubMed PubMed Central
Xia, L. C., D. Ai, J. Cram, J. A. Fuhrman and F. Sun (2013): “Efficient statistical significance approximation for local similarity analysis of high-throughput time series data,” Bioinformatics, 29, 230–237.10.1093/bioinformatics/bts668Search in Google Scholar PubMed PubMed Central
Xia, L. C., D. Ai, J. A. Cram, X. Liang, J. A. Fuhrman and F. Sun (2015): “Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of markov chains,” BMC Bioinformatics, 16, 301.10.1186/s12859-015-0732-8Search in Google Scholar PubMed PubMed Central
Zhou, J., Y. Deng, P. Zhang, K. Xue, Y. Liang, J. D. Van Nostrand, Y. Yang, Z. He, L. Wu, D. A. Stahl, T. C. Hazen, J. M. Tiedje and A. P. Arkin (2014): “Stochasticity, succession, and environmental perturbations in a fluidic ecosystem,” Proc. Natl. Acad. Sci. USA, 111, 836–845.10.1073/pnas.1324044111Search in Google Scholar PubMed PubMed Central
Supplementary Material
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0019).
©2018 Walter de Gruyter GmbH, Berlin/Boston