Abstract
Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.
Funding source: NIH Clinical Center
Award Identifier / Grant number: AI106039 MH100974 GM110749
Funding source: NIH Clinical Center
Award Identifier / Grant number: R37 AI 51164
Acknowledgment
We would like to thank Dr. Davey Smith for his contribution in providing the HIV data and his insightful comments (grant number: CFAR AI03621).
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission
Research funding: This work was supported by the National Institutes of Health under award numbers AI106039, MH100974, GM110749 (to Susan Little) and R37 AI 51164 (to Hesam Montazeri and Victor DeGruttola). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
Cottam, E.M., Thébaud, G., Wadsworth, J., Gloster, J., Mansley, L., Paton, D.J., King, D.P., and Haydon, D.T. (2008). Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. R. Soc. Lond. B: Biol. Sci. 275: 887–895, https://doi.org/10.1098/rspb.2007.1442.Search in Google Scholar PubMed PubMed Central
Didelot, X., Gardy, J., and Colijn, C. (2014). Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol. Biol. Evol. 31: 1869–1879, https://doi.org/10.1093/molbev/msu121.Search in Google Scholar PubMed PubMed Central
Felsenstein, J. (1981). Evolutionary trees from dna sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368–376, https://doi.org/10.1007/BF01734359.Search in Google Scholar PubMed
Ferguson, N.M., Donnelly, C.A., and Anderson, R.M. (2001). Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature 413: 542–548, https://doi.org/10.1038/35097116.Search in Google Scholar PubMed
Gilchrist, C.A., Turner, S.D., Riley, M.F., Petri, W.A., and Hewlett, E.L. (2015). Whole-genome sequencing in outbreak analysis. Clin. Microbiol. Rev. 28: 541–563, https://doi.org/10.1128/cmr.00075-13.Search in Google Scholar PubMed PubMed Central
Gire, S.K., Goba, A., Andersen, K.G., Sealfon, R.S., Park, D.J., Kanneh, L., Jalloh, S., Momoh, M., Fullah, M., Dudas, G., et al. (2014). Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak. Science 345: 1369–1372, https://doi.org/10.1126/science.1259657.Search in Google Scholar PubMed PubMed Central
Hall, M., Woolhouse, M., and Rambaut, A. (2015). Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions of the node set. PLoS Comput. Biol. 11: e1004613, https://doi.org/10.1371/journal.pcbi.1004613.Search in Google Scholar PubMed PubMed Central
Hall, M., Woolhouse, M., and Rambaut, A. (2016). Using genomics data to reconstruct transmission trees during disease outbreaks. Rev. Sci. Tech. (International Office of Epizootics) 35: 287.10.20506/rst.35.1.2433Search in Google Scholar PubMed PubMed Central
Janssen, R.S., Satten, G.A., Stramer, S.L., Rawal, B.D., O’brien, T.R., Weiblen, B.J., Hecht, F.M., Jack, N., Cleghorn, F.R., Kahn, J.O., et al. (1998). New testing strategy to detect early hiv-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 280: 42–48, https://doi.org/10.1001/jama.280.1.42.Search in Google Scholar PubMed
Jombart, T., Cori, A., Didelot, X., Cauchemez, S., Fraser, C., and Ferguson, N. (2014). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput. Biol. 10: e1003457, https://doi.org/10.1371/journal.pcbi.1003457.Search in Google Scholar PubMed PubMed Central
Jombart, T., Eggo, R., Dodd, P., and Balloux, F. (2011). Reconstructing disease outbreaks from genetic data: a graph approach. Heredity 106: 383–390, https://doi.org/10.1038/hdy.2010.78.Search in Google Scholar PubMed PubMed Central
Jukes, T.H., Cantor, C.R., and Munro, H. (1969). Evolution of protein molecules. Mammalian protein metabolism, pp. 132.10.1016/B978-1-4832-3211-9.50009-7Search in Google Scholar
Keeling, M., Woolhouse, M., May, R., Davies, G., and Grenfell, B. (2003). Modelling vaccination strategies against foot-and-mouth disease. Nature 421: 136–142, https://doi.org/10.1038/nature01343.Search in Google Scholar PubMed
Kenah, E., Britton, T., Halloran, M.E., and Longini, I.M.Jr. (2016). Molecular infectious disease epidemiology: survival analysis and algorithms linking phylogenies to transmission trees. PLoS Comput. Biol. 12: e1004869.10.1371/journal.pcbi.1004869Search in Google Scholar PubMed PubMed Central
Kenward, M.G. and Carpenter, J. (2007). Multiple imputation: current perspectives. Stat. Methods Med. Res. 16: 199–218, https://doi.org/10.1177/0962280206075304.Search in Google Scholar PubMed
Klinkenberg, D., Backer, J.A., Didelot, X., Colijn, C., and Wallinga, J. (2017). Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput. Biol. 13: e1005495, https://doi.org/10.1371/journal.pcbi.1005495.Search in Google Scholar PubMed PubMed Central
Kothe, D., Byers, R.H., Caudill, S.P., Satten, G.A., Janssen, R.S., Hannon, W.H., and Mei, J.V. (2003). Performance characteristics of a new less sensitive HIV-1 enzyme immunoassay for use in estimating hiv seroincidence. J. Acquir. Immune Defic. Syndr. 33: 625–634, https://doi.org/10.1097/00126334-200308150-00012.Search in Google Scholar PubMed
Kouyos, R.D., von Wyl, V., Yerly, S., Böni, J., Rieder, P., Joos, B., Taffé, P., Shah, C., Bürgisser, P., Klimkait, T., et al. (2011). Ambiguous nucleotide calls from population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin. Infect. Dis. 52: 532–539, https://doi.org/10.1093/cid/ciq164.Search in Google Scholar PubMed PubMed Central
Kühnert, D., Stadler, T., Vaughan, T.G., and Drummond, A.J. (2014). Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death sir model. J. R. Soc. Interface 11: 20131106, https://doi.org/10.1098/rsif.2013.1106.Search in Google Scholar PubMed PubMed Central
Lau, M.S., Marion, G., Streftaris, G., and Gibson, G. (2015). A systematic Bayesian integration of epidemiological and genetic data. PLoS Comput. Biol. 11: e1004633, https://doi.org/10.1371/journal.pcbi.1004633.Search in Google Scholar PubMed PubMed Central
Le, T., Wright, E.J., Smith, D.M., He, W., Catano, G., Okulicz, J.F., Young, J.A., Clark, R.A., Richman, D.D., Little, S.J., et al. (2013). Enhanced CD4+ t-cell recovery with earlier HIV-1 antiretroviral therapy. N. Engl. J. Med. 368: 218–230, https://doi.org/10.1056/nejmoa1110187.Search in Google Scholar PubMed PubMed Central
Leventhal, G.E., Kouyos, R., Stadler, T., Von Wyl, V., Yerly, S., Böni, J., Cellerai, C., Klimkait, T., Günthard, H.F., and Bonhoeffer, S. (2012). Inferring epidemic contact structure from phylogenetic trees. PLoS Comput. Biol. 8: e1002413, https://doi.org/10.1371/journal.pcbi.1002413.Search in Google Scholar PubMed PubMed Central
Little, S.J., Pond, S.L.K., Anderson, C.M., Young, J.A., Wertheim, J.O., Mehta, S.R., May, S., and Smith, D.M. (2014). Using HIV networks to inform real time prevention interventions. PLoS One 9: e98443, https://doi.org/10.1371/journal.pone.0098443.Search in Google Scholar PubMed PubMed Central
Lynch, M.L., and DeGruttola, V. (2015). Predicting time to threshold for initiating antiretroviral treatment to evaluate cost of treatment as prevention of human immunodeficiency virus. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 64: 359–375, https://doi.org/10.1111/rssc.12080.Search in Google Scholar PubMed PubMed Central
Mollentze, N., Nel, L.H., Townsend, S., Le Roux, K., Hampson, K., Haydon, D.T., and Soubeyrand, S. (2014). A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data. Proc. R. Soc. Lond. B: Biol. Sci. 281: 20133251, https://doi.org/10.1098/rspb.2013.3251.Search in Google Scholar PubMed PubMed Central
Morelli, M.J., Thébaud, G., Chadœuf, J., King, D.P., Haydon, D.T., and Soubeyrand, S. (2012). A Bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data. PLoS Comput. Biol. 8: e1002768, https://doi.org/10.1371/journal.pcbi.1002768.Search in Google Scholar PubMed PubMed Central
Moss, A.R., and Bacchetti, P. (1989). Natural history of HIV infection. AIDS 3: 55–62, https://doi.org/10.1097/00002030-198902000-00001.Search in Google Scholar PubMed
Numminen, E., Chewapreecha, C., Sirén, J., Turner, C., Turner, P., Bentley, S.D., and Corander, J. (2014). Two-phase importance sampling for inference about transmission trees. Proc. Biol. Sci. 281: 20141324, https://doi.org/10.1098/rspb.2014.1324.Search in Google Scholar PubMed PubMed Central
Popinga, A., Vaughan, T., Stadler, T., and Drummond, A.J. (2015). Inferring epidemiological dynamics with bayesian coalescent inference: the merits of deterministic and stochastic models. Genetics 199: 595–607, https://doi.org/10.1534/genetics.114.172791.Search in Google Scholar PubMed PubMed Central
Pybus, O.G., and Rambaut, A. (2009). Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10: 540–550, https://doi.org/10.1038/nrg2583.Search in Google Scholar PubMed PubMed Central
Romero-Severson, E., Skar, H., Bulla, I., Albert, J., and Leitner, T. (2014). Timing and order of transmission events is not directly reflected in a pathogen phylogeny. Mol. Biol. Evol. 31: 2472–2482, https://doi.org/10.1093/molbev/msu179.Search in Google Scholar PubMed PubMed Central
Smith, D.M., May, S., Tweeten, S., Drumright, L., Pacold, M.E., Pond, S.L.K., Pesano, R.L., Lie, Y. S., Richman, D.D., Frost, S.D., et al. (2009). A public health model for the molecular surveillance of HIV transmission in San Diego, California. AIDS (London, England) 23: 225, https://doi.org/10.1097/qad.0b013e32831d2a81.Search in Google Scholar PubMed PubMed Central
Snitkin, E.S., Zelazny, A.M., Thomas, P.J., Stock, F., Henderson, D.K., Palmore, T.N., Segre, J.A., et al., NISC Comparative Sequencing Program (2012). Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci. Transl. Med. 4: 148ra116, https://doi.org/10.1126/scitranslmed.3004129.Search in Google Scholar PubMed PubMed Central
Spada, E., Sagliocca, L., Sourdis, J., Garbuglia, A.R., Poggi, V., De Fusco, C., and Mele, A. (2004). Use of the minimum spanning tree model for molecular epidemiological investigation of a nosocomial outbreak of hepatitis C virus infection. J. Clin. Microbiol. 42: 4230–4236, https://doi.org/10.1128/jcm.42.9.4230-4236.2004.Search in Google Scholar PubMed PubMed Central
Taffe, P. and May, M. (2008). A joint back calculation model for the imputation of the date of HIV infection in a prevalent cohort. Stat. Med. 27: 4835–4853, https://doi.org/10.1002/sim.3294.Search in Google Scholar PubMed
Volz, E.M., Koelle, K., and Bedford, T. (2013). Viral phylodynamics. PLoS Comput. Biol. 9: e1002947, https://doi.org/10.1371/journal.pcbi.1002947.Search in Google Scholar PubMed PubMed Central
Wang, R., Lagakos, S.W., and Gray, R.J. (2010). Testing and interval estimation for two-sample survival comparisons with small sample sizes and unequal censoring. Biostatistics 11: 676–692, https://doi.org/10.1093/biostatistics/kxq021.Search in Google Scholar PubMed PubMed Central
Wertheim, J.O., Pond, S.L.K., Little, S.J., and De Gruttola, V. (2011). Using HIV transmission networks to investigate community effects in HIV prevention trials. PLoS One 6: e27775, https://doi.org/10.1371/journal.pone.0027775.Search in Google Scholar PubMed PubMed Central
Worby, C.J., Chang, H.-H., Hanage, W.P., and Lipsitch, M. (2014a). The distribution of pairwise genetic distances: a tool for investigating disease transmission. Genetics 198: 1395–1404, https://doi.org/10.1534/genetics.114.171538.Search in Google Scholar PubMed PubMed Central
Worby, C.J., Lipsitch, M., and Hanage, W. P. (2014b). Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput. Biol. 10: e1003549, https://doi.org/10.1371/journal.pcbi.1003549.Search in Google Scholar PubMed PubMed Central
Ypma, R., Bataille, A., Stegeman, A., Koch, G., Wallinga, J., and Van Ballegooijen, W. (2012). Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data. Proc. R. Soc. Lond. B: Biol. Sci. 279: 444–450, https://doi.org/10.1098/rspb.2011.0913.Search in Google Scholar PubMed PubMed Central
Ypma, R.J., van Ballegooijen, W.M., and Wallinga, J. (2013). Relating phylogenetic trees to transmission trees of infectious disease outbreaks. Genetics 195: 1055–1062, https://doi.org/10.1534/genetics.113.154856.Search in Google Scholar PubMed PubMed Central
Supplementary material
The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2019-0026).
© 2020 Walter de Gruyter GmbH, Berlin/Boston