Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter October 22, 2020

Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times

  • Hesam Montazeri , Susan Little , Mozhgan Mozaffarilegha , Niko Beerenwinkel ORCID logo and Victor DeGruttola EMAIL logo

Abstract

Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.


Corresponding author: Victor DeGruttola,Harvard TH Chan School of Public Health, 665 Huntington Ave, Boston, MA02115, USA, E-mail:

Funding source: NIH Clinical Center

Award Identifier / Grant number: AI106039 MH100974 GM110749

Funding source: NIH Clinical Center

Award Identifier / Grant number: R37 AI 51164

Acknowledgment

We would like to thank Dr. Davey Smith for his contribution in providing the HIV data and his insightful comments (grant number: CFAR AI03621).

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission

  2. Research funding: This work was supported by the National Institutes of Health under award numbers AI106039, MH100974, GM110749 (to Susan Little) and R37 AI 51164 (to Hesam Montazeri and Victor DeGruttola). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

Cottam, E.M., Thébaud, G., Wadsworth, J., Gloster, J., Mansley, L., Paton, D.J., King, D.P., and Haydon, D.T. (2008). Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. R. Soc. Lond. B: Biol. Sci. 275: 887–895, https://doi.org/10.1098/rspb.2007.1442.Search in Google Scholar PubMed PubMed Central

Didelot, X., Gardy, J., and Colijn, C. (2014). Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol. Biol. Evol. 31: 1869–1879, https://doi.org/10.1093/molbev/msu121.Search in Google Scholar PubMed PubMed Central

Felsenstein, J. (1981). Evolutionary trees from dna sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368–376, https://doi.org/10.1007/BF01734359.Search in Google Scholar PubMed

Ferguson, N.M., Donnelly, C.A., and Anderson, R.M. (2001). Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature 413: 542–548, https://doi.org/10.1038/35097116.Search in Google Scholar PubMed

Gilchrist, C.A., Turner, S.D., Riley, M.F., Petri, W.A., and Hewlett, E.L. (2015). Whole-genome sequencing in outbreak analysis. Clin. Microbiol. Rev. 28: 541–563, https://doi.org/10.1128/cmr.00075-13.Search in Google Scholar PubMed PubMed Central

Gire, S.K., Goba, A., Andersen, K.G., Sealfon, R.S., Park, D.J., Kanneh, L., Jalloh, S., Momoh, M., Fullah, M., Dudas, G., et al. (2014). Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak. Science 345: 1369–1372, https://doi.org/10.1126/science.1259657.Search in Google Scholar PubMed PubMed Central

Hall, M., Woolhouse, M., and Rambaut, A. (2015). Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions of the node set. PLoS Comput. Biol. 11: e1004613, https://doi.org/10.1371/journal.pcbi.1004613.Search in Google Scholar PubMed PubMed Central

Hall, M., Woolhouse, M., and Rambaut, A. (2016). Using genomics data to reconstruct transmission trees during disease outbreaks. Rev. Sci. Tech. (International Office of Epizootics) 35: 287.10.20506/rst.35.1.2433Search in Google Scholar PubMed PubMed Central

Janssen, R.S., Satten, G.A., Stramer, S.L., Rawal, B.D., O’brien, T.R., Weiblen, B.J., Hecht, F.M., Jack, N., Cleghorn, F.R., Kahn, J.O., et al. (1998). New testing strategy to detect early hiv-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 280: 42–48, https://doi.org/10.1001/jama.280.1.42.Search in Google Scholar PubMed

Jombart, T., Cori, A., Didelot, X., Cauchemez, S., Fraser, C., and Ferguson, N. (2014). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput. Biol. 10: e1003457, https://doi.org/10.1371/journal.pcbi.1003457.Search in Google Scholar PubMed PubMed Central

Jombart, T., Eggo, R., Dodd, P., and Balloux, F. (2011). Reconstructing disease outbreaks from genetic data: a graph approach. Heredity 106: 383–390, https://doi.org/10.1038/hdy.2010.78.Search in Google Scholar PubMed PubMed Central

Jukes, T.H., Cantor, C.R., and Munro, H. (1969). Evolution of protein molecules. Mammalian protein metabolism, pp. 132.10.1016/B978-1-4832-3211-9.50009-7Search in Google Scholar

Keeling, M., Woolhouse, M., May, R., Davies, G., and Grenfell, B. (2003). Modelling vaccination strategies against foot-and-mouth disease. Nature 421: 136–142, https://doi.org/10.1038/nature01343.Search in Google Scholar PubMed

Kenah, E., Britton, T., Halloran, M.E., and Longini, I.M.Jr. (2016). Molecular infectious disease epidemiology: survival analysis and algorithms linking phylogenies to transmission trees. PLoS Comput. Biol. 12: e1004869.10.1371/journal.pcbi.1004869Search in Google Scholar PubMed PubMed Central

Kenward, M.G. and Carpenter, J. (2007). Multiple imputation: current perspectives. Stat. Methods Med. Res. 16: 199–218, https://doi.org/10.1177/0962280206075304.Search in Google Scholar PubMed

Klinkenberg, D., Backer, J.A., Didelot, X., Colijn, C., and Wallinga, J. (2017). Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput. Biol. 13: e1005495, https://doi.org/10.1371/journal.pcbi.1005495.Search in Google Scholar PubMed PubMed Central

Kothe, D., Byers, R.H., Caudill, S.P., Satten, G.A., Janssen, R.S., Hannon, W.H., and Mei, J.V. (2003). Performance characteristics of a new less sensitive HIV-1 enzyme immunoassay for use in estimating hiv seroincidence. J. Acquir. Immune Defic. Syndr. 33: 625–634, https://doi.org/10.1097/00126334-200308150-00012.Search in Google Scholar PubMed

Kouyos, R.D., von Wyl, V., Yerly, S., Böni, J., Rieder, P., Joos, B., Taffé, P., Shah, C., Bürgisser, P., Klimkait, T., et al. (2011). Ambiguous nucleotide calls from population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin. Infect. Dis. 52: 532–539, https://doi.org/10.1093/cid/ciq164.Search in Google Scholar PubMed PubMed Central

Kühnert, D., Stadler, T., Vaughan, T.G., and Drummond, A.J. (2014). Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death sir model. J. R. Soc. Interface 11: 20131106, https://doi.org/10.1098/rsif.2013.1106.Search in Google Scholar PubMed PubMed Central

Lau, M.S., Marion, G., Streftaris, G., and Gibson, G. (2015). A systematic Bayesian integration of epidemiological and genetic data. PLoS Comput. Biol. 11: e1004633, https://doi.org/10.1371/journal.pcbi.1004633.Search in Google Scholar PubMed PubMed Central

Le, T., Wright, E.J., Smith, D.M., He, W., Catano, G., Okulicz, J.F., Young, J.A., Clark, R.A., Richman, D.D., Little, S.J., et al. (2013). Enhanced CD4+ t-cell recovery with earlier HIV-1 antiretroviral therapy. N. Engl. J. Med. 368: 218–230, https://doi.org/10.1056/nejmoa1110187.Search in Google Scholar PubMed PubMed Central

Leventhal, G.E., Kouyos, R., Stadler, T., Von Wyl, V., Yerly, S., Böni, J., Cellerai, C., Klimkait, T., Günthard, H.F., and Bonhoeffer, S. (2012). Inferring epidemic contact structure from phylogenetic trees. PLoS Comput. Biol. 8: e1002413, https://doi.org/10.1371/journal.pcbi.1002413.Search in Google Scholar PubMed PubMed Central

Little, S.J., Pond, S.L.K., Anderson, C.M., Young, J.A., Wertheim, J.O., Mehta, S.R., May, S., and Smith, D.M. (2014). Using HIV networks to inform real time prevention interventions. PLoS One 9: e98443, https://doi.org/10.1371/journal.pone.0098443.Search in Google Scholar PubMed PubMed Central

Lynch, M.L., and DeGruttola, V. (2015). Predicting time to threshold for initiating antiretroviral treatment to evaluate cost of treatment as prevention of human immunodeficiency virus. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 64: 359–375, https://doi.org/10.1111/rssc.12080.Search in Google Scholar PubMed PubMed Central

Mollentze, N., Nel, L.H., Townsend, S., Le Roux, K., Hampson, K., Haydon, D.T., and Soubeyrand, S. (2014). A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data. Proc. R. Soc. Lond. B: Biol. Sci. 281: 20133251, https://doi.org/10.1098/rspb.2013.3251.Search in Google Scholar PubMed PubMed Central

Morelli, M.J., Thébaud, G., Chadœuf, J., King, D.P., Haydon, D.T., and Soubeyrand, S. (2012). A Bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data. PLoS Comput. Biol. 8: e1002768, https://doi.org/10.1371/journal.pcbi.1002768.Search in Google Scholar PubMed PubMed Central

Moss, A.R., and Bacchetti, P. (1989). Natural history of HIV infection. AIDS 3: 55–62, https://doi.org/10.1097/00002030-198902000-00001.Search in Google Scholar PubMed

Numminen, E., Chewapreecha, C., Sirén, J., Turner, C., Turner, P., Bentley, S.D., and Corander, J. (2014). Two-phase importance sampling for inference about transmission trees. Proc. Biol. Sci. 281: 20141324, https://doi.org/10.1098/rspb.2014.1324.Search in Google Scholar PubMed PubMed Central

Popinga, A., Vaughan, T., Stadler, T., and Drummond, A.J. (2015). Inferring epidemiological dynamics with bayesian coalescent inference: the merits of deterministic and stochastic models. Genetics 199: 595–607, https://doi.org/10.1534/genetics.114.172791.Search in Google Scholar PubMed PubMed Central

Pybus, O.G., and Rambaut, A. (2009). Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10: 540–550, https://doi.org/10.1038/nrg2583.Search in Google Scholar PubMed PubMed Central

Romero-Severson, E., Skar, H., Bulla, I., Albert, J., and Leitner, T. (2014). Timing and order of transmission events is not directly reflected in a pathogen phylogeny. Mol. Biol. Evol. 31: 2472–2482, https://doi.org/10.1093/molbev/msu179.Search in Google Scholar PubMed PubMed Central

Smith, D.M., May, S., Tweeten, S., Drumright, L., Pacold, M.E., Pond, S.L.K., Pesano, R.L., Lie, Y. S., Richman, D.D., Frost, S.D., et al. (2009). A public health model for the molecular surveillance of HIV transmission in San Diego, California. AIDS (London, England) 23: 225, https://doi.org/10.1097/qad.0b013e32831d2a81.Search in Google Scholar PubMed PubMed Central

Snitkin, E.S., Zelazny, A.M., Thomas, P.J., Stock, F., Henderson, D.K., Palmore, T.N., Segre, J.A., et al., NISC Comparative Sequencing Program (2012). Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci. Transl. Med. 4: 148ra116, https://doi.org/10.1126/scitranslmed.3004129.Search in Google Scholar PubMed PubMed Central

Spada, E., Sagliocca, L., Sourdis, J., Garbuglia, A.R., Poggi, V., De Fusco, C., and Mele, A. (2004). Use of the minimum spanning tree model for molecular epidemiological investigation of a nosocomial outbreak of hepatitis C virus infection. J. Clin. Microbiol. 42: 4230–4236, https://doi.org/10.1128/jcm.42.9.4230-4236.2004.Search in Google Scholar PubMed PubMed Central

Taffe, P. and May, M. (2008). A joint back calculation model for the imputation of the date of HIV infection in a prevalent cohort. Stat. Med. 27: 4835–4853, https://doi.org/10.1002/sim.3294.Search in Google Scholar PubMed

Volz, E.M., Koelle, K., and Bedford, T. (2013). Viral phylodynamics. PLoS Comput. Biol. 9: e1002947, https://doi.org/10.1371/journal.pcbi.1002947.Search in Google Scholar PubMed PubMed Central

Wang, R., Lagakos, S.W., and Gray, R.J. (2010). Testing and interval estimation for two-sample survival comparisons with small sample sizes and unequal censoring. Biostatistics 11: 676–692, https://doi.org/10.1093/biostatistics/kxq021.Search in Google Scholar PubMed PubMed Central

Wertheim, J.O., Pond, S.L.K., Little, S.J., and De Gruttola, V. (2011). Using HIV transmission networks to investigate community effects in HIV prevention trials. PLoS One 6: e27775, https://doi.org/10.1371/journal.pone.0027775.Search in Google Scholar PubMed PubMed Central

Worby, C.J., Chang, H.-H., Hanage, W.P., and Lipsitch, M. (2014a). The distribution of pairwise genetic distances: a tool for investigating disease transmission. Genetics 198: 1395–1404, https://doi.org/10.1534/genetics.114.171538.Search in Google Scholar PubMed PubMed Central

Worby, C.J., Lipsitch, M., and Hanage, W. P. (2014b). Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput. Biol. 10: e1003549, https://doi.org/10.1371/journal.pcbi.1003549.Search in Google Scholar PubMed PubMed Central

Ypma, R., Bataille, A., Stegeman, A., Koch, G., Wallinga, J., and Van Ballegooijen, W. (2012). Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data. Proc. R. Soc. Lond. B: Biol. Sci. 279: 444–450, https://doi.org/10.1098/rspb.2011.0913.Search in Google Scholar PubMed PubMed Central

Ypma, R.J., van Ballegooijen, W.M., and Wallinga, J. (2013). Relating phylogenetic trees to transmission trees of infectious disease outbreaks. Genetics 195: 1055–1062, https://doi.org/10.1534/genetics.113.154856.Search in Google Scholar PubMed PubMed Central


Supplementary material

The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2019-0026).


Received: 2019-05-20
Accepted: 2020-09-16
Published Online: 2020-10-22

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 26.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2019-0026/html
Scroll to top button