Elsevier

Water Research

Volume 196, 15 May 2021, 117001
Water Research

Prediction of antibiotic-resistance genes occurrence at a recreational beach with deep learning models

https://doi.org/10.1016/j.watres.2021.117001Get rights and content

Highlights

  • Deep learning models were proposed to predict ARG occurrence at a recreational beach.

  • LSTM-CNN improved prediction accuracy of ARG occurrence over conventional LSTM.

  • IA-LSTM was superior to LSTM-CNN in predicting multiple ARGs simultaneously.

  • Rainfall, tides, and salinity affected the prediction of ARG occurrence.

Abstract

Antibiotic resistance genes (ARGs) have been reported to threaten the public health of beachgoers worldwide. Although ARG monitoring and beach guidelines are necessary, substantial efforts are required for ARG sampling and analysis. Accordingly, in this study, we predicted ARGs occurrence that are primarily found on the coast after rainfall using a conventional long short-term memory (LSTM), LSTM-convolutional neural network (CNN) hybrid model, and input attention (IA)-LSTM. To develop the models, 10 categories of environmental data collected at 30-min intervals and concentration data of 4 types of major ARGs (i.e., aac(6′-Ib-cr), blaTEM, sul1, and tetX) obtained at the Gwangalli Beach in South Korea, between 2018 and 2019 were used. When individually predicting ARGs occurrence, the conventional LSTM and IA-LSTM exhibited poor R2 values during training and testing. In contrast, the LSTM-CNN exhibited a 2–6-times improvement in accuracy over those of the conventional LSTM and IA-LSTM. However, when predicting all ARGs occurrence simultaneously, the IA-LSTM model exhibited a superior performance overall compared to that of LSTM-CNN. Additionally, the influence of environmental variables on prediction was investigated using the IA-LSTM model, and the ranges of input variables that affect each ARG were identified. Consequently, this study demonstrated the possibility of predicting the occurrence and distribution of major ARGs at the beach based on various environmental variables, and the results are expected to contribute to management of ARG occurrence at a recreational beach.

Introduction

The emergence of antibiotic resistance genes (ARGs) as aquatic environment contaminants (Pruden et al., 2006) has become a significant global threat to human public health. ARGs are released from landfills or sludge through runoff, and they can flow into recreational areas along the coast (Zhang et al., 2016b). Specifically, recreational beaches are susceptible to ARG contamination through various sources such as wastewater treatment plants (Proia et al., 2018), animal feed mills (Fang et al., 2018), and storm runoff (Joy et al., 2013). Hence, in a previous study, surfers were found to be 4.2 times more likely to be exposed to ARGs than non-surfers in the swimming areas in England (Leonard et al., 2018). The rainfall effect is known to naturally dilute ARGs; however, ARGs are not sufficiently managed in marine environments because of current global wastewater management practices (Bedri et al., 2015; Law and Tang, 2016).

Monitoring of ARGs at recreational beaches is required for beach user safety. However, ARG monitoring has the following limitations. Conventional analysis methods are time-consuming; it takes 5.2 d on average to verify incubation results (McAdam et al., 2012). Current molecular biological techniques such as quantitative polymerase chain reaction (qPCR) have been used to identify and quantify certain ARGs (de Castro et al., 2014; Schmieder and Edwards, 2012). Although qPCR is simpler and faster compared to conventional techniques such as the culture method or traditional PCR (Kralik and Ricchi, 2017; Smith and Osborn, 2009), regular monitoring is restricted due to the high cost of qPCR analysis (Sakthivel et al., 2012). Although multiplex PCR has been developed to save time and effort by reacting multiple single PCRs simultaneously, it is less accurate because it responds to nonspecific amplification products (Jansen et al., 2011; Sakthivel et al., 2012). Therefore, for preemptive responses within a limited timeframe for ARG occurrences at beaches, prediction through modeling can be more efficient than through monitoring.

Long short-term memory (LSTM), a type of recurrent neural network (RNN), has been widely used as an efficient tool to simulate and predict water quality due to an ability to extract features from time-series data (lin Hsu et al., 1997). For example, Barzegar et al. (2020) recently utilized LSTM and LSTM hybrid models to predict water quality variables in a lake. An advantage of LSTM is that it can use memory to learn features over time. Accordingly, it is considered a suitable neural network (NN) for predicting pollutant distributions and water quality over time (Wang et al., 2019; Wang et al., 2017). On the other hand, hydrological models suffer from higher uncertainties because of their inability to simulate complex mechanistic relationships among environmental variables (Abimbola et al., 2020). Although deep learning models are black box models, they can improve performance by training from observation data (Andrychowicz et al., 2016) and simulate nonlinear phenomena occurring in the environment. In particular, deep learning models have been widely used to enhance the prediction performance of hydrological models (Parmar et al., 2017; Sumi et al, 2012). Therefore, hypothetically, it is expected that the accuracy of deep learning will be higher than that of hydrological models to predict ARGs at a recreational beach of Korea affected by rain in a short period.

Based on the collected literature, however, the potential of LSTM has yet to be utilized to estimate ARGs released into the environment. We previously observed the occurrence of ARGs at a combined sewer overflow (CSO) site in Gwangalli Beach over time, which varied in relation to rainfall and tides (Jang et al., 2021). Recreational activities at the beach are concentrated in the summer and the beach is annually affected by monsoon weather. Therefore, ARG prediction is significant for preserving the health of beachgoers, and the application of LSTM would be promising in predicting the occurrence of ARGs over time. Therefore, in this study, we propose an approach based on NN techniques to predict ARGs occurrence quickly and accurately for managing and monitoring their occurrence in beach environments. This study compared conventional LSTM, LSTM-convolutional NN (CNN), and input attention (IA)-LSTM models (Fig. 1) with the following objectives: 1) to propose applicable models for predicting four major ARGs (i.e., aac(6′-Ib-cr), blaTEM, sul1, and tetX) at a recreational beach, 2) to compare model accuracies when predicting single ARG individually and multiple ARGs simultaneously, and 3) to determine critical environmental features for predicting ARG occurrences.

Section snippets

Sampling location and period

Gwangalli Beach, a popular beach in South Korea, was selected as the study area. The eastern coast of Gwangalli Beach is adjacent to the Suyeong River estuary, which consists of urban areas and has a wastewater treatment plant and several sewer outlets along the river (Fig. 2). The total area of the beach is 82 000 m2; the beach is 1.4 km in length and 25–110 m in width along the coastline (Choi et al., 2016). Seawater samplings were conducted at a CSO outfall on the right side of the beach (

Hyperparameter optimization for single ARG prediction

By comparing the partial dependence plots in objective plots of all the models (Figs. S4 and S5), we can infer that the learning rate was the most sensitive parameter during optimization. In contrast, the activation function in the CNN layer was the least sensitive parameter for the NN across models. The optimization results also demonstrated that a higher lookback value resulted in a greater reduction in the model test MSEs in all cases except for blaTEM. No uniform trend for batch size can be

Conclusions

The goal of this study was to improve the accuracy of predictions for ARG occurrence and to identify the variables that affect these predictions. Thus, in this study, the conventional LSTM, LSTM-CNN hybrid, and IA-LSTM models were compared to predict ARGs occurrence according to environmental variables. The primary results of this study are as follows:

  • 1)

    The sequential convergence of LSTM and CNN resulted in improved performance compared to that of conventional LSTM to predict single ARGs. We show

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2017R1D1A1B04033074), and Korea Environment Industry and Technology Institute (KEITI) through the Aquatic Ecosystem Conservation Research Program funded by Korea Ministry of Environment (MOE) (No. 2020003030003).

Reference (53)

  • S.K. Sakthivel et al.

    Comparison of fast-track diagnostics respiratory pathogens multiplex real-time RT-PCR assay with in-house singleplex assays for comprehensive detection of human respiratory viruses

    Journal of Virological Methods

    (2012)
  • J. Shin et al.

    Thermophilic anaerobic digestion: Effect of start-up strategies on performance and microbial community

    Science of The Total Environment

    (2019)
  • P. Wang et al.

    Exploring the application of artificial intelligence technology for identification of water pollution characteristics and tracing the source of water quality pollutants

    Science of The Total Environment

    (2019)
  • Y. Wang et al.

    Water quality prediction method based on LSTM neural network

    (2017)
  • X.-H. Zhang et al.

    Occurrence of antibiotic resistance genes in landfill leachate treatment plant and its effluent-receiving soil and surface water

    Environmental Pollution

    (2016)
  • M. Abadi et al.

    Tensorflow: A system for large-scale machine learning

    (2016)
  • A Zheng et al.

    Feature engineering for machine learning: principles and techniques for data scientists

    (2018)
  • Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., … De Freitas, N. (2016). Learning to...
  • D. Bahdanau et al.

    End-to-end attention-based large vocabulary speech recognition

    (2016)
  • R. Barzegar et al.

    Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model

    Stochastic Environmental Research and Risk Assessment

    (2020)
  • S.-H. Choi et al.

    Effects of Rainfall on Microbial Water Quality on Haeundae and Gwangan Swimming Beach

    Journal of Bacteriology and Virology

    (2016)
  • F. Chollet

    Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek

    (2018)
  • T.M. Cover

    Elements of information theory

    (1999)
  • A.P. de Castro et al.

    Insights into novel antimicrobial compounds and antibiotic resistance genes from soil metagenomes

    Frontiers in Microbiology

    (2014)
  • K. Fukushima

    Neural network model for a mechanism of pattern recognition unaffected byshift in position

    Neocognitron.Trans. IECE

    (1979)
  • George, D. and Mallery, M. (2010) SPSS for Windows Step BysStep: A Simple Guide and...
  • Cited by (22)

    View all citing articles on Scopus
    1

    These authors contributed equally to this study.

    View full text