Introduction

Foot-and-mouth disease (FMD), a contagious disease that affects cloven-hoofed animals, is caused by FMD virus (FMDV), a member of the Aphthovirus genus within the Picornaviridae family [1]. There are seven distinct types of FMDV serotypes: O, A, C, Asia 1, SAT 1, SAT 2, and SAT 3. Among them, serotype A is widely distributed and considered to be highly genetically and antigenically diverse, making vaccination control difficult [2, 3].

The nucleotide sequence encoding VP1, one of the structural proteins constituting the capsid, is used for characterization and phylogenetic analysis of FMDV [2, 4]. Based on the analysis of VP1 sequences, serotype A was classified into 26 genotypes [5]. 11 out of the 26 genotypes belonged to the ASIA topotype, and three of them seem to be the major lineages until recently. Several subtypes of A/ASIA/Iran-05 and G-VII were identified, and many studies were conducted on them [3, 6,7,8,9,10,11,12,13]. However, although A/ASIA/Sea-97 was reported in various countries in East Asia and Southeast Asia [14], only a few studies were conducted using a small number of sequence data [15, 16].

In this study, we investigated the phylogeny and evolution of Sea-97 lineage. All publicly available FMDV A/ASIA VP1 sequences were collected and used to reconstruct the phylogenetic tree. We subdivided Sea-97 into five groups based on the phylogeny and analyzed their phylodynamics.

Materials and methods

We collected VP1 coding region, polyprotein, and full genome sequences of FMDV A/ASIA from GenBank (www.ncbi.nlm.nih.gov). All nucleotide and protein sequences were aligned using MAFFT v7.419 and trimmed manually using MEGA X [17, 18]. Based on aligned protein sequences, multiple codon alignment was performed using PAL2NAL web server [19]. The final dataset contained 224 VP1 sequences of 633 bp isolated from eighteen countries between 1960 and 2018. The GenBank accession numbers are provided in Table S1.

Pairwise distances between 224 VP1 sequences were calculated and an unweighted pair group mean average (UPGMA) tree was constructed using DNADIST and NEIGHBOR in PHYLIP package v3.698 [20]. The F84 model was used to compute the distance matrix, which assumed unequal base frequencies and different transition and transversion rates. The resulting tree was visualized in FigTree v1.4.4 [21]. Then, we distinguished nine lineages based on the prototype strains specified in FAO World Reference Laboratory for FMD (WRLFMD) [22].

Bayesian evolutionary analysis was performed using BEAST v2.6.3 [23]. The best-fit nucleotide substitution model was determined using ModelTest-NG [24]. We used relaxed clock log normal and coalescent Bayesian skyline model as a prior. Four independent Markov chain Monte Carlo (MCMC) chains were run for 50 million steps, sampled every 5,000 steps, and then combined using LogCombiner [23]. The first 10% of chain in each run were discarded as burn-in. We analyzed the MCMC output log file using Tracer v1.7.1 [25]. We construct the maximum clade credibility (MCC) tree using TreeAnnotator [23].

Then, we extracted VP1 sequences of Sea-97 and conducted Bayesian analysis (two chains for 50 million MCMC iterations, sampled every 5000 steps). We inferred phylogeographical history of Sea-97 and constructed the corresponding Bayesian tree. The MCC tree with location traits was visualized using SpreadD3 v0.9.6 [26]. Bayesian skyline plot (BSP) was reconstructed in Tracer v1.7.1 [25].

We investigated selection pressures on VP1 gene of Sea-97 using Datamonkey 2.0 webserver [27]. Fixed Effects Likelihood (FEL), Fast, Unconstrained Bayesian AppRoximation (FUBAR), Single-Likelihood Ancestor Counting (SLAC) methods, and Mixed Effects Model of Evolution (MEME) were used to detect sites under pervasive and episodic positive selections [28,29,30].

Results and discussion

In general, when classifying the subtype of FMDV, the UPGMA tree is constructed using the VP1 sequences and the percent nucleotide divergence (ND) is measured [5, 31,32,33]. The threshold for classifying the sub-lineage is not clearly established, but it seems that lineages can be divided if they show at least 2.7–3.5% ND [34]. Figure 1a shows the UPGMA tree for 224 VP1 sequences of ASIA topotype. A15, Thai-87, and Sea-97 were isolated in East Asia and Southeast Asia countries, and only Sea-97 appears to be circulating at present time. As shown in the UPGMA tree, Sea-97 was clustered into five major groups denoted G1 to G5 (Table 1). G4 was distinguished from the other groups of Sea-97 at 4.5% ND. G1 and G5 were separated at the ND of 4.3% and 4.2%, respectively. The clades of G2 and G3 were divided at 2.7% ND. WRLFMD prototype strains of Sea-97 (A/TAI/2/97 and A/TAI/7/2003) belonged to G1 group. G1, G2 and G5 only included isolates from Southeast Asia, but G3 and G4 expanded to East Asia.

Fig. 1
figure 1

a UPGMA tree derived from 224 VP1 nucleotide sequences of FMDV A/ASIA. The x-axis represents the %ND. Branches are colored according to the lineage. b MCC tree derived from 224 VP1 nucleotide sequences of FMDV A/ASIA. The x-axis represents the year. Branches are colored according to the lineage and branches of G-VII and Iran-05 are collapsed. c BSP of A/ASIA/Sea-97. The x-axis and y-axis represent the year and the effective population size, respectively. The thick blue line indicates median effective population size, and the light blue region means their 95% HPD interval. d Spatial distribution of A/ASIA/Sea-97. The lines show the transmission between locations. The size of the red circles is proportional to the intensity of the virus presence in that region

Table 1 The summary of groups of FMDV A/ASIA/Sea-97 classified based on UPGMA tree

The MCC tree for VP1 sequences of the ASIA topotype is shown in Fig. 1b. The most recent common ancestor (MRCA) of Sea-97 was dated to be around 1993 [95% HPD = 1990.3525–1996.5827]. Then, G2 emerged from G1 in 2001.5275 [95% HPD = 2000.5886–2002.3205], and G3, G4, and G5 emerged from G2 in 2005.7843 [95% HPD = 2004.9717–2006.5114], 2010.395 [95% HPD = 2008.8512–2011.5133], and 2012.0403 [95% HPD = 2011.2336–2012.7064], respectively.

A set of 124 VP1 sequences of Sea-97 was further analyzed to infer evolutionary history. The mean evolutionary rate of Sea-97 (\(1.2\times {10}^{-2}\) s/s/y [95% HPD = \(9.29\times {10}^{-3}-1.51\times {10}^{-2}\)]) was estimated to be much greater than that of global FMDV serotype A samples (\(4.26\times {10}^{-3}\) [35] and \(5.77\times {10}^{-3}\) [36] s/s/y), and it was similar to the rates of other lineages of topotype ASIA (\(1.25\times {10}^{-2}\) for Iran-05 [7] and \(1.1\times {10}^{-2}\) for G-VII clade C [13] s/s/y). The BSP showed that the genetic diversity of Sea-97 isolates was constant until 2001, then increased sharply around 2002–2003, and then remained constant again (Fig. 1c). This pattern of population size is probably due to the emergence of G2 around 2001 and subsequent emergences of G3, G4, and G5. The reconstructed spatial diffusion of Sea-97 showed that this lineage first occurred in Thailand and spread to Malaysia and neighboring countries, and then from Vietnam to China and later to South Korea (Fig. 1d). It seems that this lineage has been actively circulating within Southeast Asian countries.

Amino acid sequences of each group of Sea-97 were compared with the sequence of A22 (A22/IRQ/24/64), one of the widely used vaccine strains [37]. The sequence logo of each group obtained using Jalview v2.11.1.3 [38] was used for comparison (Fig. S1). The length of VP1 sequence of A22 was 211 amino acids and there was a gap at position 140 in Sea-97. Positions 43–45, 83, 96, 141–160 (G-H loop), 169–173 and 200–211 (C-terminus) of VP1 were previously suggested as candidate regions that may affect antigenic properties of FMDV serotype A [9, 39,40,41,42,43,44]. Compared to A22, Sea-97 had many substitutions at these residues (Q43K, N44P, L45V, D83T/A, T141E/G/V/A/Q, G142T/N/P/S, P149S, V154I/L, A160T, T171E, H173Q/R, H201Y and K204R). Most of sites under positive selection also belonged to these candidate regions (Table S2). In particular, Q43K, N44P, D81T/A, T141E/G/V/A, G142T/N/S, P149S, A160T, T171E, H173Q and H201Y have been substituted with amino acids with different biochemical properties. Therefore, it has the potential to have a major impact on antigenic properties. There is already a study suggesting that P149 rather than S149 matches well with the A22 vaccine [43]. It seems necessary to test whether these substitutions actually affect the efficacy of the currently used vaccine. If the existing vaccine does not match Sea-97 well, an appropriate vaccine for this lineage needs to be developed.

This study provides information to understand comprehensive characteristic of the Sea-97 lineage. However, the number of sequences available in the current public database is insufficient. Although the number of recent sequence data are small, our findings could provide meaningful basic information on strategies to control FMDV in Asia. Based on our results, it seems necessary to collect more data and perform an extended analysis in future studies.