Introduction

The SARS-Cov2 during Covid-19 pandemic that has already infected 151 million people and claimed more than 3.1 million lives across the world in a span of 15 months is undeniably among the greatest crises facing mankind in the century. This novel coronavirus continues to challenge the health care and administrative systems of countries worldwide with its high rate of infectivity (spread) (Aguilar et al. 2020; He et al. 2020; Petersen et al. 2020).

Typical of influenza virus infections, majority of fatalities in SARS-Cov2 infections are observed in people of higher age group (over 65 years) with weakened immune systems. Among persons under 21 years of age, fatality was higher in individuals with preexisting medical conditions as well as very young children (Bixler et al. 2020). Similar to other influenza viruses such as human influenza A (H5N1), SARS-CoV2 infection also induces cytokines storm in host which may cause severe acute respiratory syndrome, multiple organ failure and death (Ratajczak and Kucia 2020; Song et al. 2020). These inflammatory responses are typical of infections with heavy viral load in the hosts for influenza viruses (Boon et al. 2011; De Jong et al. 2006).

Interestingly, a large proportion of the individuals infected with SARS-Cov2 virus are asymptomatic harboring relatively lower viral loads (Zhou et al. 2020) while simultaneously being capable of spreading the infection themselves. It is the strong prevalence of such asymptomatic carriers that make containment measures difficult in the Covid-19 pandemic (Yu and Yang 2020). As per a study on evacuated people from China to Japan, asymptomatic ratio was nearly 30% (Nishiura et al. 2020). Though SARS-Cov2 genome sequence, its mutant and their potential impact on disease management have been investigated (Wu et al. 2020; Leung et al. 2020; Li et al. 2021; Starr et al. 2021), the molecular mechanism behind the prevalence of low viral load and asymptomatic cases is largely unexplored. Here we attempted an in silico  dissection of the molecular peculiarities of the SARS-Cov2 viral genome using bioinformatic tools to develop a theoretical hypothesis behind the prevalence of asymptomatic cases in Covid-19.

Materials and method

In silico analysis of the molecular architecture of ORF1ab was carried out for 27 coronaviruses including Middle East respiratory syndrome coronavirus, SARS coronavirus and multiple novel coronavirus isolates (Table 1). ORF1ab of coronaviruses were subjected to Simple Modular Architecture Research Tool (SMART) for identification & annotation of protein domains and architectures (Letunic et al. 2015). Online web server of Sequence Manipulation Suite (https://www.bioinformatics.org/sms2/codon_usage.html) was used to estimate codon usage frequency for each amino acid in each coronavirus genome. Identification of putative introns between ORF1a and ORF1b was done based on standard GT-AG rule, and the presence of branch site (Wu and Krainer 1996). Generunner software (http://www.generunner.net/) was used for in silico sequence analyses viz to check translation frame, identify putative introns and find their length. Excision of identified putative introns and verification of the correctness of the reading frame after rejoining ORF1a and ORF1b was performed in silico using Generunner software(http://www.generunner.net/). Protein Homology/analogy Recognition Engine V 2.0 (Phyre2) is a free web-based services for protein structure modeling, prediction and analysis (Kelley et al., 2015). In silico protein sequence derived from ORF1ab of SARS-Cov2 was subjected to Phyre2 for identification of putative enzymes encoded in genome for RNA splicing.

Table 1 Sequence features of beta-coronaviruses

Results and discussion

This study attempted an in silico exploration of the novel coronavirus genomic features underlying the high prevalence of asymptomatic carriers. A basic feature of the ORF1ab of coronaviruses appears to be the presence of a conserved ribosomal slippage site. Closer examination also reveals that the ribosomal slippage junction of all the studied coronaviruses consistently features a ‘GGG’ codon (Table 1). Now, though this GGG codon at the ribosomal slippage site presents itself within the correct frame, the translating machinery reading through it would invariably encounter a pre-mature termination codon (PTC). In other words, reading through the ‘GGG’ at the ribosomal slippage site disrupts the translation of key proteins such as viral RNA polymerase (RPol), RNA-dependent RNA polymerase (RdRP), helicase, non-structural protein 11 (Nsp11) and Nsp13 located downstream of this junction in ORF1b (Fig. 1). In addition, while introns in ORF1ab are not reported in coronaviruses, we observed multiple putative introns in silico between the coding regions of Nsp10 and RPol based on the standard GT-AG rule (Wu and Krainer 1996). In silico excision of the observed putative introns in this region (that would also remove this 'GGG' codon from the ribosomal slippage site) could place ORF1b with ORF1a in correct frame without affecting the size of the preceding and succeeding domains (Nsp10 and RPol). Intriguingly, reading the viral RNA in a (− )1 frame at this ribosomal slippage site also produces the same result. This molecular position binds the virus to exercise one of the two options for successful translation of ORF1ab for replication in the host— either intron excision by RNA splicing or reading the template from a (− )1 frame at the ribosomal slippage site to generate a read through ORF (Fig. 2).

Fig. 1
figure 1

Schematic representation of PFAM domains of ORF1ab gene of SARS-Cov2 (NCBI Accession no.: MN908947): The PFAM domains on ORF1a prior to the ribosomal slippage site are indicated by upright triangles; the downturned triangles PFAM domains after point of slippage from ORF1b. The image is representative of the actual sequential order of the PFAM domains but not the genomic distance between them. Transmembrane domain features have also been excluded to represent the image clearly. (A.A.: amino acid; nt: nucleotide)

Fig. 2
figure 2

Pictorial representation of the theoretical analysis of differential transmissibility in human beta-coronaviruses based on GGG codon usage frequency and putative introns

An interesting twist to this simplistic model is that SARS coronaviruses are known to replicate in the host cell cytoplasm (Klein et al. 2020; Knoops et al. 2008; Snijder et al. 2006; Stertz et al. 2007), while the spliceosome complex required for intron removal reside inside the cell nucleus (Pessa et al. 2008). However, proteins homologous to enzymes of intron excision pathways have been identified from coronaviruses including SARS-Cov (Snijder et al. 2003). Curiously, in silico protein folding prediction models for ORF1b segment of SARS-Cov2 (Accession number: MN908947) polypeptide trained on 2’-O-MT, intron binding protein and pre-mRNA splicing factors also indicate 100% probability of homology (Suppl. file 1, 2, 3 & 4). Read together, based on bioinformatics analysis there is scope to speculate that these viral genomes encode their own splicing enzyme, albeit with limited experimental evidence.

Frameshift translation in eukaryotic systems occurs either by a programmed ribosomal slippage or due to stalling of the ribosomes during a translation event when faced with unavailability of specific tRNA matching the RNA template codon. Programmed ribosomal slippage in association with an RNA pseudoknot has been reported in coronaviruses (Brierley et al. 1989). Curiously, we also observed that coronavirus genomes have a low frequency (10%) of GGG codon usage for glycine (Table 1) compared to other common human viruses (Table 2). The GGG codon usage frequency was especially low for SARS group of viruses, with the lowest inSAS-Cov2 (3%). This would imply that the tRNA corresponding to the ‘GGG’ codon in the viral genome would be abundant in the tRNA pool of the host cell, leading to extremely low probability of ribosomal slippage events. Thus the viral replication in the host would continue to remain at low levels. Intense inflammatory response to influenza-like viral infections leading to clinical disease manifestations is significantly correlated with viral load in the hosts (Boon et al. 2011; De Jong et al. 2006). Thus basal level replication would ensure that the virus triggers negligibly low immune reaction in otherwise healthy hosts, resulting in asymptomatic cases. Indeed, SARS-CoV2 viral load in nasopharyngeal swabs, have been observed to be several fold less in 'asymptomatic patients' than the 'asymptomatic patients in the incubation period' (Zhou et al. 2020). At the same time, these asymptomatic patients also demonstrate a period of viral shedding (Zhou et al. 2020), during which viral transmission is a strong possibility and complicates containment (Yu and Yang 2020).

Table 2 GGG codon usage frequency in common human viruses

A similar strategy is observed in Rous sarcoma virus (RSV) where the frameshift site features a stop codon (Jacks et al. 1988). However, by placing a functional codon that has been used sparsely in the genome at the frameshift site, the probability of frameshift translation is further reduced, as in the case of SARS-Cov2. We speculate that this is the reason for the high prevalence of asymptomatic carriers for SARS-CoV2. Strengthening our hypothesis, the closely related MERS beta-coronavirus (GGG codon usage 7%) exhibits quicker progression of disease in infected individuals (Hilgenfeld and Peiris 2013).

In yeast model, natural modification by addition of methyl derivatives on uridines at wobble position promotes decoding of G-ending codon (Johansson et al. 2008). In silico analysis of the ORF1b segment of SARS-Cov2 (Accession number: MN908947) polypeptide predict the presence of an S-adenosyl-L-methionine-dependent methyltransferases domain in the viral genome. Assuming a phenomenon similar to yeast in human cells, this could potentially help in unhindered decoding of other GGG codons in the SARS coronavirus genome despite poor abundance of cytoplasmic tRNA corresponding to the ‘GGG’ codon. On the other hand, same can also assist SARS coronaviruses for avoiding ribosomal slippage and producing more asymptomatic cases. Coronavirus replication in vitro gets inhibited after supplementation of 'D, L-lysine acetylsalicylate and glycine' (Muller et al. 2016). These two studies invite further investigations to understand the evolution of molecular mechanisms for coronavirus replication strategies and their relation with the prevalence of asymptomatic carriers.

Multiple introns of lower (26 & 44 nucleotide) size ranges in this genomic region were also characteristic of coronaviruses with lower GGG codon usage preference such as SARS and SARS-Cov2. In addition, the intron sizes also appeared to be conserved among several viruses in our study, suggesting a definite selective basis to these molecular features. On the other hand, we could observe only a single, 89 nucleotide- long putative intron in MERS. In fact, in silico excision of even this putative intron in MERS using SMART resulted in the disruption of either Nsp10 or RPol domain. Notably, SARS-Cov2 possesses more infectivity (transmission ability) than the SARS and MERS (Chu et al. 2020; Petersen et al. 2020).

Since multiple introns offer a wider probability for generation of correct reading frames, it may be argued that the SARS and SARS-Cov2 viruses should preferably resort to the intron excision method for rapid replication over frameshift translation. In fact, influenza viruses are known to hijack host splicing machinery to process some of their own RNA (Dubois et al. 2014) as well as possess features aiding in programmed ribosomal frameshifting (Firth et al. 2012). Through subgenomic RNAs (sgRNA) quantification from the SARS-CoV2 infected people, it has been learned that transcription is repressed in asymptomatic cases compared to symptomatic cases (Wong et al. 2021). The study also revealed, higher prevalence of structural deletions in SARS-CoV2 RNAs in symptomatic cases. Together, these two observations support our hypothesis of more active transcription and splicing of the viral RNA in symptomatic cases. However, it needs to be remembered that the ssRNA genome of coronaviruses is the positive sense strand for viral protein translation (Wu et al. 2020). Therefore, excision of the introns from the initial viral particles would literally destroy the true copies of the original genetic material from the host system, thus eliminating raw material for further mutation and evolution. With this logic it is tempting to suggest the presence of a molecular switch that dictates which of the two mechanisms would be adopted by the virus for replication at a given time or tissue location. We also suggest that the presence of these combined hindrances to replication is in fact the major selective advantage to the SARS-Cov2 virus, resulting in the rapid spread of the disease. Indeed, viruses that replicate rapidly, sending the host immune systems into overdrive in a short duration are at a disadvantage, since rapid development of symptoms help elimination of infected individuals before the virus has a chance to spread in the population (Fig. 2).

Based on bioinformatic analyses of the SARS-Cov2 genome, we suggest that the SARS-Cov2 viral replication in host cells is strongly dependent on either a programmed frameshift translation at a specific ribosomal slippage site in the ORF1ab region or excision of introns within this region. The inherent presence of these two hindrances to viral replication appears to be the reason for its slower pace of replication, resulting in a high prevalence of asymptomatic carriers in the host population. Though our study provides an insight on molecular peculiarities of SARS-Cov2 underlying the high prevalence of asymptomatic cases, our observations are exclusively from in silico observations and require experimental testing and validation.