1 INTRODUCTION

Coronavirus belong to coronaviridae family, genus betacoronavirus, and subgenus sarbecovirus. Coronaviridae includes numerous birds and mammalian coronaviruses [1, 2]. Human to human coronaviruses was detected after its outbreak in Southern China in 2003 [3–5]. It was associated with severe acute respiratory symptoms (SARS), therefore it was named SARS-Coronavirus (SARS-CoV) [1, 6]. Its worldwide spread in 2003 outbreak caused above 8000 infections and more than 774 confirmed dead [1]. It was detected in the civets at the Himalayan palm [7]. Genome comparison confirmed that the civet viral isolate had 29 missing nucleotide of the open reading frame 10 (orf10) in most of characterized human isolates in the 2003 outbreak [7]. This led to the suggestion that the missing nucleotides caused the transmission of the virus from civets to human [1]. Another version of the virus was isolated from horseshoe bats [8] with 29 nucleotide insertion in orf8 (Bat-SARS-CoV) compared to most characterized human isolates. This genomic relationship suggested a common ancestor for civets, bats, and human SARS-CoV genomes [8]. After SARS outbreak in 2003, bats were considered the reservoir for future human CoV pandemics [9]. In 2012, the Middle East Respiratory coronavirus (MERS-CoV) was detected in Saudi Arabia [10, 11]. It is believed that it was transmitted from dromedary camels to human [12] but its origin was linked also to bats [13]. It caused 2521 infections and the death of 919 (35%) [14].

In 2019, a novel coronavirus (COVID-19) appeared in China (Wuhan City, Hubei Province). It is believed that COVID-19 originated from fresh seafood [15, 16]. This version of coronavirus was able to transmit from human to human [17, 18]. It has been spread in 193countries with above 10 Million confirmed infection and more than 500 000 confirmed deaths [19].

Analysis of COVID-19 full genome showed that it is similar to betacoronavirus, yet it is different from the previous SARS-CoV or MERS-CoV [15]. The COVID-19 diverged with the Bat_SARS-CoV in a separate group of sarbecovirus [15]. Genome study of COVID-19 and the Bat SARS-CoV (isolate BatCoV RaTG13) revealed that the genetic similarity between COVID-19 and RaTG13 indicated that COVID-19 is not the exact variant that led to the outbreak in China. However, the COVID-19 could have originated form the bats. Also, this study confirmed that COVID-19 did not result of recombination and not a mosaic [14]. Bioinformatics analysis using nucleotide sequence of COVID-19 genome isolated from patients revealed that the COVID-19 has 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Using amino acid sequence of the expected orfs of COVID-19 showed that it was diverged with bat, civet, and human SARS-CoV. Yet, unlike other coronaviruses, its orf3b produce a shorter protein and its orf8 encode a secreted protein making the source of the cOVID-19 version is undetectable [20].

Interaction between the COVID-19 spike protein (S) receptor and its host receptor angiotensin-converting enzyme 2 (ACE2) was investigated based on similar information obtained from SARS-CoV. The amino acid (aa) sequence of COVID-19 S protein including the receptor-binding domain (RBD) which interact with ACE2 is similar to that of SARS-CoV. This supports that the COVID-19 use ACE2 as its receptor and it has more affinity to human ACE2 and other animals, explaining its capability of human cell infection and human-human transmission [21].

The question now is where the COVID-19 came from and how similar are the isolates from different patients and different countries? Also, the wide spectrum of symptoms of the virus starting from no symptoms to death is a second key question. These are fundamental questions need to be answered for better understanding of the virus origin, transmission, and severity. In this study, we investigated the similarity of nucleotide sequence of 38 COVID-19 isolates from 6 countries to evaluate differences among them. Similarity among COVID-19 at the nt sequence or the predicted orfs were investigated. The role of human endogenous retroviruses (HERVs) in the COVID-19 wide range of symptoms is also discussed.

2 MATERIALS AND METHODS

2.1 Nucleotide and Protein Sequences

All nucleotide sequences of COVID-19 or SARS-CoV complete genome nt sequence of isolates were obtained from NCBI nucleotide database (https://www. ncbi.nlm.nih.gov/nuccore). Isolates included 17 from China, 10 from USA, 5 from Japan, 2 from Hong Kong, 2 from Taiwan, 1 from South Korea, 1 from Australia (Table 1).

Table 1.   Nucleotide sequence identity to the first reported case from China isolateHZ-1 (Accession no. MT039873.1)

2.2 Blast and Multiple Alignment Analysis of COVID-19 Isolates

The sequence of the first reported COVID-19 isolate from China (HZ-1, MT039873.1) was used in a BLAST search to determine the identity of its sequence with other sequences reported from China or other countries in the nucleotide database. The nt sequence of isolates were aligned using Clustal Omega (ClustalO) multiple alignment service (https://www.ebi.ac.uk/Tools/msa/ clustalo/). Phylogenetic tree of isolate sequence was constructed using the same ClustalO. Nucleotide SNPs were detected manually in the aligned sequences.

2.3 Expected ORFs of Different COVID-19 Isolates

The expected orfs of each COVID-19 isolate were obtained from the NCBI graphics view of the nucleotide accession at the NCBI nucleotide database website (https://www.ncbi.nlm.nih.gov/nuccore).

3 RESULTS

3.1 Nucleotide Sequence Identity of COVID-19 and Other Corona Viruses

The first Chinese reported sequence (MT039873.1) of COVID-19 was used in a BLAST search. This search revealed high identity to the other 38 COVID-19 isolates (Table 1). These included 16 other reported sequences from China, 11 form USA, 5 from Japan, 2 Hong Kong, 2 from Taiwan, and 1 from Australia. High identity of these isolates was observed to the Chinese isolate ranging from 100 to 99.91% (Table 1) with query coverage range from 99–100%. Interestingly, the Chinese first reported case showed 96.11% identity and 99% coverage with the Chinese BatCoV-RaTG13 (MN996532.1) isolate; closest identity in this study. More important, its identity to the closest isolate of SARS-CoV (AY395003.1) was 82.34% identity and 88% query coverage (Table 1).

3.2 Phylogenetic Relationship among COVID-19 Isolates

Phylogenetic relationship among the 38 COVID-19 isolates reported from different countries showed random clustering without any noticeable phylogenetic relationship on various clades of the phylogenetic tree of isolates from China or any other country (Fig. 1). Clade A has 1 Chinese isolate. Clade B has 2 Chinese isolates. Clade C has 14 isolates, 1 from Australia, 3 USA, 6 from China, 1 from Taiwan, 2 from Japan, 1 from Korea. Clade D has 3 isolates, 2 from China, 1 from USA. Clade E has 18 isolates, 7 from USA, 6 from China, 2 from Hong Kong, 3 from Japan (Fig. 1). This random distribution of isolates from the same country, specifically Chinese isolates, indicated they belong to the same strain.

Fig. 1.
figure 1

Phylogenetic relationship among COVID-19 isolates from different countries.

3.3 Nucleotide Sequence Alignment of COVID-19 Isolates

Using blast search, COVID-19 first reported Chinese isolate had 3.89% difference from the closest SARS-CoV and 17.66% difference from the closest bat coronavirus isolate (Table 1), Similarly, alignment of COVID-19 and SARS-CoV isolates as one group resulted in tremendous differences in the nt sequence spread overall the genome, therefore we investigated the nucleotide SNPs among COVID-19 and SARS-CoV isolates. The 38 COVID-19 isolates and the 3 SARS-CoV isolates were compared as separate groups.

Among the 38 COVID-19 isolates, 108 nucleotide changes (103 SNPs and 5 deletions) were detected (Table 2). Seven Chinese isolates did not have any SNPs, whereas other isolates had different number of SNPs ranging from 1–9 (Table 2). The Korean isolate SNU01 came on the top with 9 SNPs, followed by USA isolate USA-IL1, USA isolate USA-IL1, and the Chinese isolate IPBCAMS-WH-02 with 8, 7, 6 SNPs consecutively (Table 2). All Japanese isolates had SNPs ranged from 3–5. Nucleotide SNPs were distributed among transition (66) and transversion (37). The number of detected SNPS indicated that the base substitution rate (SNPs) rate for all studied COVID-19 isolate was 103/1 135 284 = 9.07 × 10–5. Similar alignment among three SARS-CoV isolates (DQ182595.1; China, AY323977.2, Italy; AY310120.1, Germany) revealed that the Chinese isolate (DQ182595.1) nucleotide sequence had 99.97 and 99.95% identity with the Italian (AY323977.2) and German (AY310120.1) isolates consecutively. Nucleotide sequence alignment resulted in 12 SNPs and 1 deletion among the three SARS-CoV isolates (Table 2) indicating base substitution rate of 12/89197 = 12.22 × 10–5 among SARS-CoV isolates. This seems to be higher that the SNPs rate in COVID-19 isolates because of low number of isolates used.

Table 2.   Summary of detected nucleotide SNPs among COVID-19 isolates

3.4 COVID-19 Open Reading Frames (orfs)

Five main orfs are usually produced by all corona virus isolates including orflab polyprotein, orfS, orfN, orfM, and orfE. Another seven orfs have been reported by various isolates including orf1a polyprotein, orf3a, orf6, orf7a, orf7b, orf8, and orf10 (Table 3). Usually, polyprotein 1ab and orf1a are processed into smaller accessory orfs (Table 4). The accessory orfs are not produced in all corona virus isolates.

Table 3.   Common coronavirus orfs
Table 4.   Accessory orfs produced from polyprotein orf1ab and orf1a

3.5 Expected orfs from COVID-19 Isolates

We investigated the expected orfs of different isolates from the same country or from different countries to check if different corona virus isolate differ in their expected orf pattern, although they have similar genome size and high identity in their genome nucleotide sequence (Tables 1, 2). Interestingly, orf pattern produced by isolates form the same country or from different countries differed greatly (Table 5, Fig. 2). All COVID-19, SARS-CoV, and the BatCoV-RaTG13 isolates have the five main orfs (1ab, S, E, M, N). Also, all of these isolates have orf3a except the Chinese isolate WHU01 (MN988668.1). This isolate is expected to produce only the five main orfs being the minimum orfs detected in this study. Only two Chinese isolates (Wuhan-Hu-1 and Yunnan-01) of COVID-19 38 isolates had the orf1a which is expected in three SARS-CoV isolates and the BatCoV-RaTG13 isolate (Table 5). Orf6 and orf7a are expected in all isolates except the Chinese isolate Wuhan-Hu-1. Orf7b is expected only in 7 Chinese isolates, the three SARS-CoV isolates, and the BatCoV-RaTG13 isolate, whereas orf8 is not expected in the three SARS-CoV isolates and the Chinese isolate Wuhan-Hu-1 (Table 5). Orf10 is not expected in 6 COVID-19 Chinese isolates, the three SARS-CoV isolates, and the BatCoV-RaTG13 isolate. Four extra accessory orfs (3b, 8a, 8b, 9b) are only expected in the three SARS-CoV isolates and the BatCoV-RaTG13 isolate (Table 5). Among isolates from the same country, USA isolates and Japanese Isolates did not show differences among their groups in the expected orf pattern. On the other hand, Chinese isolates showed differences in orfs 1a, 3a, 6, 7a, 7b, 8, 10 with Chinese isolate WHU01 (MN988668.1) is expected to produce only the five main orfs (Table 5). The orf pattern of selected 4 Chinese COVID-19 isolates, one SARS-CoV isolates, and the BatCoV-RaTG13 isolate is shown in Fig. 2. The first reported Chinese isolate (HZ-1, MT039873.1) has10 expected orfs of its genome including 1ab, N, S, E, M, 3a, 6, 7a, 8, 10. Orf1a is not expected from the genome of this isolate (Fig. 2). On the other hand, another Chinese isolate (Yunnan-01, MT049951.1) is expected to produce orf1a and orf7b beside the 10 orfs expected in isolate HZ-1 (Fig. 2). In addition, the Chinese isolate WIV02 (MN996527.1) expected orfs is similar to expected orf pattern of isolate Yunnan-01 except the absence of orf1a. Interestingly, bat isolate BatCoV-RaTG13 (MN996532.1) has exact similar expected orfs pattern as Chinese isolate WIV02. The Chinese isolate WHU01 (MN988668.1) only has 5 expected orfs (1ab, S, M, N, E). The Chinese SARS-CoV isolate SARS-CoV-1-ZJ0301 has expected 32 orfs including the main 5 orfs and 27 accessory orfs (Fig. 2).

Table 5.   Summary of predicted ORFs in reported nCoV-2 isolates (+ indicates the presence of orf, – indicates the absence of orf)
Fig. 2.
figure 2

Map of expected orfs pattern of selected 4 COVID-19, 1 SARS-CoV isolate compared to the bat BatCoV-RaTG13 isolate. Accession number and isolate name are shown in each map panel.

4 DISCUSSION

The high identity (99.91 to 100%) in nucleotide sequence among COVID-19 isolates from various countries or the same country (Table 1) and their random clustering on the phylogenetic tree (Fig. 1) indicated that the reported COVID-19 isolates from different countries are highly similar and they belong to one COVID-19 strain. Also, the difference between COVID19 and SARS-CoV (11.66%) or COVID-19 and bat corona virus isolate BatCoV-RaTG13 (3.89%) strains distance COVID-19 as a novel viral strain that has not been identified before with different genome context. In addition, the low differences in nt sequence indicated by the nt SNPs among COVID-19 isolates and their distinction from SARS-CoV or bat corona virus support the same idea. Interestingly, collective base substitution rate for the studied isolates was 9.07 × 10–5. Base substitution rate of RNA viruses is the number of changed bases per cellular infection (generation). This is very difficult to determine because it is not known how many generations (infections) these isolates have gone before they had been sequenced, therefore this number is overestimation of SNPs rate in the studied strains because they should have gone through huge number of infections from being isolated from patients with symptoms. RNA viruses have mutation rate from 1 × 10–6 to 1 × 10–4 [22–24]. Our overestimated mutation rate of COVID-19 is still in the range of RNA viruses' mutation rate indicating that COVID-19 is a new viral strain.

COVID-19 isolates showed differences in the expected orf pattern from their highly similar genome suggesting a high level of expected complexity of the COVID-19 genome and its host cells. This is in agreeing with other previous reports. Production of extra orfs beside the main orfs by different retroviruses has been reported previously. Human endogenous retrovirus K (HERV-K) produces two variant proteins (np9, rec) of its full sequence or the 292 bp deficient gene respectively [25].

Our results are in agree with results reported from other several studies which indicated that COVID-19 is a novel corona virus and did not originate from other previous existing strains [15]. Similarly, it was reported that COVID-19 is not a mosaic virus nor did it originated from recombination events [14]. In the same line, a third study revealed that COVID-19 had 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Its orf3b produce a shorter protein and its orf8 encode for a secreted protein leaving the source of the COVID-19 undetectable [20].

Therefore, the most probable scenario is that this strain was transmitted from unknown organism and developed the ability to infect and transmit from human to human [16]. Based on this scenario, future studies are needed to screen wide range of animals that come in contact with human to search for the possible source of this viral strain; COVID-19. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public by a leakage from unknown biological facilities can not be rolled out at this time. This possibility is supported by the detection of unique isolate reported in 2004. The sequence of a new SARS-CoV strain was reported in 2004 and filled by Centre National de la Recherche Scientifique CNRS, Institut Pasteur, Universite Paris Diderot as patent to the European Patent Office (Patent no. EP1694829B1). This strain was isolated from a patient from Hanoi, Vietnam. The sequence of this strain was not deposited in the nucleotide database or anywhere else except in the patent itself. When we blasted the nt sequence of this strain against the nucleotide database it turned out the SARS-CoV Urbani isolate icSARS-MA (Acc no. MK062180.1) as the closest sequence with only 89.65% identity indicating its difference from reported SARS-CoV isolates at that time and consequently from any other reported corona virus or COVID-19 isolates.

4.1 COVID-19 Symptoms Implicate Its Unique Interaction with Human Biology

It is well known that COVID-19 has a wide range of symptoms in human ranging from no symptoms to death. The valid question here is that what makes people different in their response to COVID-19 infection? Based on the distinction of COVID-19 genome from SARS-CoV and Bat CoV, COVID-19 unique characteristics, similarity among COVID-19 isolates at the nt, some possible scenarios could be suggested for the discrepancies among humans in response to infection. In addition to age and health of the host person, some genomic scenarios are summarized in the following sections based on the current studies of human endogenous retroviruses (HERVs).

4.1.1 Human endogenous retroviruses (HERVs). HERVs are DNA sequences originated from recurrent integrations of the previous exogenous retrovirus [26, 27]. HERVs are one type of highly conserved transposable elements (TE). TE and HERVS make up 40 and 8% of our genome consecutively [28]. HERVs were first detected in the human genome in the 1970s [29]. HERVs are classified into three main groups; I (gamaretrovirus and epsilonretrovirus-like), II (betaretrovirus-like), III (spumaretrovirus-like) based on their phylogenetic relationship [30, 31]. Their integration allowed the vertical transmission of retroviral genomes along with the human genome across generation [32]. HERVs are inserted in the genome through the reverse transcription of viral RNA producing a double stranded DNA (provirus) using the viral reverse transcriptase [33] and then the integration of the provirus in the host genome by the viral integrase and other host proteins [34]. Integrated copies can be activated and become active infection. After integration, the proviral DNA produce mRNA that encodes for various viral proteins or reverse transcribed by viral reverse transcriptase into proviral DNA that has the capability of new integration cycle. HERVs have similar structure to exogenous retroviruses that is comprised of two long terminal repeats (LTRs) with internal gag (matrix protein), pro-pol (protease, reverse transcriptase, and integrase), env (envelope) viral genes [32]. Beside these main retroviral proteins, some retroviruses produce extra proteins. Accordingly, the env gene of the HERV-K encodes two different protein variants (np9, rec) using its full sequence or the 292 bp deficient variant respectively [25].

4.1.2. Impact of HERVs on human cells. HERVs have several different impacts on their host cells. Production of RNA and proteins from HERV sequence could have a role in the regulation of human genes and modulate immunity of the host [35, 36]. Although most of TEs have been silenced by accumulation of mutations or hypermethyaltion, some of them have been domesticated and still active in human biology [37]. For example, syncytins is a group of env proteins produced by different HERVs in mammals [38]. In human genome, two env genes HERV-W and HERV-FRD are involved in the production of env proteins syncytin-1 and -2, respectively [39]. They are involved in placental syncytiotrophoblast development, homeostasis [39, 40], and maternal immune tolerance to the growing fetus [41] respectively.

4.1.3. HERVs and regulation of human gene expression. At DNA level, huge number of HERV are integrated in the human genome and function as binding sites for transcription factors, alternative promoter, or splicing signals for cellular genes [37, 42–46] which indicates their role in regulation of transcription and human genome development. This could lead to upregulation, downregulation, suppression, or tissue-specific splicing of cellular genes [42, 45, 47]. Also, they represent a plethora of cis-acting regulatory elements that function as binding sites for the host trans-acting elements. The interplay between both types of elements makes up the gene regulation network in a cell [48, 49]. In the same line, the solitary LTRs, reminiscent of complete HERVs, can also regulate the host gene expression. Recurrent insertions of HERVs cause insertional mutations in the target genes and allelic homologous recombination [32]. For example, recombination between homologous HERV-I on chromosome Y cause microdeletion in the azoosperma factor and consequently male infertility [50]. In addition, HERVs can produce non-coding RNAs (ncRNAs) including microRNA and long ncRNA which furnish recognition motifs for RNA binding proteins or modulate the function of transcription factors [32]. Accordingly, HERV ncRNAs that has sequence similarity to human miRNA work as RNA sponges to bind other miRNA which are involved in the post-transcriptional regulation of gene expression [51]. This was the case in the regulation of embryonic stem cells in which an interaction of ncRNA (HPAT5) produced by HERVH to the let-7 miRNAs sequence [52]. Furthermore, in case of a HERV produces a protein which could function as regulator of the host gene expression during the virus life cycle and provide cellular functions during the cycle [36]. Interesting example is the HERV Gag and Rec proteins which are involved in the stability and translation the host cell mRNA [36]. For example, HML2 Rec was able to bind to 1 600 nt mRNAs of host embryonic cells and regulate their translation by ribosome in an early development process [53]. In the same line, Arc Gag-like protein produced by the Ty3/gypsy retrotransposon was suggested to coordinate brain neural cell communication indicating its role in the nervous system development [54, 55]. Specifically, Arc has been proposed to form capsids to carry mRNA between neuron cells via extracellular vesicles to be translated in the target neuron cell [56].

A group of HERVs spread in the human genome can form a coordinated regulatory network to regulate the expression of many host genes involved in the same pathway simultaneously [35, 47, 57]. For example, more than 30% of the human genome binding sites for the protein p53 were distributed in the genome by the HERV sequences and become the target network of p53 protein [58] leading to human genome plasticity and cellular networking. An interesting example for this plasticity is the MHC (major histocompatibility complex) locus which has been shown to have heavy integration of HERVs leading to its tremendous plasticity and hyper genetic variability [59]. Accordingly, the HERVK (HERVKC4) was integrated in the 9th intron of human complement C4A gene leading to its hyper variation [60, 61]. One vital example is the role of HERVs in the interferon (IFN) antiviral pathway in the innate immunity in the induction of adaptive immune response [62]. HERV integrations were involved in the development of INF network of INF inducible transcription enhancers in various mammalian genomes [35]. It was shown that deletion of HERV sequence near IFN gene suppressed the linked pathway [35]. Also, sequences of the HERV LTRs function as promoter or enhancer sites in response to IFN based activation [63]. The HERVK LTRs that have two IFN-stimulated response elements (ISREs) were induced by the IFN cascade in response to inflammation [64].

4.1.4. HERVs and human immune modulation. Products of ancient integrated HERV represent the border line between human self and microbial non-self molecules and can be tolerated by human immune system or induce human immunity giving rise to autoimmune diseases. The innate immune pathways induced by HERVs’ products are the ones that function in the exogenous antiviral infection [65]. In humans, Toll Like Receptors (TLRs) and cytosolic pattern recognition receptors (cytPRRs) can recognize HERV products and lead to induction of immune response. This was reported in the case of autoimmune diseases and cancer [66, 67].

Recognition of viral molecules by innate immune receptors induces inflammatory molecules including IFN, cytokines, and chemokines invoking the antiviral response. This group of molecules activates the adaptive immune response through the activation of T and B cells. Both immune responses are required to fight exogenous viral infection and finally stop this activated response after infection. In case of HERV products, their continuous presence in the host cells provokes chronic stimulation of the host immune response resembling the chronic stimulation of immune response in autoimmune and inflammatory diseases caused by exogenous retroviral molecules [67–70]. The induced antiviral response activated by HERV products cause vicious circle in which the produced inflammatory molecules and epigenetic dysregulation further upregulated HERV expression [65, 71, 72]. Also, peptides produced from HERVs were implicated in the suppression of immune response. This includes the env proteins that has immunosuppressive conserved domain (ISD) in retroviral env proteins. For example, ISD from HERVs function in the maternal immune tolerance during pregnancy [38, 41].

4.1.5. HERVs and exogenous viral infection. It is well documented that HERVs can contribute negatively or positively during exogenous viral infection [67]. Infection by some viruses including HIV, herpesviruses and influenza changed HERV expression [73–75]. In this regard exogenous infection could cooperatively upregulate the HERV expression and increase the immune response [67]. Also, HERV products could play a protective role against exogenous viral infection [36]. For example, production of HERV antisense RNA develops protection against exogenous infection by viruses with complementary RNA [65, 76]. Some studies reported that products of HERV function as pathogen-associated molecular patterns (PAMPs) which is able to induce receptors for host defense system [49, 65]. In addition, some of their products mimic antigens for stimulating specific B and T cells [77, 78]. This explains the role of HERVs in autoimmune and inflammatory diseases. On the other hand, they had a role in suppressing the immunity of host cells as they have been involved in maternal immune suppression and protection of excessive imune activation [79, 80].

4.2 Possible Role for HERVs in COVID-19 Infection and Symptoms

HERVs could modulate the infection and symptoms in the case of exogenous COVID-19 infection in different possible ways. First, HERVs or their products could compromise the immune system and facilitate the infection and penetrance of the virus to human cells. Also, individuals with high levels of the ACE2 receptor could be an easy target for the virus, especially those with high blood pressure and various types of stress. Second, different isolates of the virus can use the host cell to produce different protein sets (orf pattern) that can use the host cells and compromise the host immune system with different efficiencies. This will result in spectrum of disease severity and possibly death. In this study, different isolates from the same country (China) or from different countries are expected to produce various orf patterns. Some of the produced orfs which is the enzyme responsible for methylation of the 2' carbon of the ribose sugar of viral RNA. This modification of viral RNA makes it undetectable by the host immune system and effectively infects human cells [81]. Third, HERVs could produce protein products that complement the viral set of orfs in its entry, infection, replication, packaging, and integration in the human genome. In addition, partial proviral genomes of previous integration can produce some enzymes required for the replication of viral isolates that do not have the infection ability. For example, one animal isolate which does not have the capability to infect human could transfer to human and find in this individual’s genome some proviral genes that complement the animal strain to be infectious and able to cause the symptoms. Fourth, Corona virus genome can only produce its effective proteins for viral reproduction with -1ribosomal slippage at the translation start site. HERVs may produce proteins or miRNA that modulates the translation start for the ribosome changing the pattern of COVID-19 orfs in different human hosts. This leads to different course of symptoms and severity of the COVID-19 infection.

Long term studies are urgent to be conducted on the COVID-19 and other retroviruses that attach human to validate all of these possibilities for future safety and better management of future pandemics like COVID-19. Also, intensive studies are needed to survey human populations (expecially elders and immune compromised) for their HERV loads and link this to their predisposition for other autoimmune diseases, cancer, and their risk for exogenous viral infection.

5 CONCLUSIONS

Our results conclude that COVID-19 did not originate from a known biological source or other previously characterized strains. COVID-19 isolates used in this study showed high similarity at the nt sequence, yet they differed greatly in the expected orf pattern from their similar genomes. The most probable scenario is that this strain was transmitted from unknown organism and has/or has developed the ability to infect human cells as well as to transmit from human to human. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public from unknown biological facilities can not be rolled out at this time.