Skip to main content
Log in

Fractal genomics of SOD1 evolution

  • Original Article
  • Published:
Immunogenetics Aims and scope Submit manuscript

Abstract

To understand the fundamental processes of gene evolution such as the impact of point mutations and segmental duplications on statistical topography, superoxide dismutase-1 (SOD1) orthologous sequences (n = 50) are studied. These demonstrate scale invariant self-similarity patterns and long-range correlations (LRCs) indicating fractal organization. Phylogenetic hierarchies change when SOD1 orthologs are grouped according to fractal measures, indicating that statistical topographies can be used to study gene evolution. Sliding window k-mer analysis show that majority of k-mers across all SOD1 orthologs are unique, with very few duplications. Orthologs from simpler species contribute minimally (< 1% of k-mers) to more complex species. Both simple and complex random processes fail to produce significant matching k-mer sequences for SOD1 orthologs. Point mutations causing amyotrophic lateral sclerosis do not impact the fractal organization of human SOD1. Hence, SOD1 did not evolve by a patchwork of repetitive sequences modified by point mutations. Moreover, fractal and other methods described here can be used to study the origin and evolution of genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

All data relevant to the study are included in the article or uploaded as supplementary information. All datasets are in publically available repositories with their accession numbers in the manuscript.

References

  • Albrecht-Buehler G (2012) Fractal genome sequences. Gene 498(1):20–27 (PMID: 22342253)

    Article  CAS  Google Scholar 

  • Blount ZD, Barrick JE, Davidson CJ et al (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489(7417):513–518 (PMID: 22992527)

    Article  CAS  Google Scholar 

  • Buldyrev SV, Goldberger AL, Havlin S et al (1993) Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. Biophys J 65(6):2673–2679 (PMID: 8312501)

    Article  CAS  Google Scholar 

  • de Visser JA, Krug J (2014) Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15(7):480–490 (PMID: 24913663)

    Article  Google Scholar 

  • Glenny RW, Robertson HT, Yamashiro S et al (1991) Applications of fractal analysis to physiology. J Appl Physiol 70(6):2351–2367 (PMID: 1885430)

    Article  CAS  Google Scholar 

  • Gompel N, Prud’homme B (2009) The causes of repeated genetic evolution. Dev Biol 332(1):36–47 (PMID: 19433086)

    Article  CAS  Google Scholar 

  • Hsieh LC, Luo L, Ji F et al (2003) Minimal model for genome evolution and growth. Phys Rev Lett 90(1):018101 (PMID: 12570650)

    Article  Google Scholar 

  • Kumar S, Stecher G, Li M et al (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35:1547–1549. (PMID: 29722887)

    Article  CAS  Google Scholar 

  • Li W, Kaneko K (1992) DNA correlations. Nature 360(6405):635–636 (PMID: 1465125)

    Article  CAS  Google Scholar 

  • Maddox J (1992) Long-range correlations within DNA. Nature 358(6382):103 (PMID: 1614541)

    Article  CAS  Google Scholar 

  • Mandelbrot BB (1982) The fractal geometry of nature. Freeman, San Francisco, USA

    Google Scholar 

  • Map of Life (2020) Map of life: convergent evolution online. https://www.mapoflife.org Accessed 8 May 2020.

  • Messer PW, Arndt PF (2006) CorGen--measuring and generating long-range correlations for DNA sequence analysis. Nucleic Acids Res 34(Web Server issue):W692–W695 (PMID: 16845099)

    Article  CAS  Google Scholar 

  • Messer PW, Arndt PF, Lässig M (2005) Solvable sequence evolution models and genomic correlations. Phys Rev Lett 94(13):138103 (PMID: 15904043)

    Article  Google Scholar 

  • Moreno PA, Vélez PE, Martínez E et al (2011) The human genome: a multifractal analysis. BMC Genomics 12:506 (PMID: 21999602)

    Article  CAS  Google Scholar 

  • Orgogozo V (2015) Replaying the tape of life in the twenty-first century. Interface Focus 5(6):20150057 (PMID: 26640652)

    Article  Google Scholar 

  • Peng CK, Buldyrev SV, Goldberger AL et al (1992) Long-range correlations in nucleotide sequences. Nature 356(6365):168–170 (PMID: 1301010)

    Article  CAS  Google Scholar 

  • Peitgen H-O, Jürgens H, Saupe D (2004) Chaos and Fractals: New Frontiers of Science. Springer, New York, USA

    Book  Google Scholar 

  • Saeed M, Yang Y, Deng HX et al (2009) Age and founder effect of SOD1 A4V mutation causing ALS. Neurology 72(19):1634–1639 (PMID: 19176896)

    Article  CAS  Google Scholar 

  • Saeed M (2005) Fractals analysis of cardiac arrhythmias. ScientificWorldJournal 5:691–701 (PMID: 16155684)

    Article  Google Scholar 

  • Searls DB (2002) The language of genes. Nature 420(6912):211–217 (PMID: 12432405)

    Article  CAS  Google Scholar 

  • Stern DL, Orgogozo V (2009) Is genetic evolution predictable? Science 323(5915):746–751 (PMID: 19197055)

    Article  CAS  Google Scholar 

  • Tamura K, Battistuzzi FU, Billing-Ross P et al (2012) Estimating divergence times in large molecular phylogenies. Proc Natl Acad Sci 109:19333–19338 (PMID: 23129628)

    Article  CAS  Google Scholar 

  • Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68(25):3805–3808 (PMID: 10045801)

    Article  CAS  Google Scholar 

Download references

Acknowledgments

I am grateful to Aneela Pasha for her insightful comments on the manuscript and valuable discussion and encouragement throughout this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Saeed.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest.

Additional information

Web Resources

GeneFractals software: https://github.com/dr-saeed/GeneFractals

GeneFractals SOD1 Images: https://www.immunocure.pk/research/ and https://drive.google.com/drive/folders/1XPFG8FTtOZAet-saFOQM34Vo3WVfq_eT?usp=sharing

SOD1 mutations causing ALS and their genomic positions were obtained from ALSoD database: https://alsod.ac.uk/output/gene.php#geneSummary

Electronic supplementary material

Below is the link to the electronic supplementary material.

251_2020_1184_MOESM1_ESM.jpg

Supplementary file1: Figure S1. Varying self-similarity of SOD1 orthologs. The GeneFractal diagrams for SOD1 sequences of Apis mellifera, Arabidopsis thaliana and Homo sapiens are shown. They demonstrate varying visual self-similarity and fractal measures as indicated. (JPG 294 KB)

251_2020_1184_MOESM2_ESM.jpg

Supplementary file2: Figure S2. Phylogenetic analysis of SOD1 orthologs. The evolutionary history is shown as an unrooted tree of 50 SOD1 orthologous sequences. It is inferred by using the Maximum Likelihood method and General Time Reversible model in MEGA X. Initial tree(s) for the heuristic search are obtained using Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. (JPG 173 KB)

251_2020_1184_MOESM3_ESM.jpg

Supplementary file3: Figure S3. SOD1 Orthologs graphed by DFA. The SOD1 orthologs are ordered according to increasing DFA α-values. The graph shows the rearrangement of orthologs in 9 groups according to their varying gradients. These groups (A-I) are further grouped into 3 larger groups (Groups D, E, F are classified as Group 2) that are used for phylogenetic analysis in Figure S4. (JPG 70 KB)

251_2020_1184_MOESM4_ESM.jpg

Supplementary file4: Figure S4. SOD1 Clades according to DFA based fractal organization. The evolutionary history of SOD1 orthologs (n=15) is evaluated with MEGA X, using the same processes as in Figure S2. a) In this unrooted tree (Group 1), Emiliania huxleyi switches its position from the group of Xenopus tropicalis (Figure S2) to that of Meleagris gallopavo. This shows that fractal correlations can be used to study phylogenetic organization. b) There are 4 clades in Group 2 in line with careful observation of the gradients in Figure S3. c) Group 3 containing mammals remain unchanged from Figure S2. (JPG 223 KB)

251_2020_1184_MOESM5_ESM.xlsx

Supplementary file5: Table S1. Fractal measures of SOD1 orthologs. Shows the species name (n=50), their SOD1 NCBI gene ID, NCBI reference sequence ID, NCBI taxonomic order, DFA Groups according to Figures S3 and S4, percentages of AT and GC content and the DFA of CorGen generated random sequences with the natural SOD1 ortholog as the input sequence. (XLSX 16 KB)

251_2020_1184_MOESM6_ESM.xlsx

Supplementary file6: Table S2. Phylogenetic sequence reconstruction of SOD1. Shows the calculated frequency and number of matched k-mers (k=10) across 50 SOD1 orthologs. The highlighted species_gene ID sequence is analyzed by sliding window analysis and used as a reference for matching the k-mers generated by similar analysis of all 50 orthologs (testing sequences). The percentage match refers to the number of k-mers in the reference sequence that matches the testing orthologous sequence. (XLSX 82 KB)

251_2020_1184_MOESM7_ESM.xlsx

Supplementary file7: Table S3. Analysis of natural and random oligomers for SOD1 orthologs. Sliding window k-mer (k=10) analysis of natural SOD1 orthologs produce predominantly unique k-mers. E.g. Emiliania huxleyi (Gene ID: 17260763) with gene length of 658 bp, produces a total of 555 k-mers, of which 530 are unique (95%), 1 k-mer occurs 9 times and 2 k-mers occurs 8 times in the sequence. Randomly generating 555 k-mers (k=10) over 10 cycles, lead to a total of 43 k-mers that match the natural k-mers of Emiliania huxleyi. Each cycle generated a mean of ~4 matching k-mers (range 2-6). (XLSX 15 KB)

251_2020_1184_MOESM8_ESM.xlsx

Supplementary file8: Table S4. Reconstruction of Human SOD1 sequence by simple random oligomer generation. Homo sapiens SOD1 (Gene ID: 6647) is subjected to sliding window k-mer (k=10) analysis. SOD1 orthologs from 49 other species are similarly analyzed and their k-mers matched with the k-mers of Human SOD1. Random k-mers are generated over 10x cycles and the mean number per cycle is used for calculating the odds ratio. All natural sequences contribute higher frequency of matching k-mers to Human SOD1 than randomly generated sequences. Three species are exceptions (Sordaria macrospora, Emiliania huxleyi and Apiotrichum porosum), indicating that their contribution can be due to chance. The experiment is repeated for limited number of species (due to computing time required) for 100x and 1000x cycles however, the mean number of matching k-mers remain the same as for 10x cycles. This process mimicked increasing the gene length and number of random matching k-mers can be calculated using the equation: Number of Random 10-mer = 0.0062 x Gene length (nucleotides); R² = 0.99. (XLSX 16 KB)

251_2020_1184_MOESM9_ESM.xlsx

Supplementary file9: Table S5. Matching complex random oligomer generated using CorGen with natural SOD1 orthologs. CorGen generated random gene sequences (prefixed with ‘r’) are matched with random and natural SOD1 orthologs by k-mers analysis as described above. Minimal k-mer matching was produced for natural SOD1 sequences. (XLSX 136 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saeed, M. Fractal genomics of SOD1 evolution. Immunogenetics 72, 439–445 (2020). https://doi.org/10.1007/s00251-020-01184-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00251-020-01184-4

Keywords

Navigation