Abstract
To understand the fundamental processes of gene evolution such as the impact of point mutations and segmental duplications on statistical topography, superoxide dismutase-1 (SOD1) orthologous sequences (n = 50) are studied. These demonstrate scale invariant self-similarity patterns and long-range correlations (LRCs) indicating fractal organization. Phylogenetic hierarchies change when SOD1 orthologs are grouped according to fractal measures, indicating that statistical topographies can be used to study gene evolution. Sliding window k-mer analysis show that majority of k-mers across all SOD1 orthologs are unique, with very few duplications. Orthologs from simpler species contribute minimally (< 1% of k-mers) to more complex species. Both simple and complex random processes fail to produce significant matching k-mer sequences for SOD1 orthologs. Point mutations causing amyotrophic lateral sclerosis do not impact the fractal organization of human SOD1. Hence, SOD1 did not evolve by a patchwork of repetitive sequences modified by point mutations. Moreover, fractal and other methods described here can be used to study the origin and evolution of genomes.
Similar content being viewed by others
Data availability
All data relevant to the study are included in the article or uploaded as supplementary information. All datasets are in publically available repositories with their accession numbers in the manuscript.
References
Albrecht-Buehler G (2012) Fractal genome sequences. Gene 498(1):20–27 (PMID: 22342253)
Blount ZD, Barrick JE, Davidson CJ et al (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489(7417):513–518 (PMID: 22992527)
Buldyrev SV, Goldberger AL, Havlin S et al (1993) Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. Biophys J 65(6):2673–2679 (PMID: 8312501)
de Visser JA, Krug J (2014) Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15(7):480–490 (PMID: 24913663)
Glenny RW, Robertson HT, Yamashiro S et al (1991) Applications of fractal analysis to physiology. J Appl Physiol 70(6):2351–2367 (PMID: 1885430)
Gompel N, Prud’homme B (2009) The causes of repeated genetic evolution. Dev Biol 332(1):36–47 (PMID: 19433086)
Hsieh LC, Luo L, Ji F et al (2003) Minimal model for genome evolution and growth. Phys Rev Lett 90(1):018101 (PMID: 12570650)
Kumar S, Stecher G, Li M et al (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35:1547–1549. (PMID: 29722887)
Li W, Kaneko K (1992) DNA correlations. Nature 360(6405):635–636 (PMID: 1465125)
Maddox J (1992) Long-range correlations within DNA. Nature 358(6382):103 (PMID: 1614541)
Mandelbrot BB (1982) The fractal geometry of nature. Freeman, San Francisco, USA
Map of Life (2020) Map of life: convergent evolution online. https://www.mapoflife.org Accessed 8 May 2020.
Messer PW, Arndt PF (2006) CorGen--measuring and generating long-range correlations for DNA sequence analysis. Nucleic Acids Res 34(Web Server issue):W692–W695 (PMID: 16845099)
Messer PW, Arndt PF, Lässig M (2005) Solvable sequence evolution models and genomic correlations. Phys Rev Lett 94(13):138103 (PMID: 15904043)
Moreno PA, Vélez PE, Martínez E et al (2011) The human genome: a multifractal analysis. BMC Genomics 12:506 (PMID: 21999602)
Orgogozo V (2015) Replaying the tape of life in the twenty-first century. Interface Focus 5(6):20150057 (PMID: 26640652)
Peng CK, Buldyrev SV, Goldberger AL et al (1992) Long-range correlations in nucleotide sequences. Nature 356(6365):168–170 (PMID: 1301010)
Peitgen H-O, Jürgens H, Saupe D (2004) Chaos and Fractals: New Frontiers of Science. Springer, New York, USA
Saeed M, Yang Y, Deng HX et al (2009) Age and founder effect of SOD1 A4V mutation causing ALS. Neurology 72(19):1634–1639 (PMID: 19176896)
Saeed M (2005) Fractals analysis of cardiac arrhythmias. ScientificWorldJournal 5:691–701 (PMID: 16155684)
Searls DB (2002) The language of genes. Nature 420(6912):211–217 (PMID: 12432405)
Stern DL, Orgogozo V (2009) Is genetic evolution predictable? Science 323(5915):746–751 (PMID: 19197055)
Tamura K, Battistuzzi FU, Billing-Ross P et al (2012) Estimating divergence times in large molecular phylogenies. Proc Natl Acad Sci 109:19333–19338 (PMID: 23129628)
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68(25):3805–3808 (PMID: 10045801)
Acknowledgments
I am grateful to Aneela Pasha for her insightful comments on the manuscript and valuable discussion and encouragement throughout this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest.
Additional information
Web Resources
GeneFractals software: https://github.com/dr-saeed/GeneFractals
GeneFractals SOD1 Images: https://www.immunocure.pk/research/ and https://drive.google.com/drive/folders/1XPFG8FTtOZAet-saFOQM34Vo3WVfq_eT?usp=sharing
SOD1 mutations causing ALS and their genomic positions were obtained from ALSoD database: https://alsod.ac.uk/output/gene.php#geneSummary
Electronic supplementary material
Below is the link to the electronic supplementary material.
251_2020_1184_MOESM1_ESM.jpg
Supplementary file1: Figure S1. Varying self-similarity of SOD1 orthologs. The GeneFractal diagrams for SOD1 sequences of Apis mellifera, Arabidopsis thaliana and Homo sapiens are shown. They demonstrate varying visual self-similarity and fractal measures as indicated. (JPG 294 KB)
251_2020_1184_MOESM2_ESM.jpg
Supplementary file2: Figure S2. Phylogenetic analysis of SOD1 orthologs. The evolutionary history is shown as an unrooted tree of 50 SOD1 orthologous sequences. It is inferred by using the Maximum Likelihood method and General Time Reversible model in MEGA X. Initial tree(s) for the heuristic search are obtained using Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. (JPG 173 KB)
251_2020_1184_MOESM3_ESM.jpg
Supplementary file3: Figure S3. SOD1 Orthologs graphed by DFA. The SOD1 orthologs are ordered according to increasing DFA α-values. The graph shows the rearrangement of orthologs in 9 groups according to their varying gradients. These groups (A-I) are further grouped into 3 larger groups (Groups D, E, F are classified as Group 2) that are used for phylogenetic analysis in Figure S4. (JPG 70 KB)
251_2020_1184_MOESM4_ESM.jpg
Supplementary file4: Figure S4. SOD1 Clades according to DFA based fractal organization. The evolutionary history of SOD1 orthologs (n=15) is evaluated with MEGA X, using the same processes as in Figure S2. a) In this unrooted tree (Group 1), Emiliania huxleyi switches its position from the group of Xenopus tropicalis (Figure S2) to that of Meleagris gallopavo. This shows that fractal correlations can be used to study phylogenetic organization. b) There are 4 clades in Group 2 in line with careful observation of the gradients in Figure S3. c) Group 3 containing mammals remain unchanged from Figure S2. (JPG 223 KB)
251_2020_1184_MOESM5_ESM.xlsx
Supplementary file5: Table S1. Fractal measures of SOD1 orthologs. Shows the species name (n=50), their SOD1 NCBI gene ID, NCBI reference sequence ID, NCBI taxonomic order, DFA Groups according to Figures S3 and S4, percentages of AT and GC content and the DFA of CorGen generated random sequences with the natural SOD1 ortholog as the input sequence. (XLSX 16 KB)
251_2020_1184_MOESM6_ESM.xlsx
Supplementary file6: Table S2. Phylogenetic sequence reconstruction of SOD1. Shows the calculated frequency and number of matched k-mers (k=10) across 50 SOD1 orthologs. The highlighted species_gene ID sequence is analyzed by sliding window analysis and used as a reference for matching the k-mers generated by similar analysis of all 50 orthologs (testing sequences). The percentage match refers to the number of k-mers in the reference sequence that matches the testing orthologous sequence. (XLSX 82 KB)
251_2020_1184_MOESM7_ESM.xlsx
Supplementary file7: Table S3. Analysis of natural and random oligomers for SOD1 orthologs. Sliding window k-mer (k=10) analysis of natural SOD1 orthologs produce predominantly unique k-mers. E.g. Emiliania huxleyi (Gene ID: 17260763) with gene length of 658 bp, produces a total of 555 k-mers, of which 530 are unique (95%), 1 k-mer occurs 9 times and 2 k-mers occurs 8 times in the sequence. Randomly generating 555 k-mers (k=10) over 10 cycles, lead to a total of 43 k-mers that match the natural k-mers of Emiliania huxleyi. Each cycle generated a mean of ~4 matching k-mers (range 2-6). (XLSX 15 KB)
251_2020_1184_MOESM8_ESM.xlsx
Supplementary file8: Table S4. Reconstruction of Human SOD1 sequence by simple random oligomer generation. Homo sapiens SOD1 (Gene ID: 6647) is subjected to sliding window k-mer (k=10) analysis. SOD1 orthologs from 49 other species are similarly analyzed and their k-mers matched with the k-mers of Human SOD1. Random k-mers are generated over 10x cycles and the mean number per cycle is used for calculating the odds ratio. All natural sequences contribute higher frequency of matching k-mers to Human SOD1 than randomly generated sequences. Three species are exceptions (Sordaria macrospora, Emiliania huxleyi and Apiotrichum porosum), indicating that their contribution can be due to chance. The experiment is repeated for limited number of species (due to computing time required) for 100x and 1000x cycles however, the mean number of matching k-mers remain the same as for 10x cycles. This process mimicked increasing the gene length and number of random matching k-mers can be calculated using the equation: Number of Random 10-mer = 0.0062 x Gene length (nucleotides); R² = 0.99. (XLSX 16 KB)
251_2020_1184_MOESM9_ESM.xlsx
Supplementary file9: Table S5. Matching complex random oligomer generated using CorGen with natural SOD1 orthologs. CorGen generated random gene sequences (prefixed with ‘r’) are matched with random and natural SOD1 orthologs by k-mers analysis as described above. Minimal k-mer matching was produced for natural SOD1 sequences. (XLSX 136 KB)
Rights and permissions
About this article
Cite this article
Saeed, M. Fractal genomics of SOD1 evolution. Immunogenetics 72, 439–445 (2020). https://doi.org/10.1007/s00251-020-01184-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00251-020-01184-4