Multifractal and cross-correlation analysis on mitochondrial genome sequences using chaos game representation
Introduction
Understanding the dynamics of complex and stretchy genomic sequences has always been intriguing as the subtle change in the sequence would bring about a considerable variation in the process of physiology, behaviour, anatomy, and evolution as a whole. Although it is cumbersome and complicated to address this notion, the advent of bioinformatics tools has not just made it easier but has given multifaceted solutions and perceptions to these problems. Genome sequences show fractal nature, and several methods were put forward to comprehend and visualize the dynamics of genomic sequences (Almassalha et al., 2017, Campbell et al., 1999, Sandberg et al., 2001). Chaos Game Representation (CGR) has been widely used to visualize linear DNA sequences in a graphical form since its inception (Jeffrey, 1990). It provides easy recognition of patterns in long nucleotide sequences and serves in an alignment-free and alignment-based comparison between different genomes (Joseph and Sasikumar, 2006). CGR was also applied to protein sequences and structures (Fiser et al., 1994).
Several methods such as structure–function, R/S analysis, average wavelet coefficient method, WTMM, detrended moving average analysis and its variants, detrended fluctuation analysis, and its variants, etc., were used to characterize the fractal nature and correlation behaviour in one dimensional and two-dimensional data sets (Hurst, 1951, Muzy et al., 1991, Peng et al., 1994, Simonsen et al., 1998, Guihong et al., 2001, Kantelhardt et al., 2002, Alessio et al., 2002, Patrick, 2002, Manimaran et al., 2005, Jose et al., 2006, Jose et al., 2008, Manimaran et al., 2008, Manimaran et al., 2009, Gu and Zhou, 2010, Zhou et al., 2013, Alpatov et al., 2013, Wang et al., 2015). Later, various multifractal cross-correlation analysis methods were introduced to explore the multifractal nature in the power-law cross-correlations between non-stationary time series (Podobnik and Stanley, 2008, Zhou, 2008, Podobnik et al., 2009, Jiang and Zhou, 2011, Kristoufek, 2011, Xie et al., 2015, Qian et al., 2015). The multifractal detrended cross-correlation analysis (MF-X-DFA) has been widely used in many fields, including biology, social sciences, physics, etc., as it has shown diverse applications. However, it is imperative to have the data sets of equal lengths to perform MF-X-DFA analysis (Manimaran and Narayana, 2018, Hema Sri Sai et al., 2019). Recently, Pal and co-workers developed an integrative approach to examine multifractal behaviour in the power-law cross-correlation by combining CGR and 2D MF-X-DFA to analyse the characteristic behaviour of coding and non-coding DNA sequences, which are of varying lengths in various prokaryotes (Pal et al., 2015). We have also examined some prokaryote genomes and showed the existence of multifractal nature and power-law cross-correlation behaviour among them (Pal et al., 2016).
In the current paper, we extended our approach to characterize the cross-correlation and multifractal nature of mitochondrial genomes in various species. Mitochondria, the powerhouse of the cell, are thought to be the lineage of an endosymbiotic α-proteobacterium that was submerged into a eukaryotic or archaebacteria-like cell (Bullerwell and Lang, 2005). Mitochondria mainly serve in energy homeostasis and have double-stranded circular DNA. Its genome serves for phylogenetic analysis besides the nuclear genome (van de Sande, 2012). Mitochondria show high fidelity for phylogenetic analysis as it has no recombination processes in contrast to the nuclear genome that shows high recombination and mutations (da Silva et al., 2020). On the other hand, in the context of the same reasons, mitochondria were considered the worst alternative for phylogenetic analysis (Galtier et al., 2009). However, several studies have shown the importance of the mitochondrial genome for phylogenetic analysis in vertebrates and invertebrates (da Silva et al., 2020, Avise, 2009). Most of the phylogenetic studies on mitochondria are on alignment-based methods. It is of our interest to use an alignment-free method to analyse the mitochondrial genome of various species and construct a phylogenetic tree using the novel approach of combining chaos game representation and two-dimensional multifractal detrended cross-correlation analyses.
Section snippets
Methods
We combined the Chaos game representation (CGR) algorithm and the two-dimensional multifractal detrended cross-correlation analysis (2D MF-X-DFA) method (Pal et al., 2015) to analyse mitochondrial genomes for class affiliation and classification. The details of the methodology steps are described here.
Results and discussion
We obtained the mitochondria genomes of various species from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/). The details of the species are given in Table 1. Note that we have removed ‘N’ at the position ‘3107′ in the human mitochondria genome (NC_012920.1) before performing CGR analysis.
We applied CGR on all the mitochondrial genome sequences to get 2D images, and a couple of them are given as representation in Fig. 1. The frequency CGR matrix
Conclusions
In conclusion, by combining chaos game representation and 2-D multifractal detrended cross-correlation analysis, we observed that mitochondrial genomes of all the analysed species exhibit multifractal nature. Our work has corroborated that the mitochondrial genome can be used as a marker for phylogenetic study across the species and to generate a phylogenetic tree. However, thorough research is imperative within the genus or family for a detailed understanding of evolution. We suggest that the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
The author PM would like to thank the Department of Science and Technology, Government of India, (DST-MATRICS GoI Project No. SERB/F/506/2019-2020 Dated 15th May 2019 for their financial support.
References (47)
- et al.
Chaos game representation of protein structures
J. Mol. Graph
(1994) - et al.
Multifractal detrended fluctuation analysis of non-stationary time series
Phys. A
(2002) - et al.
Difference in nature of correlation between NASDAQ and BSE indices
Phys. A
(2008) - et al.
Multiresolution analysis of fluctuations in non-stationary time series through discrete wavelets
Phys. A
(2009) - et al.
Multifractal detrended cross-correlation analysis on air pollutants of University of Hyderabad Campus
India, Physica A
(2018) - et al.
Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation
Phys. A
(2015) - et al.
Multifractal detrended cross-correlation analysis of genome sequences using chaos-game representation
Phys. A
(2016) - et al.
Fungal evolution: the case of the vanishing mitochondrion
Curr. Opin. Microbiol.
(2005) - et al.
The Global Relationship between Chromatin Physical Topology, Fractal Structure, and Gene Expression
Sci. Rep.
(2017) - et al.
Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA
Proc. Natl. Acad. Sci. USA
(1999)
Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier
Genome Res.
Chaos game representation of gene structure
Nucleic Acids Res.
Chaos game representation for comparison of whole genomes
BMC Bioinf.
Long-term storage capacity of reservoirs
Trans. Am. Soc. Civ. Eng.
Wavelets and multifractal formalism for singular signals: Application to turbulence data
Phys. Rev. Lett.
Mosaic organization of DNA nucleotides
Phys. Rev. E
Determination of the Hurst exponent by use of wavelet transforms
Phys. Rev. E
Medical image fusion by wavelet transform modulus maxima
Opt. Express
Second-order moving average and scaling of stochastic time series
Eur. Phys. J. B
Two-dimensional turbulence: A Physicist approach
Phys. Rep.
Wavelet analysis and scaling properties of time series
Phys. Rev. E
Scaling properties of image textures: A detrending fluctuation analysis approach
Phys. A
Performance of a high-dimensional R/S method for Hurst exponent estimation
Physica A Physica A
Cited by (3)
Fractal correlations in the Covid-19 genome sequence via multivariate rescaled range analysis
2023, Chaos, Solitons and FractalsChaos game in an extended hyperbolic plane
2023, Theoretical and Mathematical Physics(Russian Federation)Fractal Analysis of DNA Sequences Using Frequency Chaos Game Representation and Small-Angle Scattering
2022, International Journal of Molecular Sciences