Abstract
In the field of evolutionary genome analysis, biologists seek to identify important genes or chromosome regions by comparing phylogenetic trees and analyzing the mutation at which locus might affect phenotypic traits. Unfortunately, the tree comparison and accompanying analysis are often performed manually. In this paper, we characterize the workflow of evolutionary genome analysis and present a task analysis for the fundamental questions asked by biologists during the analysis procedure. We propose two algorithms to enable quantitative tree comparison. One is to measure the differences between corresponding leaf nodes on two trees, and the other is to compute the classification inconsistency of each leaf node by comparing tree structure with a given biological classification. Configuring with the obtained difference and inconsistency, we present a visual analysis system, visual comparison of phylogenetic trees for evolutionary genome analysis, which not only enables biologists to intuitively explore trees but also identify locus which affects their traits by comparing SNP variants of selected leaf nodes. We conclude with case studies from two biologists who used our system to augment their previous manual analysis workflow and demonstrate that our system can reveal more insight.
Graphic abstract
Similar content being viewed by others
References
Bachmaier C, Brandes U, Schlieper B (2005) Drawing phylogenetic trees
Barlow T, Neville P (2001) A comparison of 2-d visualizations of hierarchies. In: IEEE symposium on information visualization. IEEE, pp 131–131
Bremm S, von Landesberger T, Heß M, Schreck T, Weil P, Hamacherk K (2011) Interactive visual comparison of multiple trees. In: 2011 IEEE conference on visual analytics science and technology (VAST). IEEE, pp 31–40
Burch M, Konevtsova N, Heinrich J, Hoeferlin M, Weiskopf D (2011) Evaluation of traditional, orthogonal, and radial tree diagrams by an eye tracking study. IEEE Trans Vis Comput Graph 17(12):2440–2448
Chia J-M, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC et al (2012) Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet 44(7):803–807
Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER et al (2012) Music: identifying mutational significance in cancer genomes. Genome Res 22(8):1589–1598
Ethan C, Jianjiong G, Ugur D, Gross Benjamin E, Sumer Selcuk Onur, Aksoy Bülent Arman, Jacobsen Anders, Byrne Caitlin J, Heuer Michael L, Larsson Erik et al (2012) The cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data
Ferstay JA, Nielsen CB, Munzner T (2013) Variant view: visualizing sequence variants in their gene context. IEEE Trans Vis Comput Graph 19(12):2546–2555
Fiume M, Williams V, Brook A, Brudno M (2010) Savant: genome browser for high-throughput sequencing data. Bioinformatics 26(16):1938–1944
Graham M, Kennedy J (2010) A survey of multiple tree visualisation. Inf Vis 9(4):235–252
Guerra-Gómez JA, Pack ML, Plaisant C, Shneiderman B (2013) Visualizing change over time using dynamic hierarchies: Treeversity2 and the stemview. IEEE Trans Vis Comput Graph 19(12):2566–2575
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at ucsc. Genome Res 12(6):996–1006
Kumar S, Nei M, Dudley J, Tamura K (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 9(4):299–306
Li Z-K, Zhang F (2013) Rice breeding in the post-genomics era: from concept to practice. Curr Opin Plant Biol 16(2):261–269
Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science 311(5769):1936–1939
Li R, Chang Y, Li Y, Lam T-W, Yiu S-M, Kristiansen K, Wang J (2009) Soap2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al (2010) The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res 20(9):1297–1303
Munzner T, Guimbretière F, Tasiran S, Zhang L, Zhou Y (2003) Treejuxtaposer: scalable tree comparison using focus+ context with guaranteed visibility. In: ACM transactions on graphics (TOG), vol 22. ACM, pp 453–462
Nielsen CB, Cantor M, Dubchak I, Gordon D, Wang T (2010) Visualizing genomes: techniques and challenges. Nat Methods 7(3s):S5
Nucleic acid notation. http://en.wikipedia.org/wiki/Nucleic_acid_notation
Parr CS, Lee B, Campbell D, Bederson BB (2004) Visualizations for taxonomic and phylogenetic trees. Bioinformatics 20:2997–3004
Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. Trends Ecol Evol 7(3):73–79
Perrier X, Jacquemoud-Collet JP (2006) Darwin software
Qi J, Liu X, Shen D, Miao H, Xie B, Li X, Zeng P, Wang S, Shang Y, Xingfang G et al (2013) A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat Genet 45(12):1510
Robinson O, Dylus D, Dessimoz C (2016) Phylo.io: interactive viewing and comparison of large phylogenetic trees on the web. Mol Biol Evol 33(8):2163–2166
Rubin C-J, Zody MC, Eriksson J, Meadows JRS, Sherwood E, Webster MT, Jiang L, Ingman M, Sharpe T, Ka S et al (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464(7288):587
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
Sebastian P, Schaefer H, Telford IRH, Renner SS (2010) Cucumber (cucumis sativus) and melon (c. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc Natl Acad Sci 107(32):14269–14273
Shneiderman B (1998) Tree visualization with tree-maps: a 2-d space-filling approach. Technical report
Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2):178–192
Von Landesberger T, Kuijper A, Schreck T, Kohlhammer J, van Wijk JJ, Fekete J-D, Fellner Dieter W (2011) Visual analysis of large graphs: state-of-the-art and future research challenges. In: Computer graphics forum, volume 30. Wiley Online Library, pp 1719–1749
Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L et al (2012) Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol 30(1):105
Xun X, Hou Y, Yin X, Bao L, Tang A, Song L, Li F, Tsang S, Kui W, Hanjie W et al (2012) Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell 148(5):886–895
Acknowledgements
This work is supported by the Grants of NSFC (61772315, 61602273), Shenzhen Science and Technology Program (JSGG20170412170711532) and the Open Research Fund of Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University (BKBD-2017KF02).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ge, T., Lu, Y., Lu, K. et al. VEGA: visual comparison of phylogenetic trees for evolutionary genome analysis (ChinaVis 2019). J Vis 23, 523–537 (2020). https://doi.org/10.1007/s12650-020-00635-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-020-00635-0