Abstract
Genomics has emerged as one of the major sources of big data. The task of augmenting data-driven challenges into bioinformatics can be met using technologies of parallel and distributed computing. GATK4 tools for genomic variants detection are enabled for high-performance computing platforms – SPARK Map Reduce framework. GATK4+WDL+CROMWELL+SPARK+DOCKER is proposed as the way forward in achieving automation, reproducibility, reusability, customization, portability and scalability. SPARK-based tools perform equally well in genomic variants detection with that of standard implementation of GATK4 tools over a command-line interface. Implementation of workflows over cloud-based high-performance computing platforms will enhance usability and will be a way forward in community research and infrastructure development for genomic variant discovery.
Competing Interest Statement
The authors have declared no competing interest.