Leveraging Nextflow for the Development of FAIR-Compliant Somatic Variant Calling Workflows

Background: The National Center for Tumor Diseases (NCT) Heidelberg and the German Cancer Research Center (DKFZ) have developed, maintained, and used somatic variant calling pipelines for high-throughput data analysis. These pipelines have played a significant role in large consortia, including the International Cancer Genome Consortium (ICGC). The workflows encompass diverse software components for calling somatic single nucleotide variations (SNVs), short insertions and deletions (indels), structural variants (SVs), and allele-specific somatic copy number aberrations (sCNAs). What we did: With the ultimate aim of providing adequate resources for data sharing and harmonizing the processing of data, as a part of the federated European Genome-Phenome Archive (EGA), The German Human Genome-Phenome Archive (GHGA) is committed to the utilization of FAIR-compliant bioinformatic workflows which requires efforts on secure portability of data and workflows, flexibility, scalability and automation of the processes. Nextflow, a prevalent workflow management language, has emerged as a solution for such reproducible workflows aligning with the goals of GHGA. In a fruitful collaboration with nf-core, a community effort supporting nextflow, GHGA is ensuring standardized and non-repetitive work in this respect. The translation of DKFZ somatic variant calling workflows to nextflow enables us to create a framework for the standardization of workflow processes. At the end: The initial releases of standardized nf-platypusindelcalling, nf-snvcalling, and nf-aceseq workflows are readily available under the GHGA GitHub repository. Conclusion: GHGA collaborates with the nf-core community to provide FAIR-optimized versions of the DKFZ/NCT Somatic Variant calling pipelines.