GHGA: Standardizing and harmonizing NGS analysis workflows to create a unified omics data resources

As human omics data expands, the requirement for data management is to facilitate secondary use. Therefore the German Human Genome-Phenome Archive (GHGA), within the federated European Genome-Phenome Archive (FEGA), is developing a scalable and secure IT infrastructure for Germany, an ethico-legal framework to handle omics data in a data-protection-compliant but open and FAIR manner, a harmonized metadata schema, and standardized workflows to process all incoming omics data. We are building upon the nf-core and nextflow community to build NGS analysis workflows for all incoming data modalities such as nf-core/SAREK. GHGA enables the creation of harmonized and standardized NGS resources across datasets and projects by maintaining and developing workflows, creating runtime configurations for each data modality with stable identifies, and continuously evaluating the performance. In collaboration with the nf-core community and beyond, GHGA is maintaining and co-developing six workflows. To evaluate the performance of the workflows GHGA together with the German NGS Competence Network developed a continuous benchmarking framework: NCBench. GHGA is creating a runtime configuration to uniformly process NGS data while guaranteeing the highest standards and quality of the workflows. By maintaining scalable, reproducible, and continuously benchmarked workflows, GHGA will create a harmonized and standardized NGS data resource ready to be used by the German research community. Such a harmonized resource will enable cross-analysis of projects and population-scale studies, promote new collaborations and research projects, and establish the foundation for developing a German-based variant frequency database. Grants: German Research Foundation (DFG) 441914366 (NFDI 1/1)