#4Standardizing and harmonizing NGS analysis workflows in the German Human Genome-Phenome Archive
Kubra Narci, Florian Heyl, Christian Mertes, Paul Menges, Luiz Gadelha, Vangelis Theodorakis, Daniel Huebschmann, and Ivo Buchhalter
With increasing numbers of human omics data, there is an urgent need for adequate resources for data sharing while also standardizing and harmonizing data processing. Within the federated European Genome-Phenome Archive (EGA), the German Human Genome-Phenome Archive (GHGA) strives to provide (i) the necessary secure IT-infrastructure for Germany, (ii) an ethico-legal framework to handle omics data in a data-protection-compliant but open and FAIR manner, (iii) harmonized metadata schema, and (iv) standardized workflows to process the incoming omics data uniformly.
GHGA is aiming to be more than an archive. GHGA will build on cloud computing infrastructures managed in a network of data generators. Researchers will have controlled access to raw and processed sequence data using recognized GA4GH-compliant NGS workflows. For this, GHGA is working with the research community especially nf-core to co-develop and standardize bioinformatics workflows for data analysis, benchmarking, statistical analysis, and visualizations. In such an international environment, it is essential to follow the principles of continuous integration and deployment (CI/CD) to test and benchmark the workflows with synthetic and experimental datasets such as CHM cell lines and Genome in a Bottle (GiaB) in order to serve the highest quality of the developed workflows both on technical and biological sides. Finally, by delivering on IT infrastructure and the aforementioned goals, an ethico-legal framework, metadata schemas, and standardized and reproducible workflows, GHGA will enable cross-project analysis and promote new collaborations and research projects.