Oct 31 , 09:30 - 09:45

CLIMB-TRE: Nextflow powered analysis of public health big data in a trusted research environment

The SARS-CoV-2 pandemic highlighted how crucial platforms for sharing and analysing public health data are. At the University of Birmingham we developed a novel infrastructure - CLIMB-COVID - as a trusted research environment for academics studying and/or sequencing SARS-CoV-2, public health professionals involved in the national pandemic response, and genomic scientists involved in sequencing efforts to collaborate on a national scale dataset covering >3.5M genome sequences over the period 2020-2024. The accessibility of the submission method and availability of the dataset provided a much needed service to the UK. Building on these efforts we have developed an umbrella pathogen genome surveillance infrastructure called CLIMB-TRE. This infrastructure can be broadly applicable to numerous infectious pathogens, as well as metagenomics datasets. CLIMB-TRE utilises Nextflow pipelines to provide a continually integrating dataset from sequencing data ingested from multiple sources (diagnostic labs, public health agencies and public data) and provides quality control and primary analytical functionality with generated outputs that can be used for downstream public health surveillance activities and academic research. In this session I will cover some of the technical aspects of the system with a specific focus on how we utilise Nextflow workflows to process submitted data and enable CLIMB-TRE users to conduct their own analysis on our cloud computing infrastructure.
View project

Speaker

Co-authors

Thomas Brier, Radoslaw Poplawski, Nicholas Loman, Andrew Smith, Rachel Colqhoun