Automated bioinformatics infrastructure for large scale SARS-Cov-2 genomic surveillance at QIB
Between 2020 and 2022, QIB sequenced and analysed more than 87,000 SARS-CoV-2 genomes collected in the UK, and from over a dozen international collaborators. In order to rapidly analyse large numbers of the sequences with high priority, we implemented an automated bioinformatics infrastructure using the open source Nextflow based bioinformatics pipelines on an on-premise open stack platform of the CLIMB project. This is a robust modularised end-to-end workflow, it monitors and automatically starts when the instruments finish sequencing by scheduling computing analyses to a highly portable SLURM cluster, the main processes include demultiplexing the raw sequence data, generating viral consensus genomes and identifying viral lineages of the sequences. The resulting files and reports are then populated to a central database, and notified to the working group on an instant messaging platform. The pipelines ran robustly, and at scale, any time when sequence data was produced.