Oct 31 , 15:30 - 16:30

Automating reproducible workflows with Nextflow for strain resolution from long-read metagenomes

The ability to distinguish strains from clinical samples holds enduring medical significance. Recent advancements in long-read sequencing technology promise substantial improvements in the precision of strain-level identification from metagenomic data. In this study, we leveraged Nextflow pipelines to elucidate the efficiency of three current bioinformatics tools—Trac'm, Strainberry, and Strainy— in resolving bacterial strains from mock microbial community and authentic metagenomes derived from long-read sequencing. Microbial mock communities and actual microbial communities were prepared for long-read sequencing on the GridION platform. Human-DNA-free raw reads were processed using custom Nextflow pipelines on an Ubuntu Linux distribution (v.20.04) with ×86_64 architecture for each strain resolution tool. Trac’m aligned these reads to a custom reference database, while Strainberry and Strainy mapped reads to metagenome assemblies for strain resolution. We assessed the task execution time, physical memory usage, and single-core CPU utilization of each tool, utilizing pipeline information generated by each Nextflow workflow. Trac'm exhibited the highest strain completeness in both mock and authentic metagenomes, while Strainy demonstrated the highest strain accuracy. Despite its higher single-core CPU usage, Trac's provided faster strain resolution and better computational efficiency compared to Strainberry and Strainy. The longer execution time and higher memory usage of Strainberry and Strainy can be attributed to their reliance on metagenomic assembly prior to strain resolution. The automated and reproducible workflows facilitated by Nextflow enable seamless integration of diverse bioinformatics tools, significantly enhancing the scalability and reliability of strain resolution processes. Of the three tools tested, Trac'm holds the greatest potential for applications such as real-time pathogen tracking, outbreak identification, transmission monitoring, intervention evaluation, and other public health initiatives. Moving forward, we aim to integrate Trac’m into our current workflows to expedite pathogen tracking at the strain level, thereby guiding actionable decisions for patient care.
View project

Speaker

Co-authors

Ayorinde O. Afolayan, Stefany Ayala Montaño, Ifeoluwa Akintayo, Leonardo Duarte dos Santos, Sandra Reuter