Nov 01 , 10:15 - 10:30

Seamless deployment and benchmark of multiple sequence aligners with nf-core/multiplesequencealign

The massive generation of biological data has compelled bioinformatics tools to increasingly rely on approximations to manage computational feasibility in processing and analyzing large datasets. Per definition, these approximations cannot be expected to deliver exact solutions and the natural consequence is that a variety of alternative tools now exist that address similar challenges, each performing differently across datasets, without any one of them being universally recognized as the best solution. This trend is anticipated to continue growing in the foreseeable future. This presents novel challenges for both users and developers. Users face the complex task of navigating a diverse array of tools to select the most suitable option, considering factors such as accuracy, speed, and data compatibility. Meanwhile, developers must ensure their tools meet diverse user needs while maintaining robustness and usability across various computational environments and also be able to fairly compare their tools with the state of the art impartially. An exemplary instance where heuristic solutions have become essential is in multiple sequence alignment (MSA), a widely utilized yet computationally intensive modelling tool. We introduce nf-core/multiplesequencealign: a pipeline designed to facilitate seamless MSA computation while providing rigorous performance evaluation and benchmark reporting. We also highlight the beneficial aspects of Nextflow and nf-core in the design and implementation process, along with the most challenging components encountered. Overall, we anticipate that nf-core/multiplesequencealign will serve as a model for future benchmarking efforts and become a central resource for advancing MSA methodologies.
View project

Speaker