Oct 31 , 15:30 - 16:30
Introducing nf-core/phaseimpute -r dev from idea to release
Genome imputation is a statistical technique that enhances the resolution of genotyping arrays and low-pass sequencing (<1x) by filling missing data with information from reference panels. While existing pipelines primarily focus on the imputation step and in the human species, crucial steps such as panel preparation, phasing, and imputation assessment are often overlooked.
To address this gap, we introduce nf-core/phaseimpute, a comprehensive pipeline performing panel preparation, genetic simulation, imputation, and tool assessment. Each step is designed for independent execution, enabling users to save outputs and computational time for subsequent analysis. In addition, we took advantage of Nextflow’s capabilities in workflow distribution by processing each dataset by chromosomes or chunks. This means that tasks can be processed in parallel, reducing overall execution time. With support for various imputation tools like GLIMPSE1, GLIMPSE2, STITCH, and QUILT, the pipeline accommodates diverse research needs. Moreover, it offers flexibility by allowing execution with or without reference panels, making it invaluable for non-model species where phased haplotypes may not always be available.
The journey from the initial idea to the first release of the nf-core/phaseimpute pipeline in the nf-core community has been an extensive one. Starting with the aim of creating an efficient, reproducible solution for genomic phasing and imputation, we developed the essential imputation and phasing processes within the nf-core modules repository before integrating them into subworkflows. Using the existing nf-core modules significantly accelerated the development process. Throughout the implementation, we observed that some nf-core modules required design modifications when first tested within the pipeline context. These adjustments were necessary to accommodate all required parameters and ensure the modules met the user’s specific requirements. Consequently, we contributed back to the community by adding new functionality that was not previously available. . Additionally, we benefited from advancements made in Nextflow plugins, such as nf-validation and nf-test. The nf-validation plugin enabled us to enforce schema validation, ensuring that our pipeline configurations met predefined standards and reducing the likelihood of errors. Continuous integration testing and the nf-test plugin were used to verify that each update maintained the pipeline's accuracy and stability, ensuring that no matter the changes different developers would make in the code, the final output files would still be the same. Collaborative efforts within the bioinformatics community facilitated the integration of optimal tools and rigorous testing, resulting in a reliable, high-performance pipeline now accessible to the nf-core community for advanced genomic research.
View project
Speaker
Co-authors
Anabella Trigila