Oct 31 , 14:00 - 15:30

Umi-pipeline-nf: Accurate Consensus Sequence Creation for UMI-Tagged Nanopore Data

Long-range nanopore sequencing allows single-molecule sequencing but still faces challenges due to its high error rate. Umi-pipeline-nf addresses this by using unique molecular identifiers (UMIs) to create highly accurate single-molecule consensus sequences, reducing error rates by over 100-fold. Umi-pipeline-nf is a Nextflow-based pipeline designed to process UMI-tagged nanopore sequencing data. The pipeline applies quality control to input Fastq files and filters for full-length reads by aligning them against a reference sequence. The terminal UMIs are extracted, used to cluster reads, and produce highly accurate, polished consensus sequences for each UMI cluster (i.e., every tagged input molecule). It also offers optional low-frequency variant calling within the obtained consensus sequences. Key features include real-time monitoring of the number of clusters per sample during sequencing, optional GPU acceleration for cluster polishing, and detailed quality control of the UMI clusters to remove potentially admixed clusters. Additionally, Umi-pipeline-nf supports multiple variant callers to provide flexibility in data analysis and Docker/Singularity containers to ensure reproducibility and portability to HPC clusters. Umi-pipeline-nf is particularly useful in applications requiring virtually error-free sequencing with clonal, respectively single-molecule resolution, such as sequencing repetitive genome regions, studying intra-host viral evolution, investigating cancer clonal evolution, or determining detailed metagenomic profiles. In a recent preprint, we showcase its ability to generate highly accurate, full-length haplotypes at single repeat resolution of a long, complex, and repetitive human repeat element, the LPA KIV-2 VNTR (https://doi.org/10.1101/2024.03.01.582741).