Oct 31 , 11:30 - 13:00

Testing Different Approaches in Analysis of Shotgun Metagenomics Data from Epilepsy Rat Model

Introduction Shotgun metagenomic sequencing is an informative but very complex method exhibiting several challenges. There is no gold standard established for the bioinformatic analysis of the shotgun metagenomics data, and existing methods and pipelines expose varying advantages and disadvantages. Researchers decide on the analysis methodology considering different aspects such as the sensitivity, usage simplicity of the tools or the computational power and time necessary to perform whole analysis. In this study, shotgun metagenomics sequencing data, obtained from rat faecal samples to investigate the gut-brain axis in the kainic acid-induced epilepsy development, were used to test the taxonomic classification efficiencies of different analysis algorithms and methods. Methods A total of three rat subgroups were defined as Sham controls, kainic acid administered rats with and without seizure development as Epi and No-Epi. DNA extracted from 66 rat faecal samples were sequenced on Illumina NovaSeq 6000 platform with 2x100 bp by CeGaT GmbH (Tübingen/Germany). Two main approaches were employed; assembly-free analysis or de novo assembly of the short reads to obtain metagenomic assembled genomes (MAGs). For assembly-free methods, pre-processes reads were analyzed using three different tools; Kraken2, METAgenomic PHyLogenetic Analysis 4 (MetaPhlAn4), and Kaiju. Summary tables were prepared either manually or automatically using built-in functions of the tools if available. For the generation of MAGs, pre-processing, hybrid-assembly and binning steps were carried out using nf-core/mag analysis pipeline (ver. 2.2.1). For assembly step, reads of the samples belonging to same rat group were co-assembled. Independent from nf-core, a third metagenomic binning tool, MetaBinner was used. All the bins produced by three binning tools were combined, and processed by bin refinement module of MetaWRAP using CheckM, based on minimum completeness rate of 70% and maximum contamination of 10%. After obtaining first set of MAGs, pre-processed reads were mapped to them, and unmapped reads were used for second iteration of MAG generation by following the steps above, except, all the unmapped reads were pooled and used for co-assembly. Two sets of MAGs were combined, and dereplicated via dRep tool with default ANI threshold parameters. Taxonomic annotation of the final set of MAGs was carried out by Genome Taxonomy Database (GTBD)-Tk (ver. 2.1.1) using reference database version of R207_v2. Results When the annotations are aggregated into two categories based on classification rates, for Kraken2; median rates are obtained as 17.46% and 82.53% for classified and unclassified reads, respectively. In case of annotation using MetaPhlAn4, the classified and unclassified portions of the reads are 56.76% and 43.24%, respectively. Kaiju classification based on phylum level annotations, the median rates were 49.4% for bacteria and 39.19% for unassigned reads. In case of MAG generation strategy using a hybrid workflow composed of nf-core mag pipeline and additional customs tools and step, three rat group-based co-assembly strategy in the first round yielded 176, 179, and 175 MAGs from Sham, No-Epi, and Epi group sample reads, respectively. In the second round, all sample reads that are not mapped to those three MAG groups were pooled and assembled, and 109 MAGs were produced. After dereplication of all the MAGs obtained, a final catalogue composed of 324 unique MAGs were obtained. When the pre-processed reads were mapped to the MAG catalogue, a median mapping rate of 52.42% was obtained for pre-processed reads of 66 samples based on unique and concordant mapping settings. Based on GTBD-Tk classification, 100% of the MAGs were efficiently annotated at the phylum, class, order, and family level, while 97.5% and 62% of them were annotated at the genus and species levels, respectively. Discussion In this study, taxonomic composition of the gut microbiota in three rat groups were examined by analysis of deep shotgun metagenomics sequencing data with different approaches including assembly-free and de novo assembly methods. Relatively conservative parameters were set to reduce risk of false-positive annotation, and hence, it further reduced the rates of classification from Kraken2 and Kaiju, and MetaPhlAn4 showed the best performance by means of classification success. On the other hand, metagenomic assembly approach outperformed short-read based methods. We obtained a catalogue composed of 324 MAGs representing 52.42% of the pre-processed reads with much higher resolution obtained in different taxonomic levels.