Oct 31 , 15:30 - 16:30

SAMURAI: A pipeline for DNA copy number analysis of shallow whole-genome sequencing experiments

Copy number alterations (CNAs), that is alterations of the number of copies of DNA in a cell, occur in tumors (but also some genetic disorders) as a result of chromosomal instability, and are present with varying degrees in many different types of cancers. These structural changes are not only an indication of the state of the tumor genome, but may be exploited for diagnostic, prognostic, or treatment purposes. For example, the presence of absence of specific alterations may influence response to drugs, or indicate the risk of a patient to relapse after surgical or pharmacological treatment. In addition, copy number alterations can be tracked not only in tissues, but also in other biological samples, such as plasma, or other specimens routinely used in the clinic. The standard approaches towards detection of CNAs rely mostly on whole exome sequencing (WES) or whole genome sequencing (WGS). Although precise and sensitive, both WGS and WES are poorly suitable for a clinical setting, due to the sequencing costs and the storage and computing requirements. An alternative that emerged in the past decade is the use of shallow whole-genome sequencing (sWGS), also called ultra low-pass whole-genome sequencing (ULP-WGS). sWGS involves sequencing the entire genome at very low depths of coverage (from 0.1X to 1X), and allows identification of CNAs with reasonable precision, with much lower computing power requirements at a fraction of the cost of WES and WGS. The bioinformatics has developed a range of tools to handle these data, either coming from sequencing of bulk tissue, biological samples, or single cells, but dedicated, fully reproducible pipelines are still lacking despite the maturity of the approach. To fill this gap, we have developed SAMURAI (Shallow Analysis of Copy nuMber alterations Using a Reproducible And Integrated bioinformatics pipeline), a pipeline for the analysis of sWGS data (either from tissue samples, or other biological specimens) that leverages both Nextflow and the tools developed by the nf-core community to ensure robustness and reproducibility of the results. Currently SAMURAI is being used for all the sWGS analyses in the Cancer Pharmacology laboratory at Humanitas Research Hospital. Here I will describe the design principles of SAMURAI, how it was developed, and the advantages it brought over custom scripts. This session was funded by AIRC 2019 IG, project code 23059.

Speaker

Co-authors

Sara Potente, Diego Boscarino, Dino Paladin, Chiara Romualdi, Sergio Marchini