Oct 31 , 17:00 - 17:15

STIMULUS : A nextflow-based pipeline for training deep learning models

Deep learning model development in natural science is an empirical and costly process. Users must define a pre-processing pipeline, an architecture, find the best parameters for said architecture and iterate over this process. Leveraging the power of Nextflow (polyglotism, container integration, scalable on the cloud), we propose STIMULUS, an open-source software built to automatize deep learning model development for genomics. STIMULUS takes as input a user defined PyTorch model, a dataset, a configuration file to describe the pre-processing steps to be performed, and a range of parameters for the PyTorch model. It then transforms the data according to all possible pre-processing steps, finds the best architecture parameters for each of the transformed datasets, performs sanity checks on the models and train a minimal deep learning version for each dataset/architecture. Those experiments are then compiled into an intuitive report, making it easier for scientists to pick the best design choice to be sent to large scale training. Stimulus is available at : https://github.com/mathysgrapotte/stimulus
View project

Speaker

Co-authors

Alessio Vignoli, Suzanne Jin, Luisa Santus, Jose Espinosa-Carrasco, Ionas Erb, Cedric Notredame