Oct 28 , 16:45 - 17:20

Computational rescue strategy for an orphan codebase

The number of newly developed research software in the biomedical field has increased in the past years, but the reuse of this software remains a significant challenge. This problem was recognised and addressed by the introduction of The FAIR principles (Findable, Accessible, Interoperable, Reusable) for research software (Barker et al., Scientific Data 2022; Patel et al., Scientific Data 2023) highlighting the need for more sustainable software development. The goal is to enable the broader use of existing tools, prevent redundancy by each research group developing their own tools and not be dependent on proprietary software solutions. In this work, we present a computational strategy to rescue an “orphan” codebase, a repository with no sign of developer activity, and improve the FAIRness of published software. We focus our efforts on the field of biological imaging, where datasets can be in the terabyte range and acquisition times are lower and lower. Data analysis demonstrates special challenges, no clear standards or best practices can be found, a lot of manual steps are included in the analysis when using GUIs like FIJI, scientists often rely on proprietary software like MATLAB or Imaris and a lack of FAIR open source end-to-end processing tools (Istrate, Ana-Maria et al., 2022). Specifically, we demonstrate the rescue of the “orphan” pipeline NuMorph ( Krupa et al., Cell reports 2021) for processing and analysis of tissue-cleared whole mouse brain images. This extensive pipeline is used for processing light-sheet microscopy datasets in the terabyte range, but since its publication, maintenance and usage by others besides the lab of the authors could not be observed. We implemented the existing pipeline in nextflow with the final goal of integrating it into the nf-core community which ultimately leads to an improvement of each of the FAIR principles. With this, we want to contribute to a more sustainable research software development process, as well as introduce one of the first best-practice pipelines for image processing and analysis into the nf-core community thereby moving towards standardized and reproducible image analysis.