May 23 , 10:00 - 10:15

Parallelization of computer vision over large corpora of gigapixel biomedical images

Biomedical imaging presents a case where not only analyses become very slow as the number of images increases but can also preclude the use of more traditional acceleration options such as GPUs due to the large memory footprint of individual data points (gigapixel images). However, NextFlow significantly and easily improves the scalability of analyses that can be split into many independent parts, such as individual images from a large image set. Using NextFlow, we have created reusable and portable analysis pipelines that can effectively adapt to both available resources and the computational needs of various datasets. In this talk, I will cover our use of NextFlow in the analysis of a 8.1TB dataset of whole-slide images, discussing both the substantial gains from NextFlow-enabled parallelization and some of the pitfalls for our large-format images, including getting the auto submission of jobs to work with our Slurm setup and some of the issues we had with our intermediary files.

Speaker

Co-authors

Susan Sheehan, John M Mahoney