Archived site. See the  latest event.
Kathryn Gorski
Kathryn Gorski

Nov 30, 2023, 2:45 PM CET

Watch on YouTube

An event-driven solution to integrate pipeline executions within complex informatics infrastructures

Ecosystem

Kathryn Gorski, Manuele Simi, Princesca Delpé, Evan Fernandez, James Solomon, Jeffrey M. Tang, Pantelis Zisimopoulus, Alexandros Sigaras, and Andrea Sboner


Computational pipelines are often part of an ecosystem of tools that generate, process, store, and record data in several rounds. Typically, data generated by one system are processed by another, and the output is consumed by yet another. Interactions are frequently managed manually by users. To increase automation, mitigate the possibility of errors, and reduce idle times, we sought a flexible, scalable, and platform-independent solution to react to changes in these systems.

We present DispatcherSuite, an event-driven solution based on the Kafka messaging protocol with simple integration into Nextflow pipelines. Events are changes of state worthy of notification; wrapped in a message with a payload holding relevant information. The key component is Kafka-Dispatcher (KD), a microservice designed to work with multiple systems, including the analytical environment. Each KD acts both as a producer and consumer of messages: they can send messages and subscribe to brokers to be notified on selected topics. A lightweight Groovy library (Zero-Mess) has been developed to facilitate the publication of messages during pipeline execution. Messages can be sent from the main workflow, subworkflows, or individual processes.

KDs can also execute actions via configuration files. These actions can include emailing users the pipeline status or updating Laboratory Inventory Management Systems (LIMS) as the pipeline progresses. Similarly, pipeline executions can be triggered by changes in other systems; e.g., when new input files are recorded in LIMS, a message with the pipeline’s input can notify Nextflow to start. This enables the chaining of pipelines to create complex sequences. This event-driven framework enables automation and simultaneously provides robustness, flexibility, and scalability. KDs can be extended to create new ones with specialized scopes and actions. With future extensions of the DispatcherSuite framework, we aim to achieve full automation when processing data.

Watch on YouTube
Kathryn Gorski

Kathryn Gorski

Bioinformatics Analyst at Weill Cornell Medicine

Ecosystem
Speaker