Batch processing

JIPipe is a workflow language designed for batch processing and thus implements a variety of features that are designed to make batch processing convenient to implement while ensuring maximum flexibility.

Generally, all JIPipe relies on two core concepts regarding the batch processing:

The inputs and outputs are tables, thus making the processing of multiple data the default case
Nodes process all of their inputs before the pipeline continues with the next step

These concepts ensure that existing single-data pipelines can be easily converted into batch processing workflows (zero-cost scalability), although by sacrificing efficiency (memory- and runtime-wise).

Iteration modes

Due to the variety in how nodes can process data (simple iteration, merging, etc.), most JIPipe nodes follow one of the basic iteration modes, which determine how JIPipe creates the iteration steps that are applied when the workflow is running.

An iteration step is a bundle of inputs that will be processed by the actual workload function. For example, a simple iteration step could consist only of one image that will be thresholded. A more complex iteration step might be required for training deep learning networks where raw data, labels, and other information need to be present.

The three basic ways how JIPipe iterates through data are as follows:

Row-wise iteration through a single input (simple iteration)
Matching the data of multiple inputs to ensure that the underlying workload receives only one data per slot
Matching the data of multiple inputs, but without any restrictions, so the workload can merge or work with multiple data per slot

Some functionality of JIPipe like external parameters or node-specific setups (e.g., deep learning models) introduce additional parametric iteration.

Simple iteration

This mode is a special case of Single data per slot iteration where a node has exactly one (non-parametric) input. As such a node receives a table of data, it is trivial to iterate through the table rows and apply the underlying workload per row.

An example of such a node is Auto threshold 2D

Single data per slot iteration

The node requires that each iteration step has exactly one data per slot (for example, one image needs to be assigned to one set of ROIs). To ensure this, JIPipe analyzes the text annotations of the inputs and uses them to group the data together.

By default, JIPipe only considers text annotations that start with a "#" sign, thus making a clear distinction between "important" and "non-important" annotations. You can configure this behavior via the input manager interface.

Multiple data per slot iteration (merging)

Similar to Single data per slot iteration, JIPipe uses text annotations to find groups within the data. But as the node can handle multiple data items per slot, JIPipe will not do any further postprocessing.

This type of iteration is usually encountered in nodes that merge data, for example Merge table rows.

Parameteric iteration

Features like external parameters introduce parametric input slots that are highlighted by a different color.

Such slots may be excluded from the calculations applied by JIPipe's standard iteration step generation method and act as additional layer of iteration.

For example, if an Auto threshold 2D node executes its workload 10 times, as it receives 10 images, given 3 inputs from the Parameters slot, the workload would be executed 30 times, once per parameter set.

Last modified: 27 März 2025