Block Executors

RZT aiOS blocks, irrespective of pre-built or custom, can be executed either individually as a single block or as part of a pipeline. Other aspects a user is concerned about while running a block is the execution environment and the mechanism by which data is moved from one block to another. In this document, we talk about different execution environments in detail. To learn about different data transfer methods seet data transport between blocks. This tutorial starts from how to run a block in the simplest environment like a single threaded process to complex environment like spark and horovod. architectures like spark and horovod.

A block or a pipeline can be run in different types of execution environment like

  • TheadExecutor
  • SuprocessExecutor Specialization of ThreadExecutor
  • ProcessExecutor Specialization of ThreadExecutor
  • ContainerExecutor
  • BlockPickleExecutor
  • PipelineEngineExecutor Specialization of BlockPickleExecutor
  • SparkExecutor Specialization of ContainerExecutor
  • HorovodExecutor Specialization of ContainerExecutor

TODO Add a hierarchical clss diagram showing the inheritance hierarchy

By default, when no container is specified, a block will run as a subprocess forked from the Jupyter kernel process. This is ideal and quick for trying out small prototypical code. Example

import razor.flow as rf
import pandas as pd
class CsvReader:
    filename: str
    output:rf.SeriesOutput[pd.DataFrame]
    def run(self):
        file_path = project_space_path(self.filename)
        chunks = pd.read_csv(file_path, chunksize=10, nrows=None, delimiter = None)
        for df in chunks:
            self.output.put(df)
            
CsvReader(filename="titanic/train.csv").execute()