Block Executors

RZT aiOS blocks, irrespective of pre-built or custom, can be executed either individually as a single block or as part of a pipeline. Other aspects, a user is concerned about, while running a block is the execution environment and the mechanism by which data is moved from one block to another. In this document, we talk about different execution environments in detail. To learn about different data transfer methods see data transport between blocks. This tutorial starts from how to run a block in the simplest environment like a single threaded process to complex environments like spark and horovod.

A block can be run in different types of execution environments like

  • TheadExecutor
  • SuprocessExecutor Specialization of ThreadExecutor
  • ProcessExecutor Specialization of ThreadExecutor
  • ContainerExecutor
  • SparkExecutor Specialization of ContainerExecutor
  • HorovodExecutor Specialization of ContainerExecutor

Subprocess Executor

By default, when no container is specified, a block will be executed as a subprocess forked from the Jupyter kernel process. This is ideal and quick for trying out small prototypical code. The CsvReader block in the below code reads a file in multiple chunks from project space using pandas and outputs the shape of each chunk of data.

import razor.flow as rf
from razor.api import project_space_path
import pandas as pd
@rf.block
class CsvReader:
    filename: str
    output:rf.SeriesOutput[tuple]
    def run(self):
        file_path = project_space_path(self.filename)
        chunks = pd.read_csv(file_path, chunksize=100, nrows=None, delimiter = None)
        for df in chunks:
            self.output.put(df.shape)
            
csv_reader = CsvReader("Read csv file", filename="titanic/train.csv")

Container Executor

The above example works well for small files (less than 100 Mega Bytes). For larger files, one might want to assign more cpu cores and memory. RZT aiOS provides a ContainerExecutor in which one can assign more cpu and memory

csv_reader.filename = "mnist/mnist_train.csv"
csv_reader.executor = rf.ContainerExecutor(cores=2, memory=1000)
csv_reader.execute()

Spark Executor

RZT aiOS allows one to configure and add distributed environments like Apache Spark and Horovod for larger data processing tasks. To learn more about how to use a spark engine for running the pyspark code see section Building and running a spark block. Support for Horovod is not available in the current release.