RZT aiOS blocks, irrespective of pre-built or custom, can be executed either individually as a single block or as part of a pipeline. Other aspects a user is concerned about while running a block is the execution environment and the mechanism by which data is moved from one block to another. In this document, we talk about different execution environments in detail. To learn about different data transfer methods seet data transport between blocks. This tutorial starts from how to run a block in the simplest environment like a single threaded process to complex environment like spark and horovod. architectures like spark and horovod.
A block or a pipeline can be run in different types of execution environment like
TheadExecutor
SuprocessExecutor
Specialization of ThreadExecutor
ProcessExecutor
Specialization of ThreadExecutor
ContainerExecutor
BlockPickleExecutor
PipelineEngineExecutor
Specialization of BlockPickleExecutor
SparkExecutor
Specialization of ContainerExecutor
HorovodExecutor
Specialization of ContainerExecutor
TODO Add a hierarchical clss diagram showing the inheritance hierarchy
By default, when no container is specified, a block will run as a subprocess forked from the Jupyter kernel process. This is ideal and quick for trying out small prototypical code. Example
import razor.flow as rf
import pandas as pd
class CsvReader:
filename: str
output:rf.SeriesOutput[pd.DataFrame]
def run(self):
file_path = project_space_path(self.filename)
chunks = pd.read_csv(file_path, chunksize=10, nrows=None, delimiter = None)
for df in chunks:
self.output.put(df)
CsvReader(filename="titanic/train.csv").execute()