RZT aiOS blocks, irrespective of pre-built or custom, can be executed either individually as a single block or as part of a pipeline. Other aspects a user is concerned about while running a block is the execution environment and the mechanism by which data is moved from one block to another. In this document, we talk about different execution environments in detail. To learn about different data transfer methods seet data transport between blocks. This tutorial starts from how to run a block in the simplest environment like a single threaded process to complex environment like spark and horovod. architectures like spark and horovod.
A block or a pipeline can be run in different types of execution environment like
TheadExecutor SuprocessExecutor Specialization of ThreadExecutorProcessExecutor Specialization of ThreadExecutorContainerExecutorBlockPickleExecutorPipelineEngineExecutor Specialization of BlockPickleExecutorSparkExecutor Specialization of ContainerExecutorHorovodExecutor Specialization of ContainerExecutorTODO Add a hierarchical clss diagram showing the inheritance hierarchy
By default, when no container is specified, a block will run as a subprocess forked from the Jupyter kernel process. This is ideal and quick for trying out small prototypical code. Example
import razor.flow as rf
import pandas as pd
class CsvReader:
filename: str
output:rf.SeriesOutput[pd.DataFrame]
def run(self):
file_path = project_space_path(self.filename)
chunks = pd.read_csv(file_path, chunksize=10, nrows=None, delimiter = None)
for df in chunks:
self.output.put(df)
CsvReader(filename="titanic/train.csv").execute()