Introducing Blocks

Blocks and pipelines are foundational to any robust design of an Intelligent System. How they are thought of and constructed has a wide ranging impact on the accuracy, robustness and flexibility of an AI. Specifically:

  1. Ability to rapidly put together complex workflows.
  2. Ability to share code between multiple users, applications
  3. Ability to encapsulate complex algorithms and make it available with ease.
  4. Ability to run multiple technology blocks in one pipeline
  5. Ability to control execution mode (in process, container) and transport (file, streaming, socket) at a granular level
  6. Ability to scale the execution of experiments and compare them

While they may seem a bit of an overkill for the beginning data scientist, you will very quickly learn the beauty and elegance of building intelligent systems using blocks and pipelines in a framework like Razorthink's. As your system goes anywhere beyond a toy implementation, you will see the absolute criticality of this approach.

A Block in 60 Seconds

Problem

You want to define a basic Block that performs a simple operation. For example, one that takes as inputs - a string and a delimiter, splits the string by the delimiter and gives out an array of sub-strings.

Solution

The definition of a simple Block comprises of the following components:

  • Annotate the block using the @rf.block annotation
  • Inputs defined as variables within the block. The inputs of the block can be provided a type from the typing construct of python. All Inputs of the block are of type rf.Input. The input for a block can be of two types Atomic or Series
  • Outputs are variables within the class declared as type rf.Output. The output of the block can be of two types, Atomic or Series
  • Implementation of run method

Example
The blocks perform a function, that can take multiple inputs and provide multiple outputs. In the following example we define a block to split a string based on a delimiter. The block takes two inputs and provides a single output.

  1. Inputs : The block input is defined as a variable within the class. The variable need not be specified as a block input. All variables defined within the block class are assumed to be of type rf.Input which is an Atomic input

    • Atomic, where the input to the function is a single object defined as type rf.Input
    • Series, where the input is a list of values which are streamed to the block. Defined as type rf.SeriesInput

  2. Output The block output is defined using the rf.Output type. There are two types of input a block can take

    • Atomic, where the output is a single value which is streamed out of the block. Atomic outputs are defined as rf.Output
    • Series, where the output is a series of values which are streamed from the block. Series outputs are defined as rf.Series.Output

Defining the block class

The block class consists of the run function, which will house the logic for performing the necessary operation. Also, the block class will take the input and output parameters that are defined as attributes.

Once the operation is performed, the results would need to placed into the output variable/stream using the put function as shown in the following example

import razor.flow as rf

@rf.block
class SplitString:
    # Atomic inputs taking default values as str. These inputs are by default initialised to rf.Input class
    text: str
    delimiter: str
    
    # Atomic input of type list. Provides the results as a list
    data: rf.Output[list]
    
    def run(self):
        result = self.text.split(self.delimiter)
        self.data.put(result)

Executing the block

An instance of block created above can be executed directly by giving proper inputs.

SplitString(text="The,output,of,the,block,should,be,a,list,of,words",
            delimiter=","
           ).execute()
{'data': ['The',
  'output',
  'of',
  'the',
  'block',
  'should',
  'be',
  'a',
  'list',
  'of',
  'words']}

Publishing a block

The block can be published so that it can be accessed later from jupyter notebook or PIPELINES page in IDE for building pipelines. Every block is published with an associated bundle name which is analogous to a python module name. In order to publish a block, the block code should be placed in certain hierarchy of directories. A block can be published with two different scopes

  • project scope: A block published with project scope will be available ony in the project from which it is published. All python code for a block with project scope should be placed inside the following directory. Replace <bundle_name> with the actual bundle name of the block.

    • __blocks/project/<bundle_name>
  • org scope: A block can be published with org scope so that it is available in all projects for that tenant. Custom block with org scope should follow the following directory structure

    • __blocks/org/<bundle_name>
Create the block code with publish metadata

In order to publish a block, certain additional attributes should be set. Create a python file __blocks/project/string_processors/split_string.py containing the block's class definition code along with some additional metadata attributes required for publishing the block

import razor.flow as rf

@rf.block
class SplitString:
    # Metadata attributes for publishing the block
    __publish__ = True
    __label__ = "SplitString"
    __description__ = "Splits a string into multiple strings delimited with given character"
    __tags__ = []
    __category__ = 'string_processors'

    # Atomic inputs taking default values as str. These inputs are by default initialised to rf.Input class
    text: str
    delimiter: str
    
    # Atomic output of type list. Provides the results as a list
    data: rf.Output[list]
    
    def run(self):
        result = self.text.split(self.delimiter)
        self.data.put(result)
Import the block and add version information

Create a python file __blocks/project/string_processors/__init__.py and import the block class SplitString

from .split_string import SplitString
from razor import block_setup
__metadata__ = block_setup(version="0.0.1")
Publish the block

Finally publish the block by running following code

from razor import BlockScope
razor.api.blocks.publish(scope=BlockScope.PROJECT,bundle='string_processors',overwrite =True)
INFO:razor.api.impl.block_manager_impl:Packaging the bundle...
INFO:razor.api.impl.block_manager_impl:Publishing block bundle...
INFO:razor.api.impl.block_manager_impl:Block bundle published.
INFO:razor.api.impl.block_manager_impl:Make sure to restart the Jupyter kernel and then you can use the blocks as follows:
INFO:razor.api.impl.block_manager_impl:from razor.project.blocks.string_processors import SplitString

The published block can be imported later and used in Jupyter notebook. The block will also appear in IDE

from razor.project.blocks.string_processors import SplitString

Discussion

Conceptually, blocks are akin to functions in a programming language. They take in inputs as parameters, perform some kind of operation on them and then return an output. In case of blocks, there can be multiple outputs as well and each input and output of a block are individually configured. Examples of blocks with multiple outputs will follow shortly.

Notice that run method performs the operations defined within the block. The run method has access to the inputs and outputs defined within the class. The run menthod then can use the inputs and variables defined to perform the operation

Also notice how the block is instantiated and used. Razor SDK uses the conventional python constructs. The inputs to the block can be provided as attributes to the constructor while initialising the block. An example of which is listed bellow

split_str = SplitString(text="91-97384-20742", delimiter="-")

Streaming inputs and outputs

Problem

You want to define a block that works on a stream of data.

Solution

Any variable can be defined as type rf.SeriesInput[type] making that input as a series input. Razor SDK treats a series of input as a Queue. There are a couple of ways to retrieve values from this queue:

  1. by treating the input as an iterator
  2. by using .get method to take one value at a time

The following example shows you how to rewrite the block from the previous recipe to one that accepts a stream of strings as input:

import razor.flow as rf

@rf.block
class SplitString:
    # Series inputs taking default values as str. These inputs are by default initialised to rf.Input class
    texts: rf.SeriesInput[str]
    delimiter: str
    
    # Series output of type list. Provides the results as a list
    data: rf.SeriesOutput[list]
    
    def run(self):
        for text in self.texts:
            result = text.split(self.delimiter)
            self.data.put(result)

Although stream inputs and outputs are usually used in pipelines where a series of operations are performed by blocks one after another on chunks of data, one can also feed any iterator such as list to the series input of a block.

split_str = SplitString(texts=['1/1/2020', '3/1/2020', '23/1/2020'], delimiter="-")
split_str.execute()

Accessing Pre Built Blocks

Problem

You want to use a block that has been pre-built. These are blocks that have been built by Razorthink, or its partners, blocks that you have downloaded from the market place, or others that you have custom built.

Solution

You need to be able to do multiple tasks to be able to easily use pre-built blocks, specifically:

  1. List available blocks filtered by descriptions, tags, authors etc.
  2. Review the block documentation
  3. Import and use a block

There are multiple api's and mechanisms that allow you to perform the above mentioned operations

List all available blocks

Using the block widget: The blocks can be listed using the block widget, which is present in the left hand side to the Jupyter notebook. Dragging and dropping the block from the widget to a cell on the notebook imports the block with the import statement along with the intialisation skeleton

Using the SDK API: The blocks can be listed and accessed uisng the SDK api. The following API lists all available blocks and provides the usage (import) statement on how to use the block within the notebook. The SDK API is shown bellow

import razor
razor.api.blocks()