Blocks and pipelines are foundational to any robust design of an Intelligent System. How they are thought of and constructed has a wide ranging impact on the accuracy, robustness and flexibility of an AI. Specifically:
While they may seem a bit of an overkill for the beginning data scientist, you will very quickly learn the beauty and elegance of building intelligent systems using blocks and pipelines in a framework like Razorthink's. As your system goes anywhere beyond a toy implementation, you will see the absolute criticality of this approach.
You want to define a basic Block that performs a simple operation. For example, one that takes as inputs - a string and a delimiter, splits the string by the delimiter and gives out an array of sub-strings.
The definition of a simple Block comprises of the following components:
@rf.block
annotationrf.Input
. The input for a block can be of two types Atomic or Seriesrf.Output
. The output of the block can be of two types, Atomic or Seriesrun
method Example
The blocks perform a function, that can take multiple inputs and provide multiple outputs. In the following example we define a block to split a string based on a delimiter. The block takes two inputs and provides a single output.
Inputs : The block input is defined as a variable within the class. The variable need not be specified as a block input. All variables defined within the block class are assumed to be of type rf.Input
which is an Atomic input
rf.Input
rf.SeriesInput
Output The block output is defined using the rf.Output
type. There are two types of input a block can take
rf.Output
rf.Series.Output
The block class consists of the run function, which will house the logic for performing the necessary operation. Also, the block class will take the input and output parameters that are defined as attributes.
Once the operation is performed, the results would need to placed into the output variable/stream using the put
function as shown in the following example
import razor.flow as rf
@rf.block
class SplitString:
# Atomic inputs taking default values as str. These inputs are by default initialised to rf.Input class
text: str
delimiter: str
# Atomic input of type list. Provides the results as a list
data: rf.Output[list]
def run(self):
result = self.text.split(self.delimiter)
self.data.put(result)
An instance of block created above can be executed directly by giving proper inputs.
SplitString(text="The,output,of,the,block,should,be,a,list,of,words",
delimiter=","
).execute()
{'data': ['The',
'output',
'of',
'the',
'block',
'should',
'be',
'a',
'list',
'of',
'words']}
The block can be published so that it can be accessed later from jupyter notebook or PIPELINES
page in IDE for building pipelines. Every block is published with an associated bundle name
which is analogous to a python module name. In order to publish a block, the block code should be placed in certain hierarchy of directories. A block can be published with two different scopes
project
scope: A block published with project scope will be available ony in the project from which it is published. All python code for a block with project scope should be placed inside the following directory. Replace <bundle_name>
with the actual bundle name of the block.
__blocks/project/<bundle_name>
org
scope: A block can be published with org
scope so that it is available in all projects for that tenant. Custom block with org
scope should follow the following directory structure
__blocks/org/<bundle_name>
In order to publish a block, certain additional attributes should be set. Create a python file __blocks/project/string_processors/split_string.py
containing the block's class definition code along with some additional metadata attributes required for publishing the block
import razor.flow as rf
@rf.block
class SplitString:
# Metadata attributes for publishing the block
__publish__ = True
__label__ = "SplitString"
__description__ = "Splits a string into multiple strings delimited with given character"
__tags__ = []
__category__ = 'string_processors'
# Atomic inputs taking default values as str. These inputs are by default initialised to rf.Input class
text: str
delimiter: str
# Atomic output of type list. Provides the results as a list
data: rf.Output[list]
def run(self):
result = self.text.split(self.delimiter)
self.data.put(result)
Create a python file __blocks/project/string_processors/__init__.py
and import the block class SplitString
from .split_string import SplitString
from razor import block_setup
__metadata__ = block_setup(version="0.0.1")
Finally publish the block by running following code
from razor import BlockScope
razor.api.blocks.publish(scope=BlockScope.PROJECT,bundle='string_processors',overwrite =True)
INFO:razor.api.impl.block_manager_impl:Packaging the bundle...
INFO:razor.api.impl.block_manager_impl:Publishing block bundle...
INFO:razor.api.impl.block_manager_impl:Block bundle published.
INFO:razor.api.impl.block_manager_impl:Make sure to restart the Jupyter kernel and then you can use the blocks as follows:
INFO:razor.api.impl.block_manager_impl:from razor.project.blocks.string_processors import SplitString
The published block can be imported later and used in Jupyter notebook. The block will also appear in IDE
from razor.project.blocks.string_processors import SplitString
Conceptually, blocks are akin to functions in a programming language. They take in inputs as parameters, perform some kind of operation on them and then return an output. In case of blocks, there can be multiple outputs as well and each input and output of a block are individually configured. Examples of blocks with multiple outputs will follow shortly.
Notice that run
method performs the operations defined within the block. The run method has access to the inputs and outputs defined within the class. The run menthod then can use the inputs and variables defined to perform the operation
Also notice how the block is instantiated and used. Razor SDK uses the conventional python constructs. The inputs to the block can be provided as attributes to the constructor while initialising the block. An example of which is listed bellow
split_str = SplitString(text="91-97384-20742", delimiter="-")
You want to define a block that works on a stream of data.
Any variable can be defined as type rf.SeriesInput[type]
making that input as a series input. Razor SDK treats a series of input as a Queue. There are a couple of ways to retrieve values from this queue:
.get
method to take one value at a timeThe following example shows you how to rewrite the block from the previous recipe to one that accepts a stream of strings as input:
import razor.flow as rf
@rf.block
class SplitString:
# Series inputs taking default values as str. These inputs are by default initialised to rf.Input class
texts: rf.SeriesInput[str]
delimiter: str
# Series output of type list. Provides the results as a list
data: rf.SeriesOutput[list]
def run(self):
for text in self.texts:
result = text.split(self.delimiter)
self.data.put(result)
Although stream inputs and outputs are usually used in pipelines where a series of operations are performed by blocks one after another on chunks of data, one can also feed any iterator such as list to the series input of a block.
split_str = SplitString(texts=['1/1/2020', '3/1/2020', '23/1/2020'], delimiter="-")
split_str.execute()
You want to use a block that has been pre-built. These are blocks that have been built by Razorthink, or its partners, blocks that you have downloaded from the market place, or others that you have custom built.
You need to be able to do multiple tasks to be able to easily use pre-built blocks, specifically:
There are multiple api's and mechanisms that allow you to perform the above mentioned operations
List all available blocks
Using the block widget: The blocks can be listed using the block widget, which is present in the left hand side to the Jupyter notebook. Dragging and dropping the block from the widget to a cell on the notebook imports the block with the import statement along with the intialisation skeleton
Using the SDK API: The blocks can be listed and accessed uisng the SDK api. The following API lists all available blocks and provides the usage (import) statement on how to use the block within the notebook. The SDK API is shown bellow
import razor
razor.api.blocks()