Managing Machine Learning Dependencies In Distributed Systems

Recently, I was struggling with dependency issues while building NBoost. One of the important features of NBoost is the portability of the models; we wanted the platform to be agnostic between many different types of finetuned models: Tensorflow, Pytorch, Transformers, whatever. But I really wanted NBoost to be packageable and pip-installable as well! This meant that all of the python classes needed to be added to the package. But machine learning dependencies are HEAVY; you shouldn’t have to import pytorch (which is a heavy dependency) if you want to use a tensorflow model, and vice versa. Users should be able to cherry-pick the dependencies they need. The solution wasn’t very complicated, but it worked well for our use case, so I thought I’d share it. Here it is:

1. Add the dependencies in extras_require

Nboost comes in four flavors: nboost (the base package), nboost[tf] (w/ tensorflow), nboost[torch] (w/ pytorch), and nboost[all] (with both). Our setup.py is set up so that users can pip install any of these options depending on their model:

from setuptools import setup

setup(
    # ...
    extras_require={
        'torch': ['torch', 'transformers'],
        'tf': ['tensorflow==1.15', 'sentencepiece'],
        'all': ['torch', 'tensorflow==1.15', 'transformers'],
    },
    # ...
)

2. Create a model mapping in the __init__.py

In the base of our package I added a mapping that reveals which module each model class can be found.

# component => class => module
CLASS_MAP = {
    'protocol': {
        'TestProtocol': 'test',
        'ESProtocol': 'es'
    },
    'model': {
        'TestModel': 'test',
        'TransformersModel': 'transformers',
        'BertModel': 'bert_model',
        'AlbertModel': 'albert_model'
    }
}

This way, when a user types in nboost --model_dir BertModel, NBoost knows that it should only import the bert_model module.

3. Import the classes dynamically from the cli

In our command line entrypoint, I added an import_class function that takes “model” (the module) and “BertModel” (the class) as arguments and returns that class (but doesn’t import anything else!).

import importlib, CLASS_MAP

def import_class(module: str, name: str):
    """Dynamically import class from a module in the CLASS_MAP. This is used
    to manage dependencies within nboost. For example, you don't necessarily
    want to import pytorch models everytime you boot up tensorflow..."""
    if name not in CLASS_MAP[module]:
        raise ImportError('Cannot locate %s with name "%s"' % (module, name))

    file = 'nboost.%s.%s' % (module, CLASS_MAP[module][name])
    return getattr(importlib.import_module(file), name)

And that’s it! In NBoost, this model class gets handed to the proxy so that the proxy can initialize the model. Good luck!

Written on November 23, 2019