MLflow plugin for deploying your models from MLflow to Triton Inference Server. Scripts are included for publishing TensorRT, ONNX and FIL models to your MLflow Model Registry.
- MLflow (tested on 2.11.3)
- Python (tested on 3.11)
Before you can use the Triton Docker image you must install Docker. If you plan on using a GPU for inference you must also install the NVIDIA Container Toolkit. DGX users should follow Preparing to use NVIDIA Containers.
Pull the image using the following command.
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
Where <xx.yy> is the version of Triton that you want to pull.
Create a directory on your host machine that will serve as your Triton model repository. This directory will contain the models to be used by Morpheus and will be volume mounted to your Triton Inference Server container.
Example:
mkdir -p /opt/triton_models
The Morpheus reference models can be found in the Morpheus repo. A script is provided to fetch the models using git-lfs due to size.
Before running the MLflow plugin container, you can fetch the models and mount them to the local path on your host (for example, /opt/triton_models
).
git clone https://github.com/nv-morpheus/Morpheus.git morpheus
cd morpheus
scripts/fetch_data.py fetch models
cp -RL models /opt/triton_models
Use the following command to run Triton with our model repository you just created. The NVIDIA Container Toolkit must be installed for Docker to recognize the GPU(s). The --gpus=1 flag indicates that 1 system GPU should be made available to Triton for inferencing.
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /opt/triton_models:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models --model-control-mode=explicit
Build MLflow image from Dockerfile from the root of the Morpheus repo:
cd models/mlflow
docker build -t mlflow-triton-plugin:latest -f docker/Dockerfile .
Create an MLflow container with a volume mounting the Triton model repository:
docker run -it -v /opt/triton_models:/triton_models \
--env TRITON_MODEL_REPO=/triton_models \
--env MLFLOW_TRACKING_URI="http://localhost:5000" \
--gpus '"device=0"' \
--net=host \
--rm \
-d mlflow-triton-plugin:latest
Open Bash shell in container:
docker exec -it <container_name> bash
nohup mlflow server --backend-store-uri sqlite:////tmp/mlflow-db.sqlite --default-artifact-root /mlflow/artifacts --host 0.0.0.0 &
The publish_model_to_mlflow
script is used to publish triton
flavor models to MLflow. A triton
flavor model is a directory containing the model files following the model layout. Below is an example usage:
python publish_model_to_mlflow.py \
--model_name sid-minibert-onnx \
--model_directory /triton_models/triton-model-repo/sid-minibert-onnx \
--flavor triton
The Triton mlflow-triton-plugin
is installed on this container and can be used to deploy your models from MLflow to Triton Inference Server. The following are examples of how the plugin is used with the sid-minibert-onnx
model that we published to MLflow above. For more information about the
mlflow-triton-plugin
, refer to Triton's documentation
To create a deployment use the following command
mlflow deployments create -t triton --flavor triton --name sid-minibert-onnx -m "models:/sid-minibert-onnx/1"
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.create_deployment("sid-minibert-onnx", "models:/sid-minibert-onnx/1", flavor="triton")
mlflow deployments delete -t triton --name sid-minibert-onnx
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.delete_deployment("sid-minibert-onnx")
mlflow deployments update -t triton --flavor triton --name sid-minibert-onnx -m "models:/sid-minibert-onnx/1"
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.update_deployment("sid-minibert-onnx", "models:/sid-minibert-onnx/1", flavor="triton")
mlflow deployments list -t triton
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.list_deployments()
mlflow deployments get -t triton --name sid-minibert-onnx
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.get_deployment("sid-minibert-onnx")