Deploying multiple pre-trained model (tar.gz files) on Sagemaker in a single endpoint

Question

We have followed the following steps:

Trained 5 TensorFlow models in local machine using 5 different training sets.
Saved those in .h5 format.
Converted those into tar.gz (Model1.tar.gz,...Model5.tar.gz) and uploaded it in the S3 bucket.
Successfully deployed a single model in an endpoint using the following code:

from sagemaker.tensorflow import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = tarS3Path + 'model{}.tar.gz'.format(1),
                                  role = role, framework_version='1.13',
                                  sagemaker_session = sagemaker_session)
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')
predictor.predict(data.values[:,0:])

The output was:
{'predictions': [[153.55], [79.8196], [45.2843]]}
Now the problem is that we cannot use 5 different deploy statements and create 5 different endpoints for 5 models. For this we followed two approaches:
i) Used MultiDataModal of Sagemaker
from sagemaker.multidatamodel import MultiDataModel
sagemaker_model1 = MultiDataModel(name = "laneMultiModels", model_data_prefix = tarS3Path,
                                 model=sagemaker_model, #This is the same sagemaker_model which is trained above
                                  #role = role, #framework_version='1.13',
                                  sagemaker_session = sagemaker_session)
predictor = sagemaker_model1.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')
predictor.predict(data.values[:,0:], target_model='model{}.tar.gz'.format(1))

Here we got an error at deploy stage which is as follows:
An error occurred (ValidationException) when calling the CreateModel operation: Your Ecr Image 763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-inference:1.13-cpu does not contain required com.amazonaws.sagemaker.capabilities.multi-models=true Docker label(s).
ii) Created endpoint manually
import boto3
import botocore
import sagemaker
sm_client = boto3.client('sagemaker')
image = sagemaker.image_uris.retrieve('knn','us-east-2')
container = {
    "Image": image,
    "ModelDataUrl": tarS3Path,
    "Mode": "MultiModel"
}
# Note if I replace "knn" by tensorflow it gives an error at this stage itself
response = sm_client.create_model(
              ModelName        = 'multiple-tar-models',
              ExecutionRoleArn = role,
              Containers       = [container])
response = sm_client.create_endpoint_config(
    EndpointConfigName = 'multiple-tar-models-endpointconfig',
    ProductionVariants=[{
        'InstanceType':        'ml.t2.medium',
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName':            'multiple-tar-models',
        'VariantName':          'AllTraffic'}])
response = sm_client.create_endpoint(
              EndpointName       = 'tarmodels-endpoint',
              EndpointConfigName = 'multiple-tar-models-endpointconfig')

Endpoint couldn't be created in this approach as well.

Kevin Yauris · Accepted Answer

I also have been looking for answers regarding this before, and after several days of trying with my friend, we manage to do it. I attach some code snippet that we use, you may modify it according to your use case
image = '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.2.0-cpu'
container = { 
    'Image': image,
    'ModelDataUrl': model_data_location,
    'Mode': 'MultiModel'
}

sagemaker_client = boto3.client('sagemaker')

# Create Model
response = sagemaker_client.create_model(
              ModelName = model_name,
              ExecutionRoleArn = role,
              Containers = [container])

# Create Endpoint Configuration
response = sagemaker_client.create_endpoint_config(
    EndpointConfigName = endpoint_configuration_name,
    ProductionVariants=[{
        'InstanceType': 'ml.t2.medium',
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

# Create Endpoint
response = sagemaker_client.create_endpoint(
              EndpointName = endpoint_name,
              EndpointConfigName = endpoint_configuration_name)

# Invoke Endpoint
sagemaker_runtime_client = boto3.client('sagemaker-runtime')

content_type = "application/json" # The MIME type of the input data in the request body.
accept = "application/json" # The desired MIME type of the inference in the response.
payload = json.dumps({"instances": [1.0, 2.0, 5.0]}) # Payload for inference.
target_model = 'model1.tar.gz'

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name, 
    ContentType=content_type,
    Accept=accept,
    Body=payload,
    TargetModel=target_model,
)

response

also, make sure your model tar.gz files have this structure
└── model1.tar.gz
     └── <version number>
         ├── saved_model.pb
         └── variables
            └── ...

more info regarding this

Deploying multiple pre-trained model (tar.gz files) on Sagemaker in a single endpoint

One Answer

Add your own answers!

Ask a Question