From Training to Real-Time Inference: How to Solve Computer Vision Challenges in Healthcare Learn More >

Deploying Models in a Simulated Edge Environment

Deploying Models in a Simulated Edge Environment | Wallaroo.AI

Edge computing is growing in popularity and capability to bring new ML and AI business opportunities across industries of all types. But what is “the edge”? Machine Learning at the edge is a concept that brings the power of running ML models locally close to the source of the data, to minimize latency and network transport requirements.

However, ML at the Edge runs into the same operational challenges as traditional cloud deployments. These challenges such as compute and operational efficiencies, scale, flexibility across different workloads and actionability through getting ahead of issues, tightening the feedback loop and taking preventive and corrective measures in a timely manner, are all blockers to ML value realization. 

The Wallaroo Edge stack helps overcome these last mile issues by enabling deployment on-device, to local servers, and in cloud environments using the same engine and providing the same advanced observability capabilities.

Deploying and managing ML models to edge devices

When it comes to deploying and managing ML models to edge devices Wallaroo provides two key capabilities:

  1. Since the same engine is used in both environments, the model behavior can often be simulated accurately using Wallaroo in a data center for testing prior to deployment. This notebook demonstrates how.
  2. Wallaroo makes edge deployments “observable” so the same tools used to monitor model performance can be used in both kinds of deployments.

Testing an edge deployment tutorial

In the tutorial below we will step through testing an edge deployment in the same manner as a non edge configuration. The primary difference is instead of providing ample resources to a pipeline to allow high-throughput operation we will specify a resource budget matching what is expected in the final deployment. Then we can apply the expected load to the model and observe how it behaves given the available resources. 

You can try this tutorial out for yourself by downloading the free Wallaroo Community Edition and going through the ML Edge Simulation tutorial and follow along with the tutorial video.

We will be using an open source model that uses an Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution. This could be deployed on a network router to detect suspicious domains in real-time. Of course, it is important to monitor the behavior of the model across all of the deployments so we can see if the detect rate starts to drift over time.

For our example, we will perform the following:

  • Create a workspace for our work.
  • Upload the Aloha model.
  • Define a resource budget for our inference pipeline.
  • Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
  • Run a sample inference through our pipeline by loading a file
  • Run a batch inference through our pipeline’s URL and store the results in a file and find that the original memory allocation is too small.
  • Redeploy the pipeline with a larger memory budget and attempt sending the same batch of requests through again.

All sample data and models are available through the Wallaroo Quick Start Guide Samples repository.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Note that this example is not intended for production use and is meant as an example of running Wallaroo in a restrained environment. The environment is based on the Wallaroo AWS EC2 Setup guide.

Full details on how to configure a deployment through the SDK, see the Wallaroo SDK guides.

Operating in a Simulated Edge Environment

Step 1: Connect to Wallaroo

Begin by connecting to Wallaroo via Wallaroo client using the wallaroo.Client() command as shown below. You will be able to access the integrated Python library through the Jupyter Hub interface in your Wallaroo environment.

import wallaroo
from wallaroo.object import EntityNotFoundError
wl = wallaroo.Client()

On entering the command, a URL will be generated, providing SDK permission to your Wallaroo environment as appears below: 

Please log into the following URL in a web browser:
Login successful!

Copy the URL to your browser to confirm permissions, then save the connection as a variable for future reference. 

Step 2: Specify Variables

Next, specify the variables in the command below to create a workspace and a pipeline as well as upload your models. 

pipeline_name = 'edgepipelineexample'
workspace_name = 'edgeworkspaceexample'
model_name = 'alohamodel'
model_file_name = './'
 def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace
def get_pipeline(name):
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Step 3: Create and Set the Default Workspace

Run the following command to create or connect to a workspace, setting it as your current workspace: 

workspace = get_workspace(workspace_name)

You will receive the following response: 

{'name': 'edgeworkspaceexample', 'id': 2, 'archived': False, 'created_by': 'ac217b38-6f50-46fd-9c04-f790ffc5cb0e', 'created_at': '2022-10-13T17:10:35.150766+00:00', 'models': [], 'pipelines': []}

Step 4: Upload Models

The following command will upload your protobuf Aloha model from a .ZIP file and set it up to employ data in the tensorflow format: 

model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

Step 5: Allocate Resources

Proceed to use the DeploymentConfig object to allot resources to a model pipeline. The following command allocates a low budget of 1 CPU and 150 MB RAM, which can then be expanded after testing:

deployment_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(1).memory("150Mi").build()

Step 6: Deploy the Model

You can now create a deployment for your model, specifying that the model is in tensorflow format, and stating the deployment name and configuration. The command below will create a pipeline called edgepipeline that will be deployed to ingest and pass data to the Aloha model, generating results in 45 seconds: 

pipeline = wl.build_pipeline(pipeline_name)

In case of an error citing the lack of enough resources, run the following command to undeploy all running pipelines and regain resources, then redeploy this pipeline: 

for p in wl.list_pipelines(): p.undeploy()

However, if the process is automatically successful, the results will appear as follows:

ML model deployment

To confirm that your pipeline is active and to list the models connected to it, enter the following prompt:


The following response will be generated: 

machine learning model pipeline status


Step7: Infer 1 Row

With the pipeline deployed and the model in place, you can conduct a smoke test to confirm that it is functioning correctly. The following infer_from_file command will feed the inference engine with a single encoded URL and generate results indicating whether the URL is authentic (0) or fraudulent (1):


This test data should yield a result close to 0. The response will appear as below: 

machine learning model inferences

Then enter the following command:

!curl -X POST http://engine-lb.edgepipelineexample-1:29502/pipelines/edgepipelineexample -H "Content-Type:application/json" --data @data-1k.json > curl_response.txt

Which should produce the following:

Step 8: Redeploy with a Larger Budget

Although one inference passed, the inference of a larger batch failed, as indicated by the error message in the curl_response.txt file: 

  • “upstream connect error or disconnect/reset before headers. reset reason: connection termination”

Therefore, enter the DeploymentConfig prompt below to allocate 300MB of memory instead of the initial 150MB: 

deployment_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(1).memory("300Mi").build()
pipeline = wl.build_pipeline(pipeline_name)

The response will look like this: 

Step 9: Re-run the inference

If you re-run the curl command, the curl_response.txt file will now generate the appropriate results as follows: 

!curl -X POST http://engine-lb.edgepipelineexample-1:29502/pipelines/edgepipelineexample -H "Content-Type:application/json" --data @data-1k.json > curl_response.txt
re-run the inference

Note: Memory expansion was required only because a batch of 1,000 inferences had to be run simultaneously. However, for a design use case with lower loading patterns, Wallaroo will allow you to test for lower memory budgets that provide for adequate operational buffer without over-allocation of resources. 

Step 10: Undeploy Pipeline 

Finish up by entering the following command to undeploy your pipeline and restore Kubernetes resources to other tasks: 


Failure to change the deployment variable will cause aloha_pipeline.deploy() to restart the inference engine with the previous configuration. 

The following response will be generated: 

You can effectively deploy models in a simulated edge environment by following the above steps. For more information or assistance, please visit the Wallaroo documentation site.

Table of Contents



Related Blog Posts

Get Your AI Models Into Production, Fast.

Unblock your AI team with the easiest, fastest, and most flexible way to deploy AI without complexity or compromise. 

Keep up to date with the latest ML production news sign up for the Wallaroo.AI newsletter

Platform Learn how our unified platform enables ML deployment, serving, observability and optimization
Technology Get a deeper dive into the unique technology behind our ML production platform
Solutions See how our unified ML platform supports any model for any use case
Computer Vision (AI) Run even complex models in constrained environments, with hundreds or thousands of endpoints