Efficient ML Model Deployment Anywhere with One Line of Python

January 20, 2022

Wallaroo’s Big Idea for efficient ML model Deployment

Data scientists and ML engineers need to deploy ML models to production, and this is currently much more challenging and manual than it should be, sometimes taking days or even months. At Wallaroo, we’re focused on simplifying and automating that task so models can be deployed at scale in a few seconds.

We had a big idea: what if we could give data scientists a “big green button” for production AI? 

Our first version of this “big green button” for data scientists is one single, simple line of Python that data scientists can use to deploy ML models into their cloud, on-premises, or edge environment running the Wallaroo platform. Essentially, one-click ML model deployment.

Deploy a straightforward linear regression model to a single CPU or a complex neural network model to run on hundreds of CPUs in a cluster using the same Python method. And once deployed, actual inference performance is blazingly fast, you can get a complete audit trail, and you can update and redeploy a model with no downtime.

Now there are other solutions that provide a Python SDK for deploying models, for example, AWS SageMaker. The difference with Wallaroo is that:

  1. We don’t care where the models were trained
  2. You can use the same SDK to deploy to a variety of environments
  3. You get far better production inference performance than any other solution 
  4. Wallaroo is hyper-focused on being as simple and powerful as possible 

You can read more about our high-performance compute engine here, and our vision fitting within diverse ecosystems here.

This post will introduce the essential parts of our SDK that allow a data scientist to deploy a model, make some inference, and get a complete audit log back simply and rapidly. We’ll also preview some of the more powerful features – e.g., creating, deploying, and analyzing an experiment.

Wallaroo Core Concepts 

Before introducing the Wallaroo Python SDK, it’s helpful to clarify some core terms:

Model – A Model is a binary artifact that results from training a machine learning algorithm. A Model can be run to generate Inferences.

Inference (aka “prediction”) – This is a Model’s output that is returned to external business systems after being fed some input. For example, A Credit Card Fraud model will accept the details of a transaction as input and return a floating-point number between 0.0 and 1.0 denoting the probability of fraud as its inference. 

Pipeline – A pipeline is a linear chain of one or more models, where the output of one model becomes an input for the next. The output of the last model is the inference.  (Arbitrary Python code can also be included as a Pipeline step, to do things like data reformatting or validation.) The Pipeline can be turned on (“live”) and thus made ready to do inferencing or remain dormant and consume no resources when not needed, ready to be turned on later.

Experiment – A specific type of pipeline that allows for the comparison of multiple models, enabling A/B testing and experiments. For example, a Champion/Challenger experiment might route 90% of inference requests to the Champion and 10% of requests to a set of experimental Challenger models. More complex setups such as multi-armed bandits are also possible.

Operational Metrics – Wallaroo provides real-time monitoring of basic operational model and pipeline properties such as throughput and latency. 

Model Insights – Higher-order semantic metrics that describe the business accuracy of your models, such as data drift and anomaly detection – statistical measurements of whether input and output respectively are deviating from historical baselines.

Audit Log – Wallaroo logs all inference results, along with the input data that led to the inference. Metadata in the log notes which model, model version, user ID, and experiment path were used. This information can be exported to existing business intelligence (BI) systems for further analysis. 

Wallaroo Engine – A unit of compute power (e.g., 4 CPUs) running the Wallaroo software. There can be multiple engines running concurrently on a VM or hardware server. 

Wallaroo Instance – One or more engines, plus the support systems necessary to do inferencing. An Instance contains models, pipelines, deployments, and the resulting audit and metrics logs. 

Here’s a conceptual diagram of a Wallaroo instance that runs in a customer environment:

Wallaroo Python SDK Quickstart

A data scientist can get a handle to Wallaroo, deploy a model, run some inferencing, and get full audit logs back with some basic Python commands. 

The SDK also allows you to receive audit logs back and work directly with models and deployments within the Wallaroo instance.

Wallaroo Python SDK Advanced Usage Example 

Of course, once you can easily deploy models, you also naturally want to start doing additional tasks, such as validating the models on some test data, chaining multiple models together into a pipeline, or creating and running A/B tests. So, we decided to extend that “big green button” to a full Python SDK. That means data scientists can deploy complex pipelines, perform experiments, and analyze results – all using just a few powerful lines of Python (here we give just a small taste of the power and flexibility of this approach). 

Example: Credit Card Fraud Model

We’re going to create an experiment involving a credit card fraud model. 

Credit card fraud is expected in this modern age of online shopping. Machine learning is quickly becoming a fundamental approach to prevent fraud and improve e-commerce.

The input data for this model uses an actual data set, and the input data’s meaning is obscured. This is perfectly fine for our machine learning algorithm since the model is trying to see what values are signals for fraud; in actuality, you would expect to be using past purchase amounts, credit scores, or zip codes.

The model output is a score that attempts to predict the likelihood that this given transaction is fraudulent.

Before we do that, we need to be able to upload models to Wallaroo and create ML model deployments separately. That’s because our experiment is going to involve 3 model versions deployed in one pipeline (there is a configure step here that we will explain in a further tutorial). Let’s say we’ve already trained these three versions.

Wallaroo supports a few different types of experiments. Here, we will give an example of a “Value Split” experiment. Specifically, based on a card_type field in a transaction, we will use one of the different models to run inference. 

Once we have the model deployed, it can inference incoming transactions. We’re not going to show a lot of detail here, that’s for another posting. Let’s get the logs for the last hour.

We can turn this into a Pandas dataframe and run all sorts of interesting analytics.

Learn More

In this post, we demonstrated how to use Wallaroo’s Python SDK to deploy a machine learning model quickly and easily. We’re incredibly excited about our Python SDK, and we’re improving and extending it quickly as we get feedback from users.

If you want to explore this further with us, such as access to the full SDK, giving us specific feedback about functionality/semantics/integration, or discussing how it can help with your use case, email us at deployML@wallaroo.ai

Read our docs