See how Wallaroo.AI helps unlock AI at scale for Retail >

Monitoring ML models for drift through Model Insights in Wallaroo

Monitoring ML models for drift through Model Insights in Wallaroo | Wallaroo.AI blog

Productionizing machine learning models involves a significant investment of time, effort, and resources for a business. Once the model is in production the work is not done. In order to provide value to the business through accurate predictions it is important to maintain the accuracy of the model that is in production for value to be realized.

The conditions that existed when the model was created, trained and tested can change over time due to various factors, whether they are controlled or uncontrolled. These factors could be unexpected external market changes such as interest rates in the real estate industry, sensor issues in the case of IoT deployments, seasonality either through consumer spending habits in the retail industry or environmental seasonality for temperature, humidity etc.

In the ML space this change in the model data is known as model drift and leads to degradation of model accuracy and performance over time. Okay great, we spend all this time preparing our models and deploy them to production only to find that we must watch over them to ensure they continue to be accurate and provide value to the business. How do we monitor our models for this drift?

Introducing Wallaroo’s Model Monitoring with Assays

In Wallaroo you can monitor your models for drift and accuracy through the model monitoring and insight capability called Assays. Assays help you track how the environment that your model operates within changes in ways that affect the model’s outcome. It does this by tracking a model’s predictions or the data coming into the model against an established baseline. Changes in the distribution of this data can be an indication of model drift, or of a change in the environment that the model trained in. This can provide tips on whether a model needs to be retrained or the environment data needs to be analyzed for a change in state. Having this information allows you to intervene (aka retrain a model) in an efficient and timely manner.

One way to identify if there is drift is to compare model predictions with actual outcomes. This may not always be possible or have too much of a lag. For example, You have a trained model to detect fraudulent transactions; once this model rejects a transaction, there’s no way to tell if that transaction was actually fraud or if the model is drifting. If your model is rejecting too many good transactions, you might not find out until the complaints start rolling in, which is far too late.

You may be wondering: why should we do this in the production platform? Why can’t this be done offline by a Data Scientist? Doing this task in production brings less friction between when things happen and when you are informed about it. You can monitor performance in real time rather than having to spend resources to take the model offline out of production. This is especially helpful when there are a lot of models and they are all running in production. Instead of a data scientist manually logging in to run a benchmark, the Assay will automatically show the drift, alerting the data scientist that they need to take a look. From there they can check if there are changes in parameters or if the Model needs to be retrained.

Advanced Model Observability Features in Wallaroo

Now we will take a quick look at how model observability works in the Wallaroo platform through the below examples from the Model Insights Tutorial but first if you would like to watch the video on this topic we have a 3 part series on this topic at this link Model Insights.

Alright, back to the tutorial. In Wallaroo the Data Scientist can perform model observability through the SDK, UI, or API. The first step is to establish a baseline from the inferences against which the data drift can be compared. For example, suppose we have a model that predicts house prices in a certain market. We want to establish the model’s typical behavior, so we will observe the distribution of predictions that the model makes over a specific period of time (say, 24 hours). In the graph below, we can see that this distribution ranges from around $126-$130k over our period of observation. We’ll use this “typical” distribution of predictions as a baseline to compare against future model behavior.

ML model predictions distributions graph

Before you build the Assay you have an opportunity to define and preview it in the tool. In the example above we started with Jan 1-2 to establish an acceptable baseline.

Comparing the baseline to future distributions involves defining a binning scheme using the baseline data, then binning future data according to that baseline binning scheme, and calculating how the resulting distribution compares to the baseline distribution.

The default binning scheme is quantile based, with five bins. This means the bins are defined so that each bin has an equal amount of data: 20%. You have the flexibility in the tool to edit the number of bins to align with your needs. We can see from the January 6th image below that there is no drift and in the following image for January 21st that drift is has occurred.

January 6th bin distribution showing no drift
January 21st bin distribution showing drift

Next we are going to extend the view for a week after the baseline period. From the resulting output we can see that the distribution of predictions changes from day to day.

Assay one week model distribution view

Each dot represents how the daily distribution of predictions differs from the baseline.

The metric we calculate here is the Population Stability Index (PSI). This is a data science model monitoring metric that helps measure how a distribution changes over time; as a rule of thumb, a PSI measurement over 0.1 indicates an appreciable change of the distribution, compared to the baseline distribution.

You can set the length of the observation window (how long observations are taken to define the distribution), as well as the frequency of measurement (how often you compare the current distribution to the baseline) to suit your needs. Once this is done you can preview the Assay and then hit Build. In the example below we can see that there is drift in the latter part of the month which we can investigate and take action without taking the model out of production.

Assay one week model distribution view

If you would like to learn more about model monitoring in production and Assays you can watch this short video series and also practice yourself using the Model Insights Tutorial in the Free Community Edition at the links below.

  1. Free Community Edition
  2. Model Insights Video Series
  3. Model Insights Tutorial

Table of Contents



Related Blog Posts

Get Your AI Models Into Production, Fast.

Unblock your AI team with the easiest, fastest, and most flexible way to deploy AI without complexity or compromise. 

Keep up to date with the latest ML production news sign up for the Wallaroo.AI newsletter

Platform Learn how our unified platform enables ML deployment, serving, observability and optimization
Technology Get a deeper dive into the unique technology behind our ML production platform
Solutions See how our unified ML platform supports any model for any use case
Computer Vision (AI) Run even complex models in constrained environments, with hundreds or thousands of endpoints