Redefining MLOps as a value-driven approach to achieving the last mile of ML

June 8, 2022

As companies move toward data-driven decision making, and with the emergence of Big Data, Machine Learning (ML), and Artificial Intelligence (AI), Machine Learning Operations (MLOps) have been put in place to help make data insights actionable in the real world. In other words, MLOps’s mission is to deliver AI that creates value for the business by operationalizing machine learning. 

This goal implies that teams of data engineers, data scientists, and ML engineers would collaborate, using appropriate tools and processes to automate and integrate the insights from Machine learning within core business systems and operations to drive strategic outcomes. For example, to drive more growth and sales, retailers use historical consumer behavior insights to develop machine learning that is integrated into their ecommerce platform to help tailor shopping experiences to their target consumer segments. Similarly, manufacturers are relying more and more on machine learning to optimize their plant operations. A common use case is predictive maintenance, where ML is used to predict when a piece of equipment will need to be replaced or repaired, reducing downtime to manufacturing activities.  

When we look at how MLOps is managed today as a discipline, data scientists are tasked with developing and training models leveraging a variety of technologies and frameworks, while ML engineers are tasked with taking the models created by the data scientists and making them operational and actionable in “production”. This comes from the assumption that there is a handoff between data science and engineering when it comes to “productionizing” models. 

However, while many data scientists may be happy to hand their models over to the ML Engineering team for a production rollout, this process, as currently implemented, tends to create technological and cultural silos that are hard to break down as teams and complexity grow, in addition to unclear ownership of the end product. 

How is MLOps applied today? Data load & Prep > Model training > Model deployment > Model monitoring

As a result, when the models start misbehaving in production, it can often be a significant effort to diagnose the issue:

  • Is it an error in the production stack that caused the model’s performance to drop
  • Or is something wrong with the data or the model itself? 

This can lead to communication and coordination bottlenecks as data scientists often do not have access to the production environment nor are the troubleshooting tools at their disposal easy to use for a data scientist.

With that in mind, it is important to take a step back and revisit the real goal of MLOps for any organization, which goes beyond the mechanics and tactics of “productionizing” ML models. By focusing on measuring the impact of ML on business outcomes through MLOps, we transition from a task-driven approach to a value-driven approach, which ensures delivering the last mile of ML to the business and realizing the ROI from ML. 

The last mile of ML is focused on closing the gap between insight creation (aka the trained model) and the value or ROI it creates. In other words, the last Mile of ML facilitates transitioning from experimental AI to industrialized AI.

insight machine learning creation & value realization

Today, with the exception of a few companies, most MLOps efforts are focused on the Insight creation part, which explains the variety of solutions and technologies we find today that help with data prep, model training, and model deployment to a certain extent. When it comes to value realization from ML, which revolves around Making AI-enabled decisions and easily measuring the impact of AI to take the necessary actions, there isn’t as much concentration. This could be attributed to the fact that there is an existing perception that practices currently applied for standard software deployment or DevOps can be repurposed for Machine learning. 

This perception today presents all sorts of challenges such as:

  • Scale: How do you scale your current tools/infrastructure to deploy and serve thousands of complex models across different data modalities?
  • Operational efficiency: As you scale your models and data, does your MLOps team need to grow exponentially too?
  • Repeatability: How do you find consistent processes and methods that allow you to get value out of your ML at scale? 
  • Actionability: As you scale your Machine Learning operations, do you have the necessary visibility to be able to take corrective and preventative measures in a timely fashion?

At Wallaroo, our mission is to help our customers realize the value of their investments in AI and Machine Learning. This is why we are constantly investing in building a technology platform that streamlines the last mile of the ML journey. Because we know what it takes to industrialize ML, we have come up with a purpose-built platform that allows MLOps teams to deploy, manage, observe and optimize their ML models in production at scale, in a repeatable manner, and with optimal efficiency. 

wallaroo machine learning tech stack

With that in mind, we are not looking to dictate a new way of doing machine learning end-to-end, but rather integrate seamlessly with the tools and ecosystems our customers are using to get their ML over the finish line and be able to scale it efficiently to drive meaningful ROI for their business.  

 The Wallaroo platform is 3 integrated components:

3 components of wallaroo machine learning platform - self service toolkit, blazingly fast compute engine, advanced observability
  • Self-service Toolkit: This is the component that enables data scientists to upload, deploy and manage their ML models in the Wallaroo platform. That comes with an SDK, UI, and API to allow a seamless transition from training to production.
  • Distributed computing engine: Wallaroo’s performance comes from its purpose-built inference engine that supports distributed computing. The Wallaroo engine can analyze up to 100K events per second on a single server vs. the industry average of 5K events per second. On average, our customers see complex deep learning models run 5X – 12.5X faster using 80% less infrastructure compared to their previous deployments.
  • Observability: One of the key requirements to being successful in achieving last Mile ML is the ability to measure the impact of ML and be able to take the appropriate actions. Wallaroo’s advanced observability allows data scientists and ML engineers to get the necessary model monitoring and explainability insights to know exactly what’s happening and be able to address it in a timely fashion. 

 What does the last mile of ML look like in Wallaroo? 

Because we have envisioned a platform where data scientists and ML engineers collaborate effectively to launch their models faster in production, scale deployment, management, and observability of ML to deliver actionable AI, we have focused the product experience in Wallaroo on 4 key elements: Model deployment, Model management, Model observability, and Model optimization 

last mile of ML in wallaroo - model deployment > model management > model observability > model optimization

Model deployment is centered around the following capabilities:

  • Model upload: this consists of using the Wallaroo self-service toolkit to convert models from any framework into an open format that Wallaroo uses to run the models. This can be done via the UI, SDK, and API depending on where the model artifacts are being managed after training. 
  • Deployment pipelines: This is a core concept in Wallaroo and where the models run to produce single or batch inferences. For example, if you have chained models producing a single output, the Wallaroo pipelines simplify the deployment and integration of these models with the systems that will be consuming their outputs using a single cohesive interface and no overhead. 
  • Autoscaling: Each deployment pipeline can scale up or down its resource utilization to ensure optimal performance and usability downstream both for downstream systems consuming model outputs and data scientists and ML engineers looking to measure the performance of their deployments (more details here). 
Basic Wallaroo machine learning model pipeline

Model Deployment in Wallaroo

As part of Model Management in Wallaroo, we have introduced the concept of workspaces to help data scientists and ML engineers collaborate effectively on deployments, which is a supercharged version of the typical model registry offered by other deployment solutions. It also helps MLOps teams as they scale their deployments by adding more context and security around their model rollouts. For example, this concept is used by our customers to organize their rollouts by market, region, location, or even by team depending on the use case. As part of managing their models in production at scale, data scientists can try different rollout strategies such as A/B testing, Canary deploys, Shadow deploys, and Blue/Green strategies to identify the best models to run. Once that happens, replacing an existing model with a model that has better performance can take place in seconds with a hot-swap and without any interruption.

Model management in Wallaroo

When it comes to Model Observability in Wallaroo, it is important to make sure that deployment pipelines are healthy and performing as planned with the appropriate throughput and latency requirements that have been originally defined. Having access to these metrics provides insights into any potential model or infrastructure actions to take (more details here). 

At the same time, data scientists can configure a set of validation checks and alerts using Wallaroo assays, to be able to monitor model and data drift. Additionally, data scientists can generate local and global explainability reports against their deployed models to get an understanding of attributes that contributed to a certain prediction or set of predictions. 

Model observability in Wallaroo

As part of our Model Optimization experience, we are investing in delivering actionable insights to data scientists to be able to reactively and proactively tune up a given model. Additionally, we will be investing in automating retraining and redeployments of models either partially or fully depending on the use case. 

Model Optimization in Wallaroo

In conclusion, the new way of MLOps is grounded in the understanding that an ML model is not like a software application release – there is no deploying and then moving on to the next sprint. It’s a common misunderstanding that a model will keep working properly forever after deployment. In the age of big data, more rapid iterations are required as the information will inevitably keep changing and a model deployed in production by itself won’t be able to adapt to these changes. For ML models to stay relevant, ML engineers and data scientists need to work together to select the best rollout strategy, set performance boundaries, monitor & troubleshoot the ongoing performance of the models in production, and then work to continuously optimize their models as the data or the environment changes. There are powerful platforms available, designed to allow interactions with production teams, data scientists, and ML engineers. Even for the smaller teams struggling to maintain their models quickly. Wallaroo Community Edition allows your MLOps engineers to collaborate and data scientists to have more hands-on experience with the process.