From Training to Real-Time Inference: How to Solve Computer Vision Challenges in Healthcare Learn More >

How We’re Helping ML Forecasting Clients Tackle the Challenges of Scaling to Thousands of Forecasts

Helping ML Forecasting Clients Scale to Thousands of Forecasts | Wallaroo.AI Blog
Helping ML Forecasting Clients Scale to Thousands of Forecasts | Wallaroo.AI Blog

Forecasting is one of the most common and important applications of machine learning. It’s used in many industries, including manufacturing, pharmaceutical, life sciences, advertising, retail, consumer products, and more.

The goal of forecasting is usually to align production and demand, to maximize profit and improve efficiencies, or sometimes just to predict what kind of ROI a particular strategy (e.g. an advertising campaign) might yield. But beyond these tactical benefits, forecasting helps enterprises navigate market volatility, fickle customer preferences, and the pressure to deliver consistent and increasing enterprise business value.

But achieving an industrial-grade ability to accurately and repeatedly forecast various important business indicators is much easier said than done.

You have the actual data cleansing and ML modeling part of the process, which is a major challenge for some firms. Then you have to put these models into production, something that only occurs in 54% of all AI initiatives, according to Gartner1. Once that is done, you often have to create new processes, sort out new technology, or otherwise adapt your process or environment to scale to more situations (more sites, more locales). And, throughout all these steps, you have to ensure the model is as accurate as possible, despite whatever ad hoc processes and technology you have created along the way.

Just getting a forecast into production is incredibly complex, as shown by the diagram below. Needless to say, this complexity is magnified when scaling to more scenarios – which often causes long feedback loops and otherwise impairs the ability to move quickly and respond to market or organizational changes.

Typical Steps to Run a ML Workload Combining Data and Inferencing

What Makes ML Forecasting Even Harder

What makes ML forecasting even more difficult is that a forecasting report involves multiple steps on a specific data set (e.g. a specific manufacturing site, a specific country, a specific set of customer demographics, etc.). These steps involve data ingestion, data treatment/processing, model inferencing, forecast logging – for each and every single report. 

Because these steps cross multiple functional domains in an enterprise (data engineering, data science, ML engineering, business intelligence and ops) it’s a very high-touch effort involving different teams in the organization and sometimes using eternal tools (like Airflow or DBT) on top of or within the enterprise’s existing data pipelines

For organizations that depend on 2-3 forecasts per week, this level of complexity is usually managed without too much trouble. However, some forecasting teams have to scale to 100 or even 100,000 a week to meet the enterprise’s business objectives. But our forecasting customers who have used this high-touch approach in the past tell us the ad hoc method does not scale to these levels. 

On top of that, because our forecasting models run over many different data sets, this requires more bandwidth from the AI team than is often available. It is also difficult to manually monitor these processes and model performance over time.

Why Workload Orchestration is So Valuable in ML Forecasting 

One major technique that removes many of these production and scaling problems is workload orchestration. Workload orchestration provides a way for the data scientist to express the different steps in the workflow themselves within the production environment, in a way that will generalize beyond the original situation.

There are some great strides being made in the ML production space today. At Wallaroo.AI, in response to customer needs, we’ve developed several unique, purpose-built capabilities to make it easy to not only deploy and scale forecasting workloads but help monitor and maintain forecast accuracy via integrated workload orchestration.

The recently announced workload orchestration features of the Wallaroo Enterprise Edition ML production software automates the orchestration of all the steps in the process as well as the underlying resources so your team can get results when they need them, scale to far more forecasts, and increase productivity.

This capability provides an easy way for the data scientist to define and automate the different steps in the workflow themselves within the production environment in a way that will generalize beyond the initial model deployment. Then, once that workload is defined, it can be easily run once, scheduled for a recurring run, or quickly copied and modified for a new workflow with the Wallroo.AI platform.

In terms of productivity, our customers find that these new features significantly reduce (up to 40%) the amount of time spent on building and maintaining manual steps in a forecasting workflow. Recovering this much time allows AI teams to be more agile and scale much more easily. It also provides critical insights that can help avoid costly business impact from bad forecasts as well as spend more time optimizing towards business goals.

Maintaining ML Forecasting Data Quality 

Even before large scale adoption of AI, forecasting was an important part of planning and production. Producing a forecast that’s way off and acting on it can be devastating to an enterprise’s bottom line. Too much product might go to waste, there might be too few sales, resources might be diverted from a high-ROI activity to a low-ROI activity – the nightmare scenarios are endless.

That’s why making sure you scale forecasting without a reduction in data quality is critically important. That’s harder than it sounds, though, as you might have customizations or other unique elements that you need to account for. For example, you might have a model that you feel confident in for a certain factory, but now you make a slightly tweaked version for another factory. How do you gain confidence in the forecasting for the new factory before subjecting the business to it?

In addition, if you repeatedly run a forecast on a regular basis, there are a lot of ways the model can go wrong over time. Often, it has to do with the data set used for the actual inference forecast being materially different from the data set that the model was trained on. Thus, the model would not be effective on the new data set.

How Do I Reduce the Length of the Feedback Loop?

How do you understand if drift is happening so you can quickly fix it?

Obviously, monitoring the input data and the forecasts to see if they are deviating from what you expect is a good way to put some guardrails on what’s happening. But you also need benchmarks. In ordinary situations, you can expect that the inputs to your model, and its predictions, will follow expected patterns. In other words, the values of inputs and predictions will fall into typical, observable distributions that you can use to create a benchmark for your inference. Let’s call these typical distributions ‘baselines’

When things go awry, the values of the inputs to the model and/or the model’s predictions will start to drift from their baselines. You want to catch signs of drift early, so that you can intervene in a timely manner. As we discussed above, data drift can be a sign that the environment has changed from what your model expects – and that’s an indication that it is time to retrain or otherwise improve your model. 

That’s why Wallaroo Enterprise Edition includes a built-in capability to define baselines for a model’s inputs and predictions. It also continuously monitors for drift away from that baseline. That means a data scientist can configure sophisticated statistical monitoring either via the platform UI or the Python SDK to easily define and schedule drift detection algorithms.

They can also graphically view the divergence of new forecast runs versus baseline expectations, import data to do further analysis in their Python notebook or any BI tool, as well as get alerts when deviations above a threshold are detected. This gives AI teams the confidence to focus on other tasks, knowing they will be notified when a forecast needs attention or a model needs retraining. 

If an updated model is needed, you still want to validate that the new model will behave within acceptable bounds against production data. The Wallaroo Enterprise Edition provides a key capability that helps answer these questions: the ability to shadow deploy a model.

With shadow deployment, you can vet a new candidate model by running it in parallel to the original model in a shadow or “quiet” mode. The Wallaroo Enterprise Edition software will log the forecasts from both models, but only use the results from the original model for downstream processes. Once you’ve run both models long enough to be confident that the new model does not introduce any unwanted behavior, it’s easy to switch over to using it for the actual forecast.

How Do You Implement Workload Orchestration?

Wallaroo.AI has created a low-code, low-ops way for implementation, literally with just a few lines of Python code. Some people call it low-code and others call it low-ops. Either way, our goal was to simplify the plumbing portion of the ML forecasting process so AI teams can focus on providing business value.

Even better than avoiding a big engineering lift, the new capabilities of our Enterprise Edition work alongside your existing ML ecosystem so there is no need to rip-and-replace. For example, the enterprise could be using Airflow or DBT or building complex flows with Kafka or working with other discreet tools as part of their AI environment. The Wallaroo Enterprise Edition software suite does 80% of what these stand-alone components do without touching these tools, so there is no need to spend time on re-plumbing your IT environment.

This makes it easier for customers to get started with a new approach, then scale and run either ad-hoc forecast analysis or many recurring forecasts. Clients tell us they can generate 5-10X more reports per week (this is even as they free up 40% of their data scientists’ and engineers’ time to work on more important tasks, as noted earlier).

Interested in learning more? Speak to an expert.

1 –

Table of Contents



Related Blog Posts

Get Your AI Models Into Production, Fast.

Unblock your AI team with the easiest, fastest, and most flexible way to deploy AI without complexity or compromise. 

Keep up to date with the latest ML production news sign up for the Wallaroo.AI newsletter

Platform Learn how our unified platform enables ML deployment, serving, observability and optimization
Technology Get a deeper dive into the unique technology behind our ML production platform
Solutions See how our unified ML platform supports any model for any use case
Computer Vision (AI) Run even complex models in constrained environments, with hundreds or thousands of endpoints