Introducing Wallaroo: The Go-to Platform for Production ML
March 21, 2023Wallaroo is the unified platform for easily deploying, observing, and optimizing machine learning in production at scale – in any cloud, on-prem, or at the edge.
“We see AI and ML as critical for helping our clients maximize renter acquisition and retention. But our data scientists were spending more time on managing infrastructure for deployment than actually building models. And once we deployed these models, they had a hard time monitoring their models for drift as market conditions changed at the local level. Wallaroo resolves the deployment, scale, and observability issues so we can focus on solving the business problems with machine learning.”
Director of Data Science at Realpage
How we got here
We began in 2017 as a tight-knit group of engineers dedicated to solving the increasingly common problem of analyzing large amounts of data via computational algorithms efficiently and at scale. By applying our collective expertise in building distributed computing systems in industries such as high-frequency financial trading and adTech, we built a high-performance compute engine like nothing else on the market. While our customers could now efficiently analyze their data and use it to run machine learning (ML) models at scale, they soon pointed out their next biggest challenge: bringing those models online easily, and then understanding and optimizing how the models were performing so they could quickly and sustainably generate business value.
Like most organizations, they were doing everything by the book: data scientists would build ML models to solve a business problem, and engineers would launch them using a patchwork of open-source software and containerization approaches. What they found, however, was that getting each model operationalized was like pulling teeth.
Models often had to be painstakingly re-engineered, the deployment software couldn’t process data fast enough — even when running on an alarming amount of computing resources — and it was unnecessarily difficult to see how models were performing to measure their ongoing accuracy or to optimize them.
We knew there had to be a better way.
Machine Learning’s production problem
Data science is a modern-day superpower, and enterprises around the world know it. Over 90% of Fortune 1000 companies are investing in Big Data, analytics, and artificial intelligence (AI), reaching over $700 billion being poured into teams of data scientists and engineers to revolutionize the way they do business.
Yet machine learning is hard, and the last mile of ML – getting the models into production to impact the bottom line – is especially hard. If businesses can’t do this easily or at scale, their AI initiatives will fail, resulting in significant costs in terms of budget, manpower, and disillusionment. According to Gartner, less than half of AI prototypes make it to production, and in the end, only about 10% generate meaningful ROI.
Deployment solutions — whether containerization, cobbling together various existing technologies, or customizing an analytics workhorse like Apache Spark — are cumbersome, limited in scope, expensive at scale, prone to failure, and unable to run ML models against batch and streaming data.
And even once models are live, they face the challenge of staying accurate and performant even as the world continues changing. Yet observability is not enough. Enterprises need the ability to observe drift, but also the ability to act quickly on it (test new versions and deploy the updated models without disrupting the business).
With investments in AI only trending upwards, companies hoping to turn a profit with their data will never reach their full potential as the long deployment lead times and high cost to run and maintain the necessary infrastructure often outweigh the benefits.
Introducing Wallaroo: The go-to platform for production ML
Wallaroo is the only unified platform for production machine learning. It solves operational challenges for production ML so you stay focused on business outcomes.

Wallaroo gets your ML to deliver business results faster, easier, and with a far lower investment. By streamlining the deploy/run/observe parts of an ML lifecycle and giving data scientists the freedom to use the tools they already know, Wallaroo enables your team to:
- Deploy models in seconds with 1 line of Python
- Analyze data up to 12.5X faster using 80% less infrastructure
- Observe and optimize your ML models in real-time
- Run in any cloud, edge, on-prem, or hybrid environment.
Furthermore, with our deep integrations, you can add industrial-grade production workflows to your existing Databricks or Cloud ML tooling with just 1-2 lines of Python.
MLOps that work for Data Scientists, not vice versa.
Deploy, Observe, Optimize and Scale ML with minimal fuss or engineering. The Wallaroo platform provides three main benefits:
- Self-service ML operations (MLOps) for easy model deployment and management
- Blazing fast inference serving allowing you to run inferences faster using fewer servers
- Continuous optimization through tight integrations within your ecosystem for deployment, observability, insights to monitor the performance of your models in production and optimize them when needed.
1. Self-service MLOps: Fast and easy ML deployments
Wallaroo’s Model Operations Center provides full production ML capabilities (deployment, observability, testing and optimization) through an integrated, centralized hub. This is the component that enables data scientists to deploy their ML models against live data in two clicks of a button—whether it’s to a testing, staging, or production environment. With an intuitive SDK, UI, and API and support for common data workflows, Wallaroo takes care of the details to let data teams focus on the bigger picture.
- Easily deploy, test, and iterate ML models using the frameworks your team already knows (e.g., TensorFlow, PyTorch, Scikit-learn, and XGBoost).
- Run batch jobs or streaming to capture valuable market insights as they happen.
- Refine and immediately redeploy new and improved models without complex re-engineering or operational headaches
- Collaborate easily AND compliantly with our workspaces paradigm
2. Blazing fast inference serving: Lightning-speed inference at lower cost
Here’s where the magic happens. This highly performant, easily-scalable engine can analyze up to 100K events per second on a single server (beating the industry average of 5,000 events per second), making Wallaroo the fastest platform on the market for production ML.
- Run multiple models on a single server to drastically reduce computing costs and maintenance overhead.
- Analyze data at record-breaking speed and react to market changes in real-time for a sharper competitive edge.
- Leverage an ultrafast environment for production model scoring and pre/post-processing, with support for custom data operations.
- Scale down to run at the edge
Typically with customer transformer models, computer vision, complex neural networks, and NLP models we have seen 5X – 12.5X faster analysis using 80% less infrastructure compared to the customer’s previous deployments.
3. Continuous optimization
This is where it all ties together. A simple, easy-to-use interface allows anyone on your team to explore powerful metrics and detailed analytics so they can effectively track, measure, and help improve your ML’s performance, and, most importantly, be able to act on these insights via our integrated deployment and testing capabilities.
- Drill down into computing specifics like throughput, model latency, and benchmark performance for in-depth analysis.
- Validate model inputs to guard against invalid or unexpected data.
- Monitor the behavior of models and their inputs over time to understand when changes in the environment might require a model refresh.
- Use A/B testing along with shadow and staged deployments to make sure you’re always using the highest-performing models.
- Simplify audits with detailed event logs and visibility into everything your compliance and risk management team needs to do their jobs more efficiently.
- Automate the process of retraining models and deploying them via alerts and a comprehensive APIs
- Hotswap obsolete models with new, validated models without interfering with data connectors or downstream business units or systems depending on the model inferences.
Your data. Your tools. Your ecosystem.
Enterprises will often look to all-in-one MLOps platforms such as SageMaker, Databricks, or DataRobot to simplify deployment. However, these platforms force data teams to standardize on proprietary tools, processes, and formats. These tools will then lead to complexity as different business units within the same company might use different data platforms. One of our customers, for example, is all-in-one on a certain cloud, but because of mergers & acquisitions, its data engineering teams are supporting different deployment processes for multiple clouds.

In response, companies will spend countless resources building their platform in-house, cobbling together open-source technologies such as Spark and MLflow, which might work within the current ecosystem but at the expense of performance and model observability.
Seamless integration with data systems
Wallaroo is designed to click into your ecosystem and seamlessly connect with everything around it. We provide a standardized process that ML engineering teams can use to deploy, run and observe models across platforms, clouds, and environments (in the cloud, on-premises, or at the edge).
Our Connector Framework neatly plugs our platform with your incoming and outgoing data points and takes care of the integration to get you up and running in no time.
- Quickly connect with popular data sources and sinks, like Apache Kafka and Amazon S3.
- Plug in custom integrations and even your own in-house solutions.
- Rely on rapid support if you need to integrate something that isn’t available out of the box.

Wallaroo and Databricks: Better together
Data teams working on Azure Databricks appreciate its powerful data engineering and model development capabilities, but often find that the subsequent steps to get models into production and drive business outcomes require substantial additional engineering and processes. But with our newly launched native integration into the Databricks notebook environment, data scientists and ML engineers can now deploy, observe, troubleshoot, and scale production ML in Azure with just a few simple Python commands via the Wallaroo SDK.
Instead of integrating a variety of ML tools specific to deployment, testing, collaboration, and observability, Wallaroo offers a unified production ML platform that brings those combined capabilities into Databricks without forcing data scientists and engineers to give up what they like about Azure Databricks.
Built from the ground up for speed, efficiency, and ease-of-use
Wallaroo was specifically engineered for modern machine learning deployments, unlike Apache Spark, or heavy-weight containers. The core distributed computing engine is written in Rust language, not Java – so it runs at C-speeds and is Python-friendly. Our SDK was designed with data scientists in mind and has incorporated direct feedback from our customers.
You can also rest assured that all your data will remain yours. Everything that goes in and out of Wallaroo is private, secure, and only visible to those with permission to see it.
Inference anywhere but manage centrally
As organizations look to deploy AI-enabled applications in restricted environments – whether it is for performance/latency reasons to drive better experiences, or for compliance reasons to ensure secure deployments and data protection – deploying, observing and optimizing ML models in these environments starts to become challenging and more costly to scale due to increasing infrastructure incompatibility and lack of repeatability and automations in model operations.
Wallaroo’s modular and flexible technology architecture offers the ability to:
- Manage and Package your models within the model operations center for deployment in any target environment (cloud, edge, on-premise) and ensure compliance with security or performance requirements.
- Serve inferences in any target environment with optimal latency and throughput leveraging the Wallaroo inference engine.
- Monitor and observe performance in the model operations across all deployments
As a result, AI teams are able to:
- Efficiently scale their ML workflows by ensuring repeatable processes and tools across use cases with less technical and operational overhead
- Mitigate non-compliance risks
- Get measurable insights and take corrective and preventive actions to maximize the impact of ML on the bottom line
Bring your boldest AI projects online with Wallaroo
Wallaroo is a platform that we always wished we had: one where cutting-edge AI and ML can be deployed in seconds, and AI teams can deliver significant business value quickly at a low cost. We built it so your team can spend less time making your data and ML work with your software and infrastructure, and more time making your data and ML work for your business.
Reach out if your team is looking to start getting models into production, or if you already have a few models in production but are looking to scale.