Wallaroo ML Production Performance Benchmarks vs. Google Vertex and AWS SageMaker

January 27, 2022

How Wallaroo.AI delivers better ML Production Performance

Wallaroo facilitates the last mile of your machine learning journey getting ML into your production environment to impact the bottom line with incredible speed and efficiency, boosting your ML production performance metrics.

The Wallaroo ML platform includes an efficient and low-footprint execution engine, a framework for A/B testing, anomaly detection, and a web dashboard. The pieces use a familiar Jupyter notebook/Python SDK interface that data scientists love. 

For most use cases, Wallaroo can deliver faster time-to-market — typically 3X faster — and a much lower infrastructure footprint — typically 80% lower.

Wallaroo is Designed for Business-critical ML Workflows

Identifying the resources required to run machine learning models and execute machine learning projects in many organizations is becoming increasingly difficult. 

Even though ML systems are one of the most data-intensive applications, there is a blind spot in the infrastructure and capabilities required to scale ML systems at a large scale.

Wallaroo is fast and easy to use. Wallaroo provides better per-worker throughput and dramatically lower latencies. This means less hardware and significantly lower ongoing infrastructure costs for a given workload. 

Our operational experience is much simpler than existing tools or homegrown solutions. Wallaroo gives you push-button production deployment right from a notebook. Wallaroo’s high-level Python SDK and lower-level raw APIs provide you with the broadest range of integration options for your model deployment strategy, all from the convenience of your familiar tools and workflows. Additionally, machine learning tools like PyTorch, RoBERTa, TensorFlow, XGBoost, Scikit Learn, and others are integrated within Wallaroo.

Wallaroo Has the Numbers to Prove It: Boosting ML Production Performance Metrics

Wallaroo built a benchmarking proof-of-concept for a large financial company looking for ways to modernize its ML infrastructure. We took the Aloha Model and put it to the test compared to two competitors in the market, AWS SageMaker, and Google Vertex.

The Aloha Model (ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation) is a complex open-source model that attempts to classify a given file as either “malware” or “benign”.  It’s so complex that many now use Aloha for benchmarking and performance testing, to get an understanding of how well their ML systems are executing. 

  • Wallaroo vs. AWS SageMaker
  • 4.5x faster inferences per second
  • Reduction of inference server costs by 85%
  • Projected $21k in annual compute cost savings
  • Wallaroo vs. Google Vertex
    • 13.5x faster inferences per second
    • Reduction of inference server costs by 85%
    • Projected $50k in annual compute cost savings

How is Wallaroo able to do this?

At a high level, Wallaroo’s performance comes from a combination of design choices and constant vigilance. We built the system in the ultrafast Rust language, which provides a high level of robustness and safety while executing at C-like speeds.

If you want to get into the details of Wallaroo’s technology, please refer to our whitepaper An Introduction to Wallaroo Performance.


When it comes to ML production performance, Wallaroo stands out as a front-runner, delivering unmatched speed and efficiency. Wallaroo’s unique approach to your data science deployments will change how you build and deploy systems. Wallaroo provides better per-worker throughput and dramatically lower latencies — less hardware needed at a given workload size. Save your time and money with Wallaroo! We know you’ll be amazed at how simple it is to get started and how fast it works compared to existing solutions.

Reach out to us at deployML@wallaroo.ai to learn more.