Wallaroo.AI is Partnering with Ampere Computing to Provide Advanced AI Options to Oracle Cloud Infrastructure Users

October 19, 2023

By 2030, AI is expected to contribute a massive $15.7 trillion to the global economy, highlighting its potential as a major force for growth and innovation. As more businesses across different sectors start to use AI technologies, the need for advanced AI deployment options is growing.

However, organizations face tremendous challenges when deploying AI models on a large scale. A particularly big hurdle is managing computing power efficiently to keep costs reasonable. 

A recent survey revealed that 85% of Chief Data Officers (CDOs) are concerned with cloud infrastructure-related challenges. These challenges include the costs of operating the infrastructure, the complexity of deployment, and integrating new systems and tools. Essentially, organizations need the right balance of computing resources, scale, and automation to handle their AI-related tasks cost-effectively.

Reliance on GPUs is a Bottleneck

GPUs are commonly used to train AI models due to their processing capabilities. However, the cost of deploying AI models at a large scale with GPUs, especially for complex tasks like computer vision, deep learning recommendation models (DLRMs), large language models (LLMs) like ChatGPT, or natural language processing, is quickly becoming prohibitive for many organizations. And recent shortages in GPU supplies make it even more challenging to obtain the resources needed, putting many business-critical AI initiatives on hold.

But it’s not just the cost and availability of GPUs that are a bottleneck. The energy consumption and carbon footprint of using these powerful computing systems is also a growing concern. Due to their size and complexity, AI workloads demand a significant amount of energy, particularly when run on GPUs and other dedicated AI accelerators. This not only affects the total cost of ownership (TCO) but also prevents organizations from achieving their environmental, social, and corporate goals.

As a result, organizations are actively seeking more efficient and cost-effective ways to deploy AI models at scale. One promising solution is utilizing lower-power CPUs, which are easier to get and cost less. But the downside is that they process tasks slower, which can significantly extend the model training time.

To tackle these challenges, Ampere Computing and Wallaroo.AI have teamed up to enhance the AI options available on Oracle Cloud Infrastructure (OCI). This collaboration aims to provide more cost-effective and efficient solution for organizations looking to scale their AI operations.

Benchmarks indicate a significant reduction in inference time from 100 ms to 17 ms when using the OCI A1 compute with the optimized Wallaroo.AI and Ampere solution.

How it works

For many businesses, using less power-hungry and more affordable CPUs together with a well-optimized machine learning (ML) platform can be an optimal choice. An example of this is the Oracle Cloud Infrastructure (OCI) A1 compute, powered by Ampere Computing’s Altra Family processors, paired with the Wallaroo.AI Enterprise Edition ML platform. This setup can help cut costs and unblock projects that were stuck waiting for GPUs.

Oracle Cloud Infrastructure (OCI) A1

One of the main advantages of using Oracle Cloud Infrastructure for AI workloads is its high-performance computing capabilities. The OCI A1 instances are specifically designed for memory-intensive workloads, making them an excellent option for AI applications that require large amounts of data processing and memory.

Ampere Computing Altra Family Arm-Based Processors

One approach to tackle demanding computing workloads is by employing legacy x86-based processors. These processors aim to enhance performance by ramping up the speed and sophistication of each CPU, enabling them to manage more demanding computing tasks. However, this complexity elevates the energy demands of the CPU and generates more heat than what traditional heating, ventilating, and air conditioning systems are designed to handle.

In contrast, the OCI Ampere A1 compute leverages the design principles of Arm architecture, distributing processing tasks across a larger number of smaller cores instead of depending on a few, higher-capacity processors. This approach, often referred to as ‘scaling out,’ consumes less power and demands less from facility support systems compared to the ‘scaling up’ characteristic of legacy x86-based processors.

With the addition of Ampere’s Altra CPUs, OCI users can now run AI workloads more efficiently and cost-effectively. Ampere Computing’s Altra Family processors are built with a focus on energy efficiency without sacrificing performance. This makes them ideal for use in power-constrained environments such as cloud data centers. By providing a powerful yet energy-efficient platform, Ampere Computing enables organizations to run AI workloads at a lower cost.

“Amid today’s AI boom, customers are looking for more efficient and economical AI inferencing. With Wallaroo.AI’s migration to Ampere-based OCI A1 instances, we are providing them with a solution, bringing six times the AI inferencing performance while using significantly less money and power,” Jeff Wittich, Chief Product Officer at Ampere

Wallaroo.AI Enterprise Edition

The Wallaroo.AI Enterprise Edition is an optimized machine learning platform that offers scalable and efficient solutions for running AI applications on OCI A1 instances. Thanks to the benefits of the Wallaroo.ai platform, the extra software-derived performance gains are even larger, and the efficiently managed workloads provide more cost savings on top of already competitively priced Ampere OCI A1 instances. Wallaroo.AI leverages the full power of the Altra 

Family processors to deliver even further performance gains and cost savings for businesses.

  • Wallaroo.AI provides AI teams with diverse setup options, supporting both Arm-based and x86-based processors, with or without a GPU.
  • The Wallaroo.ai platform running on Ampere Altra cloud native processors enables shorter feedback loops and a more agile enterprise.
  • Organizations realize faster return on investment (ROI) on their AI projects, which consume less power at a lower cost per inference.

Ampere-based A1.Flex instances on OCI are open to all OCI clients. When integrated with Wallaroo.AI’s tailored ML production platform and it’s Rust-based inference engine, deploying production ML to OCI is both simpler and more energy-efficient. The collaborative solution from 

Wallaroo.AI, Ampere, and OCI empowers you to achieve more with fewer resources.

“This breakthrough Wallaroo.AI/Ampere solution running on OCI A1 allows enterprises to improve inference performance up to six times, increase energy efficiency, and balance their ML workloads across available compute resources much more effectively, all of which is critical to meeting the huge demand for AI computing resources today also while addressing the sustainability impact of the explosion in AI,” Vid Jain, CEO at Wallaroo.AI.

Benchmark test results for the solution

Recently, the engineering teams at Wallaroo.AI conducted benchmark tests to validate the enhanced solution developed in collaboration with Ampere. Test results on the OCI platform using the computer vision ResNet-50 model showed that the joint Wallaroo.AI and Ampere solution reduced inference time from 100 ms to 17 ms.

The testing revealed that the Wallaroo and Ampere solution significantly outperformed, delivering a six-fold increase in performance compared to typical containerized x86 deployments on OCI. Additionally, it required less power to manage complex ML use cases in a production environment.

Performance benchmarks:

  • Running Wallaroo.AI Enterprise Edition ML production platform on Ampere Altra 64-bit A1 Flex VM on OCI + Ampere Optimized AI Framework needs only 17ms per inference (chart bar A).
  • The Wallaroo.AI ML production platform running on x86 needs 53 ms per inference (chart bar B).
  • Common ML Containerized Deployment on x86 (without Wallaroo.AI) needs >100 ms per inference (chart bar C).