Wallaroo’s Answer to Containerization Problems in ML Production

November 12, 2021

Facing challenges with traditional containerization in ML deployment? Dive into Wallaroo’s innovative approach, designed to streamline the machine learning lifecycle, optimize costs, and enhance operational speed. Learn how Wallaroo’s platform offers a more efficient and secure method for deploying multiple models without the complexities and overheads of typical containerized solutions.

One of the biggest issues in the lifecycle of a machine learning (ML) model is putting it into production. Productionizing is the process of taking the model and turning it into a robust, high-traffic system to be consumed by an end-user or users. And this complex system will need to boast a myriad of accouterments, such as audit logging, error logging, monitoring and anomaly detection, high traffic capacity, high throughput, low latency, and more. 

Models are typically developed in a Jupyter Notebook, a Python development environment. This is an environment that’s optimized for ease of development for the convenience of the data scientist; consequently, it’s not suitable for serving production traffic.  

Another complicating factor in production is that sometimes model developers want to string multiple models together into a model chain, or complex pipeline. This can bring convenience, joint parameter selection, and can help avoid statistics leaking from your test data into your trained model. However, getting model chaining right can be challenging and expensive. With the right approach (and the right AI/ML platform), you can remove a lot of the painstaking work and instead shift that time and effort over to higher-value tasks.

The common approach of containerization

The most used production approach is containerization (most MLOps processes do this). Containers enable more efficient use of expensive physical computing hardware by allowing many virtual machines to run alongside one another on the same physical machine.

Most applications only actively compute something part of the time. If they run directly on physical hardware, that expensive CPU will be spending a lot of time sitting idle. With containerization, the different containers share the physical CPU and other hardware resources. This sharing of resources allows the overall system to come close to continuous utilization of this expensive investment 100% of the time, lowering costs all around.

But this power doesn’t come for free. A container is a relatively heavyweight abstraction – virtual machines removed the need for hardware resources, and containers took that a step further by hiding the complexities of the operating system. Containers emulate an entire computer, from the hardware through to the operating system and on up to the model itself. When a large number of containers are run at once, the performance of individual containers can go down as they begin to compete for resources. Cross-container communication also incurs overheads as data transits in and out of each container.

In the ML pipeline use case, many MLOps systems wrap each of your models into a separate heavyweight Docker container. If you need multiple models in a pipeline, each model gets its own container, and the output of one container becomes the input of the next container. 

The benefits of containers are that they’re easy to set up and offer broad software compatibility. You can run anything inside a container that can run on an actual machine: Python, R, Mathematica, C++ … anything.

Why containerization has hidden drawbacks

The main drawbacks of containerization are:

Scale of resources: As each container is simulating all of the functions of an entire server in software, this requires a lot of memory, CPU, disk space, etc. Then each additional container multiplies that resource consumption,  adding up to a lot of resources. It’s perhaps not such a problem if your containers are a few megabytes to several hundred megabytes at most, because this allows for quick deployment. But, if you have large monolithic applications, especially ones with complicated inter-dependencies, the size of the containers can be in the gigabytes range. Containers that are this much larger are far from ideal for deployment and execution.

Cost: You have to pay for the operational costs of the underlying hardware that’s running all these containers. There’s also a medium level of overhead required in communicating and routing between entirely separate containers.

Speed: Stringing containers together is one form of model chaining where you take one container per model. This is a common approach but as containers are a rather heavyweight construct using a lot of resources they introduce a slowdown in the operation of the pipeline. This slowdown is also caused by the communications overhead of data traversing the container boundary, entering the enclosing system, and then entering another container. 

Security: Since setting up containerization requires significant expertise, there is a real risk of misconfigurations, like using default container names or leaving default service ports exposed to the public. This leaves organizations at risk to attackers. Security risks from containers could lead to downtime or system outages.

How Wallaroo offers a different – and better – solution

Wallaroo offers a lot of punch in a small package. It’s a platform for production AI and analytics designed to make ML faster, more affordable, and accessible to organizations of all sizes. 

Wallaroo understands that speed, cost, scale, and security are real blockers for your ML deployment, so we’ve streamlined the ML lifecycle to deliver solutions while giving data scientists the freedom to use the tools they already know. Using Wallaroo’s sophisticated platform, you can:

  • Deploy models against live data in seconds
  • Analyze data 100x faster
  • Reduce compute costs by up to 80%

Wallaroo gives you the tools to iterate quickly and scale to more data processed and more models using less infrastructure. We’ve created a custom-built, high-speed Rust execution engine that can run many models within just one container. Your models are loaded into this engine and Wallaroo does the rest.  

This is far more efficient in terms of operational costs because you’re not paying to simulate an entire server for each model in a complex pipeline, which may have many models. We’re talking about an impressive 8X to 12X reduction in operational costs to run the same volume of inferencing.

It’s also faster. Our high-speed Rust execution engine plus putting many models within a single container yields the ability to process hundreds of thousands of events per second on a single server. That makes Wallaroo the quickest platform on the market for production AI. On top of raw Rust speed, the output of each model is routed to become the input of the next within Wallaroo’s single-engine, rather than across multiple container boundaries.

With your data security always in mind, Wallaroo ensures that everything processed is private, secure, and only visible to those with permission to see it. And unlike other tools on the market, Wallaroo is designed to seamlessly integrate with your existing systems and connect with everything around it, so you don’t have to go to great time and expense building your own platform in-house.

Ready to work smarter with machine learning and harness the power of many models within one container? Get in touch to get started.