How to Deploy a Machine Learning Model

October 11, 2021

Deploying machine learning models can often seem like a daunting task. In this guide, we demystify the process and provide you with the tools and strategies you need to confidently and effectively bring your model from the sandbox environment to real-world applications

So, you’ve cleaned raw data, created a machine learning model, and trained your data…now what? 

Data Science education programs and universities offering Data Science degrees don’t teach students the steps required after building and training their ML models; consequently, most Data Scientists are not trained in productionizing these models. Putting ML models into production requires skills that are more in line with DevOps and software engineering. Data Scientists who can add this skill to their profile will have an advantage, especially with lean organizations. 

To get started, we must research the following questions and create a plan accordingly: 

  • How will your data be stored and retrieved?
  • What frameworks and tooling are appropriate for your project?
  • How will we receive feedback from the productionized model and continue to iterate?

Data storage and retrieval. You will need to understand how large your data is, how it will be stored, and how data will be retrieved for specific purposes such as training and prediction. From here, you need to decide whether the data being stored will be on-premises, in the cloud (AWS, GPC, Azure), or a hybrid of both. It is good practice to have your data stored where your model training is occurring, and where the results will be provided. Keep in mind, knowing and planning for the size of your data is very important as you will need to consider compute power to handle your datasets. If you are operating locally, you will need to add more compute resources; if using the cloud, autoscaling is a great option. As you continue through the model optimization phases, you will want to monitor these resources to ensure the system is performing optimally; with either storage-type decision, the downside can get costly if you have not thought through your data needs and pre-planned carefully.

Next, you will need to think about how your data will be retrieved and processed. This can be either in batches (data is retrieved in chunks from storage) or in real-time (data is retrieved as soon as it is accessible). How will your model receive data at inference time? From webpages? API requests? These are questions that need to be answered and part of your infrastructure planning to be more robust and capable of handling each situation, especially for prediction data.

Frameworks and tooling. Once your data is prepared and ready for use, you will need to decide on what frameworks and tools to use. When choosing a framework, it is important to consider the task at hand, if the tools are open-source or closed, and how many platforms support this tool. This framework will decide the maintenance, stability, and use of your model. It needs to be as flexible as possible to not bottleneck yourself into a corner with limited options, while not providing so many possibilities that you become mired down choosing among them. 

Feedback and iteration. It is important to have the ability to track and monitor your model’s performance in production, so you are immediately aware of poor performance, data drift, bias, etc. Warnings for these can allow you to be proactive in quickly mitigating risk and fixing the issues before the end users notice. Lastly, figuring out how to experiment, retrain, and deploy new models without interrupting the current production model (called continuous integration) is key to a successful ML deployment strategy. 

These are all just one narrowed-down aspect of productionizing your model. Other potential questions to consider are:

  • How do we connect to the live systems that require inferences (predictions)?
  • Is access control/governance required? 
  • How many inferences per second do we have to support? 
  • Is downtime acceptable, and if so, how much? 
  • What do we do when something goes wrong — fallback to failsafe model, return ‘no inference’, page a human, just log the error and do nothing?

Does a more simplified route exist?

There are many MLOps platforms and open-source library options available that will solve the MLOps challenge. Each comes with its own pros and cons. The best choice for you depends on your business strategy for implementation, time, and resources. 

Google, Microsoft, and AWS all have an available MLOps addition to their platforms. As an example, Google’s Kubeflow project provides a set of open-source tools for MLOps and assembles them on Kubernetes. If a homegrown solution is preferred, options such as Sacred, DVC, or MLFlow are readily available — but you must own significant ongoing tending and maintenance in-house. 

Each use case for deciding on an MLOps strategy needs to be analyzed carefully. For example, if your business is in the banking industry or healthcare, you will need to ensure your MLOps tech stack falls in line with highly regulated compliance requirements. Your MLOps tech stack should include tooling for the following: 

  • Version control
  • CI/CD pipelines
  • Automation
  • A/B Testing and Experiments
  • Anomaly detection
  • Monitoring and performance

At Wallaroo, we have been thinking about these issues for many years. Our solutions are readily available to anyone interested in productionizing their ML projects simply and fast. We empower companies to easily get their ideas into production and continuously improve to maximize business impact.

About Wallaroo. Wallaroo enables data scientists and ML engineers to deploy enterprise-level AI into production simpler, faster, and with incredible efficiency. Our platform provides powerful self-service tools, a purpose-built ultrafast engine for ML workflows, observability, and an experimentation framework. Wallaroo runs in cloud, on-prem, and edge environments while reducing infrastructure costs by 80 percent.

Wallaroo’s unique approach to production AI gives any organization the desired fast time-to-market, audited visibility, scalability – and ultimately measurable business value – from their AI-driven initiatives, and allows data scientists to focus on value creation, not low-level “plumbing.”