Model Interoperability with ONNX

February 9, 2023

Explore the seamless sharing and deployment of machine learning models with ONNX, an open-source framework transcending neural nets to traditional algorithms. Discover how ONNX enhances model interoperability, aiding data scientists and ML engineers in model serialization, deployment, and execution across diverse runtime platforms. Make it easier to share and deploy your machine learning models.

ONNX (the Open Neural Network Exchange) is an open source framework for serializing machine learning models. While it was originally developed for representing neural nets, it has been extended to a variety of other traditional machine learning algorithms. Because the model representation is independent of any specific environment, ONNX allows data scientists to share machine learning models that they produce, regardless of their preferred modeling framework, and to deploy them across a variety of runtime platforms.

In this article, we’ll give an overview of ONNX, and talk about why it’s an important tool for sharing and deploying machine learning models. We’ll also provide some tips and resources for converting models to ONNX.

The Basic Idea

Let’s take a simple linear model:

 y = b0 + b1 ∗ x1 + b2 ∗ x2  + …

Powered By Embed Fun

This expression can be represented by a computation graph, made up of features (inputs), edges, weights, and operators:

A notional computational graph for a linear model


An ONNX model is a description of this graph. The graph can then be “executed” by any runtime that understands the representation.

The beauty of this representation is that it can be used to express a wide variety of complex model types, regardless of how that model was originally fit. Whether you fit a gradient boosting model using scikit-learn or xgboost, or fit an LSTM using PyTorch or Tensorflow, you can serialize your model to an ONNX representation that’s not beholden to the original modeling framework.

These models can be run with ONNX Runtime, a cross-platform model accelerator that supports a wide variety of operating systems, architectures, and hardware accelerators.

This gives Data Scientists and ML Engineers a lot of flexibility to tune their respective ecosystems to their needs. Data Scientists can develop in the language and framework of their choice. They can share the models with colleagues who may prefer another framework. These colleagues can test out the model, without needing to know much about the original environment where the model was developed; just the appropriate format for the input data, and the appropriate version of ONNX.

ML Engineers can deploy these models to the best environment for their inferencing use case, with minimal or no dependence on the model’s development framework. 

For example, our company, Wallaroo.ai, uses ONNX as the primary model framework for our ML production platform. Data Scientists can develop models in their preferred Python framework, convert them to ONNX, and upload them to the Wallaroo high-performance compute engine, which is implemented in Rust. Wallaroo then efficiently runs the model in the production environment.

Other production environments might run the model in C, or on special hardware accelerators, or deploy the models to the edge (a scenario Wallaroo also supports).

Let’s See It in Action

Let’s see an example of training a model, converting it to ONNX, and doing inferences in a Python ONNX runtime. For this example, we will train a simple Keras model to predict positive and negative movie reviews from IMDB. Since the focus of this article is on model conversion, rather than training, we’ll use the already tokenized version of the data set that is included in Keras.

This code snippet trains the model and saves it to the TensorFlow SavedModel format. It also saves off a small bit of data (five rows) for testing the fidelity of the ONNX conversion, later on.

Note that for this example, the model input is a vector of 100 integer tokens (max_len = 100)

Converting the Model

To convert our model to ONNX, we will use the onnxmltools package. The conversion function takes as input the trained Keras model, and a description of the model’s input. This description is a list of tuples, where each tuple is the name of the input, and the input type.

Our model has one input, of type Int32TensorType(None, 100) — that is, the model accepts as input an arbitrary number of integer vectors of length 100. We’ll call that input “input_text.”

Finally, we convert and save the model.

Inferring with the ONNX Model

After the model is converted, it can be shared with other data scientists, who can run it using ONNX Runtime. We’ll show an example of that in Python, using the onnxruntime package. The first thing a new user might want to do is interrogate the model to determine its inputs and outputs.

This gives us the following output:

This tells us that the model takes as input named “input_text” that consists of integer vectors of length 100, each of which returns a single float named “dense” as output (the probability that the text is a positive review). In this example, we aren’t really using the output names.

Finally, let’s predict on our example input data, with the call sess.run(). The inputs to the run method are the name of the output (we’ll use None here), and a dictionary keyed by the input name(s).

And now we’ve successfully inferred with the model, without needing the Keras environment.

Tips and Resources for ONNX Conversion

ONNX provides a lot of advantages in terms of sharing, running, and deploying models, but model conversion can be a challenge. Fortunately, both PyTorch and Hugging Face have fairly well documented and straightforward procedures for converting models from those respective frameworks.

For other ONNX-supported frameworks, the documentation is a bit diffuse, and there have been several conversion packages that have come and gone. I’ve found that onnxmltools is the most reliable and up-to-date; the package supplies some useful examples for converting models from a variety of frameworks.

For deployment, the ideal situation would be for data scientists to be able to submit their original models to a deployment registry and have that registry automatically convert those models to ONNX or another appropriate representation to run into production. Wallaroo is currently working on making this situation reality. But in the meantime, learning how to convert models to ONNX for maximum interoperability is a valuable tool in the Data Scientist’s arsenal.