Challenges and Best Practices for Scaling Computer Vision to Drive ROI

September 11, 2023

Computer vision (CV) plays a pivotal role across various industries, with applications spanning cloud and edge computing. The goal is to accurately teach machines to process and understand visual data (still images or videos) and make decisions based on that. It has the potential to automate both the detection of risk and to create business opportunities at a massive scale, which is why we are seeing its rapid growth. Here are just a few examples:

  • Manufacturing – product defect detection
  • Retail – Checkout, Foot Traffic
  • Smart Cities – Safety (e.g. busy intersections)
  • Healthcare – Medical Imaging
  • Marketing – Audience segmentation and demographic

According to a recent report, The AI in Computer Vision (CV) market is currently estimated at $17B and growing to $48B by 2028. This is backed up in our customer conversations at Wallaroo.AI, where CV is a component of overall AI strategy for over 30% of enterprises. 

However, these firms are facing challenges scaling. Computer vision applications encounter not only the production challenges seen in other ML applications but also obstacles specific to the domain. Managing extensive data, using the available hardware efficiently, monitoring, and scaling are just a few hurdles AI teams must overcome to deliver measurable, real-world results. 

Furthermore, half of these CV applications would benefit greatly from deployment to edge locations such as manufacturing floors, retail outlets, etc. and this poses even larger barriers to adoption.

In this blog post we cover the key roadblocks that enterprises face in deploying and scaling customized CV solutions, many of the best practices and considerations we’ve seen lead to successful implementations, as well as describe two typical case study where Wallaroo.AI worked with a customer to deliver ROI quickly and at scale.

Challenges in Computer Vision

Computer vision presents many challenges that must be tackled to harness its potential. This section explores the obstacles in effectively processing high-resolution visual data, the privacy and cost concerns associated with cloud computing, and the complexities of securing specialized hardware, optimizing model deployments, and monitoring production models to keep them effective.

Massive Data Management and Processing

Computer vision’s inherent complexity demands significant computational resources. The vast amount of data contained within high-resolution images and videos of diverse formats creates storage and processing challenges. Businesses need robust infrastructure for storage and substantial computational power for analyzing.

The potential of CV for businesses is immense. However, companies need to address the challenges of effectively managing these vast datasets.

Cloud Privacy and Cost Challenges

Cloud computing offers scalable data management and processing solutions but is costly and has privacy risks. Transferring and storing sensitive visual data in the cloud can result in unauthorized access or breaches, threatening individual privacy and business IP.

The financial strain of using cloud platforms for CV can grow considerably, especially when requiring advanced computational resources. 

The challenge lies in balancing the benefits of the cloud’s scalability against its financial and privacy risks.

Hardware Challenges

The computational needs of CV model training are driving the demand for specialized hardware like Graphics Processing Units (GPUs). GPUs can be a challenge to procure and have high costs. Many organizations feel the need to automatically deploy these trained CV models on GPUs for inferencing, which can lead to high production costs and delays in the project as teams wait for GPU availability. 

Even if the target hardware for deployment (especially in edge environments) may involve x86 or ARM CPUs, the additional challenge is the engineering overhead of converting models packaged and tuned for GPU inference to be run on alternative hardware.

Model Efficacy and Maintenance

So you’ve trained a model and have it deployed in a production environment running against production data. In fact you’re not done. Consider the use case of retail checkout, where CV can be used to identify and ring-up products as consumers check out. If there is a smudge on the camera, the model is no longer effective, the product is not identified, and the consumer thinks the checkout experience is bad. Understanding when models stop behaving in the expected manner and addressing such issues quickly is key to keeping CV effective day after day. However, tools to monitor and observe the essential aspects of CV models in production are lacking in functionality, and geared only to cloud deployments.

Best Practices for Optimizing Computer Vision Models in Production

Ensure you can easily deploy, update, and scale your ML pipelines

Creating ML pipelines that can be swiftly deployed, updated, and scaled is pivotal to ensure quick adaptations to changing business requirements as well as the need to retrain and redeploy models. This goes beyond just making the model efficient and involves being able to work seamlessly in real-world operations. It is important to ensure that your CV solution can adapt without significant disruptions as new data or business requirements change.

Integrating CV models into existing infrastructure poses its own set of challenges. The model must interface with existing systems, databases, and other applications. It’s vital to consider the entire ecosystem in which the CV model will operate, ensuring that it performs its specific task and does so in harmony within the bigger picture.

Consider running as much of the analysis near the edge where the images originate

Analyzing at the edge, where visual data originates, offers significant advantages. Edge computing can help alleviate latency issues, making real-time or near-real-time analysis feasible. It also reduces data transfer concerns, as large volumes of visual data doesn’t have to be transmitted across networks, thus optimizing costs and performance.

Running the workload at the edge rather than centrally can drastically reduce cloud-related expenses. Additionally, eliminating the need to transport vast amounts of data leads to quicker, more responsive systems.

Analyze your production ML pipeline to see what fraction is accelerated by GPUs and study hardware alternatives 

While GPUs have become the default choice for image analysis due to their parallel processing capabilities, they might only be necessary for some use cases. If the latency requirements are moderate, inferencing on CPUs could be a more cost-effective choice without significantly compromising performance. 

A comprehensive analysis of your ML pipeline will reveal which components benefit most from GPU acceleration. Furthermore, it’s about more than just the core inference. Aspects like feeding image data into the model can often dominate the processing time. You will see significant performance gains by optimizing these peripheral tasks.

Consider ARM hardware for inferencing to reduce energy footprint

ARM hardware, known for its energy efficiency, can be a game-changer for inferencing tasks in computer vision applications. By opting for ARM, operational costs can be reduced due to lower energy consumption, making the entire system more sustainable in the long run.

Embracing energy-efficient hardware doesn’t only translate to monetary savings; it’s also an environmental responsibility. As businesses strive for sustainability, reducing the energy footprint of CV applications can make a meaningful impact.

Ensure you have a way to understand data and model drift in real-time

Data can change over time, or the operational environment can evolve, creating a risk that the model will drift from its intended performance. Real-time monitoring allows immediate detection of these shifts, ensuring the system remains accurate and effective.

Many tools and strategies are available that can help in the early detection of data and model drift. Leveraging these can save time and prevent potential downstream complications, ensuring the CV system remains robust and reliable.

Case Studies of Best Practices in Action  

Here, we want to briefly touch on two case studies where Wallaroo.AI worked with customers to implement many of these best practices as well as our purpose-built platform for production ML.

Video Analytics – Major Telco for a large US City.

The goal of this project is to analyze large amounts of video data coming from 1000 or more 5G connected cameras in busy street intersections and look for unsafe conditions that can be used to manage the traffic lights or alert emergency response. As such, it requires highly efficient inferencing at scale in 5G edge locations (where the camera would be connected to), as well as centralized and real-time monitoring of the models to make sure they perform well. This posed a challenge in terms of scale, cost, speed, and observability that we were able to solve.

Marketing – Major Advertising Agency.

This involves frequent analysis of 10s of millions of still and video images on an ad-hoc basis to segment the people in the images based on basic categories such as gender, age-range, etc. The challenge was the amount of manual time that the AI teams were spending with each set of data to set up and schedule the analysis, the speed and cost of the inferencing, and the ability to monitor how well the models performed on any particular data set. 


Computer Vision presents critical challenges to AI teams – mainly cost, energy efficiency, and accuracy when deploying at scale. Understanding these challenges and the best practices to address them will put you on your way to building and operating CV applications that can drive business value.

For those looking for help in deploying and scaling CV, we invite you to try out Wallaroo. We’ve spent a lot of time thinking about these challenges, talking to enterprise AI teams, and we purposefully designed the Wallaroo Enterprise Platform for production ML to help remove these roadblocks while freeing up the data scientists’ and engineers’ time, reducing infrastructure costs, and helping maintain the models’ efficacy.

To learn more about how to achieve ROI when scaling CV, request a demo of the Wallaroo.AI Enterprise Edition ML production platform.