Why Data Scientists Should be Excited about Wallaroo Community Edition

May 3, 2022

As a data scientist, here are the parts of my job that I enjoy the most (in no particular order):

  • Learning about new domains where I can apply my problem-solving skills, and gaining knowledge and inspiration from the experts in these fields who work with me.
  • Learning about exciting new techniques and approaches in machine learning and statistics, and applying them to the real-world problems that I’m working on.
  • Getting my hands dirty in the data, exploring how it relates to the information that I need and the knowledge that I want to extract.
  • Creating and executing elegant and effective solutions to the problems that I’m tackling. 
  • Seeing my models and solutions bring real value to the business, solving problems and making decisions.

What’s the part of the job I do not enjoy?

To be honest (and I’m not proud of it): the act of making my models bring value to the business. 

I think it boils down to this: to me, data science (and data analysis in general) is a process of abstraction: finding the patterns, the “truth” that lies within the data, and turning those patterns into automated and actionable decision processes (models) that perform a valuable function within the business. That means my job entails tasks like exploration, experimentation, and communication of my findings to interested parties.

But putting these models into production feels – to me – like the opposite. To operationalize a model, the focus is on tasks like automation, optimization of resources, hardening, security, logging and developing fallback strategies in case things go wrong. This requires explicitly worrying about all the complexities and details that must be taken into account to make sure that the production pipeline is robust, reliable, and failsafe. And I’m not a details person; at least not those kinds of details.

Not all data scientists feel the way I do, of course; some of them are quite comfortable with, and even actively interested in, the process of shepherding their models into the real world. But it seems that the population of data scientists who feel similarly to the way I do is large enough that a whole new profession has arisen to fill the gap: Machine Learning Ops engineers. 

Bridging the gap with Wallaroo

MLOps engineers have precisely the skill set to make the models that data scientists create active and actionable in the real world, and to monitor model performance, operationally speaking, while that model is live. But while many data scientists may be happy to hand their models over to the MLOps team for a production rollout, this process, as currently done, may not be terribly efficient.  Because data scientists and MLOps engineers don’t speak the same language, and don’t work or think the same way, there can often be time-consuming bottlenecks as one group tries to articulate a requirement, and the other team tries to satisfy it. 

In addition, if the model does start misbehaving in production, it can often be a team effort to diagnose the issue – is it an error in the production stack, or is something wrong with the model? This can lead to the same communication and coordination bottlenecks as deployment as data scientists struggle with gaining visibility into their models within the production stack.

Can we make the interaction between the data scientist’s world and the MLOps world easier?

When I am learning how to operate within an unfamiliar system it’s helpful to me to do that in an environment I’m comfortable in; preferably an interactive one. As a data scientist, I find a notebook-style environment to be helpful, because I can simultaneously figure out the process and document it for future reference. Some people prefer graphical UIs and dashboards, and those are certainly useful for information that’s best absorbed visually, like summaries of model health, or quick glances into what might need my attention the most.

I want it to be easy to specify important information about the model to deployment teams: for instance, data validation constraints that the model expects to be met, or any preprocessing of incoming data that must be done before it’s fed to the model. 

If and when something goes awry with one of my models in production, I want to know about it sooner rather than later. I want to easily diagnose the problem; automated diagnosis is ideal, but for hairy problems I want to to quickly pull down the information I need for more in-depth analysis. This includes inference logs, input history, and model health information. 

Wallaroo community edition helps me do all that. 

Meet Wallaroo community edition

It’s designed to be a platform where I can interact with production teams, about production concerns, in a way that fits my natural (data scientist) working style. That’s definitely a benefit for me, and I hope it’s a benefit for the ML engineers who have to work with me, too! 

Wallaroo’s Enterprise offering has been around for a while, but now there’s a free Community Edition to make Wallaroo’s simplicity and power available to smaller teams, too. The early access beta has launched already. The general availability launch of the Community Edition two months later.  If you are a member of a small data science team that’s been struggling to deploy models in a timely manner, if your team members have been seeking a better way for data scientists and MLOps engineers to collaborate, or if you’re a data scientist who wants more hands-on experience with the MLOps process, Wallaroo Community Edition is for you. 

Need more info?

Learn more about the details of Wallaroo Community Edition at our launch announcement, and with Wallaroo 101. Then make sure to sign up for the beta waiting list! Once you get access, we’ll help you with any questions or issues that arise.

About Nina Zumel

Nina Zumel has a Ph.D. in Robotics from Carnegie Mellon University and over 20 years of experience practicing and teaching analytics, machine learning and data science. She was a scientist at SRI, led the design of an early online pricing system for a small Palo Alto startup, and has worked on applications for emergency management training and intelligent search.  In her roles as VP of Data Science at Wallaroo and Co-Founder & Principal Consultant for Win Vector LLC, she has led or been involved in engagements pertaining to adword revenue attribution, customer transaction models, product recommendation systems, and loan risk modeling. She is also heavily involved with data science related training and teaching, including the design of EMC Corporation’s Data Science and Big Data Analytics course, and bespoke Data Science training courses for a number of large corporations. Dr. Zumel is the co-author of the popular text, Practical Data Science with R (Manning Publications, 2019), now in its second edition.

How To Register and get your free license