Cloudera Issues Call to Define Open Standards for Machine Learning

Enterprise cloud data firm Cloudera has issued a call for industry participation to help define universal open standards for machine learning operations (MLOps) and machine learning model governance.

MLOps at scale

MLOps revolves around implementing machine-learning in production, notably around the infrastructure and tooling needed to deploy machine-learning algorithms and data pipelines reliably and at scale.

By leveraging the community, Cloudera hopes to help companies make the most of their machine learning platforms and pave the way forward for the future of MLOps.

The challenge of deploying and governing machine learning models at scale needs to be addressed at the industry level, says Doug Cutting, the chief architect at Cloudera. He pointed to Apache Atlas as being the best-positioned framework to integrate data management for explainable, interoperable and reproducible MLOps workflows.

“The Apache Atlas (project) fits all the needs for defining ML metadata objects and governance standards. It is open-source, extensible, and has pre-built governance features,” said Cutting.

Joining the call is the CEO of data science platform Anaconda, Peter Wang, who noted that open standards for MLOps can reduce the “clutter” of proprietary technologies and offer businesses the ability to focus on innovation.

Wang said: “Open source and open APIs have powered the growth of data science in business. But deploying and managing models in production is often difficult because of technology sprawl and [the creation of silos].”

“Machine learning models are already part of almost every aspect of our lives from automating internal processes to optimizing the design, creation, and marketing behind virtually every product consumed,” said Nick Patience, the founder and research vice president for software at 451 Research.

“As ML proliferates, the management of those models becomes challenging, as they have to deal with issues such as model drift and repeatability that affect productivity, security and governance. The solution is to create a set of universal, open standards so that machine learning metadata definitions, monitoring, and operations become normalized, the way metadata, and data governance are standardized for data pipelines.”