Getting Started With DataOps

If data was water, then most organizations have such poor quality of data that they are essentially subsisting on dirty water, observed Andy Palmer, co-founder and chief executive officer of Tamr. “Give them clean water to drink and they will find that they can’t go back.”

This visceral analogy was shared at the recent virtual round table discussion “Getting started with DataOps to drive better business outcomes, and why you need to do it now”, a discussion led by CDOTrends with enterprise data mastering company Tamr.

Palmer and Lorida Bejleri, a data management and governance leader at a global logistics company, took to the virtual stage to share their insights and experiences with data. This was followed by an engaging roundtable discussion attended by dozens of data practitioners and leaders.

A process for clean data

To illustrate his point, Palmer shared an anecdote of how a publicly listed firm erroneously believed that it had around 15 customers who spent more than a million dollars a year, based on data from its multi-million-dollar ERP system and the findings of external consultants. It turned out that the real figure was more than 180, and only came to light after Tamr helped them successfully master their data.

Were the consultants negligent? Not likely. Palmer conceded that they likely delivered on the project they were hired to do. However, they were probably only privy to some subset of data sources that were availed to them, which led to the wrong conclusion.

“A lot of organizations are analytically empowered by their new data scientists and all the fantastic algorithms and the infrastructure and the capabilities that they have. Data is the fuel that makes it all possible. [Unfortunately], many have sort of neglected the core need to clean up, organize, and prepare their data on a continuous basis,” explained Palmer.

“Data comes out in this raw form and we must go back and refine and clean the data. This is what DataOps is about. Our preferred way to help people understand this is to give them clean water to drink and then they realize how important it is.”

“No algorithm is useful without enough great data. And it is always about the human and the machine working together. People quickly realize that getting their data right is the first step in any AI program. It’s tempting, though, to try and put the cart before the data horse. [To succeed], you really must start with great data,” he summed up.

Treating data as an asset

However, organizations must first start treating their data as an asset before they can progress to a state of maturity on this front. Palmer noted: “Companies don’t allow their cash to be shoved in drawers and hidden on flash drives. The company’s data is no different. Your data is an asset. It’s not owned by any individual or any business unit, and you have to manage it as an asset.”

While people are often afraid to share data because they perceive it as imperfect or it has inherent problems, Palmer argues that the only way to get better data is by first sharing it and getting feedback. Finally, businesses also need the right tools and skillsets to take full advantage of their data.

“You go home at night and the way we manage our personal finances and our schedules using these state-of-the-art tools [and mobile apps]. [But] when we come to work in the morning, we feel like we have just stepped back a decade or two in time – in terms of what we are expected to do, managing data, and using the information to make better decisions.”

“A huge part of the role of the chief data officer in the next-gen data organization is to meet the expectations of the average consumer, the average employee, and the average company that expects to have clean data available to make the best decisions they can,” noted Palmer.

Of balance and communication

Getting this message across the organization will take time and effort, says Bejleri, and will involve explaining the big picture to employees. “People need to understand that we are not doing this for the sake of doing this, but because it contributes to the key business drivers of the company. It’s also important to articulate what [data practitioners] need, how this is going to impact them, how this is going to change their day-to-day job,” she said.

But how can businesses strike a balance between leveraging data to drive new business opportunities, and making sure that security and data privacy are not compromised? Andrew Milroy, founder of digital risk advisory firm Veqtor8 posed this question during the roundtable discussion, based on the idea of data being a crucial asset.

“I think it depends on the type of industry that you’re working in. The banking institutions predominantly focus on compliance because they are highly regulated, so having that compliance in regards to those regulations is a must-have,” said Bejleri.

But while the industry sector can significantly influence how organizations work with data, this should not be an excuse to let opportunities slip by. “We try to [align] as much as possible to business generating revenues, type of use cases, and balance the number of data elements or assets that need to be [accessed] versus for compliance and legal requirements,” she said.

The time to act is now

In a cost-conscious environment with multiple partners and solutions, how can organizations effectively leverage data, wondered Ken Chia, the head of business assurance for an insurance company.

“Avoid boiling the ocean. Many organizations have tried to build new data infrastructure, spend two or three hundred million dollars, and never actually make a difference in terms of getting data out into the hands of consumers,” advised Palmer.

“You build these projects very quickly, deliver cradle to grave solutions, and engineer and design for the fact that data changes all the time. It’s essential to deliver a pipeline that adapts as the data changes. We advocate for using open best of breed technologies, instead of single vendor, single platform kinds of solutions. [Also] we believe it’s easier and faster to use cloud-based infrastructure.”

Paul Mah is a senior editor at CDOTrends. He likes (almost) all things tech and enjoys writing about data science, AI, and digital transformation. You can reach him at [email protected].  

Image credit: iStockphoto/miya227