Replication Creates New Data Hoarding Headache
- By Lachlan Colquhoun
- April 24, 2023
There are a lot of metaphors and analogies for data. But perhaps one which can resonate in the current market is a comparison with compulsive hoarding behavior.
Just as a hoarder can’t bear to let items go because they might be helpful one day, many organizations are sitting on exponentially increasing quantities of data.
Perhaps one day, they might clean it all and run a new algorithm over it, and there might be some new revelations.
In the meantime, the data accumulates — ungoverned, under analyzed and stagnant — to the point where many organizations don’t know precisely what they are sitting on.
Most large organizations have around 400 different data sources, and financial institutions such as banks are running up to 4,000 different applications, all creating fresh data.
The consequences were seen this week in Australia, as the fallout from the September 2022 data breach at Australian telco Optus reached a new phase.
Law firm Slater and Gordon registered a class action in the Australian Federal Court on behalf of 100,000 registered participants, who will claim that the company failed in its duty of care to protect them from harm.
The elephant in the room
According to Ben Henshall, the vice president in Australia and New Zealand for data visualization company Denodo, many organizations are not just beset with issues around quantities of data. They also compound the problem by replicating what they have in different forms.
“In data management world, there is this really big elephant in the room called data replication because data gets moved around and stored in different formats and different areas,” he says.
“And the cloud is yet another repository, that’s yet another place where data is replicated and stored for the purpose of being potentially analyzed or used.”
Replication, says Henshall, bloats the organization’s data profile and creates a “major surface attack area,” which increases vulnerability to breaches and heightens risk.
“And that doesn’t even include what we call ‘last mile’ data copying in Excel spreadsheets,” he says.
“In data management world, there is this really big elephant in the room called data replication”
“Because when people want to analyze data, Excel is the common ‘lingua franca’ for that, and spreadsheets get copied and moved around, increasing replication and the surface attack area and creating long-term operational and engineering issues,” Henshall adds.
“Every time you copy data and move it around, you’ve then got to try and protect it, store it and manage it.”
Hoarding environment
Henshall says that 60% to 90% of data collected is not used or analyzed for business purposes, and around 90% of data is not unique past one year — meaning that the older it is, the more likely it is to be replicated and potentially unused.
The result for many organizations is akin to living in a hoarding environment. They have so much data it becomes unmanageable, and instead of being an asset — the modern goal of so many organizations — it becomes a liability and a risk.
It also complicates the understanding of the provenance or lineage of data. It introduces the risk that the data may be changed in replication, creating issues around single sources of truth.
Henshall says the vendor community has often been complicit in some of this because, for many, it has been in their interest for organizations to replicate data so they can then sell tools to manage it.
The last mile
So, what does Henshall suggest as a solution? The first step is don’t copy data; keep it where it is.
“The value comes from how to unify it and present it in formats and structures that the business can self-service and consume and digest through the tool of their choice,” he says.
“And if you do that, you don’t have all the overheads of copying and replicating and moving data if you don’t have to. And that is what we call ‘last mile data delivery’ in a format the business can understand.”
Henshall says the aim for organizations is to “reduce the temperature,” meaning they are currently expending a lot of energy they don’t have to pay to manage the vast volumes of replicated data they are holding on to.
To return to the hoarding analogy, organizations need a garage sale or a council clean-up to clear the house of clutter and make it more manageable to live in.
Success, says Hensall, is where data sources are created without replication “to make it all unified, like one big database through metadata.”
His other callout is to chief financial officers and C-suite executives outside the technology area.
Without their focus and attention, he says, data management within organizations will flatline in an environment where it’s not a question of “if” there will be a data breach but “when.”
“There’s a huge opportunity cost that will divert larger amounts of resources at the same time as the goodwill on balance sheets will be impaired,” Henshall says.
“So organizations will be diverting resources on fixing recovery that could have been otherwise spent on reducing attrition, improving customer experience, growing and expanding,” he continues.
“And that is the issue that I’m not sure has been comprehensively understood and acted upon because data is still put down to some kind of plumbing issue. I really think this is a major elephant in the room which is just getting bigger and bigger.”
Lachlan Colquhoun is the Australia and New Zealand correspondent for CDOTrends and the NextGenConnectivity editor. He remains fascinated with how businesses reinvent themselves through digital technology to solve existing issues and change their entire business models. You can reach him at [email protected].
Image credit: iStockphoto/Aleutie