Your Data Journey Needs an Exit Strategy
- By Lachlan Colquhoun
- April 14, 2024
When they initially conceive data projects as part of their digital transformations, many organizations see their future solely in the cloud. However, many are now also repatriating their data on-premises as part of a truly hybrid approach.
According to Keir Garrett, regional vice president for Cloudera in the Australian and New Zealand market, there are multiple reasons why organizations should take a strategic approach to their infrastructure configurations, with on-premises always in the mix.
“For some large enterprises, whether that is in the financial services and insurance space or Government, they have extremely large databases," Garrett said in a recent interview on the sidelines of the AWS Summit in Sydney.
“What they are finding is that the cost of running their data in the cloud has potentially been far greater than they perceived when they started their journey, so some of them are thinking about repatriation as a result,” Garrett added.
“So what we talk to them about is the cost of doing business today, without any degradation outcomes from putting it in the cloud to putting the data on-prem.”
She said there were "multiple rationales" for keeping some data on-premises, ranging from cost to speed of processing to security and issues of "content and sensitivity."
“We always ask customers about the principles which guide where their data is going to sit, and what outcomes they want from the data and the capabilities they need to have in place to achieve that outcome,” said Garrett.
Regardless of the decisions, it was desirable to apply the same security and governance to data in all environments and allow the movement of workloads to multiple clouds.
The key to performance was "observability tools," which deliver visibility on workloads and performance and point to optimizations that "reduce runtimes and also build in financial governance or chargeback models."
This information, said Garrett, is critical to making the right choices around where data should be deployed.
Implementation challenges
Vini Cardoso, head of solutions engineering and Cloudera’s ANZ chief technology officer, said that while Generative AI was one of the big themes at the AWS conference, organizations were also discussing implementation challenges.
"There is a lot of questioning on how to start using these advanced capabilities like Generative AI, and while there are the capabilities, there is also a skills shortage in the market," Cardoso said.
“The cost of running their data in the cloud has potentially been far greater than they perceived when they started their journey.”
Another point in a lot of the conference sessions was around cost controls.
“Many organizations are embarking on a cloud strategy and using multiple approaches to build an architecture, and sometimes that is not the most efficient, but they need to achieve quick outcomes,” Cardoso added.
“And when that happens, the costs keep blowing up.”
Ethical issues
Garrett also observed that ethics were at the top of many organizations' minds as they began their data strategies.
She said that many organizations were at an early point in their data and AI journeys and were grappling with ethics, governance and privacy in the first stage of transformation projects.
"I think the first step is to consider the intent and usage of the technology and the data," said Garrett.
"So we start with intent and then use that to examine and design frameworks that ensure that data is used effectively," she added.
“There are questions around segregating and separating things which shouldn’t come into consideration when making sure the approach is unbiased and suits the social values that the organization has around fairness and transparency.”
Garrett agreed that this could create early "bottlenecks" as organizations transform. But, it was critical to get these frameworks right before committing to budgets and key infrastructure decisions.
Data provenance
Understanding the quality of the data and its provenance in a world of multiple data sources, from third parties, legacy systems and across borders, was critical to overcoming the early implementation challenges.
Cardoso endorsed the idea of a "data catalog" solution that enables users to understand, manage, secure and govern data from multiple clusters on a single platform.
He said this would give organizations the confidence they need to move forward.
“This makes sure that every single touch point of the data is captured from the metadata point of view, and that enriches the understanding of the lineage of the outcome,” Cardoso said.
“Our vision for the catalog is that it will evolve to a platform that allows collaboration between companies so they can have confidence in sharing data assets and products that can be consumed across the board,” he added.
“That will make an ecosystem that people can trust and use with confidence.”
Image credit: iStockphoto/Gudella
Lachlan Colquhoun
Lachlan Colquhoun is the Australia and New Zealand correspondent for CDOTrends and the NextGenConnectivity editor. He remains fascinated with how businesses reinvent themselves through digital technology to solve existing issues and change their business models.