AI Ambitions Keep Derailing? Check Your Data Quality.
- By CDOTrends editors
- January 15, 2025

Nearly one-third of all GenAI proof-of-concepts will crash and burn. The culprit? It's not the AI models, the computing power, or even the use cases. It's your data.
That's the chilling prediction from Gartner, and it throws a bucket of ice water on the AI fever currently gripping enterprises worldwide. Companies are scrambling to implement generative AI and machine learning, only to discover a harsh reality: their data foundations are built on quicksand.
At a recent CDO Trends roundtable in Singapore, organized with Informatica, tech leaders from various sectors — banking, pharmaceuticals, energy, and agriculture — painted a grim picture of the data chaos hindering AI adoption. Their collective experience? Data quality gaps are exposing their AI projects to failure.
Data inequality nightmare
“You'll never have uniform standards across the bank for data models, governance, or technology,” confessed a senior data scientist from a major bank. “Different markets, different systems, different regulations — it's a recipe for disaster.”
This challenge explodes in companies built on mergers and acquisitions. One participant from an agricultural manufacturer described their situation as “a complete spaghetti mess" with "no consistent architecture, no consistent data sharing.” The result? Teams build overlapping use cases while the organization has visibility into a mere 10-15% of their AI initiatives.
Sometimes, the consequences of these data discrepancies are painfully obvious. In one case, a beverage company's attempt to consolidate sales data from New Zealand and Australia went sideways. The New Zealand data used the delivery date for sales, while the Australian data used the sales posting date. In another case, a company ended up with an expensive New Zealand wine sharing the same SKU number as a cheap Australian variant when they rushed their data consolidation — a tenfold price difference that went unnoticed for eight months!
“The point is that at the end of the day, quality of the data is something that will need to be harnessed. A lot of customers are realizing that they did not put in a rigorous process around it,” said Steven Seah, vice president for ASEAN, India and Korea at Informatica.
A senior banking executive chimed in, saying the challenge is that the impact of low-quality data was not pronounced enough to make the senior management take notice. AI changed this perspective, making quality gaps glaring.
The quality conundrum
But what does “good” data even mean? The answer, it turns out, depends entirely on who you ask.
Data quality is not a one-size-fits-all issue. “Finance guys demand 100% accuracy because they don't trust anything less,” explained Anand Ramamoorthy, APJ communities practice leader at Informatica. “But for a data science team, they don't care about perfect accuracy. They need the data now to do the analysis, so timeliness becomes the key focus.”
This creates a fundamental tension in organizations trying to serve multiple stakeholders with the same data. One international bank participant noted, “The same data can be used for three different purposes, and its accuracy, relevance, and manipulation will be different each time.”
For highly regulated industries like banking, the challenges multiply due to the nature of AI and data quality. For example, one participant described how his Singapore bank's legal and compliance team shifted from rule-based systems to machine learning for transaction surveillance and fraud detection. But they faced skeptical regulators.
“How do you deal with the uncertainty?” the bank participant asked. “Rules are clear, crafted by humans. But machine learning and AI are statistical at their core.”
Ramamoorthy proposed companies first find out what data they already have internally. He noted that when talking about data transparency, visibility or understanding, there is a tendency for companies to “seek more data because we do not know what we have.”
There are two major reasons to look at the internal data. “You get more value out of it, and… the more data you keep, the higher your risks are,” added Ramamoorthy.
He cited Informatica’s work with Amgen as an example, where they worked together to overcome the issue of site identification and parameters. “So what they used was prompt engineering and AI capabilities to drive efficiencies and reduce their pipelines. Every small thing matters in the outcome you are looking at AI.”
Also, companies need to understand that working with external LLMs comes with copyright risks. “Organizations get into GenAI not just through industry-built LLM but also use RAG pipelines. Use of external models for AI can potentially lead to some copyright issues. This is where we play a key role,” said Ramamoorthy.
The path forward: No more excuses
So how do organizations navigate this data minefield? The roundtable participants offered some hard-hitting advice:
- Start with data governance before AI: “Forget the AI; look at your data first,” a senior banking participant advised. “AI can bring good and bad out of the data, but data has to be in the first place, correct.”
- Define fit-for-purpose quality metrics: Rather than pursuing a one-size-fits-all approach to data quality, organizations must understand how different stakeholders will use the data and define appropriate quality metrics for each use case.
- Establish clear data ownership and SLAs: When data flows between different business units or systems, proper documentation of expectations and responsibilities becomes crucial, even if it's an internal arrangement.
- Implement proper data discovery and cataloging: Organizations need tools and processes to understand what data they have, where it resides, and how it's being used.
The wake-up call
The gap between ambition and reality becomes increasingly apparent as AI continues its march into enterprise operations. The technology leaders at the roundtable painted a picture of organizations scrambling to implement AI while struggling with fundamental data management challenges that have existed for decades.
The message is clear: before you can realize the transformative potential of AI, you need to get your data house in order. This means investing in proper data governance, quality management, and discovery tools. It also means understanding that data quality isn't a binary good/bad proposition but rather a spectrum that depends on the use case.
But you do not need to do it alone. Partnership matters; one that is invested in your long-term success, more so. “In the industry, Informatica is known as the ‘Switzerland of data.’ We have the broadest capabilities geared toward a platform approach, whether it is data quality, integration, or master data management. But the point is where do we start with you, and it is by understanding your challenges today and where you want to go, and then map out the journey together,” said Seah.
Most importantly, we must acknowledge that there are no shortcuts to addressing data quality. The alternative? Watching 30% of your AI projects crash and burn while wondering why nobody sounded the alarm sooner.
Image credit: iStockphoto/vchal