An End to Garbage In, Garbage Out: How To Derive Value From AI Projects
- By David Jayatillake, Cube
- August 25, 2024
As businesses heavily invest in generative artificial intelligence (GenAI), many face disappointment due to unmet expectations — and few are transitioning successfully from prototypes to production. According to an October 2023 Gartner, Inc. survey, 45% of organizations are currently piloting GenAI, while only 10% have fully deployed it.
The growing disillusionment and delays often stem from one fundamental issue: poor data quality. The success of AI-driven large language model (LLM) projects depends on accurate, reliable data. Yet, many organizations are attempting to build AI solutions on top of messy data warehouses without the necessary data engineering, including implementing a universal semantic layer. The result? Garbage in, garbage out.
Why AI needs a semantic layer
Many in the data world have heard of semantic layers, but only some AI professionals are familiar with them. A universal semantic layer is an abstraction layer residing between data sources and consumers. It provides a consistent, standardized, trusted view of data. The semantic layer empowers data analysis — whether by humans or AI — by providing a unified data source.
Just like humans, LLMs require context and consistency to deliver accurate results. Proper data cleaning, curating, and modeling are essential for improving AI accuracy. A universal semantic layer establishes metrics and metadata to provide consistency and accuracy to the LLM. Providing a query interface can also limit what the model can use to respond to a query rather than having it respond using the entire body of knowledge used for training.
Getting the data house in order
Implementing a universal semantic layer takes a bit of work, but it is worth the effort. Developers and data engineers can define a data model once, and it can be consumed anywhere, including with an AI application. The first step is to determine the issues to be resolved, gather the proper information, and then code the connections between the two (business logic). Then, metadata is used to develop an abstraction (semantic) layer based on the business logic.
A universal semantic layer overcomes the garbage in/garbage out phenomenon so common now that many companies have adopted large-scale cloud data platforms like BigQuery, Databricks, and Snowflake. Though highly beneficial, storing the appropriate logs, events, telemetry, customer behavior, and so on also adds another layer of complexity: an ever-growing lattice of permissions, caches, and metrics definitions (is "average_cart_size" or "average_order_value" the appropriate column for an e-commerce dashboard, for example?).
A universal semantic layer removes semantic complexity, helping individuals and LLMs navigate the inconsistent metrics, overlapping schemas, and conflicting permissions problems that arise in modern data architectures.
Other ways AI projects benefit
Integrating GenAI with consistent enterprise data enhances reliability, transparency, and security while improving data quality and scalability. By sitting between data platforms and consumers, a universal semantic layer enforces security via authentication and role-based access control.
A universal semantic layer helps AI applications in other ways as well. GenAI gains deeper insights into an organization’s unique context using private, semantically tagged data. This integration ensures the AI system accesses real-time, updated data, improving the overall quality of generated answers. And, as production AI models generate new data (predictions, answers, features), they must be exposed to users. A universal semantic layer can automatically publish model-generated insights based on existing analytics and outputs.
As data analytics and AI projects expand, dependence on a single platform for AI or data analysis becomes less practical. A semantic layer connects diverse tools and data platforms by decoupling data sources from consumption, making analytics and AI accessible to more users.
The semantic layer can also enable explainable AI by organizing and disseminating information about why an AI model provides a particular answer. Delivering better insight into the reasoning behind an AI model's suggestions enhances trust in a model's outputs.
Putting an end to garbage in, garbage out
While revolutionary, LLMs encounter limitations, particularly in producing accurate outputs due to the “garbage in, garbage out” issue. Essentially, LLMs hallucinate. Simply feeding them database schemas isn’t sufficient for generating correct SQL. The prerequisite for successful AI projects is data organized into meaningful business definitions and a query interface so that LLMs can understand data contextually.
Once the data engineering is done, organizations can start deriving the full value of AI and paving the way for innovative AI applications. In addressing the complexity of modern enterprise data and AI management, the universal semantic layer has emerged as the foundation that promises to enhance efficiency and power more informed decision-making by humans or AI.
So, start with data engineering and a universal semantic layer. Only then can organizations optimize AI investments and drive meaningful value.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/gmast3r
David Jayatillake, Cube
David Jayatillake is the vice president of AI at Cube.