Building an artificial intelligence model is no easy feat. Although the noise surrounding AI promotion is loud, the art and application of this technology do require a specific understanding of mathematics and technical know-how to develop meaningful AI models and algorithms.
With growing volumes of raw data about people, places and things, plus increasing computing power and real-time processing speeds, immediate AI applicability and business benefits are more viable. But before IT leaders attempt to deploy or conquer an enterprisewide AI strategy successfully, they must have the capability to bring large datasets together from disparate and varied data sources into a secure, centralized and scalable governed data repository.
AI is only as intelligent as the data behind it. When it comes down to it, machine learning is carried out by training and feeding machine system information in an organized and structured manner. The effectiveness of a machine learning platform relies on the initial set of data used to train the system. Otherwise, the machine learning systems will produce incorrect outputs and prove to be ineffective.
Take for example Microsoft’s Tay -- a Twitter AI chatbot that was supposed to engage in casual and playful conversation with its followers. Instead, it tweeted inappropriate and racist comments. Why? The chatbot was given negative sentiments from Twitter trolls, and preventive filtering was thrown out the window once it was launched. It is a clear example of how AI failed because of a bad data set.
We need to put a higher emphasis on the importance of data quality and governance. Your data is not perfect. If talented data scientists struggle with disparate data, why would machines be any different? The solution: allow data governance and quality to pave the way to AI democratization.
Why Your “Crown Jewels” Need Data Quality, Governance
As more companies try to democratize AI on their own, they’re discovering it’s not the easiest thing to do. But companies are eager to speed up the process. In fact, according to a Cowen and Company study, 81 percent of IT leaders are currently investing in or plan to spend on AI, as CIOs have mandated that AI needs to be integrated into their technology stacks.
But before businesses can get to an AI proof-of-concept or invest in operational AI applications, they need to have a data quality and governance strategy in place. Both of these frameworks work in tandem, and to put into perspective how data quality and data governance are truly symbiotic, it helps to think of your data as your crown jewels.
With data quality, you ensure that your jewels are cleansed and in perfect condition. Quality is not a one-off process and data can come from everywhere. Continuous data management makes sure that the quality and integrity of data remains intact so that you can make smarter business decisions. Businesses can gain a competitive edge if data quality issues are addressed within the organization. However, to improve data quality, data needs to be accurate, complete and consistent.
On the other hand, data governance requires a team armed with the responsibility and the right tools to manage the system that protects those sacred jewels. A well-planned data governance framework covers strategic, tactical, and operational roles and responsibilities. It defines who can take action upon which data in what kinds of situations with which methods. For businesses, data governance initiatives seek to build a strong foundation for business intelligence and can be a pillar for strategic planning.
Now that the EU’s General Data Protection Regulation is in effect, there is a huge opportunity to put a data quality and governance plan in place. A sound data governance approach should and can involve more than one platform or project, plus it should contain a set of rules and standards for data related matters. And keep in mind that a data governance program can stretch across several areas of focus, drilling down to the enterprise or project level. One is (you guessed it) data quality — where finding, correcting and monitoring data quality issues in the enterprise is a top priority.
Data Democratization, Dirty Data and the Data Champions
With data governance comes great responsibility. It’s no surprise that companies are in a mad rush to become data-driven, and rightfully so — but it leads to incomplete, inaccurate “dirty data,” or data that is riddled with errors and missing values. A Kaggle study shows that dirty data is the most common problem for workers in the data science field.
So, you need to get a sense of how dirty the data is. Whether you need to update date formats, capitalization or punctuation, it’s important to get a quick understanding of what you’re dealing with. Systems infused with AI capabilities are smart, but they are still computer programs. As noted with Microsoft’s Tay chatbot, you can’t feed the system dirty data and expect to train a model or build a foolproof platform. Like the saying goes, “garbage in, garbage out.”
While data-literate professionals and scientists typically own the keys to the data kingdom, the proliferation of new data streams coming from sensors, social media, the cloud, internet of things and so on, is uncontrollable. That’s why we’re seeing new data-focused roles emerge within enterprises, whether they are data analysts, data scientists or data stewards. These new roles are blurring the lines between enterprise data and consumers, and it’s presenting a challenge to corporate data quality, reliability, and trust that must be addressed by IT organizations.
But there is a solution. While we need data experts to maintain data integrity, it's critical to democratize data to distribute information across all teams. Instead of having business units go through IT teams to get the data they need, we can empower all business units (marketing, business analysts, IT and sales) to act on business insights.
For example, the marketing department can analyze click streams from the website or finance teams can get vendor billing details.
An AI strategy Is a Data Strategy
When dealing with machines, the quality of the analysis and the outcomes depend on the quality of the data you feed into the algorithm. Businesses can't and shouldn't even begin to think about creating and applying their own AI models or algorithms without the power of secure and clean democratized data that is integrated into mission-critical systems. The results can be disastrous and cost millions.
Before thinking about how you can teach a machine to learn, have a vision for data governance in your company that evolves and can provide value to your business. The right data strategies are crucial to implementing the right AI strategies, and we need those who understand data best to maintain the data quality and integrity necessary to fuel the types of automated, intelligent insights that AI can provide. Then, and only then, can organizations fulfill their AI dreams.
Laurent Bride, chief technology officer, Talend contributed this article.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends.