Tech Architecture, Hybrid by Design Key To AI Success
- By CDOTrends editors
- November 29, 2024
Technology architecture is emerging as a strategic cornerstone for CEOs seeking a sustained competitive edge, and data governance initiatives are a must to ensure that an organization’s data delivers actionable results. “CEOs are right to be concerned about data quality and security,” says Kitman Cheung, chief technology officer and director of pre-sales engineering at IBM ASEAN. “That’s why data governance over data usage, quality and protection must be an integral part of their business strategy.”
Cheung says that while many organizations have already started their data governance initiatives, “now these must be extended to cover AI applications.”
“With the right governance strategy in place, AI solutions can be rolled out in enterprise use cases with minimal risk,” says Cheung.
Another strategy for success is a hybrid-by-design approach, which leverages pre-existing tech investments and helps leaders optimize and integrate multicloud and on-prem environments — accelerating AI and business outcomes.
Onboarding data successfully
Onboarding new data sets and data sources into a traditional data warehouse often presents significant challenges. What can organizations do to ensure that the entire spectrum of data is visible to stakeholders?
“Leading organizations seek to become data-driven in their operations,” says Cheung. “Business leaders agree that having trusted data to support decision making and leveraging trusted data for AI/ML initiatives provides their organizations with a significant competitive advantage.”
A data lake is a system or repository of data stored in its natural/raw format. Cheung points out that data lakes in place today “have not delivered on their promise to demonstrate data.”
“Many organizations are stuck with data pipelines that cannot keep up with the speed of business,” he says. “Data are first copied into the data lake only to be copied out to another database with long delays between data requests and data availability.”
Data usability is another problematic area, points out the IBM engineer. “It is hard to find high-quality data within the data lake, and there are no ‘contracts’ between data provider and consumers to ensure SLAs,” says Cheung. “Data is often copied into different formats before it can be consumed by downstream applications. And finally, access control is limited in most cases.”
Want to unlock the power of AI with a hybrid cloud? Join Kitman Cheung and other leading IBM experts at the IBM Hybrid Cloud Forum: "Accelerate the Impact of AI with Hybrid Cloud by Design." Don't miss this opportunity to hear from industry leaders and transform your business. To browse the agenda, find out more info or register to secure your spot, click here.
Security and regulation
Regulations like the GDPR in Europe and the CCPA in California, among others, complicate data security. And there’s always the possibility of additional regulations to be applied in the future.
CDOs want to future-proof their operations as much as possible. But how do we accomplish this task?
“Organizations should leverage AI, automation and hybrid cloud infrastructure by design to keep up with the changing regulatory requirements,” says Cheung. “Given the global nature of many of these regulations, it can be difficult for businesses to keep up with the changes.”
"Organizations cannot rely on audits to ensure compliance. Instead, they must leverage technology to bring necessary control points into IT operations.” — Kitman Cheung
Cheung suggests GenAI, which he says can help identify compliance gaps. “AI-powered automation and hybrid cloud infrastructure can be used to discover implementation gaps as well as provide remediation,” he says. “As all data resides in storage environments, organizations’ storage platforms need to be capable of providing data services, including data security and resiliency. Nowadays, organizations cannot rely on audits to ensure compliance. Instead, they must leverage technology to bring necessary control points into IT operations.”
Leveraging GenAI
“Currently, Generative AI like LLM and traditional machine learning are complementary technologies,” says Cheung. “Think of them as left or analytical and right or creative brains. For now, Generative AI is mostly for generating unstructured data content like text and image generation while traditional machine learning is better suited for numerical data related predictions, like linear regression.” In addition to image and text generation, Cheung highlights document summarization and customer-facing chatbots for Q&A as proactive uses of GenAI.
Cheung also shared some best practices on how GenAI can help improve organizational productivity. “New generation chatbots are able to accurately provide answers directly to customers without human intervention, which frees up customer care agents to focus on more complex issues,” he says. “Internally, retrieval-augmented generation (RAG) pattern-based digital assistants provide accurate information for employees and on-demand.
We are also seeing new agentic AI to work alongside knowledge workers to complete tasks independently while humans monitor higher-level objectives,” says Cheung. Agentic AI refers to a system or program capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and using available tools.
Tools and solutions
“The good news is that automated, and often AI-powered, tools are now available to make the risk assessment and remediation must simpler,” says Cheung.
“Tools like IBM Cloud Pak for Data and IBM Hybrid Cloud Infrastructure provide end-to-end governance of how data is protected and how data is used,” he says. “On the AI side, watsonx.governance provides AI life-cycle management capability to monitor AI performance at scale.”
Managing data lakes
Finding data usability within data lakes is challenging. Cheung says technical leaders should evaluate watsonx.data with IBM Storage, a platform that “promotes the use of open source and open standards to broaden data usage.”
The platform stores data “in one of the open data formats such as Iceberg, ORC or Parquet to future-proof data access,” he says. “A single copy of data is stored on S3 compatible cloud object storage.”
“Watsonx.data with IBM Fusion and Ceph can connect data from anywhere as long as they are stored on S3 compatible cloud object store,” says Cheung. “The data processing engine leverages in-memory open source technologies like Presto and Spark. With these capabilities, watsonx.data can reduce the need for data movement and duplication. IBM watsonx.data separates data storage from compute, allowing multiple engines to access shared copies of data across a hybrid cloud. A unified meta-data and governance layer also ensures access control and compliance.
“In short,” says Cheung, “watsonx.data with IBM Fusion and Ceph embedded is a data lakehouse that can scale storage and compute independently to match business needs. It provides data engines to support all popular use cases (like AI/ML, ETL and Analytics). It supports third-party engines for additional flexibility. And with a governance layer to ensure access control.”
Image credit: iStockphoto/Chayada Jeeratheepatanont