Data's New Sheriff: Snowflake's Quest to Bring Order to the AI Frontier
- By Winston Thomas
- May 20, 2024
Data has always been important, but let's be honest: It wasn't exactly the life of the party.
Before LLMs sprung into the public consciousness, many companies were sleepwalking themselves through data management.
Sure, we know companies were swimming in it, with their storage infrastructures bursting at its seams. Sure, they had mountains of data, but it was often fragmented and underutilized—thanks to lingering data silos and department politics that made data hard to wrangle.
But many companies, especially larger enterprises, knew that the truth lay somewhere in their 360-degree view and aimed for it.
With Large Language Models (LLMs) like ChatGPT storming onto the scene, data management is in the spotlight. It has transformed from a back-office chore into the lifeblood of cutting-edge innovation.
The LLM wakeup call
Before LLMs exploded onto the scene, many companies were sleepwalking through their data strategies. Now, with LLMs streaming into public and corporate consciousness, suddenly, everyone wants a piece of the AI pie.
Sanjay Deshmukh, senior regional vice president for ASEAN and India at Snowflake, sees this shift firsthand.
"With what has happened with Gen AI in the last 12-14 months, it has completely changed the data landscape or the data economy," he explains. "Enterprises quickly realized that they have data from the customers and the business, and basically, the customers trust them with the data. It is their responsibility to protect their assets, protect the data."
A shift in mindset
But LLMs aren't a magic bullet. The hype suggests anyone can create apps with natural language, but the reality is more complex. Data science teams know the gap between theory and practice and how easily models can drift off course.
Moreover, not every company is Google or OpenAI. They can't afford to burn through R&D budgets running queries in massive data centers. The smarter approach? Bring the model to the data. That solves data sovereignty and privacy issues while keeping costs in check.
And then there's governance. When ChatGPT gets facts wrong, it's a meme. When a healthcare LLM does the same, it's a disaster. Established rules matter, too. In places like Singapore or Hong Kong, banks can't give specific financial advice, but an LLM might if prodded with misleading prompts.
The Snowflake power play
This is where Snowflake comes in. They've evolved from a data warehousing platform to a full-fledged data cloud, helping businesses centralize, govern, and share their data securely.
Snowflake's secret sauce? A unique architecture that separates storage and compute, letting businesses scale on demand. It's perfect for the LLM era, where compute needs fluctuate wildly.
"We allow customers to host their chosen LLM next to the data," explains Deshmukh. This keeps sensitive info safe and eliminates the risk of sending it to external providers.
Snowflake is also helping businesses integrate LLMs into their workflows, allowing them to unlock insights without compromising security. By hosting LLMs next to the data, sensitive information stays within the company's walls.
LLMs under the microscope
With LLMs becoming more deeply embedded in business, the need for robust data governance has never been greater. Regulators are watching, and transparency is critical.
Snowflake has made governance a core principle, offering features like masking, encryption, and role-based access to protect sensitive data. "The importance of governance has never been higher," stresses Deshmukh.
Snowflake's approach to LLM training and deployment leverages Snowpark Container Services for containerized data apps. Snowflake Cortex is a fully managed service for LLM app development, providing access to top AI models and vector search functionality. Cortex includes specialized sentiment analysis, summarization, and translation functions, as well as general-purpose functions with Meta's Llama 2 and Snowflake's high-performance LLMs.
Additionally, Snowflake Arctic introduces a top-tier enterprise-focused LLM that optimizes cost-effective training and openness, offering efficient intelligence and open access to weights and code.
Snowflake's ecosystem enables users to build, fine-tune, and deploy LLMs seamlessly within Snowflake's secure and scalable infrastructure. Arctic uses a Mixture-of-Experts (MoE) architecture for top-tier intelligence and efficiency. Snowflake is releasing Arctic's weights under an Apache 2.0 license and details of its training, setting a new openness standard for enterprise AI.
Snowflake built three LLM-powered experiences leveraging Cortex, including Snowflake Copilot (private preview) and Universal Search (private preview), to enhance user productivity. Meanwhile, a public preview of Streamlit in Snowflake can help accelerate the creation of custom LLM-powered apps, enabling users to quickly turn data, AI models, and functions into interactive apps written in Python.
Vector support is going to be huge
The biggest news is Snowflake adding vector as a native data type within its data cloud to support vector embeddings and vector search capabilities in Cortex. This is important because vector search is essential for large language models (LLMs) for several key reasons:
- Efficient storage and retrieval of word embeddings: LLMs often use word embeddings like Word2Vec, GloVe, and FastText to represent words as high-dimensional vectors. Vector databases can store these embeddings and quickly fetch them during real-time operations.
- Enabling semantic understanding: Vector embeddings capture semantic relationships and meaning between words. By indexing these vector representations, LLMs can leverage vector databases to understand the context and meaning of input text, enabling them to generate coherent and relevant responses.
- Facilitating efficient similarity search: Vector databases allow for fast similarity search by indexing vector representations of textual data. LLMs can quickly retrieve similar or related information, which is crucial for tasks like information retrieval, recommendation systems, and content generation.
- Supporting multimodal applications: Vector databases can store embeddings of multimodal data like images and audio, allowing LLMs to integrate and reason across different modalities for tasks like image captioning and speech recognition.
As Deshmukh points out, these announcements demonstrate Snowflake's focus on making enterprise AI easy, efficient, and accessible by providing a data foundation and cutting-edge AI building blocks to create powerful AI and machine learning applications.
Snowflake knows it can't do it alone. They're teaming up with AI giants like NVIDIA and regional players like Accenture to build the infrastructure and expertise needed for the LLM revolution.
Snowflake has just started
The rise of LLMs has catapulted data management into the spotlight. It's a new era where governance, security, and collaboration are king. With its evolving platform and strategic partnerships, Snowflake is leading the charge.
As Deshmukh puts it, "The differentiation will come from how you leverage the model of your choice and build these AI applications to serve your business and your business users."
In this new data-centric world, Snowflake is betting big that they can help companies do just that.
Image credit: iStockphoto/Kolidzei
Winston Thomas
Winston Thomas is the editor-in-chief of CDOTrends. He likes to piece together the weird and wondering tech puzzle for readers and identify groundbreaking business models led by tech while waiting for the singularity.