Data Science Data Management

Data Scientists More Indispensable Than Ever in Age of LLMs

By Paul Mah
August 02, 2023

Far from threatening the jobs of data scientists, generative AI and large language models (LLMs) will help data scientists improve their productivity and unlock more of their time as they start to interact with data systems conversationally, says Torsten Grabs.

Grabs is the senior director of Product Management at Snowflake, in town this week for the Singapore leg of the data cloud giant’s Data Cloud World Tour.

He spoke with CDOTrends about the implications of generative AI to data professionals, the vital role of data scientists, and how Snowflake is harnessing the technology.

Bring on the LLMs

For a start, Grabs doesn't see LLMs replacing data scientists: “I don't think generative AI and large language model technology has advanced far enough that we can fully automate and trust generative AI results.”

“[Enterprises should] make sure they have a domain expert in the loop that they can lean on, who can provide oversight for the results produced by generative AI and then can intervene,” he said.

Grabs shared how organizations with more mature machine learning practices are leaning on their data scientists to build AI models for specific use cases.

This means data scientists are expected to be conversant with generative AI systems: “How do you inject proprietary information from your organization into the pre-trained model to tune or optimize the model for your specific organization?”

He ticked off training a new AI model, fine-tuning a pre-trained model, prompt engineering, and context injection as some of the skillsets that data scientists will need to be “very familiar” with to support their organizations.

The case for smaller models

Though LLMs such as OpenAI’s GPT-4 and Claude 2 from Anthropic have captured much of the buzz, Grabs observed that these are essentially trained on all publicly available data on the Internet. Sometimes, smaller might be better.

“Large models are resource hungry: They are very expensive to host, they are expensive to run, and part of the expense is essentially just for those parts of the model that you don't need.”

He noted that organizations might be better off with a smaller, narrower model that is specifically trained and optimized for the use cases that the organization cares about.

“[With smaller models, you] get better performance in terms of accuracy and you are more resource efficient, so you can run it at lower cost. I think that's going to be very compelling for organizations to do that because they're getting better results at a lower cost.”

Snowflake and AI

So what generative AI features can users expect from Snowflake? Grabs shared that Snowflake has already demonstrated the ability to take a natural language query and translate that into a SQL statement. This feature is projected to come out in a Preview release this year.

The team is also working on the automatic generation of Python code and even an entire Streamlit app, in effect creating a data-centric “Copilot” to help data scientists and data professionals with coding-related tasks.

“We are now starting to engage in a more conversational search experience around our documentation. Can we help Snowflake users be more effective in finding relevant documentation for them through a natural language conversation?” Grabs said.

Document AI is currently in private preview, and will be a first-party multimodal LLM that is natively integrated within the Snowflake platform to help customers understand documents and other unstructured data.

“Think about it like loading PDF documents into Snowflake, and then asking natural language questions about these documents and then automatically finding the corresponding values from these documents,” he explained.

Finally, there is already a full spectrum of third-party LLMs that are accessible from a Snowflake account without data ever leaving the platform.

Get started today

Does Grabs have any words of advice for the data scientists out there? His response is simple: Get started today.

“I would recommend data scientists to start exposing themselves to the technology. Embrace generative AI and LLMs wholeheartedly and start interacting with it.”

“Start by getting some hands-on experience. This will probably lead to really meaningful experimentation and the discovery of the most valuable use cases that data scientists can then unlock for their organizations.”

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].

Image credit: iStockphoto/NicoElNino

Paul Mah

Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.

Data Scientists More Indispensable Than Ever in Age of LLMs

Related

Why Snowflake Built a Data Warehouse in the Cloud

Alteryx Announces Partnership With Snowflake

Paul Mah

Recommended Stories

New Study Reveals Growing AI Use, Data Quality Challenges in Analytics

Snowflake Just Solved a Major Data Engineering Headache

Enterprises Need a New Toolbox to Tune up Their GenAI Development

Pharmarack's Data Revolution: Cleaning India's Pharma Supply Chain Mess

Data Quality Is Now the Primary Factor Limiting GenAI Adoption

Recommended Whitepapers

Your Data-First Guide to APAC Expansion: Spotlight on South Korea

An IDC Report: How South Korean Enterprises Are Embracing Hybrid Cloud

Country Focus: Australia | Unlocking Innovation and Efficiency: Harnessing the Power of Hybrid Cloud and Data

eGuide: The Business of MDM

IDC InfoBrief: Hybrid Cloud & Data for Innovation in Asia Pacific and Hong Kong