Half Human, Half Algorithm: The Mutant Future of Data Science
- By Winston Thomas
- December 08, 2024
Companies are starting to wake up to a stark truth: GenAI isn’t just another tech tool — it fundamentally reimagines how we approach tasks, especially those involving complex analytical challenges. And nowhere is this more evident than in data science, the very field that gave rise to GenAI.
” We will retain control of the whole designing process,” says Rosaria Silipo, head of data science evangelism at KNIME, “but we’ll rely more and more on AI for practical implementation.”
How the data science workflow changes
The most immediate impact of GenAI will be on the day-to-day mechanics of data science.
For example, coding, traditionally a time-intensive process, is becoming conversational and instantaneous. Silipo illustrates this with a striking example: A colleague recently created a complex Minard chart visualization entirely through AI interaction.
“He didn’t touch a single line of code,” she recalls. “It was just conversational code building.”
KNIME is at the forefront of this workflow shift, developing specialized tools to help data science teams navigate the generative AI landscape.
The platform introduces innovative nodes that approach AI integration from two critical perspectives: First, GenAI assists in building applications through both coding and low-code approaches. Second, users can directly implement tasks by sending prompts to GenAI models and receiving responses.
Task-specific transformation
Not all tasks are created equal in the GenAI landscape. Text processing emerges as an apparent strength, with AI demonstrating remarkable capabilities in sentiment analysis, topic detection, and multilingual content generation.
Silipo points to her own experience in helping detect fraud. She updated a previous workflow based on Isolation Forest, which finds frauds and warns the victim. This required adding disclaimers in different languages and amending them as regulations changed. Here, GenAI excelled.
However, GenAI’s success with numerical predictions and outlier detection remained mixed. “I changed a prompt slightly, and suddenly the outlier detection results transformed completely,” Silipo notes. “There’s no transparency in how these changes occur.” This limitation underscores the continued importance of human expertise and critical evaluation.
KNIME addresses these challenges by creating specific nodes for cloud-based large language models, including authentication, connection, and prompting capabilities. For more advanced implementations, KNIME offers additional nodes to support the retrieval augmented generation (RAG) process, making AI integration more accessible and controlled.
The coming change: Data scientists as orchestrators
The plain truth is that data scientists won’t become obsolete — but their roles will evolve. They’ll become orchestrators, conductors of a symphony of AI and human intelligence.
“We will get a lot of assistance from GenAI,” Silipo predicts. “But we’ll retain the control of the overall designing process.”
The emerging professional must master prompt engineering, carefully select AI models, provide strategic oversight, and critically validate AI outputs.
Silipo also sees promising developments in small language models (SLMs) that shape the future of data science. “The biggest problem now is huge networks that know everything poorly,” she explains. “Dedicated, smaller models could offer higher accuracy for specific tasks.”
KNIME is actively exploring these developments, continuously adding nodes that address privacy, model compliance, and sensitive data anonymization.
Ultimately, the most successful approach won’t be wholesale replacement, but strategic integration. As Silipo describes it: “What AI can do better, I put it together with classic data science techniques. What traditional techniques do better, I use those.”
Vendors are also seeing the same trend. For example, KNIME’s workflow tools enable this seamless integration, allowing data scientists to combine AI-generated insights with traditional analytical methods.
Challenges and considerations
Still, transparency remains a significant hurdle for the adoption of GenAI.
“Large language models are neural networks,” Silipo notes. “Neural networks are black boxes. I don’t see how they can provide full transparency.”
Evaluation will continue to focus on correctness, transparency, controllability, and cost-effectiveness.
Despite this major blind spot, Silipo believes data science teams will likely have their own AI assistants in five years. Humans will provide high-level supervision and strategic direction.
“We will ask much more of AI,” Silipo predicts. “Basically, we’ll have our own AI to do our stuff, and we’ll just retain the supervision of creation.”
The future of data science isn’t about humans versus AI. It’s about humans and AI working in unprecedented harmony, with platforms like KNIME bridging the technological divide.
Image credit: iStockphoto/Yana Lobenko
Winston Thomas
Winston Thomas is the editor-in-chief of CDOTrends. He likes to piece together the weird and wondering tech puzzle for readers and identify groundbreaking business models led by tech while waiting for the singularity.