Solving AI’s Data Pangs: Inside the Secret Storage Wars
- By Winston Thomas
- March 26, 2024
AI is a data glutton, and its appetite is only growing. As AI models evolve and go mainstream, the storage industry faces a make-or-break moment. Companies like Pure Storage are betting big on a new generation of hardware to keep these hungry algorithms fed.
Right now, most AI projects are still lab experiments. But the writing's on the wall—the data demand is about to explode. "A lot of the advancements in storage technology are going to come in the form of improved data management," observes Matthew Oostveen, chief technology officer at Pure Storage (Asia Pacific & Japan). He called for radical shifts in how we manage and store data.
The pressure's on. Companies want storage that's lightning-fast, planet-friendly, and won't bankrupt them. The stakes are high—the winners in the AI race will likely be those who figure out the storage puzzle.
Flash memory at the speed of thought
Pure Storage knows about AI's hunger pangs all too well. Its flash-based solutions, like FlashBlade, were built from the ground up for AI workloads. Oostveen maintains that much of the core innovation is done; now, it's a race to squeeze out every last drop of performance at a reduced cost as AI requirements escalate.
Picture it: a machine learning model sifts through datasets over and over, teasing out patterns and relationships. "Then you've got these random workloads, and it poses a challenge for that underlying hardware and the optimization of it, particularly the storage system because it needs to handle these unpredictable access patterns," Oostveen explains.
It's the storage system's nightmare, with unpredictable access patterns that strain the underlying hardware. The dream scenario for AI engineers? Data should just be there.
“It's on us as storage providers to build the platform that mirrors how they think and work with their data instead of making them hack around constraints," says Oostveen.
The mission: a storage solution that scales lightning-fast with a jumble of workloads, FlashBlade fits the bill by smashing through copy bottlenecks.
“Fast shared storage is important here because it eliminates those copy steps…this is what a flash blade is designed to do,” Oostveen adds.
Metadata mayhem: AI's hidden complexity
AI's data demands go deeper than sheer volume. Oostveen outlines the challenge: “This is where we get into metadata and objects. So, if you want to effectively manage metadata and the objects to create an effective AI model, you need to bring many components together. You also need to ensure that the model is going to make accurate predictions,” says Oostveen.
Customers grapple with organizing, annotating, and manipulating that metadata. Think of bounding boxes, those outlines you see in AI-powered videos. "And a bounding box is a really crucial piece of metadata,” Oostveen notes.
“So, it's got spatial coordinates of the object so the machine knows where it is in the 3D realm of the image," Oostveen adds.
It’s not a single file. You need to store the metadata, including annotations, separately from the raw video file, and you must do it efficiently.
This type of application screams for a platform built to handle billions of tiny objects—a FlashBlade forte.
The scaling nightmare (and how to beat it)
The real headache in AI isn't just its scope; it's ensuring models grow correctly. "We call it 'checkpoints at scale.' With each advancement you make in the model, you create a new checkpoint. And as the model evolves, the amount of variables and parameters increases exponentially," Oostveen says.
He warns of the choking effect on legacy storage systems as data floods in, leading AI teams to make crippling design compromises. The inadequate storage infrastructure leads the AI team to compromise how they create the models.
“Quite often, it leads to terrible problems around projects taking too long and costing too much,” says Oostveen.
Enter FlashBlade, which is designed to enable model scaling at breakneck speeds. Pure Storage also touts Fusion, a way to bring cloud-like simplicity on-premise for cost-conscious operations.
The next AI frontier could well be Edge AI, as models continue to become more sophisticated and AI teams will want the processing done closer to the data sources.
Here, Oostveen pointed to Pure Storage’s development of smaller, edge solutions that can be managed centrally alongside larger infrastructures.
Wide partnerships
Oostveen is refreshingly candid about the breadth of the AI challenge. That's where partnerships rule. Pure Storage leans heavily on heavyweights like NVIDIA and Cisco, offering integrated solutions geared for AI.
One such development is AIRI. Standing for AI-ready Infrastructure, it is a certified NVIDIA DGX BasePOD full-stack solution that simplifies AI deployment and scales quickly and efficiently to keep your data teams focused on delivering valuable insights, not managing IT.
“It is a turnkey converged infrastructure system that is the fastest and easiest way to [operate] an on-premises AI platform,” explains Oostveen.
“We don't do these things in isolation because customers don't do machine learning in isolation. Besides, there's an expectation that if you're a leading vendor, you'll have great partnerships,” he adds.
The road ahead—buckle up
The AI revolution? It's just left the starting line. Storage is where the rubber meets the road—it's the fuel that will push AI from concept to world-changer. Companies like Pure Storage aren't just innovating; they're building the high-octane infrastructure that will take AI to the next level.
Oostveen's upfront about it: cutting-edge tech and partnerships are just the beginning. Getting AI engineers and infrastructure teams in sync is where things get tricky. A data scientist doesn't necessarily speak the same language as someone who lives and breathes storage.
But here's the thing: bad storage doesn't just annoy data scientists. It hemorrhages money, kills innovation, and makes those green targets a pipe dream. That's a wake-up call no company can afford to ignore.
The bottom line? AI isn't a solo act. Winning in this space means the model is only as good as the blazing-fast storage feeding it. It’s why Pure Storage is looking to lead the charge.
Image credit: iStockphoto/Ihor Lukianenko
Winston Thomas
Winston Thomas is the editor-in-chief of CDOTrends. He likes to piece together the weird and wondering tech puzzle for readers and identify groundbreaking business models led by tech while waiting for the singularity.