Boiling Point: The Race to Cool Down AI Heats Up
- By Winston Thomas
- July 08, 2024
Remember the hum of traditional air-cooled data centers? Chee Hoe Ling, vice president of product management at Vertiv Asia, believes it will soon be a relic of the past.
AI's insatiable thirst for computational power generates heat that air cooling simply can't handle. As AI workloads surge, pushing power densities to unprecedented levels, data centers are turning to a new paradigm: liquid cooling.
Chee puts it bluntly, "Liquid has become the most effective way of rejecting the key in the data center infrastructure point of view." With its better thermal conductivity, it offers a lifeline, enabling data centers to support the AI revolution without melting down — literally.
While the choice to go liquid may seem like a no-brainer, the implications can be seismic to the world of data center engineering.
AI's thirst for power: Challenge and opportunity
The rise of AI, particularly chips such as NVIDIA’s H100 GPUs, is creating shockwaves across the data center industry.
Chee notes that traditional power densities of 5-6 kilowatts per rack are dwarfed by AI workloads demanding 30-50 kilowatts or more. This exponential increase in power density generates an inferno of heat that traditional air cooling systems struggle to contain.
"With the acceleration of deployment and power requirement and the heat load coming from the GPU, liquid has become the most effective way of rejecting the heat from the data center infrastructure point of view,” Chee explains.
Two accepted approaches to liquid cooling are direct-to-chip (or direct-on-chip) and immersion cooling. The former pipes the liquid coolant to the chip and absorbs the heat, while the latter bathes the entire hardware. Choosing the right approach is creating a hot debate among data center engineers.
Each approach has strengths and weaknesses, and the optimal choice often depends on the specific application and environment. Direct-to-chip cooling offers unparalleled efficiency for high-density components like GPUs, while immersion cooling provides a more holistic solution for entire servers.
Data center engineers also need to work out if their outdated hardware is ready, if the economics is in their favor, if there’s enough space (immersion often sees racks laid out horizontally so they can take up more space), if servicing support is there, and if the floor of their data center can handle the extra weight of the liquid and tanks (a major issue in older buildings).
Chee sees data centers using a mix of these methods, but it's not a simple calculation. Climate plays a huge role. “For example, what we can deploy in Sweden may not be the best environment for Singapore,” he explains.
That’s why Vertiv is working with industry associations and the local academia to test and optimize the best solution, like its support for Singapore’s Sustainable Tropical Data Centre Testbed (STDCT) — a first-of-its-kind testing facility led by the National University of Singapore and Nanyang Technological University.
Retrofitting the past for the future
While Vertiv sees the need to rethink data center design for AI cooling needs, not everyone can fork out cash for a swanky new, liquid-cooled data center. Yet, retrofitting traditional data centers can also be a considerable challenge, putting many data center operators in a bind.
It’s not that data engineers do not understand that their existing facilities, designed for a pre-AI era, are ill-equipped to handle the demands of modern workloads. However, scrapping these investments and building anew is not viable for most enterprises that have invested in them over the past decade or are facing uncertain marketing dynamics.
It’s a problem that Vertiv is looking to fix. “An existing data center is always the most complex one [to fix],” Chee acknowledges. "But we can maximize or utilize as much as possible of the existing infrastructure.”
Vertiv offers a range of liquid cooling solutions designed to retrofit existing facilities, maximizing efficiency and minimizing disruption. Chee explains that Vertiv’s approach involves carefully assessing the existing infrastructure, identifying bottlenecks, and implementing targeted liquid cooling solutions to alleviate them.
Beyond heat dissipation, Vertiv is also helping enterprises reduce the environmental impact of data centers — a significant concern for governments tinkering with the energy moratoriums while companies are fretting over Scope 3 needs.
But liquid cooling is only one part of the energy puzzle. "We are focusing on driving the energy efficiency of all our products. For example, on the UPS side, we are rolling out the most efficient UPS efficiency into the marketplace,” says Chee.
It’s part of the company’s effort to optimize power conversion and push the extra capacity into the grid or look into introducing renewables into the data center design. This is important as AI workloads see power requirements fluctuate wildly.
“The data center is the most energy-intensive in a lot of countries…AI just makes it more intensive. A lot of countries are now thinking how we can support this.”
The future is liquid
The need for better cooling isn't going away. "AI power requirements are only going up," Chee predicts.
He believes liquid cooling offers a more straightforward path to sustainability by drastically cutting energy use and carbon footprints. He expects it to become the norm in data centers worldwide.
If Chee is correct, the days of noisy, energy-guzzling air conditioners could be over, replaced by the quiet hum of liquid-cooled servers.
Image credit: iStockphoto/Marcus Millo
Winston Thomas
Winston Thomas is the editor-in-chief of CDOTrends. He likes to piece together the weird and wondering tech puzzle for readers and identify groundbreaking business models led by tech while waiting for the singularity.