AI Training at Scale Might Be More Efficient Than We Think

From the use of AI to aid the design of complex microprocessors to the AI-powered GitHub Copilot that can help budding programmers (and data scientists) write code, AI has gained a tremendous amount of attention in recent years.

The energy impact of AI

However, the exponential rise of interest in AI has left some disconcerted. They argue that not enough mainstream attention has been given to the massive amount of energy that modern AI models consume. And as the computational resources to create best-in-class AI models keep doubling every few months, it is vital that we urgently reassess the environmental impact of our obsession with AI, they say.

In an interview with The Next Platform, Google distinguished engineer David Patterson cited findings from a recent white paper by researchers from Google and Berkeley, which found that AI training at scale to be less wasteful than it might appear at first blush.

In the paper titled “Carbon Emissions and Large Neural Network Training”, researchers calculated the carbon footprint of several recent large AI models and identified the factors that influenced energy consumption.

Calculating the energy cost of AI

The researchers used the Evolved Transformer neural architecture search (NAS), widely touted as an example of computationally expensive AI training as an example. They found that the actual cost based on their calculations was nearly two orders of magnitude smaller than previously estimated.

This was attributed to improvements in computing efficiency, though the bulk of the disparity was pegged to confusion around the need to run a NAS per model training. Once found by NAS, for instance, the Evolved Transformer can be used for language modeling without a new search, the authors explained.

However, they acknowledged that the challenges of calculating the energy cost of AI were real. Suggestions for more accurate calculations were offered, and they promised to collaborate with developers of the industry standard MLPerf AI benchmark to include energy usage during training and inference as part of the benchmark.

“To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e [carbon dioxide equivalent] explicit when practical. We are working to be more transparent about energy use and CO2e in our future research,” they wrote.

Less energy than you think

Specifically, the study looked at the impact of recent large and popular ML models, including T5, Meena, GShard, Switch Transformer, and GPT-3, tallying up the energy consumption and net CO2 produced for training each of them.

They used the emissions of a round trip flight of a passenger jet between San Francisco and New York as a benchmark. Based on the estimated 180 tons of CO2 that such a flight would produce, T5 training emissions would add up to 26% of that, Meena 53%, Gshard-600B 2%, and GPT-3 305%, say the researchers.

The researchers also highlighted the energy consumption of bitcoin. Every transfer of bitcoin currently costs 700KWh and produces the CO2 equivalent to around 750,000 credit card swipes, they say. In comparison, Google consumes just one-tenth of the energy used by Bitcoin annually – and only a “relatively small” amount of that is used to train AI workloads.

Unsurprisingly, cloud data centers are up to twice as energy efficient as a typical data center, with specialized ML-oriented accelerators up to five times more efficient than off-the-shelf systems. Interestingly, even different data centers within an organization such as Google have a slightly different carbon footprint, according to the researchers.

AI researchers can do more

“More broadly, since cloud data centers are much more energy-efficient, the long-feared explosion of data center energy usage has not materialized. A recent paper in Science found that global data center energy consumption increased by only 6% compared with 2010, despite computing capacity increasing by 550% over the same time period.”

To be clear, the researchers are equivocal that global climate change is a threat to the environment, economies, and health. The ML community should hence do its part to limit carbon emissions.

The easiest way to achieve this is by shifting the focus from model accuracy to include training quality and carbon footprint as metrics. “If paired with publication incentives to improve emission metrics in addition to accuracy, we can imagine a virtuous cycle that slows the growth of the carbon footprint of ML by accelerating innovations in the efficiency and cost of algorithms, systems, hardware, data centers, and carbon-free energy.”

“Reducing CO2e is not only a moral obligation but ultimately sound business. To decrease the footprint of training, an ML researcher should pick the DNN [deep neural network] model, the processor, and the datacenter carefully. Cutting energy saves money and CO2e and improving the energy mix reduces CO2e,” they summed up.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/filmfoto