Nvidia Shows Off Instant 2D to 3D AI Conversion

Researchers at Nvidia have successfully turned a collection of 2D images into a digital 3D scene within seconds, which the technology company demonstrated late last month at its Nvidia GTC AI conference for developers.

3D images from photos

The process of reconstructing a 3D scene from a series of 2D images itself is hardly new and was first shared publicly in 2020.

The technique, known as NeRF, or Neural Radiance Fields, took up to 12 hours to generate a scene, and minutes to render a single frame (More about NeRF here).

The improved technique uses neural networks to represent and render realistic 3D scenes in “tens of milliseconds” after just seconds of training. The process uses AI to approximate how light behaves in the real world, enabling researchers to reconstruct a 3D scene from a handful of 2D images taken at different angles.

Nvidia says its “Instant NeRF” technique is the fastest to date and achieved more than 1,000 times speedups in some cases over standard NeRF.

“Instant NeRF… cuts rendering time by several orders of magnitude. It relies on a technique developed by NVIDIA called multi-resolution hash grid encoding, which is optimized to run efficiently on Nvidia GPUs. Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly.”

To be clear, NeRF requires dozens of images taken from multiple positions around the scene, including the precise camera positions of each shot. And if there is too much motion during the 2D image capture process, the AI-generated 3D scene will similarly end up blurry.

But by predicting the color of light radiating in any direction, the technique is also capable of working around occlusions in the form of pillars or other objects that may be in the way. And because multiple frames can be generated each second, instant NeRF should in theory allow the creation of 3D worlds that can be navigated in real-time.

“The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them. It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on,” said Nvidia in a blog post.

Researchers are currently exploring this technique for possible use to accelerate other AI challenges such as reinforcement learning, language translation, and general-purpose deep learning.

Image credit: iStockphoto/Joel Papalini