Google’s Incredible AI Photo Upscaling Tech

Researchers from the Google AI team have unveiled a new AI photo upscaling technology that can dramatically improve low-resolution images to transform them into incredible high-resolution photographs.

Google says the technology can help restore old family photos or improve medical imaging systems. Indeed, the resulting images were so compelling that photography and camera news site PetaPixel called the results “jaw-dropping”.

Diffusion models

The researchers adopted an approach known as diffusion models, first proposed in 2015. However, it took a backseat to deep generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and autoregressive models.

In a blog post titled “High fidelity image generation using diffusion models” on the Google AI Blog, research scientist Jonathan Ho and software engineer Chitwan Saharia from Google Research observed that while image synthesis tasks are typically performed by deep generative models today, they suffer from various downsides.

GANs often suffer from unstable training and mode collapse while autoregressive models typically suffer from slow synthesis speed they said, noting that diffusion models offer potentially favorable trade-offs compared to other types of deep generative models.

“Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process.”

“Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples,” they explained.


An image processed through Google's SR3 technique

Understanding SR3

A combination of two different techniques was used to achieve the outcomes: Super-Resolution via Repeated Refinements (SR3) and a model for class-conditioned synthesis known as Cascaded Diffusion Models (CDM).

The team trained the SR3 model using an image corruption process. Noise is progressively added to a high-resolution image until only pure noise remains. The model then learns to reverse this process through the guidance of low-resolution images.

According to Google, SR3 achieved strong benchmark results for human and natural images when going from 4x to 8x that of the input low-resolution image. Notably, the models can be cascaded using CDM to increase effectiveness. This means that a 64x64 resolution image can effectively be scaled to 1024x1024 resolution by first progressing it to 256x256 resolution.

When SR3 is used with class-conditional image generation, the Google team introduced a new data augmentation technique that it says further improves the sample quality results of CDM. Known as conditioning augmentation, methods used include Gaussian noise and Gaussian blur and result in better higher resolution sample quality.

Better than existing methods

The resulting image isn’t perfect by any means. Based on sample images published by Google, one notable weakness is gaps in spectacle frames or missing frames. However, when measured in a forced-choice experiment performed by human volunteers, SR3 outperformed existing state-of-the-art face-centric super-resolution methods such as PULSE and FSRGAN.

“With SR3 and CDM, we have pushed the performance of diffusion models to state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks. We are excited to further test the limits of diffusion models for a wide variety of generative modeling problems,” wrote the authors.

Additional information about the team’s work along with multiple sample images can be accessed here and here.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/JensHN