A.I. Machine Learning

Generative AI Art Gets Text Upgrade

By Paul Mah
May 10, 2023

Generative AI has made remarkable strides, showcasing its capabilities by winning an art competition and helping a head of design in revamping a hundred articles on his organization’s blog over a single weekend.

But if you have tried using them for yourself, you may have noticed that even the best text-to-image AI models often suffer from a persistent limitation – the inability to generate textual information within the produced images.

This is set to change with a new model from a research group DeepFloyd Lab that is backed by Stability AI.

DeepFloyd IF

As reported on TechCrunch, DeepFloyd IF is trained on a dataset of more than a billion images and text to “smartly” integrate text into images. With it, users can create an image from a prompt like “a teddy bear wearing a shirt that reads ‘Deep Floyd’.“

Unlike models such as DALL-E 2 and Stable Diffusion, DeepFloyd IF uses multiple different processes stacked together in a modular architecture to generate images, according to Angus Russell, the CEO of generative art platform NightCafe. It is understood that NightCafe was given early access to DeepFloyd IF.

So how is DeepFloyd IF different? For a start, it works directly with pixels, generating images at different resolutions, and then upscaling them. This approach is combined with a large language model to understand and represent complex prompts. The model is hence able to efficiently generate detailed images while accurately representing spatial relationships described in prompts.

Speaking to TechCrunch, Russell said he expects DeepFloyd IF to unlock a new wave of art such as logo designs, web designs, posters, billboards – and even memes.

“It’s also very good at generating legible and correctly spelled text in images, and can even understand prompts in multiple languages,” said Russell. “Of these capabilities, the ability to generate legible text in images is perhaps the biggest breakthrough to make DeepFloyd IF stand out from other algorithms.”

DeepFloyd IF is open source, though not licensed for commercial use at the moment. Running it requires a GPU with at least 16GB of RAM to produce a 256x256 image, or 24GB for a 1024x1024 image.

Learn more about DeepFloyd IF or download it from GitHub here.

Image credit: DeepFloyd Lab

Paul Mah

Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.

Generative AI Art Gets Text Upgrade

Related

AI Art Is Now a Thing

The Future of AI is Open-source as Dolly 2.0, RedPajama Released

Paul Mah

Recommended Stories

Microsoft Releases Phi-3 “Small Language Model”

The Clear And Present Danger of Open LLMs

NEC Claims Its New LLMs Are Faster Than GPT-4

Will RPA Platforms Remain Relevant? AI Agents May Hold the Answer

Meet Hadrian X, the Robot Bricklayer Disrupting Construction

Recommended Whitepapers

Are You Data and AI Ready?

AI for IT Leaders: Deploying a Future-Proof IT Infrastructure

Top 5 Considerations for Your AI/ML Platform

Operationalizing ML Models for DevOps and ML Engineers

Advance Your Business With AI/ML