Generative AI Art Gets Text Upgrade
- By Paul Mah
- May 10, 2023
Generative AI has made remarkable strides, showcasing its capabilities by winning an art competition and helping a head of design in revamping a hundred articles on his organization’s blog over a single weekend.
But if you have tried using them for yourself, you may have noticed that even the best text-to-image AI models often suffer from a persistent limitation – the inability to generate textual information within the produced images.
This is set to change with a new model from a research group DeepFloyd Lab that is backed by Stability AI.
DeepFloyd IF
As reported on TechCrunch, DeepFloyd IF is trained on a dataset of more than a billion images and text to “smartly” integrate text into images. With it, users can create an image from a prompt like “a teddy bear wearing a shirt that reads ‘Deep Floyd’.“
Unlike models such as DALL-E 2 and Stable Diffusion, DeepFloyd IF uses multiple different processes stacked together in a modular architecture to generate images, according to Angus Russell, the CEO of generative art platform NightCafe. It is understood that NightCafe was given early access to DeepFloyd IF.
So how is DeepFloyd IF different? For a start, it works directly with pixels, generating images at different resolutions, and then upscaling them. This approach is combined with a large language model to understand and represent complex prompts. The model is hence able to efficiently generate detailed images while accurately representing spatial relationships described in prompts.
Speaking to TechCrunch, Russell said he expects DeepFloyd IF to unlock a new wave of art such as logo designs, web designs, posters, billboards – and even memes.
“It’s also very good at generating legible and correctly spelled text in images, and can even understand prompts in multiple languages,” said Russell. “Of these capabilities, the ability to generate legible text in images is perhaps the biggest breakthrough to make DeepFloyd IF stand out from other algorithms.”
DeepFloyd IF is open source, though not licensed for commercial use at the moment. Running it requires a GPU with at least 16GB of RAM to produce a 256x256 image, or 24GB for a 1024x1024 image.
Learn more about DeepFloyd IF or download it from GitHub here.
Image credit: DeepFloyd Lab
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.