A.I. Digital Experience

OpenAI Pushes Envelope Again With Sora Video Model

By Paul Mah
February 21, 2024

OpenAI last week unveiled Sora, a text-to-video AI model with the potential to upend the advertising and the video industry with its ability to generate photorealistic, high-resolution videos of up to a minute.

Earth-shattering capabilities

Built on past research in DALL-E and GPT models, Sora is a diffusion model and uses a transformer architecture. Videos and images are represented as collections of smaller units of data called patches, each of which is akin to a token in GPT.

Under the hood, Sora offers a range of jaw-dropping capabilities. Beyond the ability to generate a video solely from text instructions, it can take an existing still image and generate a video from it, or animate the contents of a still image.

Moreover, the model can take an existing video and extend it, fill in missing frames, or create multiple shots within a single generated video and accurately reflect the same characters and visual style.

“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” says OpenAI researchers.

Room for improvement

There is room for improvement. According to OpenAI, Sora may struggle to simulate the physics of a complex scene and may not understand specific instances of cause and effect. For instance, a video of a man taking a bite out of a cookie might later show the cookie not having a bite mark.

OpenAI says it is currently building tools to help detect misleading content with a detection classifier to tell when a video was generated by Sora.

OpenAI says it plans to include Coalition for Content Provenance and Authenticity (C2PA) metadata in the future should Sora be deployed in a product. The C2PA standard is an open technical standard to certify the source and provenance of media content.

C2PA increases the size of the resulting media file slightly and is currently implemented in OpenAI’s DALL-E 3 text-to-image model. Users can use sites like Content Credentials Verify to check the origins of an image or video.

In closed testing

Sora is capable of generating images, and OpenAI also thinks it exhibits capabilities suitable for use to simulate digital worlds, too.

“[The capabilities] suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” wrote the OpenAI team in its technical report.

While OpenAI posted dozens of high-resolution videos it says are generated from Sora, the text-to-video AI model is currently only available to researchers to assess “critical areas for harms or risks” and a select group from the video industry.

“We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”

You can read more about Sora here.

Image credit: OpenAI (Still shot of video)

Paul Mah

Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.

OpenAI Pushes Envelope Again With Sora Video Model

Earth-shattering capabilities

Related

OpenAI’s GPT-4 Turbo Update Fixes Laziness Problem

OpenAI Launches GPT Store

Room for improvement

In closed testing

Paul Mah

Recommended Stories

Your Customer Service Sucks, and It's Costing You Billions

The Film Industry Is a Big Fish For AI but Needs an Olive Branch

Forget Tinder, This Power Couple is Building the Future of AI

Overcoming the Top Barriers to GenAI Adoption in the Enterprise

Malaysia Plugs Into the Global Grid

Recommended Whitepapers

Are You Data and AI Ready?

AI for IT Leaders: Deploying a Future-Proof IT Infrastructure

Operationalizing ML Models for DevOps and ML Engineers

Advance Your Business With AI/ML

Top Considerations for Building a Production-Ready AI/ML Environment