OpenAI's Sora joins text-to-video AI content generation race

OpenAI’s Sora joins text-to-video AI content generation race

Posted on

OpenAI today announced Sora, a new text-to-video model that can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.

Text-to-video is arguably the next big thing in artificial intelligence and OpenAI isn’t the first to the party. Meta Platofrms Inc., Google LLC and Runway AI Inc., among others, also offer similar services. The challenge with all the services has been quality: Though the videos from some existing services make are highly impressive, the Holy Grail is making realistic videos, and not all get that close.

Sora is a diffusion model, a generative machine learning model that creates data such as images or videos by gradually refining random noise into structured patterns based on learned data distributions. Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model also understands not only what the user has asked for in the prompt but also how those things exist in the physical world.

According to OpenAI, the model has a deep understanding of language, enabling it to interpret prompts accurately and generate “compelling characters that express vibrant emotions.” The service can also create multiple shots within a single generated video that accurately portray characters and visual style.

To its credit, OpenAI has been open about the model’s flaws as well. Sora, at least as it stands in testing, has weaknesses, including issues with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect. The model may also confuse spatial details of a prompt, for example mixing up left and right, and may struggle with precise descriptions of events that take place over time, such as following a specific camera trajectory.

Those flaws are an issue, but the model is young and some of the first demonstrations are stunning.

The video above was made using the prompt, “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”

Although Sora looks great, ChatGPT users will have to wait to get their hands on it. As of today, Sora is only being released to available “red teamers” to assess critical areas for harm or risks. OpenAI is also granting access to a number of visual artists, designers and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.

“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon,” OpenAI said.

Image: OpenAI

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *