Google follows Meta in introducing text-to-video AI

Google follows Meta in introducing text-to-video AI

Posted on

Researchers at Google LLC’s AI lab, Google Brain, unveiled Imagen Video today, a program that can create high-quality videos from text, similar to what Meta Platforms Inc. introduced around a week ago.

“We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models,” said the researchers. “Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models.”

The generator will produce 1280×768 HD video at 24 frames per second. This is something currently in the development stage, but it’s already quite the step up from Google’s text-to-image generation model DALL-E, which was debuted earlier this year. With that, if you said you wanted to see a still frame of a spaceman riding a horse, you could, and now it seems you can have your astronaut/horse team galloping through space.

To program the video generator, Google let it look at a vast range of videos and still images, each labeled with some text. So, when text is later inputted, the generator replicates the videos and images it has seen in the past as a synthesis of the data. 14 million videos and 60 million still images, as well as 400 million images in the LAION-400M open dataset, were used for the AI’s training. Google showed some examples, such as a panda eating and a teddy bear doing various things.

Google said it realized that there are always dangers in video manipulation technology, such as when people create what has come to be known as deep fakes. Such technology is already a problem, but as systems advance, society may have quite a problem on its hands.

“Video generative models can be used to positively impact society, for example, by amplifying and augmenting human creativity,” the company said.  “However, these generative models may also be misused, for example, to generate fake, hateful, explicit or harmful content. We have taken multiple steps to minimize these concerns, for example, in internal trials, we apply input text prompt filtering, and output video content filtering.”

Image: Google

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *