Google VideoPoet: An AI Tool That Crafts Videos from Text Input

Mukund Kapoor
By Mukund Kapoor - Author 2 Min Read
2 Min Read

VideoPoet excels in tasks like text-to-video, image-to-video, and video-to-audio conversions.

In Short
  • VideoPoet, Google's new AI tool, creates videos from text inputs.
  • VideoPoet is preferred for its accurate text fidelity and engaging motion production in videos.

December 23, 2023: Google’s software engineers, Dan Kondratyuk and David Ross, have recently introduced an innovative tool named VideoPoet, which is set to change the world of AI video generation.


This new tool, based on a large language model (LLM), can perform a range of video generation tasks, including text-to-video, image-to-video, video stylization, and even video-to-audio conversions.

VideoPoet stands out in its field by integrating various video generation capabilities into a single LLM, unlike other models, which rely on separate components for each task.

This integration allows for more seamless and coherent video creation, especially in tasks involving large motions, which has been a challenge for current models.

One of the key features of VideoPoet is its ability to animate still images and edit videos for tasks like inpainting, outpainting, and stylization.

For example, it can take a static image of a ship at sea and animate it to show the ship navigating through a thunderstorm. This capability is enhanced by the use of text prompts, which guide the motion and style of the generated videos.

videopoet example videos

The model’s training and inference inputs and outputs across different tasks are particularly intriguing.

VideoPoet uses multiple tokenizers (MAGVIT V2 for video and image, and SoundStream for audio) to convert various modalities into tokens and vice versa.

This process enables the model to generate tokens based on context, which are then converted back into a viewable representation.

VideoPoet has also shown promise in generating longer videos maintaining the appearance and consistency of objects over several iterations. Additionally, the model can interactively edit existing video clips, allowing users to change the motion of objects within a video.

The evaluation results of VideoPoet are equally impressive. In terms of text fidelity and motion interestingness, VideoPoet was preferred over competing models, showcasing its ability to follow prompts and produce interesting motions accurately.

For those interested in seeing more examples of VideoPoet’s capabilities, a demo is available on their website.


Based on our quality standards, we deliver this website’s content transparently. Our goal is to give readers accurate and complete information. Check our News section for latest news. To stay in the loop with our latest posts follow us on Facebook, Twitter and Instagram

Subscribe to our Daily Newsletter to join our growing community and if you wish to share feedback or have any inquiries, please feel free to Contact Us. If you want to know more about us, check out our Disclaimer, and Editorial Policy.

By Mukund Kapoor Author
Mukund Kapoor, the enthusiastic author and creator of GreatAIPrompts, is driven by his passion for all things AI. With a special knack for simplifying complex AI concepts, he's committed to helping readers of all levels - be it beginners or experts - navigate the intriguing world of artificial intelligence. Through GreatAIPrompts, Mukund ensures that readers always have access to the most recent and relevant AI news, tools, and insights. His dedication to quality, accuracy, and clarity is what sets his blog apart, making it a reliable go-to source for anyone interested in unlocking the potential of AI. For more information visit Author Bio.
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *