At the Google I/O 2024 developer conference, Google introduced Veo, its generative text-to-video competitor to OpenAI’s Sora. Let’s find out more about it!
Highlights
- Veo is Google DeepMind’s text-to-video generative model that can create high-quality, 1080p videos over 60 seconds long.
- It can seamlessly blend text prompts with reference images to generate videos that follow both inputs.
- It supports video editing by incorporating text instructions, including masked editing for specific areas.
Introducing Veo by Google
Veo is Google DeepMind’s text-to-video generative model, setting a new benchmark in the field of video generation. Veo boasts the capability to generate high-quality, 1080p resolution videos lasting over a minute, spanning a diverse range of cinematic and visual styles.
Along with video creation, it is also able to edit existing videos by incorporating text-based instructions thus modifying the videos as per the needs of the user.
Veo’s versatility extends to generating videos using both images and text prompts. By inputting a reference image alongside a text prompt, It seamlessly blends the visual style of the image with the instructions from the prompt, producing a breathtaking video that is based on both inputs.
To enhance Veo’s ability to comprehend and adhere to prompts precisely, Google DeepMind enriched the training data with more detailed video captions. Additionally, the model utilizes high-quality, compressed video representations known as latent which will help to boost efficiency. These measures collectively improve overall video quality and reduce generation time.
Versatile Features of Veo
Veo uses advanced natural language processing and visual semantics to accurately capture the details and tones specified in text prompts, rendering intricate details within complex scenes. It offers creative control, comprehending prompts for various cinematic effects, such as time-lapses, close-ups, or aerial shots of landscapes.
Time-lapse:
✍️ Prompt: “Timelapse of a water lily opening, dark background.” pic.twitter.com/t5uLQ89E1Y
— Google DeepMind (@GoogleDeepMind) May 14, 2024
Close-up:
✍️ Prompt: “Extreme close-up of chicken and green pepper kebabs grilling on a barbeque with flames. Shallow focus and light smoke. vivid colours.” pic.twitter.com/LDHC8XGyJA
— Google DeepMind (@GoogleDeepMind) May 14, 2024
Veo’s cutting-edge technology extends beyond generating videos from scratch. It can seamlessly edit existing videos by incorporating text-based instructions, including adding or modifying specific elements within a scene.
Additionally, It supports masked editing, enabling targeted changes within designated areas of the video. The example below shows how the videos can be edited as per requirements.
Initial: Prompt: Drone shot along the Hawaii jungle coastline, sunny day
2. Prompt: Drone shot along the Hawaii jungle coastline, sunny day pic.twitter.com/2yU7h8BSSL
— Jv Shah (@JvShah124) May 15, 2024
New: Drone shot along the Hawaii jungle coastline, sunny day. Kayaks in the water
11) Drone shot along the Hawaii jungle coastline, sunny day. Kayaks in the water pic.twitter.com/xyymAC0aXI
— Allen T (@Mr_AllenT) May 14, 2024
Veo’s advanced latent diffusion transformers address the problem of visual consistency and fluidity throughout the generated videos, preventing flickering, jumping, or morphing of characters, objects, and styles between frames, thereby enhancing the overall viewing experience.
It can generate video clips exceeding 60 seconds, either from a single prompt or by stitching together a sequence of prompts that collectively narrate a story.
Using its remarkable capabilities, It aims to democratize video production, empowering seasoned filmmakers, aspiring creators, and educators alike to unleash their storytelling potential and share knowledge through captivating visuals.
The tweet below shows how filmmakers can use Veo to bring ideas to life that would otherwise not be possible to implement.
We put our cutting-edge video generation model Veo in the hands of filmmaker @DonaldGlover and his creative studio, Gilga.
— Google DeepMind (@GoogleDeepMind) May 14, 2024
Let’s take a look. ↓ #GoogleIO pic.twitter.com/oNLDq1YlHC
How to access Veo?
Like OpenAI’s Sora, Google’s Veo is not available to the public just yet. Currently, it is being shared with a select number of creators in a private preview inside VideoFX, their new experimental tool. Users can join a waitlist if they are interested in trying out Veo’s capabilities. Click here to apply for access to Veo.
Once you click on the signup button, you will be redirected here. Users can now join the waitlist on Google Labs to try some of its features in VideoFX.
Once you click on Sign in with Google, it will redirect you to login to Google Labs.
After this, you can join the waitlist by filling out the Google Form. For now, Veo is only available in a few countries. You can search for your country in the dropdown provided to know whether it is available or not.
Conclusion
With its impressive capabilities, Veo is shaping up to be a strong contender to OpenAI’s groundbreaking text-to-video model Sora. It aims to empower creators and educators and its potential to democratize video production is highly anticipated.