Everyone out there is talking about Sora’s ability to generate photorealistic videos from text prompts and how it will change the future. However, one of its features isn’t being talked about much. Not just text, you can input videos and images as prompts to create AI videos. Yes, you read it right. So, let’s learn how to do it!
Sora allows Images and Videos as Prompts
When Sora was unveiled at first, everyone knew this would have a similar effect to ChatGPT on various industries. Just write down what type of video you want and get a video showcasing your thoughts instantly. This is awesome but there is more.
Along with text prompts, Sora can also generate videos from images and videos as input. Users will be able to use pre-existing images and videos on their local system to generate AI-based videos.
The official technical report says that it can also perform various image and video editing tasks such as creating perfectly looping videos, animating static images, extending videos forwards or backward in time, etc. Here’s what else this cutting-edge technology can do.
Static Image Animation
As mentioned before, Sora takes image inputs to generate videos. You can first generate images using OpenAI’s DALL.E and then prompt it with a text on what you want to do with the image.
Here is an example (via OpenAI):
Extending Generated Videos
Did you think Sora is limited to Video Generation? Well, you are wrong. Sora has made another breakthrough in the world of AI and videos. It has introduced the feature of extension of generated videos.
Users can get their videos extended backwards or forward, based on the subject of reference and real-world simulation. This technology works on the principle of creating a seamless infinite loop, which allows all segments of a generated video to have a similar ending time irrespective of different starting points.
Editing Input Videos & Connection
OpenAI has developed Sora as a diffusion model which allows it to edit images and videos from input prompts. For this, it utilizes SDEdit Guided image synthesis and editing with stochastic differential equations.
You can use the tool to change the environments and styles of input films with this approach. Zero-shot is a deep learning technique that involves a learner (Sora) observing samples from classes (input videos and images) that were not observed during training.
Here is an example (via OpenAI):
One of Sora’s breathtaking features has to be the interpolation of multiple videos. This feature transcends technical limits, as it seamlessly combines different videos irrespective of differing points of time and references.
Image Generation
Last, but not least Sora is also enabled to generate images. This feature came into the limelight just a while ago when Google’s Gemini got the Image Generation Update. OpenAI keeps the competition alive as it brings this feature into play.
Sora can create images by arranging patches of Gaussian noise in a spatial grid with a temporal extent of one frame. The output images can be up to up to 2048×2048 resolution.
This revolutionary tool was made available to a limited number of people at first, but soon Sora will be accessible to everyone as part of their subscription plan.
Conclusion
With Sora’s technical innovations, the world of AI just keeps getting better. This is the first time that we are witnessing Generative AI in this new format, so we must always give it time to advance more. Of course, there are some ethical concerns associated with Sora. For now, what we know is that we can generate videos from almost any multimedia source. What’s next? Stay tuned for more updates!