Sora: OpenAI's New Text-to-Video Tool Will Disrupt The Industry

On 15^th February 2024, OpenAI made quite the announcement when it revealed Sora AI, its brand-new text-to-video AI model. Let’s explore this huge breakthrough in AI Technology and is this the next disruption after ChatGPT?

Highlights:

OpenAI announces Sora, their new text-to-video AI model.
It can generate complex video scenes with multiple characters, emulate real-world physics, and include subject matter.
Sora is still in the testing phase and is being rolled out to a select number of experts.

Sora’s Cutting-Edge Text-to-Video Technology

OpenAI launched its first Text-to-Video tool called Sora AI. This tool allows users to create realistic videos up to 60 seconds based on prompt inputs.

With the release of this new tool, Sam Altman’s OpenAI is laying the groundwork for creative artificial intelligence. With the use of written prompts and photographs, just like ChatGPT, users may make incredibly lifelike AI videos.

Here is their official announcement X, along with a sample prompt and the result they got from the Sora AI video generator:

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Sora takes its name from the Japanese word “Sora” which means sky. Its ability to generate realistic videos of up to 60 seconds based on following the receipt of user input regarding the topic and style of the video, remains unmatched. Everyone on social media is amazed by this revolutionary product that may have the same effect in the industry as ChatGPT in 2022.

How Does It Work Behind-the-Scenes?

Sora’s AI model is based on transformer architecture, along with the abilities of GPT and DALL.E 3 models. It can generate complex scenes surrounding multiple characters, their behavioral motion, and fine details of both the subject and the matter. It creates a video by first adding what appears to be static noise, then progressively altering it by taking out the noise in multiple steps.

According to their official announcement, it can produce several shots that faithfully maintain the visual style and characters in a single created video. They also added that Sora’s profound linguistic comprehension allows it to reliably decipher cues and produce engrossing characters that vividly convey emotions.

Apart from this, Sora can also generate videos from an already-existing still image and use it to create a new one, accurately and minutely animating the image’s contents. Additionally, the model can be used to expand or add frames to an already-existing video.

How to Access Sora AI?

Sora AI is still not made available to the general public. As of now, Sora is only being rolled out to their Red Teamers, a community of experts, and some visual artists, filmmakers, and designers. With these enthusiasts having their hands on the latest model, OpenAI aims to take feedback on Sora’s abilities before they take it to a much wider platform.

How Safe is the Model? Is it Ethical?

Sora AI does raise some ethical concerns. Fans and critics of artificial intelligence (AI) were quick to discuss the possibilities of this newest technology, despite initial concerns from others about how its accessibility would undermine human occupations and accelerate the spread of false information online.

With the rise of AI’s indulgence in the generation of fraudulent and explicit content which can be of much harm to several popular personas and firms around the globe, OpenAI took several important steps to ensure that Sora’s abilities revolve primarily around ensuring safety.

The red teamers who are given access to this AI video generator, are people with expertise in safeguarding the product from hate speech, and disinformation. They will conduct testing on the model before it will be made available to all the users.

Along with that, they are using the same safety methods used by DALL-E models. They are also developing tools to assist in identifying deceptive content, such as a detection classifier that can determine whether a movie was produced by Sora. If we use the model in an OpenAI product in the future, we intend to incorporate C2PA metadata.

Most importantly to involve politicians, educators, and artists globally to comprehend their apprehensions and ascertain constructive applications for this novel technology. It will also help in staying in line with the ever-updating norms surrounding safety.

Are there any Limitations?

OpenAI stated that Sora is yet in its initial developmental phase and still needs a lot of testing before it reaches a wider audience.

They admitted that their current model consists of several weaknesses one of them being lacking the accuracy in mimicking the physics of a complicated scene. It might also not be able to comprehend particular cases of cause and effect.

The model can also mess up in following intricate details of prompts such as directions and precise descriptions of events.

“The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.” – OpenAI states some limitations to the model in their announcement post.

Conclusion

Sora is without a doubt a big step in the advancement of AI Technology. This text-to-video generation technology lays a new foundation for upcoming AI models and simulation of real-world physics. As it is still in its testing phase, we will keep you updated and tell you more on our blog on how this model does in the long run.