While the whole world is waiting for the launch of SORA AI, this AI model has taken the world by storm lately with its impressive capabilities. This model is even up for open access and many developers are stating it produces far better videos than SORA.
Highlights:
- Chinese video platform company Kuaishou announces Kling, a powerful text-to-video generative AI model.
- Built upon the diffusion architecture with powerful 3D VAE technology.
- Can generate videos for up to 2 minutes while capturing bodily movements, video outputs, aspect ratios and much more.
Kling AI: SORA’s Chinese Competitor
On 7th June 2024, the Chinese AI company Kuaishou announced their latest text-to-video generating model called Kling AI.
The Kuaishou Big Model Team created Kling, a model for creating videos. Its strong video creation features enable users to quickly and simply produce artistic videos. This AI model is very impressive and can generate videos of up to 2 minutes!
It distinguishes itself by accurately reproducing real-world physics while producing two-minute films in pristine 1080p resolution at 30 frames per second. We know many text-to-video-generating AI models are already present out there, but it is the physical simulations that catch our eyes.
Catching up to dynamic real-time simulations in today’s generative world is not an easy task. SORA AI showed us how perfectly it was trained to be efficient in replicating these mechanisms, and now Kling AI is also doing the same.
Here’s a video generated by Kling where you can see a Chinese man sitting at a table and eating noodles with chopsticks:
Sora by OpenAI is insane.
— Angry Tom (@AngryTomtweets) June 6, 2024
But KWAI just dropped a Sora-like model called KLING, and people are going crazy over it.
Here are 10 wild examples you don't want to miss:
1. A Chinese man sits at a table and eats noodles with chopstickspic.twitter.com/MIV5IP3fyQ
This doesn’t at all feel generated to me! Look at the movements and expressions, it’s as if the man was recorded! Kling AI is doing wonders.
Notably, it is not the first video generation model attempt by China. Vidu AI, the nation’s first Sora version, created a stir earlier this year when it was able to produce 16-second films in crystal-clear 1080p.
China’s AI revolution is gaining momentum, with Kling in the forefront, and rivals are finding it difficult to keep up with this quickly changing environment. It will be interesting to see how the competition between Kling and SORA AI unfolds.
How To Access Kling AI?
Despite rumours of the model being up for open access, there is no public information as to how you can access Kling’s Video Generating Model. It’s reportedly available for invited beta testers via the Kwaiying (KwaiCut) app as a demo, with possible free access to the model coming in the near future.
Here’s what you can do instead:
- Download the Kwaiying (KwaiCut) mobile app on the Play Store or App Store.
- The app’s interface is in Chinese language, so be ready to use some translators.
- Check out the Kling AI video creation tool on the app. It’s good if you can use this feature. If not, select “Beta Testing Access” from your profile options.
By doing this you can request access to Kling using the Mobile App, but you can also request access using your email by sending an email to this ID: [email protected]. You must include your profile information and a brief explanation of your interest in testing this model as a beta tester.
Kling’s Model Architecture
Kling’s Model Architecture is quite intricate and yet simple. Kling creates vibrant scenarios by utilizing the Diffusion Transformer architecture to transform rich textual prompts. It produces immersive visual experiences.
Thus, here we go again. This is another generative AI model built upon the diffusion architecture. SORA AI was also built upon a diffusion model and Stable Video 3D, was also built upon Stable Video Diffusion, which is also a diffusion architecture model.
Using a single full-body shot, KLING’s superior 3D face and body reconstruction technology can achieve full expression and limb movement drive thanks to its patented 3D VAE and variable resolution training support for different aspect ratios.
Thus, this is also a model that can adjust to different aspect ratios and movie/image qualities. This change in regulation allows for the generation of videos in different styles and environments, with grand scenes and images.
Kling’s Mind-Blowing Video Outputs
Here are some of the impressive features of Kling’s Video Generating AI Model. Let’s look into them:
1) Lifelike Expressions
Kling’s technology allows for accurate mimicking of lifelike expressions and body movements. This makes the objects look more realistic as if they were imported from real time.
This is all thanks to the 3D VAE and variable resolution methodology, which makes this model add life to almost any type of object in any environment.
Look at this video obtained from Kling, where you can see a boy enjoying his hamburger and closing his eyes to enjoy the taste. This moment is surreal and what’s even more impressive is that Kling perfectly captures the facial movements and perfectly illustrates the emotions.
7.
— Rowan Cheung (@rowancheung) June 6, 2024
A Chinese boy wearing glasses enjoys a delicious cheeseburger with his eyes closed in a fast food restaurant pic.twitter.com/2x8SirLpFY
2) Bodily Movements
Full-drive technology for facial expressions and limbs is realized using self-developed 3D face and body reconstruction technology along with backdrop stability and redirection modules.
All it takes for Kling AI to enjoy the lively “singing and dancing” gameplay is a full-body shot. Kling attaches template actions to your input images and gives life to the image object for a particular scene.
Take a look at this video created by Kling where you can see a Panda playing the guitar in a highly peaceful composure. It almost looks human-like when it plays the guitar. Are you able to distinguish? I can’t.
7. Panda playing the guitarpic.twitter.com/6KwWrUdpwI
— Angry Tom (@AngryTomtweets) June 6, 2024
Thus, you can upload the full body image of your favourite object, back it up with a prompt, and then you can see the object singing and dancing in a highly fashionable manner!
3) Strong concept combination ability
Based on a deep understanding of text-video semantics and the powerful capabilities of the Diffusion Transformer architecture, KeLing is able to transform users’ rich imaginations into attractive videos.
You can think of unrealistic situations and have Kling carve them out for you, out of nowhere!
See this video below generated by Kling’s AI model, a very unrealistic situation of a cat driving down the streets of a busy city. This video looks so real!
5.
— Rowan Cheung (@rowancheung) June 6, 2024
A white cat driving in a car through a busy downtown street with tall buildings and pedestrians in the background pic.twitter.com/HvRgJ2PYWK
4) Large-scale reasonable exercise
KeLing adopts a 3D spatiotemporal joint attention mechanism, which can better model complex spatiotemporal motion and generate video content with larger movements while conforming to the natural mechanisms of the real physical world.
You can ask for whatever mechanisms you want, Kling will provide that with ease, while making it look real and natural. The body movements can be of any range and size, depending on your needs.
Here’s a video of a man dusking into the sunset while riding on his horse. This video was generated by Kling, and the mechanics look too good. It’s almost as if it’s from a movie, which was shot live.
5. A man riding a horse through the Gobi Desert with a beautiful sunset behind him, movie quality.pic.twitter.com/PAerK5ShCT
— Angry Tom (@AngryTomtweets) June 6, 2024
5) 2 Minute Videos
Lastly here comes perhaps the most impressive feature. You can generate videos of up to 2 minutes. 2 minutes is too long for a generative AI model. Even SORA-created videos of only a minute long.
Take a look at this video of a boy cycling generated by Kling. We have to give Kling credit for producing different scenes and environments for the whole duration of the boy cycling. It almost feels like the AI model won’t run out of ideas as to how it can extend the video scenes.
6. Little boy riding his bike in the garden through the changing seasons of fall, winter, spring and summer.pic.twitter.com/LY8Wfvs3Po
— Angry Tom (@AngryTomtweets) June 6, 2024
Conclusion
Kling’s 3D VAE and aspect ratio capabilities make it highly demanding. Although SORA may be launched by the end of this year, developers are starting to feel OpenAI is falling behind following the release of Vidu and now Kling. Kling will take the video generation industry to the next level!