OpenAI just announced ChatGPT’s new real-time conversational chat! It can now understand both audio and video. With these new advancements, it can tell us how you are feeling through your facial expressions or adjust its audio tone according to the user’s current emotional state. Let’s discuss such amazing features added to ChatGPT!
6 New Audiovisual Advancements to ChatGPT
In their official Spring Update streamed on X, OpenAI’s employees demonstrated the new voice and vision capabilities of ChatGPT. Let’s take a look at a few examples!
1) Provide Real-Time Advice
ChatGPT can provide users with advice in real time and help them prepare for different situations. It will guide you through each step like it is making conversation with you.
Mark Chen, a research lead at OpenAI, let ChatGPT know that he was doing a live demo and was feeling nervous. He asked it how he could calm his nerves. It told him to take a deep breath.
Live demo of GPT-4o realtime conversational speech pic.twitter.com/FON78LxAPL
— OpenAI (@OpenAI) May 13, 2024
Mark then took a deep breath in a hard and haphazard manner which ChatGPT recognized and let him know that he is not a vacuum cleaner. This showed that ChatGPT can identify when a user is doing something wrong thus highlighting its remarkable capabilities.
It then explained to him how to take a deep breath and asked him if he felt better. Mark performed the steps as instructed, felt a lot better, and thanked GPT for the same.
This example shows how ChatGPT can help users in difficult situations by providing them with advice about how to approach a situation better. It also guides users on how to perform certain tasks step-by-step.
2) Understand Emotions
ChatGPT is now capable of understanding emotions as well. Mark let it know that his fellow research lead Barret Zoph was having a hard time sleeping and asked it to tell him a bedtime story about robots and love.
Live demo of GPT-4o voice variation pic.twitter.com/b7lLJkhBt1
— OpenAI (@OpenAI) May 13, 2024
ChatGPT then narrated the story in a boring and less-expressive manner to which they asked it to use some more emotion and drama.
Barret then asked it to generate even more emotion to the maximum limit much more than it was doing before. It then used a lot of emotions to narrate this story. CTO Mira Murati then asked it to narrate the story in a robotic tone which it successfully did. It was also able to narrate it in a singing tone.
This shows ChatGPT’s new emotional and sentimental capabilities to change its tone based on the situation. It can alter its tone as per the requirement. For example, it can use a childish tone when speaking to a child or use a more serious tone when narrating a news article.
3) Prompt with Live Videos
You can also interact with ChatGPT using videos now. Barret asked ChatGPT to help him with a linear equation that he wrote down on paper. He asked it only for hints and not the final solution.
Live demo of GPT-4o vision pic.twitter.com/m7iyixdTLY
— OpenAI (@OpenAI) May 13, 2024
He noted down the question and asked ChatGPT the equation which it correctly understood. This highlighted the upgraded vision capabilities as it was able to grasp what was written through the real-time video provided by the user.
It then started giving all the steps to Barret. Barret also acted confused in order to test its mathematical skills but it correctly guided him.
Mark then told ChatGPT that he was weak at linear equations and asked if it had to be used in the real world. It then gave him some real-world scenarios where linear equations are used. They were really happy with the accurate responses.
This showed how it was able to follow instructions and help the user solve the mathematical problem. It also demonstrated the real-world application of the question asked. It also showed that ChatGPT did not get confused when the user was trying to test its capabilities.
Thus, ChatGPT’s new vision capability will be extremely useful to users when they want to chat with it using real-time videos.
4) Real-time help for Developers
ChatGPT can help developers with real-time coding problems using their Desktop app. It can hear the user but it can’t see any bit of code unless the code is highlighted. Once the code is highlighted it gets sent to it. Barret shared a code with ChatGPT and asked for a description of the code.
Live demo of coding assistance and desktop app pic.twitter.com/GlSPDLJYsZ
— OpenAI (@OpenAI) May 13, 2024
It then provided an accurate and concise description of the code along with appropriate answers to future questions.
ChatGPT can also see real-time data on the desktop using its vision capabilities. Once the user clicks on the desktop button, it can see what is on the screen. Barret presented it with a plot and it accurately described the plot in a simple manner.
This shows how ChatGPT can now help users with their coding problems in real-time. It allows users to share their code just by highlighting it or sharing their screen with the click of a button. This will improve the coding experience and help users solve their queries quickly and efficiently.
5) A Very Good Translator
ChatGPT can now be used as a translator with its real-time translation capabilities. Mark and Mira had a conversation in Italian and English and they asked ChatGPT to translate English to Italian and vice versa.
Live audience request for GPT-4o realtime translation pic.twitter.com/VSj5phFKM6
— OpenAI (@OpenAI) May 13, 2024
It excelled at this task and was able to perform all translations with flying colours.
6) Detect How You Are Feeling?
ChatGPT can also understand the sentiments of a user by looking at their face. Barret tried it out with a happy look and ChatGPT correctly guessed that Barret was happy and in a good mood.
Live audience request for GPT-4o vision capabilities pic.twitter.com/FPRXpZ2I9N
— OpenAI (@OpenAI) May 13, 2024
This shows that it can identify how a user is feeling along with the sentiments attached. This can help to improve a user’s mood as ChatGPT can assist the user to feel better by providing helpful advice.
OpenAI also released a few more use cases of these new capabilities in their official blog. These use cases include singing, interview preparation, math, teaching, games, real-time translations, jokes, customer service, and general knowledge. All these capabilities will be a part of OpenAI’s newly announced GPT-4o model.
Conclusion
The new voice and vision capabilities of ChatGPT aim to provide a personalized user experience like never before! The GPT-4o model will be available to all users including users subscribed to the free plan. Users will be avail of these features for their use in this new model.