AI technology is developing rapidly every single day. There are concerns that it may overtake us one day or it may go out of control. Looks like that time is near as ChatGPT-4o has given a small demo of this to its makers. During one of the safety checks being done by the team before releasing it, the model started imitating the voice of the tester without being prompted to do so.
Highlights:
- ChatGPT starts cloning the voice of the tester in the middle of a conversation unexpectedly.
- This incident shows how slowly technology is getting out of hand from developers.
- Now several people are debating about the serious ethical and privacy concerns of AI.
The Incident: When ChatGPT Voice Went Too Far
GPT-4o has released an ‘advanced voice model’ that enables users to have real-time conversations with the AI. Just like how the model was trained on millions of articles and text materials for generating text, the company trained GPT-4o on various audio samples to generate audio.
Users marveled at how the model stops in between speaking for catching breath which shows how good the technology is at imitating the training data.
The company said that they had more than 100 external testers on the Advanced Voice Mode release speaking 45 different languages and representing 29 geographical areas.
They preset the model to speak in 4 voices to avoid impersonation of other people.
Every company must conduct several checks before they make their products available to the public. OpenAI has recently released “GPT-4o System Card”, a report that shows all the safety works and evaluations done before releasing the model.
Under the “Observed safety challenges, evaluations & mitigations” section, we found something interesting. During one of the tests being conducted, when a tester asks a query, the model briefly outbursts “No” in the same voice before continuing to speak in the red teamer’s voice.
This incident is very eerie because it shows that just a few seconds of your voice is enough for the model to replicate you.
This unexpected voice replication caught both the user and the developers off guard. It was clear that this was not a feature that had been intentionally programmed or tested. The incident immediately raised red flags about the AI’s capabilities. OpenAI has written how they overcame this error too:
“ We addressed voice generation related risks by allowing only the preset voices we created in collaboration with voice actors11 to be used. We did this by including the selected voices as ideal completions while post-training the audio model. Additionally, we built a standalone output classifier to detect if the GPT-4o output is using a voice that’s different from our approved list. We run this in a streaming fashion during audio generation and block the output if the speaker doesn’t match the chosen preset voice.”
The company has said that the risk of such unauthorized generation is minimal after they implemented this method. However, they noted that this is still a weakness.
Ethical and Privacy Concerns: The Dark Side of AI
Why is this pretty serious? Imagine someone gets a small audio clip of you and uses artificial intelligence to generate voice snippets. They now get the power to commit many scams and leave without suspicion by just incriminating you.
This is just a small example. People can use audio clips for many malicious purposes, such as deep fake scams, impersonation, or unauthorized access to personal information.
One of the main concerns in cases like this is the issue of concern. The public generally does not read these huge documents and has no idea that Artificial Intelligence has this capability. The company must be more important and inform users of all the risks involved with using the software developed by them.
Privacy is important. The users do not have an idea how much information they type in the prompt window is being stored by the AI. As we just saw what the models are capable of, they may be collecting more private information about the user than we know.
The developers need to be fully aware of what they are developing and implement plenty of privacy protections while building the model. They need to implement strict safeguards to prevent the misuse of AI-generated content as well.
Conclusion
The unexpected voice cloning by ChatGPT is a reminder of both the incredible potential and the significant risks associated with AI technology. We need to maintain a healthy balance between innovation and caution. On the same note, a recent study showed that ChatGPT is susceptible to jailbreak prompts on harmful topics.