In recent years, speech-to-text applications have become vital in our lives, enabling communication through personal voice assistants and customer support assistants. Tools like Whisper, developed by OpenAI are used in these applications and are excellent at accurately transcribing speech to text.
Highlights:
- A study shows that OpenAI’s speech-to-text tool Whisper generates harmful hallucinations for 1% of transcriptions.
- These hallucinations disproportionately impact individuals with speech impairments like aphasia.
- The researchers discovered that 38% of the hallucinations included explicit harms, such as perpetuating violence, making up inaccurate associations, or implying false authority.
New Research raises concerns about Whisper
A group of researchers from four prestigious universities Cornell University, the University of Washington, New York University, and the University of Virginia conducted a study focused on OpenAI’s Whisper. This has highlighted a concerning problem with speech-to-text systems generating “hallucinations.”
Hallucinations occur when an LLM assumes or manufactures information that may be factually incorrect. The hallucinations lead to inaccurate transcriptions that contain phrases, sentences, or words that were never spoken in the original audio.
These hallucinations can have significant implications, particularly in critical scenarios where exact speech-to-text transcription is required. Eg: In a court of law where currently manual transcription is done, if a system with this problem is employed, plaintiff statements would become inadmissible.
The study focuses on OpenAI’s Whisper, a state-of-the-art automated speech recognition tool released in September 2022. The Whisper API has been used for various purposes in order to generate transcriptions for audios.
Experiments show that Whisper outperforms various speech-to-text industry competitors. While Whisper’s transcriptions were generally accurate, they found that approximately 1% of the audio transcriptions generated in mid-2023 contained hallucinated content.
The implications of these hallucinations are far-reaching and have the potential to cause real-world harm. The researchers conducted a thematic analysis of the hallucinated content and discovered that nearly 40% of the hallucinations included explicit harms, such as perpetuating violence, making up inaccurate associations, or implying false authority.
They can have severe consequences, particularly for individuals with speech impairments like aphasia.
What is causing these Hallucinations?
Aphasia is a language disorder that affects an individual’s ability to express themselves using speech and voice. It is often caused after a stroke or brain injury. The study found that hallucinations disproportionately occurred for individuals with aphasia, likely due to their longer pauses and non-vocal durations during speech, a common symptom of the condition.
They highlighted that these hallucinations can lead to allocative and representational harms, potentially denying individuals with aphasia access to opportunities and resources, while also reinforcing subordination along the lines of identity and disability.
Ethical, Societal, and Legal Implications
The study’s findings have major consequences in important scenarios where speech-to-text technology is essential.
An example could be a scenario where a job candidate with aphasia is interviewed, and the AI-generated transcription incorrectly attributes violent or inappropriate language to them, leading to unfair rejection. Another example could be a medical setting where a patient’s speech is misinterpreted, potentially resulting in incorrect treatment decisions.
Beyond the immediate harms, the study also points out the ethical and legal issues of these aphasia-based hallucinations. For instance, the use of biased speech-to-text systems in hiring processes could potentially violate the Americans with Disabilities Act (ADA), which protects individuals from unfair evaluation based on their disabilities, including speech patterns.
These findings should alert developers within the AI industry and highlight the need for greater transparency, and inclusivity in developing these types of technologies. The developers should take immediate action to solve this hallucination problem and bring attention to potential biases because of hallucinations in applications of speech-to-text models.
The study also highlights the importance of involving individuals with speech impairments in the design and testing processes of these systems.
AI companies should be open about these issues, take responsibility for fixing them, and actively involve the communities affected, like those with speech disorders. This will lead to fairer and less biased AI systems that benefit everyone.
While the study focuses on OpenAI’s Whisper, its implications should expand to all other generative AI-based systems. As generative AI technologies continue to advance and include different domains, it is crucial to address potential biases and the harmful consequences of said biases.
Conclusion
This study brings to light a major issue in speech-to-text transcription that could further perpetuate a bias against people with speech impairments. It is essential for the industry to now focus on these problems and work on finding a solution that ensures fairness and accessibility to all sections of society.