Articles by FavTutor
  • AI News
  • Data Structures
  • Web Developement
  • AI Code GeneratorNEW
  • Student Help
  • Main Website
No Result
View All Result
FavTutor
  • AI News
  • Data Structures
  • Web Developement
  • AI Code GeneratorNEW
  • Student Help
  • Main Website
No Result
View All Result
Articles by FavTutor
No Result
View All Result
Home AI News, Research & Latest Updates

OpenAI’s Speech-to-Text Tool Whisper Generates Hallucinations

Dhruv Kudalkar by Dhruv Kudalkar
May 13, 2024
Reading Time: 4 mins read
OpenAI Whisper Hallucinations
Follow us on Google News   Subscribe to our newsletter

In recent years, speech-to-text applications have become vital in our lives, enabling communication through personal voice assistants and customer support assistants. Tools like Whisper, developed by OpenAI are used in these applications and are excellent at accurately transcribing speech to text.

Highlights:

  • A study shows that OpenAI’s speech-to-text tool Whisper generates harmful hallucinations for 1% of transcriptions.
  • These hallucinations disproportionately impact individuals with speech impairments like aphasia.
  • The researchers discovered that 38% of the hallucinations included explicit harms, such as perpetuating violence, making up inaccurate associations, or implying false authority.

New Research raises concerns about Whisper

A group of researchers from four prestigious universities Cornell University, the University of Washington, New York University, and the University of Virginia conducted a study focused on OpenAI’s Whisper. This has highlighted a concerning problem with speech-to-text systems generating “hallucinations.”

Hallucinations occur when an LLM assumes or manufactures information that may be factually incorrect. The hallucinations lead to inaccurate transcriptions that contain phrases, sentences, or words that were never spoken in the original audio.

These hallucinations can have significant implications, particularly in critical scenarios where exact speech-to-text transcription is required. Eg: In a court of law where currently manual transcription is done, if a system with this problem is employed, plaintiff statements would become inadmissible.

The study focuses on OpenAI’s Whisper, a state-of-the-art automated speech recognition tool released in September 2022. The Whisper API has been used for various purposes in order to generate transcriptions for audios.

Experiments show that Whisper outperforms various speech-to-text industry competitors. While Whisper’s transcriptions were generally accurate, they found that approximately 1% of the audio transcriptions generated in mid-2023 contained hallucinated content.

The implications of these hallucinations are far-reaching and have the potential to cause real-world harm. The researchers conducted a thematic analysis of the hallucinated content and discovered that nearly 40% of the hallucinations included explicit harms, such as perpetuating violence, making up inaccurate associations, or implying false authority.

They can have severe consequences, particularly for individuals with speech impairments like aphasia.

What is causing these Hallucinations?

Aphasia is a language disorder that affects an individual’s ability to express themselves using speech and voice. It is often caused after a stroke or brain injury. The study found that hallucinations disproportionately occurred for individuals with aphasia, likely due to their longer pauses and non-vocal durations during speech, a common symptom of the condition.

They highlighted that these hallucinations can lead to allocative and representational harms, potentially denying individuals with aphasia access to opportunities and resources, while also reinforcing subordination along the lines of identity and disability.

Ethical, Societal, and Legal Implications

The study’s findings have major consequences in important scenarios where speech-to-text technology is essential.

An example could be a scenario where a job candidate with aphasia is interviewed, and the AI-generated transcription incorrectly attributes violent or inappropriate language to them, leading to unfair rejection. Another example could be a medical setting where a patient’s speech is misinterpreted, potentially resulting in incorrect treatment decisions.

Beyond the immediate harms, the study also points out the ethical and legal issues of these aphasia-based hallucinations. For instance, the use of biased speech-to-text systems in hiring processes could potentially violate the Americans with Disabilities Act (ADA), which protects individuals from unfair evaluation based on their disabilities, including speech patterns.

These findings should alert developers within the AI industry and highlight the need for greater transparency, and inclusivity in developing these types of technologies. The developers should take immediate action to solve this hallucination problem and bring attention to potential biases because of hallucinations in applications of speech-to-text models.

The study also highlights the importance of involving individuals with speech impairments in the design and testing processes of these systems.

AI companies should be open about these issues, take responsibility for fixing them, and actively involve the communities affected, like those with speech disorders. This will lead to fairer and less biased AI systems that benefit everyone.

While the study focuses on OpenAI’s Whisper, its implications should expand to all other generative AI-based systems. As generative AI technologies continue to advance and include different domains, it is crucial to address potential biases and the harmful consequences of said biases.

Conclusion

This study brings to light a major issue in speech-to-text transcription that could further perpetuate a bias against people with speech impairments. It is essential for the industry to now focus on these problems and work on finding a solution that ensures fairness and accessibility to all sections of society.

ShareTweetShareSendSend
Dhruv Kudalkar

Dhruv Kudalkar

Hello, I'm Dhruv Kudalkar, a final year undergraduate student pursuing a degree in Information Technology. My research interests revolve around Generative AI and Natural Language Processing (NLP). I constantly explore new technologies and strive to stay up-to-date in these fields, driven by a passion for innovation and a desire to contribute to the ever-evolving landscape of intelligent systems.

RelatedPosts

Candidate during Interview

9 Best AI Interview Assistant Tools For Job Seekers in 2025

May 1, 2025
AI Generated Tom and Jerry Video

AI Just Created a Full Tom & Jerry Cartoon Episode

April 12, 2025
Amazon Buy for Me AI

Amazon’s New AI Makes Buying from Any Website Easy

April 12, 2025
Microsoft New AI version of Quake 2

What Went Wrong With Microsoft’s AI Version of Quake II?

April 7, 2025
AI Reasoning Model Better Method

This Simple Method Can Make AI Reasoning Faster and Smarter

April 3, 2025

About FavTutor

FavTutor is a trusted online tutoring service to connects students with expert tutors to provide guidance on Computer Science subjects like Java, Python, C, C++, SQL, Data Science, Statistics, etc.

Categories

  • AI News, Research & Latest Updates
  • Trending
  • Data Structures
  • Web Developement
  • Data Science

Important Subjects

  • Python Assignment Help
  • C++ Help
  • R Programming Help
  • Java Homework Help
  • Programming Help

Resources

  • About Us
  • Contact Us
  • Editorial Policy
  • Privacy Policy
  • Terms and Conditions

Website listed on Ecomswap. © Copyright 2025 All Rights Reserved.

No Result
View All Result
  • AI News
  • Data Structures
  • Web Developement
  • AI Code Generator
  • Student Help
  • Main Website

Website listed on Ecomswap. © Copyright 2025 All Rights Reserved.