{"id":3077,"date":"2024-03-31T07:58:05","date_gmt":"2024-03-31T07:58:05","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=3077"},"modified":"2024-03-31T17:24:09","modified_gmt":"2024-03-31T17:24:09","slug":"openai-voice-engine","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/openai-voice-engine\/","title":{"rendered":"Meet OpenAI&#8217;s &#8220;Voice Engine&#8221;: An AI That Can Clone Your Voice"},"content":{"rendered":"\n<p><strong>On March 29<sup>th<\/sup>, 2024, OpenAI leveled up its Generative AI game when it unveiled its brand-new voice cloning tool, Voice Engine. This tool brings cutting-edge technology that can clone your voice in just 15 seconds.<\/strong><\/p>\n\n\n\n<p><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenAI unveils Voice Engine, an AI that can clone any user&#8217;s voice.<\/li>\n\n\n\n<li>Comes with several features such as translation and assistance with reading.<\/li>\n\n\n\n<li>Currently in preview mode and only rolled out to a few companies, keeping safety guidelines in mind.<\/li>\n<\/ul>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">We&#39;re sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. <a href=\"https:\/\/t.co\/yLsfGaVtrZ\" target=\"_blank\">https:\/\/t.co\/yLsfGaVtrZ<\/a><\/p>&mdash; OpenAI (@OpenAI) <a href=\"https:\/\/twitter.com\/OpenAI\/status\/1773760852153299024?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 29, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p>OpenAI has been quite on the move in bringing a revolution to the Gen AI industry. After Sora, the state-of-the-art video generation AI model, this is yet another major advancement from OpenAI, which will disrupt the world of AI enthusiasts and developers.<\/p>\n\n\n\n<p>What is OpenAI\u2019s Voice Engine and how can developers make the most out of this tool? What are the features that come with it? Let\u2019s find them out in-depth!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Voice Engine from OpenAI?<\/strong><\/h2>\n\n\n\n<p><strong>The well-known artificial intelligence firm OpenAI has entered the voice assistant market with Voice Engine, its most recent invention. With just 15 seconds of recorded speech from the subject, this state-of-the-art technology can accurately mimic an individual&#8217;s voice.<\/strong><\/p>\n\n\n\n<p>The development of Voice Engine began in late 2022, and OpenAI has utilized it to power ChatGPT Voice and Read Aloud, in addition to the preset voices that are available in the text-to-speech API.<\/p>\n\n\n\n<p>All that Voice Engine needs is a short recording of your talking voice and some text to read, then it will successfully generate a copy of your voice. The voices are surprisingly of highly realistic quality and also represent emotions to an extreme degree.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How was Voice Engine trained?<\/strong><\/h2>\n\n\n\n<p><strong>A combination of licensed and openly accessible data sets was used to train OpenAI&#8217;s Voice Engine model. Speech recordings serve as an example for models such as the one that powers Voice Engine, which is trained on a vast amount of data sets and publicly accessible websites.<\/strong><\/p>\n\n\n\n<p>Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview that Voice Engine&#8217;s generative AI model has been operating covertly for some time. Since training data and related information are valuable assets for many generative AI vendors, they tend to keep them confidential.<\/p>\n\n\n\n<p>However, another reason not to provide a lot of information about training data is that it could be the subject of IP-related disputes. This is one of the major reasons that much training information has not been provided on Voice Engine\u2019s AI model. However, we can expect a detailed technical report soon from OpenAI, giving deep insights into the model\u2019s build, dataset, and architecture.<\/p>\n\n\n\n<p><strong>What\u2019s interesting is that Voice Engine hasn&#8217;t been trained or optimized using user data.<\/strong> This is partially due to the transient nature of speech generation produced by the model, which combines a transformer and a diffusion process. The model creates a corresponding voice without the need to create a unique model for each speaker by concurrently evaluating the text data intended for reading aloud and the speech data it takes from.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>We take a small audio sample and text and generate realistic speech that matches the original speaker. The audio that\u2019s used is dropped after the request is complete. <\/p>\n<cite>Harris told TechCrunch in the interview regarding Voice Engine.<\/cite><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Looking Into Voice Engine\u2019s Features<\/strong><\/h2>\n\n\n\n<p>OpenAI\u2019s voice engine comes with several features that are mainly built around cloning realistic user voice. Let\u2019s look into these features in detail:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Assisting With Reading<\/strong><\/h3>\n\n\n\n<p><strong>Voice Engine\u2019s audio cloning capabilities can be highly beneficial to children and students as it uses realistic, expressive voices that convey a greater variety of speech than can be achieved with preset voices. The tool has a high potential to provide realistic interactive learning and reading sessions which can highly bolster the quality of education.<\/strong><\/p>\n\n\n\n<p>A company named Age Of Learning has been using GPT-4 and Voice Engine to improve learning and reading experience for a much wider variety of audience.<\/p>\n\n\n\n<p>In the tweet below, you can see how the reference audio is being cloned by Voice Engine to teach a variety of subjects such as Biology, Reading, Chemistry, Math, and Physics.<\/p>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\" data-media-max-width=\"560\"><p lang=\"tr\" dir=\"ltr\">OpenAI, ses klonlama arac\u0131 Voice Engine&#39;i tan\u0131tt\u0131.<br><br>15 saniyelik k\u0131sa bir sesle, insan seslerini ger\u00e7ek\u00e7i bir \u015fekilde kopyalayabiliyor ve yaz\u0131lan metinleri sese \u00e7evirebiliyor.<a href=\"https:\/\/t.co\/6yNhhEGvxe\" target=\"_blank\">pic.twitter.com\/6yNhhEGvxe<\/a><\/p>&mdash; BPT (@bpthaber) <a href=\"https:\/\/twitter.com\/bpthaber\/status\/1773964120745714075?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 30, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Translating Audio<\/strong><\/h3>\n\n\n\n<p><strong>Voice Engine can take a user\u2019s voice input and then translate it into various multiple languages which can be communicated or reached to a wider variety of audiences and communities.<\/strong><\/p>\n\n\n\n<p>Voice Engine maintains the original speaker&#8217;s native accent when translating; for instance, if English is generated using an audio sample from a Spanish speaker, the result would be Spanish-accented speech.<\/p>\n\n\n\n<p>A company named HeyGen, an AI visual storytelling company is currently using OpenAI\u2019s Voice Engine to translate audio inputs into multiple languages, for a variety of content and demos.<\/p>\n\n\n\n<p>In the tweet below, you can see how the input reference voice in English is being translated into Spanish, Mandarin, and much more.<\/p>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\" data-media-max-width=\"560\"><p lang=\"zh\" dir=\"ltr\">OpenAI\u516c\u5e03\u5176\u8bed\u97f3\u751f\u6210\u6a21\u578b\uff1aVoice Engine <br><br>\u6839\u636e\u6587\u672c\u8f93\u5165\u548c\u4e00\u4e2a15\u79d2\u7684\u97f3\u9891\u6837\u672c\uff0c\u5c31\u80fd\u751f\u6210\u63a5\u8fd1\u539f\u59cb\u8bf4\u8bdd\u8005\u58f0\u97f3\u7684\u81ea\u7136\u542c\u8d77\u6765\u7684\u8bed\u97f3\u3002<br><br>Voice Engine\u6700\u521d\u4e8e2022\u5e74\u5e95\u5f00\u53d1\uff0c\u5e76\u5df2\u7ecf\u63d0\u4f9b\u7ed9\u5305\u62ecHeygen\u5728\u5185\u7684\u5c11\u6570\u516c\u53f8\u8fdb\u884c\u6d4b\u8bd5\u6027\u4f7f\u7528\u3002<br><br>\u4e3b\u8981\u529f\u80fd<br><br>1\u3001\u81ea\u7136\u542c\u8d77\u6765\u7684\u8bed\u97f3\u751f\u6210\uff1a\u5229\u7528\u5355\u4e2a15\u79d2\u7684\u97f3\u9891\u6837\u672c\uff0cVoice\u2026 <a href=\"https:\/\/t.co\/AjP2wAYr4N\" target=\"_blank\">pic.twitter.com\/AjP2wAYr4N<\/a><\/p>&mdash; \u5c0f\u4e92 (@imxiaohu) <a href=\"https:\/\/twitter.com\/imxiaohu\/status\/1773896583006101720?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 30, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Connecting with Communities throughout the World<\/strong><\/h3>\n\n\n\n<p>Giving interactive feedback in each worker&#8217;s native tongue, such as Swahili, or in more colloquial languages like Sheng\u2014a code-mixed language that is widely used in Kenya\u2014is possible with Voice Engine and GPT-4. This can be a highly useful feature to improve delivery in remote settings.<\/p>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\" data-media-max-width=\"560\"><p lang=\"en\" dir=\"ltr\"><a href=\"https:\/\/twitter.com\/OpenAI?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">@OpenAI<\/a> text to voice Engine \ud83d\udd25\ud83e\udee8 <a href=\"https:\/\/t.co\/5rgQMbW7wR\" target=\"_blank\">https:\/\/t.co\/5rgQMbW7wR<\/a> <a href=\"https:\/\/t.co\/XnWyIDj8Oj\" target=\"_blank\">pic.twitter.com\/XnWyIDj8Oj<\/a><\/p>&mdash; Patrick Assal\u00e9 (@patrickassale) <a href=\"https:\/\/twitter.com\/patrickassale\/status\/1773765256331858259?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 29, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p>Voice Engine is making it possible to improve the quality of life and service in remote regions, who for long haven&#8217;t had access to the latest gen AI models and their technologies.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Helping Non-Verbal People<\/strong><\/h3>\n\n\n\n<p><strong>People who are non-verbal can highly make use of Voice Engine, to solve their day-to-day issues.<\/strong> The AI alternative communication app Livox drives AAC (Augmentative &amp; Alternative Communication) devices, which facilitate communication for those with disabilities. They can provide nonverbal persons with distinct, human voices in a variety of languages by utilizing Voice Engine.<\/p>\n\n\n\n<p>Users who speak more than one language can select the speech that most accurately reflects them, and they can keep their voice consistent in all spoken languages.<\/p>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\" data-media-max-width=\"560\"><p lang=\"ar\" dir=\"rtl\">Voice Engine<br>\u062b\u0648\u0631\u0629 OpenAI \u0641\u064a \u062a\u0643\u0646\u0648\u0644\u0648\u062c\u064a\u0627 \u0627\u0644\u0635\u0648\u062a \u0627\u0644\u0630\u0643\u064a<br><br>OpenAI \u0623\u0639\u0644\u0646\u062a \u0639\u0646 \u0625\u0637\u0644\u0627\u0642 \u0646\u0645\u0648\u0630\u062c \u0635\u0648\u062a\u064a \u062c\u062f\u064a\u062f \u064a\u0633\u0645\u0649 \u201cVoice Engine\u201d\u060c \u0627\u0644\u0630\u064a \u064a\u0645\u0643\u0646\u0647 \u062a\u0648\u0644\u064a\u062f \u0623\u0635\u0648\u0627\u062a \u0637\u0628\u064a\u0639\u064a\u0629 \u062a\u0634\u0628\u0647 \u0635\u0648\u062a \u0627\u0644\u0634\u062e\u0635 \u0645\u0646 \u062e\u0644\u0627\u0644 \u0645\u062c\u0631\u062f 15 \u062b\u0627\u0646\u064a\u0629 \u0645\u0646 \u0639\u064a\u0646\u0629 \u0635\u0648\u062a\u064a\u0629. \u0647\u0630\u0627 \u0627\u0644\u0646\u0645\u0648\u0630\u062c \u0642\u062f \u062a\u0645 \u0627\u0633\u062a\u062e\u062f\u0627\u0645\u0647 \u0628\u0627\u0644\u0641\u0639\u0644 \u0645\u0646 \u0642\u0628\u0644 \u0634\u0631\u0643\u0627\u0621 \u0643\u0628\u0627\u0631 \u0645\u062b\u0644 HeyGen.<br><br>\u25aa\ufe0f\u0623\u0628\u0631\u0632 \u0627\u0644\u0646\u0642\u0627\u0637 \u062d\u0648\u0644 Voice\u2026 <a href=\"https:\/\/t.co\/TxrVPQPYw4\" target=\"_blank\">pic.twitter.com\/TxrVPQPYw4<\/a><\/p>&mdash; \u0633\u0639\u064a\u062f \u0627\u0644\u0643\u0644\u0628\u0627\u0646\u064a (@smalkalbani) <a href=\"https:\/\/twitter.com\/smalkalbani\/status\/1773772301206753758?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 29, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5. Assisting Patients in Regaining Voice<\/strong><\/h3>\n\n\n\n<p><strong>Voice Engine is highly beneficial for those who suffer from sudden or degenerative voice conditions. <\/strong>The AI model is being offered as part of a trial program by the Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health institution that is the main teaching affiliate of Brown University&#8217;s medical school that treats patients with neurologic or oncologic aetiologies for speech impairment.<\/p>\n\n\n\n<p>Using audio from a film shot for a school project, doctors Fatima Mirza, Rohaid Ali, and Konstantina Svokos were able to restore the voice of a young patient who had lost her fluent speech owing to a vascular brain tumor, since Voice Engine required only a brief audio sample.<\/p>\n\n\n\n<div align=center><blockquote class=\"twitter-tweet\" data-media-max-width=\"560\"><p lang=\"en\" dir=\"ltr\">My favorite from <a href=\"https:\/\/twitter.com\/OpenAI?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">@OpenAI<\/a> new voice engine :<a href=\"https:\/\/twitter.com\/hashtag\/voiceEngine?src=hash&amp;ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">#voiceEngine<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/openAi?src=hash&amp;ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">#openAi<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/aiforgood?src=hash&amp;ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">#aiforgood<\/a> <br><br>OpenAI&#39;s Voice Engine is changing lives by restoring voices to those who&#39;ve lost them! Check out this video of its impact on a patient who lost her speech to a brain tumor. <a href=\"https:\/\/t.co\/Qed1Z2ezgj\" target=\"_blank\">pic.twitter.com\/Qed1Z2ezgj<\/a><\/p>&mdash; Qaisar Roonjha (@QRoonjha) <a href=\"https:\/\/twitter.com\/QRoonjha\/status\/1773796898681360630?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 29, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p><strong>Overall, Voice Engine\u2019s cloning capabilities extend far beyond just simple audio generation, as it covers a wide aspect of use cases benefitting the youth, diverse communities, and non-verbal patients with speech issues. <\/strong>OpenAI has made quite the bold move in developing a tool that can be of much use to people worldwide, with its magical \u201cvoice\u201d features.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Is Voice Engine Accessible?<\/strong><\/h2>\n\n\n\n<p>OpenAI\u2019s announcement of Voice Engine, which hints at its intention to advance voice-related technology, follows the filing of a trademark application for the moniker. <strong>The company has chosen to restrict Voice Engine&#8217;s availability to a small number of early testers for the time being, citing worries over potential misuse and the accompanying risks, despite the technology&#8217;s potentially revolutionary potential.<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>In line with our&nbsp;<a href=\"https:\/\/openai.com\/blog\/our-approach-to-ai-safety\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">approach to AI safety<\/a>&nbsp;and our&nbsp;voluntary commitments, we are choosing to preview but not widely release this technology at this time. We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.<\/p>\n<cite>OpenAI stated the limiting use of Voice Engine in their <a href=\"https:\/\/openai.com\/blog\/navigating-the-challenges-and-opportunities-of-synthetic-voices\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/openai.com\/blog\/navigating-the-challenges-and-opportunities-of-synthetic-voices\" rel=\"noreferrer noopener nofollow\">latest blog<\/a>.<\/cite><\/blockquote>\n\n\n\n<p>Only a small group of companies have had access to Voice Engine, and they are using it to help several groups of people, we already discussed some of them in detail. But we can expect the tool to be rolled out publicly in the months to come.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How is OpenAI tackling the misuse of \u201cDeepfakes\u201d with Voice Engine?<\/strong><\/h2>\n\n\n\n<p>Recognizing the serious risks associated with voice mimicking, especially on delicate occasions like elections, OpenAI highlights the necessity of using this technology responsibly. The need for vigilance is critical, as seen by recent occurrences like robocalls that mimic political personalities with AI-generated voices.<\/p>\n\n\n\n<p>Given the serious consequences of producing a speech that sounds a lot like people, especially during election season, the business revealed how they are taking preventative measures to mitigate these dangers.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>We recognize that generating speech that resembles people&#8217;s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build.<\/p>\n<cite>OpenAI <\/cite><\/blockquote>\n\n\n\n<p>The company also announced a set of safety measures such as using a watermark to trace the origin of any audio generated by Voice Engine, and also monitor how the audio is being used. The companies using Voice Engine currently are also required to adhere to OpenAI\u2019s policies and community guidelines which involve asking for consent from the person whose audio is being used and also informing the target audience that Voice Engine\u2019s audio is AI-generated.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Voice Engine from OpenAI holds a profound potential to change the landscape of audio generation forever. The creation and application of technologies like Voice Engine, which present both previously unheard-of potential and difficulties, are expected to influence the direction of human-computer interaction as OpenAI continues to advance in the field of artificial intelligence. Only time will tell how the tool will be publicly perceived worldwide.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI introduces Voice Engine, a game-changing voice cloning tool that replicates your voice in 15 seconds. Currently in preview, it&#8217;s available to select companies, focusing on safety.<\/p>\n","protected":false},"author":15,"featured_media":3100,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[59,60],"class_list":["post-3077","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-generative-ai","tag-openai"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3077","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=3077"}],"version-history":[{"count":8,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3077\/revisions"}],"predecessor-version":[{"id":3107,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3077\/revisions\/3107"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/3100"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=3077"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=3077"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=3077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}