OpenAI has released preliminary results from a test of a feature that can read words aloud in a convincing human voice, indicating a new area for artificial intelligence and raising the possibility of deepfake hazards.
According to a spokeswoman, the business offers early demos and use cases from a small-scale preview of the text-to-speech model known as Voice Engine, which it has already shared with roughly ten developers.
OpenAI opted against a broader launch of the function, which it briefed journalists about earlier this month.
According to an OpenAI spokeswoman, the business opted to pull back the release in response to criticism from stakeholders, including legislators, industry experts, educators, and creatives. The company had previously planned to release the tool to up to 100 developers through an application process, according to the earlier press briefing.
“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the company wrote in a blog post-Friday. “We are engaging with US and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build.”
AI voice impersonation stirs election fears
Other AI technology has already been utilized to impersonate voices in certain situations. In January, a false but realistic-sounding phone call claiming to be from President Joe Biden urged people in New Hampshire not to vote in the primaries, escalating AI anxieties ahead of important worldwide elections.
Unlike OpenAI’s past attempts to generate audio content, Voice Engine can produce speech that sounds like individual persons, complete with their own cadence and intonations. To replicate a person’s voice, the software requires only 15 seconds of recorded audio of them speaking.
During a presentation of the tool, Bloomberg listened to a clip of OpenAI CEO Sam Altman briefly presenting the technology in a voice that sounded identical to his genuine speech but was entirely AI-generated.
“If you have the right audio setup, it’s a human-caliber voice,” said Jeff Harris, a product lead at OpenAI. “It’s a pretty impressive technical quality.” However, Harris said, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”
The Norman Prince Neurosciences Institute at Lifespan, a non-profit health organization, is one of OpenAI’s current developer partners that is employing the tool to assist patients recover their voices. According to the company’s blog post, the tool was used to restore the voice of a young girl who had lost her capacity to talk effectively owing to a brain tumor by reproducing her speech from an earlier recording for a school assignment.
OpenAI’s speech model powers multilingual podcasts and education
OpenAI’s speech model can also translate the sounds it produces into a variety of languages. This makes it beneficial for audio companies, such as Spotify Technology SA. Spotify has already utilized the technology in its pilot program to translate the podcasts of well-known hosts such as Lex Fridman. OpenAI also promoted other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.
OpenAI’s testing program requires its partners to adhere to its usage regulations, seek agreement from the original speaker before utilizing their voice, and inform listeners that the sounds they hear are AI-generated. The company is also implementing an inaudible audio stamp to determine whether a piece of audio was made using their tool.
Before deciding whether to release the feature more broadly, OpenAI said it’s soliciting feedback from outside experts. “It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company said in the blog post.
OpenAI also stated that it believes the preview of its software “motivates the need to bolster societal resilience” to the difficulties posed by more advanced AI technology. For example, the corporation requested that banks phase out voice authentication as a security safeguard for accessing bank accounts and critical information. It also seeks public education against false AI content, as well as more development