Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Welcome to the Precious Metals Bug Forums

Welcome to the PMBug forums - a watering hole for folks interested in gold, silver, precious metals, sound money, investing, market and economic news, central bank monetary policies, politics and more. You can visit the forum page to see the list of forum nodes (categories/rooms) for topics.

Please have a look around and if you like what you see, please consider registering an account and joining the discussions. When you register an account and log in, you may enjoy additional benefits including no ads, market data/charts, access to trade/barter with the community and much more. Registering an account is free - you have nothing to lose!

Goldhedge

GIM2 Refugee
Moderator
Messages
9,667
Reaction score
7,697
Points
238
deep fakes getting deeper

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio​

Text-to-speech model can preserve speaker's emotional tone and acoustic environment.​

BENJ EDWARDS

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

FURTHER READING​

Meta’s AI-powered audio codec promises 10x compression over MP3
Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.

Microsoft calls VALL-E a "neural codec language model," and it builds off of a technology called EnCodec, which Meta announced in October 2022. Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms, VALL-E generates discrete audio codec codes from text and acoustic prompts. It basically analyzes how a person sounds, breaks that information into discrete components (called "tokens") thanks to EnCodec, and uses training data to match what it "knows" about how that voice would sound if it spoke other phrases outside of the three-second sample. Or, as Microsoft puts it in the VALL-E paper:

"To synthesize personalized speech (e.g., zero-shot TTS), VALL-E generates the corresponding acoustic tokens conditioned on the acoustic tokens of the 3-second enrolled recording and the phoneme prompt, which constrain the speaker and content information respectively. Finally, the generated acoustic tokens are used to synthesize the final waveform with the corresponding neural codec decoder."
more​
 
How long will biometric security systems hold up for ?
Weve got your iris scan, your fingerptints and now your voice pattern

A solar flare may be our saviour, not our demise.
 
The following is the April 17, 2023, Congressional Research Service report, Deep Fakes and National Security.

From the report​

“Deep fakes”—a term that first emerged in 2017 to describe realistic photo, audio, video, and other forgeries generated with artificial intelligence (AI) technologies—could present a variety of national security challenges in the years to come. As these technologies continue to mature, they could hold significant implications for congressional oversight, U.S. defense authorizations and appropriations, and the regulation of social media platforms.

 
after reading the article, Myself and most other pmbug members, pass pass the non AI generated messages test with numerous errors in spelling, grammar punctuation etc . It great to be human! We can determine the truth but can’t spell for peanuts!,,
 
Related:

Behind a Secretive Global Network of Non-Consensual Deepfake Pornography​

Warning: This article discusses explicit adult content and child sexual abuse material (CSAM)

One of the world’s largest online video game marketplaces says it has referred user accounts to legal authorities after a Bellingcat investigation found tokens to create nonconsensual pornographic deepfakes were being surreptitiously sold on the site.

Accounts on G2A were being used to collect payments for Clothoff, one of the most popular and controversial nonconsensual pornographic deepfake sites on the internet. Clothoff disguised the sales as if they were for downloadable gaming content.

“Security is one of our top priorities that we never compromise on, hence we have taken immediate action and suspended the sellers in question until we have investigated it fully,” G2A said, in a statement. “We also decided to report the case to the appropriate authorities.” (G2A said it was reporting the accounts and the companies affiliated with them to authorities in the “companies’ countries of origin” which, as this story outlines below, varies but includes the US and New Zealand.)

Clothoff is part of a loosely affiliated network of similar platforms uncovered in Bellingcat’s investigation.

The network, which also includes the sites Nudify, Undress, and DrawNudes, has variously manipulated financial and online service providers that ban adult content and non-consensual deep fakes by disguising their activities to evade crackdowns. Other services they have tried to exploit include Coinbase, Patreon, Paypal, Shopify, Steam and Stripe.

Behind one G2A account that was selling tokens for Clothoff is an IT solutions firm called Aurora Borealis Limited Liability Company, listed on the account’s contact page. On its website, Aurora Borealis claims G2A is one of its partners, which the gaming marketplace said is false.

Aurora’s CEO, a Valencia-based entrepreneur named Vitalii Ionov, did not reply to a request for comment, nor did his company. Ionov, as this investigation details, is affiliated with multiple entities that overlap with Clothoff and the network of deepfake porn sites. Another company he is listed as the CEO of has also falsely claimed to be partners with other companies, including Microsoft.

More:

 
Back
Top Bottom