VALL-E, a new Microsoft AI, can "clone" your voice using just 3 seconds of recording time

The last few years have been focused around new AI technologies and how they will significantly change the way we use computers and work in many fields. Microsoft continues to develop such solutions with VALL-E, a new artificial intelligence algorithm that can reproduce almost any human voice speaking English using a sample of just three seconds of speech.

VALL-E AI currently works only in English

Table of Contents

Basically, this software will allow you to give it a sample of your voice, perhaps from a previous recording, or from a new recording, made on the spot in a few seconds, and then, by entering a text, VALL-E will be able to read it back to you in your own voice. This will certainly speed up the way work is done in video and audio production, especially in TV or radio stations, or, why not, in online content creation.

Read: How durable is the Galaxy S22 Ultra if you drop it on the floor? The practical tests leave no room for interpretation

The process of writing a text, which then has to be audio recorded and edited for inclusion in a finished material is a lengthy one. With a solution like this, all you have to do is have the text ready, and the voice is then generated in seconds. What’s more, by using AI to edit text, with solutions like GPT-3, you can cut the time it takes to produce a piece of material even further. Of course, these technologies are only in their infancy, and their use in real-life situations is not exactly indicated, as the results are not yet perfect.

Microsoft calls this AI a “neural codec language model” and it is built on EnCodec technology. So the software can analyse the sounds a person makes in speech and use the results to create the most accurate reproduction of the voice. The VALL-E AI training was conducted using a selection of 60,000 hours of sound from 7,000 different people from the LibriVox sound library, which includes free audio books. Of course, the results are much better the more closely the source voice resembles the voices in that library.

Read: ChatGPT competitor Bard is available now and can do a lot more than before

The AI can preserve the vocal and emotional timbre of the speaker, but is also able to change them, based on variables.

Microsoft says it can detect whether a recording is made with its AI

Of course, this technology again brings into question its use for unintended purposes. For this reason, Microsoft will keep VALL-E a closed-source software and it can only be used in the way the company wants. On top of that, there is already the possibility of identifying AI-generated records:

“Since VALL-E can synthesize speech that preserves the speaker’s identity, it could come with risks in using it in undesirable ways, such as tricking voice-based security systems or mimicking a particular speaker. To avoid such risks, it is also possible to create a detection model to check whether or not an audio clip was made with VALL-E. We will also put the Microsoft AI Principles into practice when developing new models in the future.”, say the researchers behind the project.

Tech & Gadgets

VALL-E, a new Microsoft AI, can “clone” your voice using just 3 seconds of recording time

VALL-E AI currently works only in English

Microsoft says it can detect whether a recording is made with its AI

The Best Online Bookmakers March 03 2026

Legendplay Sports

Legendplay Sports

Bonus

Royalistplay Sports

Royalistplay Sports

Bonus

DirectionBet Sport

DirectionBet Sport

Bonus

Your Ultimate Gaming Thrill at Online Casino sites in 2023

Jamie Dimon, CEO of JPMorgan, reiterates his negative stance on bitcoin and other virtual assets.