This “speech-to-speech” tech is much more complicated than it sounds. Translating an original speech to a foreign speech using computers is a hectic task. It is not translating a language to another just from the audio resource, but there are several steps involved. The automated dubbing process essentially includes 3 steps. First, the original speech needs to be converted in a text format. The second step involves translating the text to the desired language. Finally, the translated text generates the new speech.

A model in the text-to-speech phase has trained on 47 hours of speech recordings. This model generates a context sequence from the text that is fed into a pre-trained vocoder, that coverts the sequence into a speech waveform.

The process is surely a complicated one, but researchers wrote that their future work will be devoted to the improvements of the automatic dubbing. It can eliminate the need for voice actors to dub a show or a film to another language. It will become less time-consuming and much cheaper to dub content to the desired language. And yes, it will benefit the production houses to deliver more shows and films to viewers by making the list much more diverse.