Spotify: AI translates podcasts with original voice

Overcoming language barriers, connecting people – Spotify has set itself these noble goals when introducing its latest function. But voice translation can do more than just translate. TECHBOOK editor Natalie Wetzel heard how good the new AI function actually is.

With around 70,000 German-language podcasts on Spotify, the local podcast scene is quite solid. The offering extends from professional news podcasts, commercial entertainment formats to low-threshold hobby recordings and all nuances in between. The variety of topics is also large and is becoming increasingly diverse. And yet one or two users look curiously at the remaining 3.5 million podcasts that Spotify has available. If it weren’t for the language barrier. But Spotify has developed a solution for exactly this problem: AI-based voice translation.

AI as a bridge over the language barrier

The result is as simple as it is ingenious: The AI ​​translates the podcast into the target language and “clones” the voice and speaking habits of the person speaking. “Across cultures, countries and communities, the stories we share bring us together. And most of the time it is the voices of the speakers that give the stories as much weight as the stories themselves.” Spotify opens with this somewhat flowery formulation statement, in which it announces the pilot phase of Voice Translation. From now on, listening to your favorite podcast (originally in a foreign language) in your own native language – that should be the goal one day.

Here you will find content from Twitter

In order to interact with or display content from social networks, we need your consent.

Voice Translation – this is how it works

As early as May 2023, Bill Simmons, the founder of the Spotify sports podcast network “The Ringer,” revealed that the Swedish music streaming provider was working on an AI function that imitated voices. Simmons discussed the use of AI-generated voices from well-known podcast hosts for advertising purposes. Of course only with their consent. Spotify itself was still keeping a low profile at the time: “Advertising is an interesting field for future investigations, but we have nothing to announce at this point.” That has changed with the now introduced Voice Translation, although not in the area of ​​advertising.

Spotify’s Voice Translation is powered by Whisper, an AI transcription tool from OpenAI. The speech recognition system was trained with hundreds of thousands of predominantly English-language audio files and can now translate speech back and forth between selected languages. At the same time, Whisper can transfer the podcast hosts’ characteristic speaking characteristics – such as tone of voice, pauses and filler words – into the target language.

Also read: The best true crime podcasts for crime fans

Would you like a sample?

If you want to convince yourself of Whisper’s performance, you can use the specially designed one Hub listen to some pilot episodes. Since September 25th, three episodes that Whisper has translated from English into Spanish can be found here:

  • DOAC: E256 – Dr. Mindy Pelz (June 15, 2023)
  • Lex Fridman Podcast: #390 – Yuval Noah Harari (July 17, 2023)
  • KB: Armchair Expert with Dax Shepard (August 14, 2023)

Further translations into German and French will follow in the next few days and weeks. Spotify also says it would like to include more podcasts in the pilot series. These include Dax Shepard’s “eff won with DRS”, The Ringer’s “The Rewachtables” and the new podcast from Trevor Noah, the former host of the “Daily Show”. A certain focus on sport and social issues is already apparent here.

Who can use voice translation?

Whether and how Spotify will further expand Voice Translation and make it available to a wider range of podcast producers depends largely on the feedback during the pilot phase. The fact that the episodes translated so far are at least a month and a half old suggests that Whisper is not yet working completely smoothly. It is also unclear what exactly the collaboration with the hosts will look like and what conditions Spotify has in mind for the future. Does the use of voice translation cost producers and if so, how much?

Currently, the general rule is that podcast episodes edited with voice translation are available to all users. It remains to be seen whether the function will remain accessible to everyone in the future – including those with a free subscription. Spotify itself could hope to increase the number of users by eliminating the language barrier. In any case, the reach of some podcasters is likely to expand massively once voice translation has become established. The fact that revenues from commercial podcasts are skyrocketing certainly doesn’t make the model any less attractive. However, it will probably be some time before voice translation is available to all podcasters and users can actually listen to their personal favorite podcasts in their own native language.

TECHBOOK means

“Personally, I am torn between enthusiasm and skepticism. The concept itself is great and a game changer for anyone who knows no or only a few foreign languages. I don’t want to learn Japanese first to immerse myself in Japanese cooking podcasts, or Polish to get information directly. And even with an excellent command of English, it can sometimes be difficult to understand the technical terms that NFL commentators shout into the microphone in cracking voices while you’re sitting on the subway. Spotify’s Voice Translation could bring a lot of relief to individual users and help communities connect. If only the feature were ready.

But Spotify currently still determines which podcasts should be internationally accessible. And these are, unsurprisingly, big podcasts from established hosts. The language selection is initially limited to the usual suspects: English, Spanish, French and German. One can only hope that Spotify will soon also provide translations into and from other, including non-European, languages.
What’s also interesting is how authentic the translated podcasts can ultimately be, both on a technical and human level. “By matching the voice of the creator, Voice Translation enables listeners around the world to discover and be inspired by new podcasters in a more authentic way than ever before,” says Ziad Sultan, Vice President of Personalization Spotify.

I have listened to the existing translations, but since they are only available in Spanish so far, I can only give a limited opinion. Voice translation cannot yet really imitate the hosts’ voices perfectly, especially when it comes to the exact tone. However, pauses, fluctuations in emphasis, speed and vocal melody seem to be surprisingly close to the original. Above all, the artificial origin of the voices can hardly be heard, if at all. In this respect, voice translation absolutely has potential. Now the tool just needs to be made available to as many podcasters as possible as easily as possible.” – Natalie Wetzel

ttn-35