DeepL, an AI company famous for its text translation tools, today released a speech-to-speech translation product portfolio to enter the real-time speech translation market, covering a variety of scenarios such as online meetings, mobile and web conversations, and group communication involving front-line employees through customized applications. At the same time, DeepL also launched an API for developers and enterprises to support customized voice translation solutions for call centers and other businesses based on its technology.

DeepL CEO Jarek Kutylowski said in an interview that after years of focusing on text translation, speech was the company's "natural next step." He emphasized that DeepL has come a long way in text and document translation, but in the field of real-time speech translation, "there is still a lack of a truly outstanding product," which is why the company decided to enter.
Kutlovsky pointed out that the core difficulty in building a real-time translation product is how to strike a balance between reducing latency and maintaining accuracy. The so-called delay refers to the time difference between when the user speaks and when the translated voice is played. In conference and dialogue scenarios, the smaller the difference, the closer the user's communication experience is to "simultaneous dialogue."
In this release, DeepL launches plug-ins for Zoom and Microsoft Teams, allowing listeners in remote meetings to listen to all parties speaking in their native languages, hear translated voices in real time, or read real-time translated subtitles on the screen. The program is still in early testing, and DeepL is inviting businesses to join a waitlist to be the first to try out the feature. In addition, the company also provides conversation products for mobile terminals and web pages, allowing users to communicate cross-language in person or remotely.
For multi-person offline or online group scenarios such as training and seminars, DeepL allows participants to join the same session by scanning the QR code, and everyone can receive translated content in the corresponding language on their own device. DeepL said its speech-to-speech technology can also learn and adapt custom vocabulary, such as vertical industry terms, company names, and personal names, to improve its use in professional scenarios.
Kutlovsky believes that AI is reshaping the shape of the customer service industry in the next few years. A high-quality translation layer can help companies still provide multilingual service support in a market where local language talents are lacking and recruitment costs are high. Under this vision, DeepL hopes that its voice technology will not only serve conference scenarios, but also become one of the basic language infrastructures for customer service centers and global enterprises.
On the technical roadmap, DeepL said that its current products are driven by a self-developed complete "speech-to-speech" technology stack, but at this stage it still uses the three-step process of "speech to text - text translation - text to speech". The company believes that its long-term focus on text translation gives it an advantage in overall translation quality. Going forward, DeepL plans to develop an end-to-end speech translation model that omits text intermediate steps in order to achieve further improvements in latency and naturalness.
In the field of speech and translation, DeepL faces competition from multiple startups. Among them, Sanas raised US$65 million from Quadrille Capital and Teleperformance last year. It focuses on technology that modifies the accent of speakers in real time, mainly for call center agents. Camb.AI, headquartered in Dubai, provides speech synthesis and translation services for media and entertainment companies, helping customers complete dubbing and localization of large-scale content. Palabra, invested by Reddit co-founder Alexis Ohanian’s fund Seven Seven Six, builds a real-time speech translation engine that emphasizes trying to retain the speaker’s original voice characteristics during the translation process, forming a more direct competitive relationship with the capabilities being built by DeepL.
After establishing a foothold in the text translation market, DeepL is trying to expand its boundaries through voice products, extending the technology to conference collaboration, customer service and front-line operation scenarios. As more companies seek to use AI to reduce cross-language communication costs, real-time speech translation is expected to become the focus of a new round of competition, and DeepL is accelerating its deployment on this track.