On April 9, ByteDance launched Seeduplex, a large-scale native full-duplex voice model, which has now been fully launched on the Doubao App. This model is based on a new framework design of "listen and speak at the same time". Compared with the previous generation half-duplex end-to-end speech model, it achieves real-time interaction of listening and speaking simultaneously, and improves the conversation rhythm, naturalness and anti-interference ability.

According to the official introduction, Seeduplex has overcome engineering challenges such as lagging and stability under high concurrency through model architecture innovation and training optimization. In terms of precise anti-interference, the model has the ability to continuously "listen", understand the acoustic environment in which the user is located, and accurately ignore background noise and irrelevant conversations. In complex scenarios, the false reply rate and false interruption rate are reduced by 50% compared with the half-duplex model. In terms of dynamic decision-making, the model combines speech and semantic features to comprehensively determine the user's intention. It can listen patiently when the user is hesitant and respond quickly after the user has finished speaking. The proportion of preemptive calls is reduced by 40% compared with the half-duplex model, and the decision-making performance is improved by 8%.

Multi-dimensional evaluation shows that Seeduplex is significantly better than the traditional half-duplex solution and the voice call function of mainstream apps in the industry in terms of conversation fluency and rhythm. This model has been the first in the industry to achieve large-scale implementation and can provide continuous high-quality real-time voice interaction experience for hundreds of millions of users.