Amazon launches NovaSonic, a new generation of generative AI voice model, marking a major breakthrough in the field of artificial intelligence voice.This innovative model can natively process speech input and generate natural and smooth speech output. In terms of core performance indicators such as speed, speech recognition accuracy and dialogue quality, it has reached a level comparable to the cutting-edge speech models of technology giants such as OpenAI and Google.

NovaSonic provides services through the Amazon Bedrock developer platform and uses an innovative two-way streaming API interface to provide strong support for enterprise-level AI application development.Amazon specifically emphasized that this model has significant advantages in cost-effectiveness, and its price is about 80% cheaper than OpenAI’s GPT-4o. It can be called the most cost-effective AI voice solution on the market.

Compared to competing AI speech models, NovaSonic excels at routing user requests to different APIs. This capability allows NovaSonic to know when it needs to obtain real-time information from the Internet, parse proprietary data sources, or take action in external applications and use the appropriate tools to complete the task.

In a two-way conversation, NovaSonic waits for the "right moment" to speak, taking into account the speaker's pauses and interruptions.In addition, NovaSonic can also generate text records for users' speech, and developers can use these texts for various application scenarios.

Rohit Prasad, chief scientist of Amazon's AGI department, revealed that some of NovaSonic's technologies have been used in the upgraded digital assistant Alexa+. The launch of this model is an important step in Amazon's strategy to build artificial general intelligence (AGI). In the future, it will also launch AI models that support multi-modal understanding, covering images, videos and other physical world perception data.