On Monday evening, Beijing time, OpenAI, a well-known startup in the field of artificial intelligence, released a report titled "
(Source: OpenAI)
ChatGPT has previously launched a "code interpreter" function that can upload images, and has some preliminary capabilities for processing images and text photos. But there is no doubt that today’s “take photos and ask questions” is closer to most users’ AI assistant usage scenarios.
In order of title, there are two main features updated today:
Let’s talk about the picture chat function that has attracted a lot of attention first. According to OpenAI, users can now
In the official example, ChatGPT is given a
Then the official pretended not to understand and took a photo of the bolt.
Then the official took another photo of the toolbox and asked ChatGPT which wrench it was. ChatGPT also successfully recognized the wrench and prompted the user exactly which size to take.
In addition, OpenAI also packages speech recognition, transcription and audio generation functions and launches
According to OpenAI, this feature uses the Whisper open source speech recognition system to transcribe what the user says into text. It also uses a new text-to-speech model and works with professional voice actors to provide 5 voices for users to choose.
OpenAI says its new speech technology is capable of creating realistic synthetic voices from just a few seconds of real speech. This capability opens the door to creativity, but also creates new risks—such as the possibility that criminals may impersonate public figures to commit fraud. So OpenAI’s decision is to launch this feature through specific use cases like “voice chat”.
At the same time, OpenAI is also cooperating with more institutions. For example
Images also bring new challenges, such as hallucination problems and users relying on model interpretation of images in high-risk areas. Therefore, before going online, OpenAI also conducted risk tests in areas such as extremism and scientific capabilities.
In addition, for the Chinese readers who read this article, the experience of picture dialogue is probably worth looking forward to, but the voice dialogue may have to be discounted. OpenAI said,