If you want to talk about the top topic in the big model circle these two days, it is definitely DeepSeekV3. However, as netizens are testing it one after another, a bug has become the focus of heated discussion——There is just one missing question mark, DeepSeekV3 actually calls itself ChatGPT.
Even if you ask it to tell a joke, the result will be the same as ChatGPT:
In addition, one of the highlights of DeepSeekV3’s popularity this time is that the training cost only US$5.576 million.
Ever since, some people began to doubt:Isn't it trained on the output of ChatGPT?
What a coincidence,AltmanI also posted a status, which seemed to be sarcastic...
But DeepSeekV3 is not the first to appear"Reporting the wrong door"large model.
For exampleGeminiHe once said that he was from BaiduWenxinyiyan
So what is going on?
Why does DeepSeekV3 report the wrong door?
The first thing that needs to be emphasized is that from the current overall discussion point of view among netizens, it is said that DeepSeekV3 is trained on the output of ChatGPTUnlikely
The reason for saying this is as summarized by netizen RileyGoodside——Because the shadow of ChatGPT is everywhere.
For example, ShareGPT is a ChatGPT conversation data set that is not new, and many people have tried to adapt it and other ChatGPT data sources. But even so, there was no large model at the DeepSeekV3 level.
Immediately afterwards, RileyGoodside took out some evidence from the DeepSeekV3 report:
For example, in the Pile test (the effect of basic model compressing Pile), the score of DeepSeekV3 is almost the same as that of Llama3.1405B, which has nothing to do with whether it is exposed to ChatGPT data.
Moreover, the report states that 95% of GPU-hours are used to pre-train the basic model. Even if it is related to ChatGPT data, this part will occur in the post-training stage (the last 5%).
Rather than using ChatGPT data, perhaps we should pay more attention to why large models frequently have the problem of "reporting the wrong home".
TechCrunch gave a sharp comment on this issue:
After all, an EU report predicted that by 2026, 90% of online content may be generated by AI.
This kind of "AI pollution" will make it difficult to "completely filter the AI output by training data".
Heidy Khlaaf, chief scientist of AINow Institute, said:
Models accidentally trained on ChatGPT or GPT-4 output will also not necessarily exhibit output reminiscent of OpenAI's custom messages.
So now for the hotly discussed issue among netizens, qubits have been tested in a wave of experiments. DeepSeekV3 has not yet solved this bug.
There is still a missing question mark, so the answer will be different:
More ways to play DeepSeekV3
However, most netizens have greatly affirmed the capabilities of DeepSeekV3.
This can be confirmed by the fact that AI tycoons from all walks of life collectively call it "elegant".
In the past two days, netizens have successively posted more images blessed by DeepSeekV3Practical gameplay
For example, some netizens competed with DeepSeekV3 and ClaudeSonnet3.5 and used them respectively in ScrollHub.Create website
Video address: https://mp.weixin.qq.com/s/ieCfWqC5gsJ-Oc7-_L3uDQ?token=904287848&lang=zh_CN
After testing, the blogger believes that DeepSeekV3 completely wins!
Some netizens shared how to use DeepSeekV3 toAI video editorexperience in.
He said that there is no need to waste time on FFMPEG commands in the future. DeepSeekV3 is not only free, but can also change your workflow:
Video address: https://mp.weixin.qq.com/s/ieCfWqC5gsJ-Oc7-_L3uDQ?token=904287848&lang=zh_CN
AI programming artifactCursorIt can also be combined with DeepSeekV3 to see a case of making a snake:
Video address: https://mp.weixin.qq.com/s/ieCfWqC5gsJ-Oc7-_L3uDQ?token=904287848&lang=zh_CN
Well, DeepSeekV3 is somewhat easy to use on your body.
OneMoreThing
Regarding the 53-page paper previously published, some netizens also paid attention to a non-technical detail——
The contribution list shows not only technical staff, but also data annotation and business staff:
Netizens believe that this approach is very consistent with DeepSeek’s tone:
[1]https://techcrunch.com/2024/12/27/why-deepseeks-new-ai-model-thinks-its-chatgpt/
[2]https://x.com/victormustar/status/1872647314231398524
[3]https://x.com/breckyunits/status/1872422078592516295
[4]https://x.com/op7418/status/1872689338242482203
[5]https://x.com/goodside/status/1872911457857208596
[6]https://x.com/kevinsxu/status/1873146905846530472