D God is on the air again? Real or fake? Deepseek has been holding it back for so long, and recently he finally started to let it out. Last week they quietly launched the V4, followed by two major price cuts. . Liang Shen, have you come to save all sentient beings again? As a result, today, it suddenly came and pushed me a wave of gray tests: DeepSeek, which has multi-modal capabilities. To be precise, it is image recognition.
After checking the card, it is true.

Those who want to try something new can open your DeepSeek now and take a look.
If there is a "image recognition mode" in the interface, congratulations, you are the lucky one who was tested internally, and you can directly buy the real multi-modal version of V4 for free.
DeepSeek’s own researcher Chen Xiaokang couldn’t help but post a post. We whales finally have eyes and are no longer the blind monks of the national server!

Why are everyone so excited? In fact, DeepSeek has been criticized for a long time because it does not have multi-modality. The three foreign giants ChatGPT, Gemini, and Claude have long had multi-modal capabilities. Domestic models such as Doubao and Qianwen have also done very well.
As for this domestically produced light that has high hopes, it has been unable to recognize even a picture for so many years. It can only rely on OCR, that is, to recognize the text in the picture. The use experience is really poor.
Now, this shortcoming has finally been filled.
Without further ado, let’s go straight to the test.
First of all, it does get rid of the traditional OCR and can really see the whole picture. You can rest assured about this.
For example, if we give it a piece of text that says "This is a line of red text" written in blue, if we only use traditional OCR, it can only recognize that the text is "This is a line of red text" and it will never recognize that it is blue. (It may even be unrecognizable)

After turning on the visual mode, it can accurately identify that this is a line of blue and red letters, and even sensed my humor.

Not only that, it also has visual reasoning capabilities.
Have you all seen this meme? I believe that with my intelligence, I can definitely understand what is written in the picture.

So I sent it to DeepSeek and asked it to help me analyze the funny points.
After thinking about it, not only did it figure it out, but it also made a localized translation of "Golden Dalia", "Silver Dalia" and "Copper Dalia". It made me laugh.

Then I sent it a random picture taken by a colleague while driving. It was actually quite blurry, and only some information about appearance and lighting effects could be analyzed.

As a result, it guessed that the car was indeed a Subaru, and it took 13 seconds to think about it and came to the conclusion.

Considering that Teacher D is a math expert, we sent him another math-related meme. To be honest, Shichao almost didn’t understand it. It was his brother’s father-in-law.

Teacher D’s explanation is still perfect.
Not only did it understand simple operations, it even saw several homophones in it: taking the real part means removing the imaginary number "i", which means removing the "Eye", which means removing the eyes. The inverted triangle is the gradient, which is "Grad", which is almost the same as "Graduate", so I put a bachelor's hat on my little face.
Those who have forgotten their mathematical knowledge can review it word for word.

By the way, I also tested a few problems in life, such as where to insert this 3.5mm plug.

Where should I plug this square USB port?

Although it is very simple, it can understand my random shots when I am not in focus, and it can be considered competent for daily tasks.
But in fact, according to Shichao’s actual testing, Teacher D’s current version is not invincible.
For example, we gave it a picture, a very beautiful night view of the earth.

DeepSeek also saw it quite clearly and said that this photo came from the International Space Station.

But actually, if you turn over the photo and look at it, you will find This photo is a picture of the city under the sunset. This is an upside-down perspective...
Then I threw it to Gemini, a recognized multi-modal expert... and it really saw it. No, are you so strong even if you've lost your wits?

Still not able to make the king of multimodality try his best, Haji Whale.
Includes the recognition of some faces, and occasionally has troubles. For example, I threw a picture of a bean bag to it, and what it recognized for me was, well, Luo Xiang, the UP leader of station B.

There is also this classic optical illusion problem. The two balls are obviously not the same size, right? As a result, Teacher D thought about it and told me that the two balls were the same size.

But I also took a look at its thinking process. In fact, it had already seen that the ball on the right was bigger, but because it read the question carefully, it felt that this was an illusion given to it, so it chose to deceive itself and said that they were the same size. . Maybe the reinforcement learning is too strong.

The comprehensive evaluation can give you a duality of ghost and god. When you tamp, you tamp, and when you pull it, it is finished. .
But then again, DeepSeek has just grown eyes, so we still have to give it some time to adapt to this world.
Finally, the current AI giants battle has long passed the novice village stage where it only looked at running scores and text output capabilities.
Coding level, multi-modal capabilities, smoothness of calling tools, etc., are basically indispensable.
But the absence of the previous Big D teacher in multi-modal capabilities always made me feel a pity. It seems that everyone is humming and working, but DeepSeeK's Agent capabilities are greatly reduced because of the lack of arms and eyes.
After all, most current models and APIs are multi-modal, or at least have image input capabilities.

We also hope that DeepSeek can update the multi-modal capabilities of image recognition to the API of the new V4 model as soon as possible.
You know, before I was blindfolded, I had already fought many opponents back and forth. . Now take off the blindfold, the performance of tools such as Claude Code, Lobster, Cowork, etc. is expected to be greatly improved.
In addition, judging from the frequency of DeepSeek blowing bubbles to increase presence during this period, it is estimated that there are still a lot of combos waiting to be executed.
No more talk, let’s watch Teacher D’s performance.