Naver officially announced that it will completely remove the Chinese Alibaba Qwen 2.5 visual encoder used in its AI model and completely replace it with a self-developed visual encoder.Naver Cloud completed the development of its self-developed visual encoder early last month and has started internalization work, with plans to fully apply it to all multi-modal models in the future.
Naver said that the new encoder is greatly improved based on Naver's original technology "VUClip", and its performance has reached a level comparable to the world's top model Qwen.
The visual encoder is a module in multimodal AI that converts image and video information into understandable data formats, and is called the "optic nerve" of the model.
At the beginning of the year, when Naver participated in the independent AI basic model project led by the Korean government, it caused controversy by partially using Alibaba Qwen 2.5's visual encoder in the HyperCLOVA X SEED 32B Sync model.
On January 15, South Korea’s Ministry of Science, Technology, Information and Communications announced the results of the first round of review. Naver Cloud was directly eliminated due to insufficient model originality and technical independence. NC AI was also eliminated along with it.
At the time, Naver argued that "the visual encoder can be replaced at any time and is not an irreplaceable core component."
Four months later, Naver's new encoder was launched. Its biggest highlight is that it was designed with Korean as the center from the training stage, connecting images and Korean directly without going through an intermediate translation layer.
The person in charge of Naver Cloud emphasized that when processing visual data containing Korean geography, culture, or proper nouns, the new encoder can avoid distortion of the information during the translation process.
However, the encoder replacement plan for the HyperCLOVA X SEED 32B Sync model that has been released as open source has not yet been determined.
