Today, Alibaba officially released Qwen-Image-2.0, a new generation image generation and editing model.As the image generation model base of Qianwen's large model, Qwen-Image-2.0 integrates image generation and editing. It scored 1029 points in the AI ​​Arena image generation evaluation, surpassing models such as Seedream4.5 and Flux2-Max, and second only to Google Nano Banana Pro and GPT Image1.5.

Qwen-Image-2.0 supports 1K token ultra-long text input and 2K high resolution. It can accurately render complex instructions and easily generate professional PPTs and infographics. The quality is comparable to that of professional photographers. At the same time, Qwen-Image-2.0 has extremely strong Chinese character rendering capabilities, and the full text of hundreds of ancient texts can almost be fully rendered in the picture.

Qwen-Image-2.0 is a new upgrade based on the two major models of Qwen-Image and Qwen-Image-Edit. For the first time, image generation and editing are unified into one model. With a lighter model architecture, the performance of image generation and image modification is greatly improved.

The texture of Qwen-Image-2.0's generated images is particularly delicate, ranging from the wrinkles of an old man's wrinkles to the vastness of the universe. Commonly used images of people, nature, buildings, etc. generated by the model are extremely lifelike.

In the authoritative evaluation AI Arena, Qianwen's new model scored 1029 in image generation, ranking third; it scored 1034 in picture editing, second only to Nano Banana Pro.

In terms of Chinese character rendering, Qwen-Image-2.0 performs extremely well. Not only can it accurately render Chinese characters in a variety of fonts, it can also write many and accurately, and the effect is better than Nano Banana Pro.

Qianwen's new model expands the input prompt words to 1K tokens, which can describe tasks in detail, achieve more professional text rendering, and easily handle complex images such as professional PPTs, advanced posters, and multi-frame comics. For example, the hundreds of words of the full-text illustrations of "The Preface to the Lanting Collection" are almost completely rendered in small regular script fonts, and complex PPTs with essay format illustrations are generated in natural language.

Based on the Qwen-Image-2.0 model, users can collaborate with AI to create richer and more practical pictures, such as a flow chart for generating Kung Pao Chicken in one sentence, a two-day travel guide to Hangzhou, a 4x6 multi-frame comic group picture, a children's picture book picture, a realistic-style movie poster, an extremely realistic green jungle, etc.;

At the same time, users can also upload several pictures for editing to generate selfies with multiple gestures, emoticons with real people, realistic AI photos of two people, poems with pictures, etc.