Alibaba’s new open source image generation model Qwen-Image-Layered enables PS-level layer understanding and image generation within the model for the first time.Qianwen’s new model adopts a self-developed innovative architecture.Can "decompose" pictures into multiple layers, just like a professional designer using Photoshop layered rendering and retouching can achieve almost "zero drift" AI image precision editing, completely solving the consistency problem of AI generated drawings and accelerating the practical implementation of large models in the professional design field.
Qwen-Image-Layered breaks the "flat thinking" of mainstream visual large models. The model establishes a more accurate "physical understanding" of the real world through "layering" and "completion", allowing AI to move from a flat "looking at pictures and talking" to a real "spatial reconstruction."

In the current field of large visual models, image consistency editing is always a core challenge.AI-generated pictures are creative but difficult to edit, mainly because large models understand pictures as flat, with a bunch of pixels tightly coupled together, and cannot perceive the physical relationships such as distance and occlusion of objects in the picture like humans.
Therefore, drawing and editing a large model is like drawing cards to "open a blind box": for example, you want to move the cat in the painting 10 centimeters to the left, but the AI has no idea what will be in the background after the cat moves to the left, so it can only regenerate it again, and both the cat and the background will change.
This kind of randomness that "one hair touches the whole body" means that AI drawing can only be used as a reference in professional fields such as commercial advertising design, UI interface design, and film and television post-processing that pursue the ultimate precision, and cannot truly replace professional tools.
The emergence of Qwen-Image-Layered means that large visual models shift from "pixel prediction" to "structural reorganization".The Qianwen team self-developed a new RGBA-VAE encoding, which introduced the "Alpha channel" representing the transparency layer into the traditional RGB image, giving the model the concept of a layer..
At the same time, the new model adopts the innovative VLD-MMDiT architecture, combined with the unique "layer-level 3D position coding", allowing AI to automatically "brain-fill" the background texture of the occluded parts, achieving a deeper understanding and generation of layers and space.
It is understood that in order to train this ability, the Qianwen team extracted real layer logic from a large number of professional Photoshop (PSD) files, so that the AI has the "layered thinking" of professional designers from birth.

Qwen-Image-Layered model architecture diagram
Industry insiders pointed out that Qianwen’s new model will bring substantial changes to the creative industry. AI drawing is no longer a rigid piece, but a living and infinitely adjustable material library.
Image editing no longer requires complex and sophisticated manual cutout, but AI natively realizes "inherent editability." Designers, animators, and film and television producers can shift, scale, or redraw components of specific layers while keeping the background or subject completely unchanged, significantly improving the production efficiency of digital content creation.
It is understood that Qwen-Image-Layered has been open sourced in the Magic Community and HuggingFace, and developers and enterprises can download it for free for commercial use.
To date, Alibaba has open sourced nearly 400 Qianwen models, with global downloads exceeding 700 million and more than 180,000 derivative models, making it the world's number one open source model. Tongyi Large Model has served more than 1 million customers. Tongyi ranks first in China's enterprise-level large model calling market and is the large model most chosen by Chinese enterprises.