Image generation has now become a key function driving the development of AI applications

The latest application data analysis shows that in 2026, the protagonist that can really drive the growth of mobile AI application downloads has shifted from the more "smart" large models themselves to visual functions with generative images as the core. According to statistics from Appfigures, version updates that use image models as a selling point bring about 6.5 times more new downloads than "regular updates" that focus on language or reasoning ability upgrades.

This change marks a clear shift in the focus of the AI wave. In the early days, what pushed users to try AI applications was mainly the iteration of dialogue models and the improvement of interaction methods such as voice. These functions are still important today, but they are no longer able to significantly stimulate user interest in a short period of time as they did in the past. In contrast, features that directly generate shareable visual content are more likely to attract attention on social media and app stores.

The latest product rhythms of several head platforms well confirm this trend. After Google's Gemini application launched the image model Nano Banana, the number of installations jumped significantly; within 28 days after the Gemini 2.5 Flash image model was launched, there were more than 22 million new downloads, and the growth rate was about four times more than its previous average for the same period of time. This series of updates shows that even if the changes to the underlying model itself are not earth-shaking, as long as there are new gameplay on the "visible" image side, it will be enough to move the download curve in the short term.

OpenAI’s ChatGPT has experienced similar growth after integrating GPT‑4o image generation capabilities. In the first 28 days after the new feature was launched, the app saw more than 12 million new installs. Comparative data from Appfigures points out that this download peak is about 4.5 times the increase brought about by previous model upgrades such as GPT‑4o, GPT‑4.5 and GPT‑5, which further confirms that for most new users, “visible” image functions are more innovative than text performance improvements that are difficult to intuitively perceive.

This model of visual content-driven growth is not limited to static images. Meta’s AI product, Vibes, which focuses on short-form videos generated by AI, brought about 2.6 million additional downloads to the app in its first month since its launch in September 2025. Although in form it emphasizes video, in essence it is still a visual AI tool that pursues "fast production and easy sharing". It and the image generation function jointly point in the same direction: using more direct visual feedback to shorten the user's path from curiosity to dissemination.

However, a surge in downloads does not automatically equate to an increase in revenue. The data also revealed an obvious “growth-realization” gap. Taking Gemini as an example, although Nano Banana achieved strong new installation performance within 28 days after its release, it only contributed approximately US$181,000 in estimated spending on the consumer side during the same period. Meta’s Vibes is equally impressive in driving installed numbers, but there is little sign of driving corresponding revenue growth. This shows that for most products, the image function is currently more like a "customer acquisition tool" rather than a direct monetization engine.

At this point, ChatGPT is one of the few exceptions that “breaks the curse.” Its GPT‑4o image model not only brought in a large number of new users, but also significantly increased paid conversions: within 28 days of the new feature going live, the app’s estimated user spend was approximately $70 million higher than baseline levels. This set of data shows that the image function does have the potential to undertake the dual tasks of "attracting new users" and "monetizing" at the same time, but only if its positioning and charging design in the product structure are clear enough so that users are willing to pay for it and not just regard it as a free "toy filter".

Not all hot AI products rely on image capabilities to drive growth. The R1 model released by DeepSeek in January 2025, without prominent image or video capabilities, also drove about 28 million downloads in a short period of time. The difference is that this wave of rise is more due to industry attention and topic effects - especially the widespread discussion caused by its low-cost training route and related technical paths in the technology circle, rather than a specific type of generative visual characteristics.

Even so, judging from the current overall data, the trend is clear enough: in mobile scenarios, visual AI functions are becoming the first entry point for a large number of users to access an AI application. For ordinary users, pictures and short videos that can be quickly generated and shared immediately are often more attractive than more abstract “inference enhancement” and “model upgrade”. The evolution of underlying model capabilities is still important, but it is increasingly being "hidden" in the background. What ultimately determines whether users are willing to download, try, or even recommend an app are often explicit and easy-to-disseminate image and video features.