On March 20, the AI programming tool Cursor released the self-developed model Composer 2, claiming to be the company’s first result of “continued pre-training combined with reinforcement learning” on a base model. The release blog did not mention the origin of the base model, but the wording seemed to say that it was made by Cursor himself.
In less than two hours, a developer named Fynn intercepted the real model ID of Composer 2, kimi-k2p5-rl-0317-s515-fast, while debugging the Cursor API. Taking it apart, kimi-k2p5 points to Kimi K2.5, rl is Reinforcement Learning, followed by the date and version number.

Du Yulun, the person in charge of pre-training for Dark Side of the Moon, tweeted immediately, saying that the team tested the tokenizer of Composer 2 and found that it was "completely consistent" with Kimi's tokenizer, which almost confirmed that "this is a further fine-tuning of our model." He directly @Cursor co-founder Michael Truell, asking why the license was not followed and no fees were paid. The tweet was later deleted.

But the fire was already burning. Musk replied "Yeah, it's Kimi 2.5" to Fynn's tweet, which directly made the matter a hot search.

From "sheltering" to "cooperating", the reversal only took a few hours
Kimi K2.5 adopts a modified version of the MIT license, which clearly requires that commercial products with monthly revenue exceeding US$20 million or monthly active users exceeding 100 million must have "Kimi K2.5" prominently marked on the user interface. Cursor’s annual revenue is approximately US$2 billion, exceeding this threshold by more than 8 times.
But on the same day as public opinion fermented, the plot reversed. The official account of Dark Side of the Moon @Kimi_Moonshot posted a message, changing his tone from questioning to congratulations, saying "We are proud to see Kimi K2.5 providing the basis for Composer 2" and clarifying that Cursor has been authorized for use through the inference service provider Fireworks AI.

Cursor co-founder Aman Sanger later explained that the team evaluated multiple base models and Kimi K2.5 was the "strongest", and then based on this, additional pre-training and reinforcement learning of 4 times the scale were performed. He admitted that not mentioning the Kimi K2.5 in the launch blog was a mistake.

From the open source agreement dispute to the official announcement of cooperation, the entire process took less than 24 hours.
Why Cursor "mistakes"
This is not the first time Cursor has been found to have a "base from China". When Composer 1 was released in November 2025, the community speculated through tokenizer analysis that it was highly consistent with DeepSeek, and would occasionally output Chinese during inference. Cursor also did not respond at that time.
From DeepSeek to Kimi, the base of Cursor's self-developed model has changed a round, all pointing to the same fact. The base model with the strongest programming capabilities in the world comes from China's open source community.
Cursor's reluctance to disclose the source of the base has a deeper structural problem behind it. Cursor has always relied on the models of Anthropic and OpenAI to drive products, but these two companies are now making programming tools themselves. Claude Code and Codex are both rapidly spreading, and many developers have begun to migrate. The paradox faced by Cursor is that it must rely on top models to meet user needs, but model manufacturers are also its direct competitors. Without its own controllable model base, Cursor will always be controlled by others.
From this perspective, it is almost an inevitable logic to choose China's open source model for fine-tuning, which is strong enough but will not become one's own competitors. But this is also the reason why Cursor is unwilling to talk publicly. In 2025, it will be the hottest star in the AI programming track, with a valuation of US$29.3 billion. On March 12, Bloomberg reported that the new round of financing target valuation is about US$50 billion. Admitting that the core model comes from the Chinese open source community at this juncture is not friendly to the valuation narrative.
Composer 2 scored 61.3 points on CursorBench designed by Cursor, surpassing Claude Opus 4.6's 58.2 points, but after all, this is a report card of self-administered questions and self-examination. On the other hand, if a fine-tuned product based on an open source model can compete with giants in programming tasks, this matter itself may be more interesting than Cursor's disclosure mistake. Clément Delangue, co-founder of Hugging Face, commented, "China's open source is now the biggest force shaping the global AI technology stack."
For Dark Side of the Moon, the result of this controversy was almost a perfect brand event. From the "infringed party" to the "partner", it gained a sense of presence in the global developer community. In the end, Cursor personally confirmed that "Kimi K2.5 was chosen because it is the strongest."
Kimi's "Golden Week"
Going back a few days, Kimi has just experienced an extremely dense exposure cycle.
On March 16, Dark Side of the Moon released a purely architectural-level technical paper "Attention Residuals" (Attention Residuals), trying to replace a basic component in the Transformer architecture that has been almost untouched since ResNet in 2015, the residual connection. In the past, the output and input of each layer were directly added and transmitted without distinction. The Kimi team allows each layer to "look back" and dynamically select which previous layers to extract information from. Experiments show that the training efficiency is improved by about 25%, and the inference delay is increased by less than 2%. One of the co-authors of the paper is a 17-year-old Shenzhen high school student, alongside Kimi's key researchers Su Jianlin and Zhang Yu.

On the night the paper was released, Musk commented on X as "Impressive work from Kimi", and Kimi officially replied, "Your rocket is not bad either." Andrej Karpathy said, "It seems that we haven't understood the phrase 'Attention is All You Need' literally." Former OpenAI reinforcement learning VP Jerry Tworek calls it the beginning of "deep learning 2.0".
The next day, March 17, Huang Renxun mentioned the Chinese open source model many times in the Keynote of GTC 2026. Kimi K2.5 replaced last year's DeepSeek and became the benchmark model used by Huang Renxun to demonstrate the importance of reasoning to the world.
On March 18, Yang Zhilin directly gave a speech at the GTC sub-forum. He is the only representative from an independent large model startup company on the guest list, along with Tesla AI Director and DeepMind core architect. The lecture was also packed with people. He systematically disclosed the technical route behind Kimi K2.5 and summarized the model evolution into three dimensions: token efficiency, long context and agent cluster.
Before DeepSeek became completely popular, the Chinese open source model team that shared the most on GTC used to be DeepSeek.
Papers, GTC, and Cursor, three things were launched one after another within a week, and these dazzling highlights all have the meaning of "changing times" with DeepSeek: in the past, every paper of DeepSeek was sought after and reposted by the global technology community and KOL bosses. In the past, GTC was almost an "unofficial" launch conference of DeepSeek. Even Cursor was "quietly hidden" by DeepSeek, and in an instant, everything turned into Kimi, the Dark Side of the Moon.
Standing in DeepSeek’s shoes
This has made many people realize that Kimi is occupying DeepSeek's position in the global AI community.
The outbreak of DeepSeek R1 in early 2025 has reshaped the perception of the entire industry, turning "Chinese AI" from a vague concept into a specific and operational model weight. But since then, DeepSeek has been relatively quiet. V4/R2, which the community has been waiting for for a long time, has not been released. V3.1, V3.2 and other versions have been continuously updated, but the impact of "rewriting the rules as soon as the game is launched" has not been reproduced for the time being.
Kimi happened to step into this window period.
After the Spring Festival in 2025, Kimi Daily was under pressure. Dark Side of the Moon cut off a large amount of its marketing budget and started working on models behind closed doors. In July, Kimi K2 was released with a trillion-parameter MoE architecture. After the release of K2, the number of downloads on Hugging Face's first day exceeded all other models on the platform. Anthropic co-founder Jack Clark evaluated it as "the best open source weight model in the world."
At the end of January 2026, K2.5 was released, with native multi-modality and Agent cluster architecture, and won the best open source award in the world in multiple Agent evaluations. After the OpenClaw craze arrived, Kimi Claw quickly went online. According to reports, less than a month after the release of K2.5, Kimi’s cumulative revenue in the past 20 days exceeded the entire year of 2025. Stripe data shows that payment orders from individual Kimi subscribers increased by 8,280% month-on-month in January.
The pace at the capital level is also accelerating. The C round of US$500 million at the end of 2025, with a post-money valuation of US$4.3 billion; in February 2026, it exceeded US$700 million, and the valuation rose to US$10 billion; in mid-March, a new round of US$1 billion is underway, and the valuation has reached US$18 billion. The market values of Zhipu and MiniMax, which were listed on the Hong Kong stock market during the same period, reached the level of HK$330 billion and HK$380 billion respectively in mid-March. Dark Side of the Moon has not yet entered the secondary market. Judging from the current premium of the AI sector, there is not much room for imagination after listing.

In this way, Kimi used DeepSeek's method to take away DeepSeek's halo.
The architecture of Kimi K2 is directly derived from DeepSeek V3. The MLA attention mechanism and MoE expert hybrid framework are the first to be pioneered or verified on a large scale by DeepSeek. The rise of Kimi itself is a continuation of the influence of DeepSeek technology. DeepSeek's open source strategy is also more thorough, using a pure MIT license without any revenue threshold restrictions, which allows it to accumulate a very high penetration rate in the global developer ecosystem. Kimi's modified MIT license has an additional layer of restrictions on commercial use. This Cursor incident is an example.
During DeepSeek's relatively quiet period, Kimi took over the microphone from the "China AI Open Source Representative". Whether it’s Jen-Hsun Huang’s podium, Cursor’s model base, or academic papers and developer communities, Kimi is filling a narrative space that requires constant fresh content.
Moreover, what Kimi does is not just to produce models. The Attention Residuals paper touches on the underlying structure of deep learning that has not changed substantially in ten years. This is the same approach as DeepSeek's MLA in the past, both of which are trying to redefine the industry's infrastructure.
The story of China's AI open source is changing from "a DeepSeek" to a story where new players are constantly emerging to take away the halo. This is increasingly similar to the rhythm of Silicon Valley. OpenAI is followed by Google, Google is followed by Anthropic, and then the cycle continues.
China's open source models alternately take over the timelines of developers around the world. While the model capabilities are spiraling, the right to speak is not left aside: when the new model of DeepSeek appears, will Kimi's attention be taken away; will the new work of MiniMax, Qwen, Wisdom, Step, and the equally aggressive newcomer Xiaomi suddenly take away their leading positions? These are all allowing this spiral alternation to continue, and this is a good thing for every Chinese AI participant.