From global popularity, to successful financing, to being exposed to deletion of blogs, layoffs, and running away to Singapore, Manus took just four months to demonstrate the entrepreneurship of an emerging track. Some people think that Manus got off to a bad start, using Chinese engineering resources to build products, quickly raising funds, laying off employees and running away... Amid the controversy, in the early hours of this morning, the company's co-founder Ji Yichao rarely spoke out and published a blog of several thousand words in an attempt to bring public opinion back to the product and technology itself. He also publicly responded to the key lessons behind this ups and downs for the first time.

From explosion to controversy in four months

Let’s briefly review it first. In March this year, Manus became popular for its concept of "the world's first universal agent". At that time, some people said that this was China's "second DeepSeek moment."

In May, Manus quickly completed a $75 million Series B round of financing led by Silicon Valley’s top venture capital Benchmark, with its valuation soaring to $500 million. The outside world had extremely high expectations for it.

However, at the end of June, Manus was suddenly exposed by the media to many controversial incidents: some employees claimed to have been laid off without warning, the founding team deleted their blogs on social platforms on a large scale, and the company's main body moved to Singapore, causing public outcry.

For a time, blog deletion, layoffs, and running away became the main labels of this star agent startup company.

The co-founder posted a long post in the early morning

In the face of external doubts, Ji Yichao chose to answer with a long technical article this time, systematically summarizing the team's core understanding of Agent products and technologies for the first time:

1. Choose contextual engineering instead of end-to-end self-developed large models. The founder of Manus tried to train an NLP model from scratch in his previous company, but was eliminated by large models such as GPT-3. After this review, they chose not to develop the underlying model themselves, but to focus on how to do "contextual engineering" based on open source or large commercial models to maximize their existing capabilities.

2. KV cache hit rate is the core indicator of the proxy system. Multi-round intelligent agents are different from single-round chats. The input-output ratio may be as high as 100:1, and long inputs will greatly affect latency and inference costs. The goal of context design is to maximize the KV cache hit rate, which requires the prompt to be stable, the context to be appended but not modified, and the prefix to be reused.

3. Tool management avoids dynamic addition and deletion, and uses masking instead of deletion. With many agent functions, the action space will expand rapidly, making it easier to choose the wrong model. Dynamically adding or removing tools can cause cache invalidation. Manus's practice is to use context state machines to manage tool availability: by shielding Token probability instead of directly removing it from the context, it not only ensures flexibility but also retains cache.

4. Treat the file system as an infinite context. No matter how large the context window of a large model is, it is limited, and very long context will slow down the inference speed and increase the cost. Manus' approach is to treat the file system as the external memory of the agent. Information can be accessed at any time, ensuring that historical status can be checked, read, written, and restored.

5. Use an explicit “recitation” mechanism to control the model’s attention. In long tasks, Manus will automatically generate todo.md, disassemble the task into an executable list, and continuously update it. It will repeatedly write the goal to the end of the context, which is equivalent to "repeatedly reminding the model" to avoid the task going off track midway.

6. Do not erase errors and retain failure information to help the model self-correct. The agent is bound to make mistakes. Instead of hiding errors and starting over, it is better to leave the failure information in the context and let the model "see" the failure path to form negative examples, thereby reducing similar errors.

7. To summarize in one sentence: Context engineering is an emerging experimental science. Manus wants to use context to shape the behavior and capabilities of agents: it is not a competition to see how smart the model is, but a competition to make the model more useful.

Beyond the review, the controversy has not subsided

It can be seen from this blog that Manus is not entirely a "PPT project". It has indeed done a lot of low-level exploration for agent scenarios, and has also overcome many pitfalls.

But this long article did not mention the question that the outside world is most concerned about: Why did the company move to Singapore? How do domestic laid-off employees deal with the aftermath? etc.

Ji Yichao did not answer these questions, nor did he mention them in his blog.

Ji Yichao wrote at the end: "The future of intelligent agents will be gradually constructed one by one. Each situation is carefully designed."

The current reality is, does Manus still have a chance to bring these "scenarios" from technical documents back to real users?

Nothing is settled yet.

Blog post link:

https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus


The following is the original blog post of Manus co-founder Ji Yi (translated by GPT):

Context engineering for AI agents: Lessons learned from building Manus

July 18, 2025 Ji Yichao

At the beginning of the Manus project, my team and I faced a key decision: should we use an open source basic model to train an end-to-end agent model, or should we build an agent based on the contextual learning capabilities of cutting-edge models?

Thinking back to my first ten years in natural language processing, we didn’t have that choice. In BERT's ancient days (yes, it's seven years old), models had to be fine-tuned and evaluated before being transferred to new tasks. Even though the models then were much smaller than today's LLMs, this process often took weeks per iteration. For rapidly developing applications, especially in the early stages of product-market fit, such a slow feedback cycle is fatal. This was a lesson I learned the hard way at my last startup, when I trained models from scratch for open information extraction and semantic search. Then the emergence of GPT-3 and Flan-T5 made my self-developed model irrelevant overnight. Ironically, it’s these models that are ushering in a new era of contextual learning—and giving us a whole new path forward.

This hard-won lesson made the choice clear: Manus would bet on contextual engineering. This allows us to release improvements in hours, rather than weeks, while keeping our product orthogonal to the underlying model: if model progress was a rising tide, we wanted Manus to be a ship, not a pillar anchored to the seabed.

However, context engineering is far from simple. This is an experimental science—we’ve rebuilt the agency framework four times, each time after discovering a better way to shape context. We affectionately call this manual process of architecture search, hint tuning, and empirical guessing "stochastic gradient descent." It's not elegant, but it works.

This post shares the local optimal solution we achieved through our own "SGD". If you are building your own AI agent, hopefully these principles will help you converge faster.

Designed around KV cache

If I could only choose one metric, I think KV cache hit rate is the most important metric for production-stage AI agents. It directly affects latency and cost. To understand why, let’s first look at how a typical proxy works:

After receiving user input, the agent completes the task through a series of tool calls. In each iteration, the model selects an action from a predefined action space based on the current context. The action is then executed in an environment (such as Manus' virtual machine sandbox) to produce observations. Actions and observations are appended to the context, forming the input for the next iteration. This cycle continues until the task is completed.

As you can imagine, the context grows with each step, while the output—usually a structured function call—is relatively short. This makes the ratio between prepopulation and decoding much higher among agents, unlike chatbots. For example, in Manus, the average input-to-output Token ratio is approximately 100:1.

Fortunately, contexts with the same prefix can take advantage of KV caching, which greatly reduces time to first tokenization (TTFT) and inference costs - whether you use a self-hosted model or call the inference API. The savings here are not small: In the case of Claude Sonnet, cached input tags cost $0.30/thousand tags, compared to $3/thousand uncached—a 10x difference.


From a context engineering perspective, improving KV cache hit rate involves several key practices:

Keep prompt prefix stable. Due to the autoregressive nature of LLMs, differences in even a single tag will invalidate that tag and subsequent caches. A common mistake is to include a timestamp at the beginning of the system prompt - especially a timestamp that is accurate to the second. While this allows the model to tell you the current time, it also significantly reduces cache hit rates.

Make your context append-only. Avoid modifying previous actions or observations. Make sure your serialization is deterministic. Many programming languages ​​and libraries do not guarantee a stable ordering of keys when serializing JSON objects, which can silently corrupt caches.

Explicitly mark cache breakpoints when needed. Some model providers or inference frameworks do not support auto-increment prefix caching and instead require manual insertion of cache breakpoints in the context. When setting these breakpoints, you should consider the possibility of cache expiration and at least ensure that the breakpoint includes the end of the system prompt.

Additionally, if you use a framework self-hosted model like vLLM, make sure you enable prefix/hint caching and use techniques like session IDs to route requests consistently across distributed worker nodes.

Mask, not remove

As your agent becomes more powerful, its action space naturally becomes more complex—in short, the number of tools proliferates. The recent popularity of MCP has only added fuel to the fire. If you allow users to customize tools, trust me: someone will plug hundreds of mysterious tools into your carefully curated action space. As a result, the model is more likely to choose the wrong action or take an inefficient path. In short, your reloaded agent becomes dumber.

A natural response would be to design a dynamic action space - perhaps using a RAG-like approach to load tools on demand. We also tried this in Manus. But experiments show a clear rule: avoid dynamically adding or removing tools during an iteration unless absolutely necessary. There are two main reasons:

In most LLMs, tool definitions are usually located at the front of the context after serialization, usually before or after the system prompt. Therefore, any changes will invalidate the KV cache for all subsequent operations and observations.

Models get confused when previous actions and observations still refer to tools that are no longer defined in the current context. Without constrained decoding, this often results in pattern violations or illusory operations.

To solve this problem and improve the effectiveness of action selection, Manus uses a context-aware state machine to manage tool availability. It is not a removal tool, but rather masks the token's log probability during decoding to prevent (or force) certain actions to be selected based on the current context.


In practice, most model providers and inference frameworks support some form of response prepopulation, which allows you to restrict the action space without modifying the tool definition. Function calls usually have three modes (we take NousResearch’s Hermes format as an example):

Automatic – The model can choose whether to call the function. Implemented by prefilling only the reply prefix: <|im_start|>assistant

Required - The model must call a function, but the selection is not restricted. Implemented by pre-populating the tool call tag: <|im_start|>assistant

Specified - The model must call functions from a specific subset. This is achieved by pre-filling to the beginning of the function name: <|im_start|>assistant{"name": “browser_

With this approach, we constrain action selection by directly masking the log-probability of markers. For example, when the user provides new input, Manus must reply immediately rather than perform an action. We also intentionally designed action names to have consistent prefixes—for example, all browser-related tools start with browser_, and command-line tools start with shell_. This allows us to easily ensure that an agent only chooses from a certain set of instruments in a specific state without using a stateful log-probability processor.

These designs help ensure that the Manus agent loop remains stable—even under model-driven architectures.

Use file system as context

Modern cutting-edge LLMs now offer context windows of 128K Tokens or more. But in real-world intelligent agent scenarios, this is often not enough and sometimes even becomes a burden. There are three common pain points:

Observations can be very large, especially when agents interact with unstructured data such as web pages or PDFs. It's easy to exceed contextual limits.

Even if windows are technically supported, model performance tends to degrade beyond a certain context length.

Long inputs are expensive, even with prefix caching. You still have to pay for transfer and pre-population per tag.

To solve this problem, many agent systems implement context truncation or compression strategies. But excessive compression inevitably leads to information loss. The problem is fundamental: the agent essentially has to predict its next move based on all previous states—and you can't reliably predict which observation might become critical ten steps later. From a logical perspective, any irreversible compression is risky.

This is why we consider the file system as the ultimate context in Manus: infinite in size, inherently durable, and directly operable by the agent itself. The model learns to write and read files on demand—using the file system not just as storage, but as a structured external memory.


Our compression strategies are always designed to be recoverable. For example, web page content can be removed from context as long as the URL is preserved, and document content can be omitted as long as the document path remains in the sandbox. This allows Manus to shorten the context length without permanently losing information.

While developing this feature, I couldn't help but imagine what it would take for a state space model (SSM) to work effectively in an agentic environment. Unlike Transformer, SSM lacks a complete attention mechanism and has difficulty handling long-distance backward dependencies. But if they can master file-based memory—externalizing long-term state rather than keeping it in context—then their speed and efficiency could unlock a new generation of agents. The agent-like SSM may be the true successor of the neural Turing machine.

Manipulate attention through recitation

If you have used Manus, you may have noticed an interesting phenomenon: when dealing with complex tasks, it tends to create a todo.md file, and gradually updates it as the task progresses, checking the completed items.

This isn’t just cute behavior—it’s a mechanism for intentionally manipulating attention.


A typical task in Manus requires about 50 tool calls on average. This is a long loop - since Manus relies on LLMs for decision-making, it's easy to get off topic or forget previous goals in long contexts or complex tasks.

By constantly rewriting the to-do list, Manus repeatedly writes his goals to the end of the context. This pushes the global plan into the model's near-term attention span, avoiding the "lost in the middle" problem and reducing goal inconsistency. In effect, it uses natural language to guide itself toward task objectives—no special architectural changes are required.

keep error message

Agents make mistakes. This isn't a vulnerability - it's reality. Language models can hallucinate, environments can return errors, external tools can misbehave, and unexpected edge cases happen all the time. In a multi-step task, failure is not an exception; it is part of the cycle.

However, a common impulse is to hide these errors: clean up the traces, retry the operation, or reset the model state in the hope of a magical "temperature" parameter. This seems safer and more controllable. But it comes at a cost: erasing failure erases the evidence. Without evidence, models cannot fit.


In our experience, one of the most effective ways to improve agent behavior is deceptively simple: keep faulty paths in context. When the model sees a failed operation and its resulting observation or stack trace, it implicitly updates its internal beliefs. This biases its prior away from similar operations, thereby reducing the likelihood of repeating the same mistake.

In fact, we believe that error recovery is one of the clearest measures of true agent behavior. However, this metric remains overlooked in most academic studies and public benchmarks, which typically focus on mission success rates under ideal conditions.

Avoid being limited by a small number of examples

A few sample hints are a common technique to improve LLM output. But in agent systems, it can backfire in subtle ways.

Language models are good at imitation; they replicate patterns of behavior in context. If your context is full of similar past action-observation pairs, the model will tend to follow this pattern, even if this is no longer optimal.

This can be dangerous in tasks that involve repeated decisions or actions. For example, when using Manus to help review a batch of 20 resumes, agents often fell into a rhythm of performing similar actions over and over simply because something similar appeared in the context. This can lead to deviations, overgeneralization, and sometimes even hallucinations.


The solution is to add variety. Manus introduces small amounts of structured variation in actions and observations—different serialization templates, alternative expressions, subtle noise in sequence or format. This controlled randomness helps break patterns and adjust the model's focus.

In other words, don’t let a small number of examples limit you into a fixed pattern. The more unified the context, the more fragile the agent.

in conclusion

Contextual engineering is still an emerging science—but for agent systems, it’s already crucial. Models may become more powerful, faster, and cheaper, but no amount of raw power can replace the need for memory, context, and feedback. How you shape the context ultimately determines how the agent behaves: how fast it runs, how resilient it is, and how far it scales.

At Manus, we learned these lessons through repeated rewrites, walking down dead ends, and real-world testing with millions of users. What we share here are not universal truths, but these are patterns that have worked for us. If they help you avoid even one painful iteration, then this article has served its purpose.

The future of intelligent agents will be built step by step from scenario to scenario. Craft every situation carefully.