ByteDouBao 2.0 released: Inference cost reduced by an order of magnitude, head-on with GPT-5 and Gemini 3

ByteDance’s Doubao Big Model has officially entered the 2.0 stage, launching a systematically upgraded version for the Agent era.The new version reduces inference costs by about an order of magnitude while maintaining comparable performance to GPT-5.2 and Gemini 3 Pro, providing more competitive solutions for complex task execution in large-scale production environments.

On February 14, ByteDance announced that the Doubao 2.0 series includes three general Agent models, Pro, Lite, and Mini, and a specialized Code model.Among them, the flagship version of Beanbao 2.0 Pro is fully benchmarked against GPT-5.2 and Gemini 3 Pro, reaching the highest level in the industry in most visual understanding benchmark tests, and winning gold medals in the Mathematics Olympiad IMO, CMO and programming competition ICPC.

This series of models is now fully online. Doubao 2.0 Pro has been connected to the "expert" mode of Doubao App, PC and web version, the Code version has been integrated into the AI programming product TRAE, and the Volcano Engine has simultaneously launched API services for enterprises and developers.

Analysis believes that in complex tasks in the real world,Since large-scale inference and long link generation will consume a large amount of tokens, the cost advantage of Doubao 2.0 will become a key competitiveness.. This marks an important step for ByteDance in the commercial application of large models.

Multimodal capabilities reach the world's top level

Doubao 2.0 has comprehensively upgraded its multi-modal capabilities and has outstanding performance in tasks such as visual reasoning, perceptual ability, spatial reasoning and long context understanding.

Dynamic scene understandingIn terms of performance, the model is in a leading position in key evaluations such as TVBench, and even exceeds human scores on the EgoTempo benchmark, showing that it is more stable in capturing information such as changes, movements, and rhythms.

Long video sceneAmong them, Doubao 2.0 outperforms other top models on most evaluations and performs well on multiple streaming real-time Q&A video benchmarks.

This enables it to be used as an AI assistant to complete real-time video stream analysis, environment perception, active error correction and emotional companionship, achieving an interactive upgrade from passive question and answer to active guidance, and can be applied to companionship scenarios such as fitness and dressing.

The reasoning ability is comparable to the top models, and the cost advantage is significant

By strengthening long-tail domain knowledge, Doubao 2.0 Pro scored more than GPT-5.2 on SuperGPQA and won first place on HealthBench. Its overall score in the scientific field is equivalent to Gemini 3 Pro and GPT-5.2.

In the reasoning and agent ability evaluation, the model won gold medals in IMO, CMO Mathematical Olympiad and ICPC Programming Competition, and also surpassed the performance of Gemini 3 Pro on Putnam Bench.

On HLE-text (the final human examination), Doubao 2.0 Pro achieved the highest score of 54.2 points, and also performed well in the tool calling and instruction following tests.

What's more,ByteDance said,This model maintains comparable performance to the industry's top large models while reducing token pricing by about an order of magnitude. This cost advantage will become even more critical in large-scale inference and long link generation scenarios.

Based on the OpenClaw framework and Doubao 2.0 Pro model, ByteDance built an intelligent customer service agent on Feishu.

The Agent can complete customer conversations by calling on different skills. When encountering problems, it will actively recruit real colleagues to ask for help, help customers make appointments for door-to-door repair personnel, and actively return visits and recommend products after repairs.

Code model improves development efficiency

Doubao 2.0 Code is optimized for programming scenarios based on the 2.0 base model, which strengthens code base interpretation capabilities and application generation capabilities, and enhances the model's error correction capabilities in the Agent workflow.This model has been launched in TRAE China version as a built-in model, supporting image understanding and reasoning.

In practical applications, developers use TRAE with Doubao 2.0 Code to construct the basic structure and scene of the "TRAE Spring Festival Town·Year of the Horse Temple Fair" interactive project with only one round of prompt words, and the entire work can be completed after five rounds of prompt words.

The project includes 11 NPCs driven by a large language model. They can chat naturally, greet customers, and bargain on the spot based on their personality. AI tourists will also independently decide which stall to go to, what to buy, and what to say. Relevant prompt words and materials have been open sourced on GitHub for developers to test.

At present, Doubao 2.0 Pro has launched the "expert" mode in Doubao App, computer and web version for C-end users; for enterprises and developers, the Volcano Engine has simultaneously launched Doubao 2.0 series model API services.

Bytedance said that in the future, it will continue to iterate models for real scenarios and explore the upper limit of intelligence.