Google's big model of "getting up early to catch the evening market" Is it really "way ahead" this time?

Google, which was the first to launch the Transformer architecture, once lagged behind in the large model competition. Fortunately, as Gemini continues to evolve, Google is returning to the first echelon. On March 26, Gemini2.5Pro was launched. This model topped the major lists as soon as it was launched, and was a full 39 points higher than the second place on ChatbotArena!

Gemini2.5Pro is an inference model. According to Google, reasoning capabilities refer not only to classification and prediction, but to the system's ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions.

It is reported that Gemini2.5Pro currently supports a context window of 1 million tokens, and will soon launch a context window of 2 million tokens, inheriting and carrying forward the advantages of the Gemini model - native multi-modal capabilities and ultra-long context length.

This allows it to understand massive data sets and handle complex problems from multiple sources of information, including text, audio, images, videos, and even complete code repositories.

On ChatbotArena (developed by researchers from SkyLab and LMSYS at the University of California, Berkeley, and mainly used to evaluate the performance of large language models based on human preferences), Gemini2.5Pro ranked first with a significant advantage across all categories, and was a full 39 points higher than the closely followed Grok-3.

At the same time, Gemini2.5Pro also won the only championship in the three major fields of creative writing, instruction following and long query.

In addition, Gemini2.5Pro successfully topped the Vision Arena rankings.

In the field of web development, as the first model with strength comparable to Claude3.7Sonnet, Gemini2.5Pro successfully won the second place in the web development arena (WebDevArena).

Not only that, Gemini2.5Pro also performs well in mathematics and science benchmarks such as Humanity’s LastExam (notools), GPQA and AIME2025.

Humanity’s Last Exam (notools) means “humanity’s last exam (no tools)”. The “no tools” here means that no external tools, such as search engines, databases, etc., are allowed to be used during the exam. Past experiments have shown that the accuracy of state-of-the-art LLMs on HLE is generally less than 10%, and there are problems such as confidence and ability imbalance, low reasoning efficiency, etc., indicating the gap between the capabilities of current LLMs and the cutting-edge capabilities of human experts on closed academic problems. In this context, Gemini2.5Pro’s score of 18.8% is very outstanding.

It is reported that Gemini2.5Pro has been opened to GeminiAdvanced users in Google AI Studio and Gemini applications, and will be launched on VertexAI.

It will announce pricing plans in the next few weeks, and users can apply the model to large-scale production environments under higher usage quotas.