Today, Ant Bailing officially launched Ling-2.6-flash - an Instruct model with a total parameter volume of 104B and an activation parameter of 7.4B.This model focuses on "Token Efficiency". While maintaining a competitive intelligence level, it is faster, more economical and more suitable for large-scale real applications.
According to authoritative third-party evaluation Artificial Analysis data, Ling-2.6-flash demonstrates outstanding Token Efficiency advantages, achieving an Intelligence Index of 26 points with 15M output tokens, while maintaining a strong level of intelligence while controlling output consumption at a relatively low position.

It is understood that Ling-2.6-flash follows the hybrid linear architecture design of Ling 2.5. This highly sparse MoE architecture has obvious advantages in hardware performance.
Under the condition of 4 cards H20, the inference speed can reach up to 340 tokens/s, and the prefill throughput reaches 2.2 times that of Nemotron-3-Super.
In the Output Speed evaluation, Ling-2.6-flash ranked first among models of the same parameter level with a stable output speed of 215 tokens/s.
From the perspective of Token consumption, Ling-2.6-flash’s I/O ratio has been significantly improved.
In the full Artificial Analysis evaluation, the total consumption of Ling-2.6-flash was 15M tokens, while models such as Nemotron-3-Super reached or exceeded 110M tokens. This means that Ling-2.6-flash only uses about 1/10 of the token consumption to complete similar evaluation tasks.


Ling-2.6-flash has made targeted enhancements for Agent scenarios. It still maintains strong task execution capabilities while controlling Token consumption. The model has reached the SOTA level of the same size on Agent-related benchmarks such as BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench.
At the same time, Ling-2.6-flash maintains excellent levels in dimensions such as general knowledge, mathematical reasoning, instruction following, and long text parsing.

In terms of API pricing, Ling-2.6-flash is priced at US$0.1 per million tokens for input and US$0.3 for output.Currently, the API of Ling-2.6-flash has been officially opened to users, and a one-week limited-time free trial is provided.
Users can obtain corresponding services through OpenRouter and Bailing large model tbox. It is understood that the model will subsequently release a commercial version, LingDT, through Ant Digital to serve global developers and small and medium-sized enterprises.