New Model Reduces AI Costs and Latency
Ant Group officially announced the release of Ling-2.6-Flash, a new large language model designed to enhance AI efficiency by significantly reducing costs and latency. The announcement was made in Hangzhou, China on 22nd April 2026.
Ling-2.6-Flash leverages a sparse Mixture-of-Experts (MoE) architecture, utilising 104 billion total parameters with only 7.4 billion active. This design delivers high intelligence while minimizing resource usage compared to other models in the market.
Artificial Analysis indicates that the model achieves an Intelligence Index of 26 with only 15 million output tokens. In stark contrast, models such as Nemotron-3-Super require over 110 million tokens, resulting in an 86 per cent reduction in inference cost for developers.
The new model focuses on token efficiency instead of generating excessive tokens to achieve higher benchmark scores. As a result, it strikes an optimal balance between intelligent performance and output cost.
Performance and Cost Efficiency
Under 4-card H20 conditions, Ling-2.6-Flash achieves inference speeds of up to 340 tokens per second and ranks in the top tier of its size class for output speed. Its Prefill throughput is 2.2 times that of Nemotron-3-Super, offering significant speed advantages.
The model is tailored for AI agent applications, excelling in benchmarks such as BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, and PinchBench. It maintains strong capabilities in general knowledge, mathematical reasoning, and long-text analysis while effectively controlling token consumption.
Previously known as ‘Elephant Alpha’, Ling-2.6-Flash was tested on OpenRouter under this codename and quickly gained popularity. It topped the ‘Trending’ charts with daily token calls reaching the 100 billion level.
The API for Ling-2.6-Flash is now available, priced at USD 0.1 for input and USD 0.3 for output per million tokens. A one-week free trial is offered through OpenRouter and the Alipay Tbox, with a commercial version, LingDT, available via Ant Digital Technologies to support global developers and SMEs.
Ant Group, a global digital technology provider and the operator of Alipay, aims to support innovation and efficiency in AI applications with this new release. The launch confirms that Ling-2.6-Flash is the previously anonymous model, now identified with its official name. The company continues to drive advancements in digital technologies, offering solutions that cater to developers and enterprises worldwide.
According to Ant Group, Ling-2.6-Flash is designed to deliver high performance at a fraction of the cost of its competitors, making it an attractive option for businesses looking to leverage AI technology without incurring high expenses.
Last updated: 23 April 2026, 12:19 am

