Open SourceFriday, 08 May 2026 · 3 min read

Xiaomi Releases MiMo-V2.5-Pro: MIT-Licensed 1T-Param Agent Model

Xiaomi published MiMo-V2.5-Pro under the MIT License on May 7 — a 1.02-trillion-parameter MoE model with 42B active parameters, 1M-token context, and efficiency benchmarks that challenge closed frontier models.

Xiaomi MiMo model branding graphic showing model architecture overview — ↳ Source: github.com/XiaomiMiMo

Xiaomi published full model weights for MiMo-V2.5-Pro on Hugging Face on May 7 under the MIT License — one of the most permissive open-source terms available — placing a 1.02-trillion-parameter mixture-of-experts model alongside tokenizer, model card, and architecture documentation in immediate public reach.

Architecture and Specs

MiMo-V2.5-Pro is a sparse mixture-of-experts transformer with 1.02 trillion total parameters, 42 billion of which are active at inference time. The design uses 384 routed experts, activating 8 per token pass, across 70 layers — one dense layer followed by 69 MoE layers. A hybrid attention scheme interleaves sliding-window attention (window size 128) and full global attention at a 6:1 ratio across 60 and 10 layers respectively, enabling efficient handling of very long sequences without the full quadratic memory cost of standard self-attention.

The context window is 1 million tokens. Training consumed 27 trillion tokens at a native sequence length of 32,768 tokens, with post-training using supervised fine-tuning, agentic reinforcement learning, and a technique called Multi-teacher On-Policy Distillation (MOPD) that combines signal from multiple teacher models during on-policy rollouts.

Alongside the Pro variant, Xiaomi released MiMo-V2.5 Omni — a 310-billion-parameter sparse MoE with native visual and audio encoders for multimodal input — and smaller dense variants suited for edge and on-device deployment.

Benchmark Performance

On standard academic evaluations the model posts a 99.6% pass rate on GSM8K, 86.2% on the MATH benchmark (4-shot), 75.6% on HumanEval+ for code synthesis, 89.4% on MMLU, 88.4% on Big-Bench Hard, and 57.2% on SWE-Bench Pro — the last of which measures real software engineering task completion. On Chinese-language evaluation, C-Eval sits at 91.5%.

The more revealing numbers come from agentic long-horizon tasks. On ClawEval, a benchmark measuring agent efficiency across multi-step tool-use trajectories, MiMo-V2.5-Pro uses 40 to 60 percent fewer tokens per completed workflow than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 while achieving comparable task success rates. In practical terms: the model built a SysY compiler with a 100 percent test success rate in 4.3 hours using 672 tool calls, and generated approximately 8,000 lines of code for a video editing application over 11.5 hours with 1,868 tool calls. On analog circuit optimization it achieved 10 to 20 times improvement across key design metrics.

These numbers position MiMo-V2.5-Pro not as a challenger on raw capability peaks but as a strongly competitive option on cost-per-useful-output — the metric that actually drives infrastructure decisions for teams running production agent workloads.

Why the MIT License Matters

The choice of MIT over more restrictive licenses — Apache 2.0 with additional use restrictions, or models that allow research but prohibit commercial deployment — removes the legal friction that often delays enterprise adoption of open-weight releases. MIT permits use, modification, and redistribution in commercial products without requiring royalties or attribution beyond a copyright notice. For a model at this scale, that is a deliberately aggressive move: Xiaomi is optimizing for adoption breadth over any licensing revenue.

This follows a pattern visible across recent Chinese open-weight releases, including Kimi K2.6 from Moonshot and Mistral Medium 3.5 — labs are converging on maximally permissive terms to accelerate ecosystem growth and API-layer competition with OpenAI and Anthropic.

Competitive Positioning

Xiaomi is not primarily an AI lab. It is a consumer electronics manufacturer with a smartphone market share large enough to place it among the global top five by volume, and an ambition to build AI into devices and operating systems directly. MiMo-V2.5-Pro's 42 billion active parameters at inference time — drawn from a 1-trillion-parameter pool — means the compute cost per token is closer to a 42B-scale dense model than a 1T monolith, which is compatible with Xiaomi's device integration roadmap as mobile chip capabilities advance.

The Omni variant, with visual and audio encoders, points toward the same destination: a model family designed to handle the multimodal inputs that consumer devices generate — camera, microphone, screen — without routing everything through an external API.

What to Watch

MiMo-V2.5-Pro weights are live on Hugging Face and ModelScope. Early community benchmarking will clarify whether the ClawEval efficiency figures hold up on diverse real-world agentic pipelines. The more consequential question for the open-weight ecosystem is whether a hardware manufacturer's model — optimized for practical deployment over research prestige — can displace larger foundation-model labs in the developer toolchains that are coalescing around long-context agent frameworks.

#Xiaomi#MiMo#open source#MIT license#MoE#agentic AI#multimodal