youtubeJanuary 27, 2026
Kimi K2.5 (Fully Tested): An Open Weights Model beats OPUS 4.5?
by aicodeking
Kimi K2.5 challenges proprietary models at a fraction of the cost: trillion-parameter MoE with vision, agent swarm parallelism, and 5th place on AICodeKing's benchmark—beating Claude Sonnet 4.5 and DeepSeek V3.2.
Key Takeaways
- Same architecture, new capabilities: K2.5 retains the trillion-parameter mixture of experts (32B active) from K2 but adds native multimodal training on 15 trillion mixed visual and text tokens
- Vision-first coding: "Coding with vision" lets you show it a website design and get code, or feed it a video workflow and have it understand and implement it
- Agent swarm paradigm: Spins up to 100 sub-agents executing tasks in parallel, handling up to 1,500 tool calls per session—4.5x faster than single-agent setups
- Competitive benchmarks: Ranks 5th on AICodeKing's leaderboard at 64%, beating Claude Sonnet 4.5 (62%) and DeepSeek V3.2
- Cost efficiency: $27 for full benchmark run vs $114 for Claude Opus 4.5 Max and $48 for GPT 5.2x
Benchmark Performance
| Model | Score | Cost |
|---|---|---|
| Gemini 3 Pro | 100% | - |
| Claude Opus 4.5 Max | 74% | $114 |
| GLM 4.7 | 65% | - |
| GPT 5.2x High | 65% | $48 |
| Kimi K2.5 | 64% | $27 |
| Claude Sonnet 4.5 | 62% | - |
| DeepSeek V3.2 | - | - |
Official benchmarks: 96.1 on AIM 2025, 87.6 on GPQA Diamond, 85 on Live Codebench v6, 76.8 on SWEBench verified.
Model Variants
- K2.5 Instant: Fast non-thinking mode (temperature 0.6)
- K2.5 Thinking: Reasoning mode (temperature 1.0)
- K2.5 Agent: Standard agent mode
- K2.5 Agent Swarm: Parallel agent execution (beta)
Context window: 256K tokens. OpenAI-compatible API. Weights available on Hugging Face with native int4 quantization.
Notable Quotes
"This is the only model that is a straight-up competitor to something like Opus because vision is always lacking."
"For the price and the fact that it's open weights, this is pretty unbeatable."
Connections
- 2025-the-year-in-llms - Simon Willison's analysis predicted Chinese labs would dominate open-weight benchmarks; K2.5 proves the pattern continues with Moonshot AI's entry