Kimi K2.5 (Fully Tested): An Open Weights Model beats OPUS 4.5?

Key Takeaways

Same architecture, new capabilities: K2.5 retains the trillion-parameter mixture of experts (32B active) from K2 but adds native multimodal training on 15 trillion mixed visual and text tokens
Vision-first coding: "Coding with vision" lets you show it a website design and get code, or feed it a video workflow and have it understand and implement it
Agent swarm paradigm: Spins up to 100 sub-agents executing tasks in parallel, handling up to 1,500 tool calls per session—4.5x faster than single-agent setups
Competitive benchmarks: Ranks 5th on AICodeKing's leaderboard at 64%, beating Claude Sonnet 4.5 (62%) and DeepSeek V3.2
Cost efficiency: $27 for full benchmark run vs $114 for Claude Opus 4.5 Max and $48 for GPT 5.2x

Official benchmarks: 96.1 on AIM 2025, 87.6 on GPQA Diamond, 85 on Live Codebench v6, 76.8 on SWEBench verified.

Context window: 256K tokens. OpenAI-compatible API. Weights available on Hugging Face with native int4 quantization.

"This is the only model that is a straight-up competitor to something like Opus because vision is always lacking."

"For the price and the fact that it's open weights, this is pretty unbeatable."

2025-the-year-in-llms - Simon Willison's analysis predicted Chinese labs would dominate open-weight benchmarks; K2.5 proves the pattern continues with Moonshot AI's entry