Skip to main content
youtubeJanuary 27, 2026

Kimi K2.5 (Fully Tested): An Open Weights Model beats OPUS 4.5?

Kimi K2.5 challenges proprietary models at a fraction of the cost: trillion-parameter MoE with vision, agent swarm parallelism, and 5th place on AICodeKing's benchmark—beating Claude Sonnet 4.5 and DeepSeek V3.2.

Key Takeaways

  • Same architecture, new capabilities: K2.5 retains the trillion-parameter mixture of experts (32B active) from K2 but adds native multimodal training on 15 trillion mixed visual and text tokens
  • Vision-first coding: "Coding with vision" lets you show it a website design and get code, or feed it a video workflow and have it understand and implement it
  • Agent swarm paradigm: Spins up to 100 sub-agents executing tasks in parallel, handling up to 1,500 tool calls per session—4.5x faster than single-agent setups
  • Competitive benchmarks: Ranks 5th on AICodeKing's leaderboard at 64%, beating Claude Sonnet 4.5 (62%) and DeepSeek V3.2
  • Cost efficiency: $27 for full benchmark run vs $114 for Claude Opus 4.5 Max and $48 for GPT 5.2x

Benchmark Performance

ModelScoreCost
Gemini 3 Pro100%-
Claude Opus 4.5 Max74%$114
GLM 4.765%-
GPT 5.2x High65%$48
Kimi K2.564%$27
Claude Sonnet 4.562%-
DeepSeek V3.2--

Official benchmarks: 96.1 on AIM 2025, 87.6 on GPQA Diamond, 85 on Live Codebench v6, 76.8 on SWEBench verified.

Model Variants

  • K2.5 Instant: Fast non-thinking mode (temperature 0.6)
  • K2.5 Thinking: Reasoning mode (temperature 1.0)
  • K2.5 Agent: Standard agent mode
  • K2.5 Agent Swarm: Parallel agent execution (beta)

Context window: 256K tokens. OpenAI-compatible API. Weights available on Hugging Face with native int4 quantization.

Notable Quotes

"This is the only model that is a straight-up competitor to something like Opus because vision is always lacking."

"For the price and the fact that it's open weights, this is pretty unbeatable."

Connections

  • 2025-the-year-in-llms - Simon Willison's analysis predicted Chinese labs would dominate open-weight benchmarks; K2.5 proves the pattern continues with Moonshot AI's entry