articleJanuary 1, 2026

2025: The Year in LLMs

Simon Willison's comprehensive year-in-review analyzing how reasoning models, autonomous agents, and Chinese AI competition fundamentally reshaped the landscape in 2025.

Reasoning Models Dominated - Started with OpenAI's o3/o3-mini/o4-mini and spread across the industry. Excel at breaking down complex problems through intermediate steps.

Agents Became Real - After years of hype, LLM agents finally delivered practical value, especially in coding and search applications.

Claude Code as Game-Changer - Released quietly in February, emerged as the year's most impactful development. Async versions enabled background task completion.

Chinese Labs Rising - DeepSeek's Christmas 2024 release triggered market corrections. Qwen, Moonshot AI, Z.ai, and GLM-4.7 dominated open-weight benchmarks with OSI-approved licenses.

Notable Developments

  • $200/month subscriptions justified by token-intensive agent usage
  • Image editing revolution generated 100M ChatGPT signups in one week
  • Academic victories - gold-medal IMO and ICPC performances
  • Browser integration raised prompt injection security concerns
  • MCP adoption then pivot - bash access proved superior for coding agents
  • Local model progress - GPT-4-class on consumer hardware

Warnings

Normalization of Deviance - Repeated YOLO-mode agent successes create dangerous risk acceptance, mirroring Challenger disaster patterns.

The Lethal Trifecta - Prompt injection combining data access, external communication, and untrusted content exposure.

See 12-factor-agents for principles on building production-grade agents, and writing-a-good-claude-md for Claude Code configuration best practices.

Connections (26)