2025: The Year in LLMs
Simon Willison's comprehensive year-in-review analyzing how reasoning models, autonomous agents, and Chinese AI competition fundamentally reshaped the landscape in 2025.
Key Trends
Reasoning Models Dominated - Started with OpenAI's o3/o3-mini/o4-mini and spread across the industry. Excel at breaking down complex problems through intermediate steps.
Agents Became Real - After years of hype, LLM agents finally delivered practical value, especially in coding and search applications.
Claude Code as Game-Changer - Released quietly in February, emerged as the year's most impactful development. Async versions enabled background task completion.
Chinese Labs Rising - DeepSeek's Christmas 2024 release triggered market corrections. Qwen, Moonshot AI, Z.ai, and GLM-4.7 dominated open-weight benchmarks with OSI-approved licenses.
Notable Developments
- $200/month subscriptions justified by token-intensive agent usage
- Image editing revolution generated 100M ChatGPT signups in one week
- Academic victories - gold-medal IMO and ICPC performances
- Browser integration raised prompt injection security concerns
- MCP adoption then pivot - bash access proved superior for coding agents
- Local model progress - GPT-4-class on consumer hardware
Warnings
Normalization of Deviance - Repeated YOLO-mode agent successes create dangerous risk acceptance, mirroring Challenger disaster patterns.
The Lethal Trifecta - Prompt injection combining data access, external communication, and untrusted content exposure.
Related
See 12-factor-agents for principles on building production-grade agents, and writing-a-good-claude-md for Claude Code configuration best practices.