Harness Engineering: Leveraging Codex in an Agent-First World

Summary

Ryan Lopopolo describes a five-month experiment at OpenAI: three engineers used Codex to build an internal product from an empty repository to roughly one million lines of code across 1,500 pull requests. No human wrote any code directly. The article redefines what engineering means when code authorship moves entirely to agents—a discipline Lopopolo calls "harness engineering."

Key Concepts

Harness engineering — The practice of designing environments, specifying intent, and building feedback loops that let AI agents produce reliable work. Engineers become architects of agent capability rather than code producers.
Map, not manual — Documentation should be a navigable map for agents, not a 1,000-page instruction manual. Short, structured, repository-local context outperforms exhaustive guides.
Repository as system of record — All knowledge must live inside the repo to maximize agent legibility. External docs, wikis, and Notion pages create friction that agents can't navigate.
Mechanical enforcement — Architectural invariants and design patterns should be enforced through CI, linters, and automated checks rather than code review. Agents won't learn from review comments the way humans do.
Continuous automated refactoring — With agents generating massive code volumes, automated refactoring prevents technical debt from compounding. Manual cleanup can't keep pace with agent throughput.
Merge philosophy transformation — 3.5 PRs per engineer per day demands different merge strategies than traditional human workflows. The review bottleneck shifts from "is this correct code" to "does this match intent."

The Workflow

Engineers interact with the system almost entirely through prompts: describe a task, run the agent, let it open a pull request. The initial scaffold—repo structure, CI configuration, formatting rules, package manager setup—was itself generated by Codex CLI using GPT-5. Even the first AGENTS.md file was written by Codex, guided by a small set of existing templates.

What Broke

The article is honest about failure modes. Without mechanical enforcement, agents drifted into inconsistent patterns. Without repository-local context, agents made assumptions that compounded into architectural problems. The lesson: agent-scale development amplifies both good patterns and bad ones.

Connections

unrolling-the-codex-agent-loop — The companion article from the same OpenAI Codex team, detailing the internal agent loop architecture that powers the system described here
the-importance-of-agent-harness-in-2026 — Philipp Schmid's argument that agent harnesses are the competitive differentiator as models converge, providing the theoretical framing for the practical experiment Lopopolo describes
feedback-loopable — Making problems feedback-loopable is the prerequisite for the harness engineering approach—agents need text-based validation loops to iterate autonomously
pi-coding-agent-minimal-agent-harness — Mario Zechner's minimal harness philosophy offers a counterpoint: where OpenAI scales with tooling, pi succeeds by stripping the harness to essentials