Advanced Context Engineering for Coding Agents

Overview

Contemporary AI models struggle with large, complex codebases. Stanford research found that AI tools often generate redundant code requiring rework—performing worse than traditional development in established codebases. This guide argues the problem isn't model capability but context management.

The core thesis: strategic context engineering, termed "frequent intentional compaction," enables productive AI-assisted development without sacrificing code quality. The only lever affecting output quality is input quality—keep context utilization in the 40-60% range.

Key Features

Research-Plan-Implement workflow for complex codebases
Context compaction techniques for maintaining model performance
Subagent delegation patterns for isolated context-intensive tasks
Real-world case studies with measurable outcomes

The Research-Plan-Implement Workflow

Three phases keep the agent in the "smart zone":

Research: Understand codebase architecture and problem scope. Output: compressed document of what matters.
Plan: Outline precise steps with file names, line snippets, verification procedures. A good plan makes even simple models succeed.
Implement: Execute the plan in phases with continuous status compaction.

Context Management Principles

The optimization hierarchy:

Correctness — Eliminate false information
Completeness — Include necessary details
Size — Minimize token consumption

Compaction techniques include intentional distillation of search results, subagent delegation for focused tasks, and frequent markdown documentation of progress.

Case Studies

BAML Bug Fix (300k LOC Rust) Single developer unfamiliar with the codebase achieved one-shot PR approval within 24 hours—demonstrating feasibility in unfamiliar domains.

Cancellation & WASM Support (35k LOC) Two engineers, 7 hours total effort. Estimated at 3-5 days for senior engineers. Shows scalability on complex features.

The Failure Case Parquet-Java dependency removal attempt highlighted boundaries: insufficient codebase expertise and incomplete dependency analysis defeated the workflow. Single-domain specialists remain necessary for some problems.

Human Leverage

"A flawed specification cascades through hundreds of lines; flawed research generates thousands of bad lines."

Code review focus shifts from implementation inspection to specification/plan validation—capturing errors at highest-leverage points. When AI ships 2-3x more code, plans become the artifact for peer review.

Connections

no-vibes-allowed-solving-hard-problems-in-complex-codebases - The talk version of this guide by Dex Horthy, with the same Research-Plan-Implement framework
12-factor-agents - Same author's principles for building context-aware production agents
context-engineering-guide - The broader map connecting context engineering techniques across AI tools