๐Ÿ“Œ Featured findings (jump to bottom): ๐Ÿ”ฌ Grove โ€” the hidden training-data pipeline ยท ๐Ÿ›ก๏ธ Anti-Distillation โ€” Anthropic's 5-layer defense

Harness Engineering: A Comprehensive Guide Based on Claude Code

A Comprehensive Textbook on AI Agent Infrastructure Design

"The model is the agent. The code is the harness. Build great harnesses. The agent will do the rest."

๐Ÿ“ฃ As featured in Chinese AI media โ€” This Harness Engineering analysis has been republished by Chinese AI publications including QingkeAI (้’็จžAI) and others, with 20,000+ reads and 2,000+ shares across WeChat. Original article.

This tutorial is based on reverse engineering and systematic analysis of the Claude Code source code (~512,664 lines of TypeScript). Written in the style of an academic textbook, it provides an in-depth examination of every design decision, engineering trade-off, and implementation detail of an AI Agent Harness. The text adheres to academic writing conventions, constructing a theoretical framework atop the code analysis and distilling scattered implementation details into reusable design principles.

Intended Audience: AI engineers, Agent system architects, and researchers interested in LLM application infrastructure.

Prerequisites: Familiarity with TypeScript, experience with LLM API calls, and a working knowledge of basic system design concepts.

Harness Engineering Architecture Overview Figure 0-1: Harness Engineering Architecture Overview โ€” The LLM is surrounded by six layers of Harness infrastructure: the Tool System (43+), the Permission Model (5 modes), the Hooks System (26 events x 4 types), the Sandbox (file + network + process isolation), Context Engineering (CLAUDE.md + memory + 4-level compaction), and Settings & Configuration (7-level hierarchy).


Table of Contents


Chapter 1: What Is Harness Engineering?

Imagine you are about to train a wild horse. You would not simply mount it โ€” you would first build fences, prepare reins, and lay out a track. This "infrastructure" is not the horse itself, but without it, even the finest horse remains untamed.

AI Agents are no different. The model (LLM) is the horse โ€” powerful yet unbroken. Harness Engineering is the discipline of building fences, crafting reins, and laying tracks.

1.1 Definition

Harness Engineering is the engineering discipline of designing the environment, constraints, feedback loops, and infrastructure that enable AI Agents to operate reliably at scale.

The term was formally introduced by the OpenAI engineering team in early 2026. They described internal systems comprising "over one million lines of code, none of which were written by humans" โ€” engineers no longer wrote code directly but instead "designed systems that enabled AI Agents to write code reliably."

A simple analogy helps illustrate the concept:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                 โ”‚
โ”‚   Agent = Model (LLM)                          โ”‚
โ”‚   Harness = Everything Else                     โ”‚
โ”‚                                                 โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚  Claude   โ”‚ โ†โ”€โ”€ โ”‚  Tools, Permissions,   โ”‚  โ”‚
โ”‚   โ”‚  Opus/    โ”‚ โ”€โ”€โ†’ โ”‚  Hooks, Sandbox,       โ”‚  โ”‚
โ”‚   โ”‚  Sonnet   โ”‚     โ”‚  Memory, Settings,     โ”‚  โ”‚
โ”‚   โ”‚          โ”‚     โ”‚  MCP, Skills, Agents   โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚      Model              Harness                 โ”‚
โ”‚                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1.2 Three Pillars

To understand Harness Engineering, it is most instructive to decompose it into three pillars. Consider the analogy of constructing a building: Context Engineering is the foundation (ensuring that the right information is in place), Architectural Constraints are the load-bearing walls (ensuring structural integrity), and Entropy Management is the building maintenance (preventing degradation over time).

mindmap
  root((Harness Engineering))
    Context Engineering
      ้™ๆ€ไธŠไธ‹ๆ–‡
        CLAUDE.md
        AGENTS.md
        ่ฎพ่ฎกๆ–‡ๆกฃ
      ๅŠจๆ€ไธŠไธ‹ๆ–‡
        ๆ—ฅๅฟ—ไธŽๆŒ‡ๆ ‡
        Git ็Šถๆ€
        CI/CD ็Šถๆ€
      ไธŠไธ‹ๆ–‡ๅŽ‹็ผฉ
        ๅ››็บง็ฎก้“
        ๆŒ‰้œ€ๅŠ ่ฝฝ
        ่ฎฐๅฟ†็ณป็ปŸ
    Architectural Constraints
      ๆƒ้™ๆจกๅž‹
        5 ็งๆจกๅผ
        7 ็บง่ง„ๅˆ™ๅฑ‚็บง
        AI ๅˆ†็ฑปๅ™จ
      ๅทฅๅ…ท็บฆๆŸ
        Schema ้ชŒ่ฏ
        ๅนถๅ‘ๅฎ‰ๅ…จๆ ‡่ฎฐ
        ๅปถ่ฟŸๅŠ ่ฝฝ
      ๅฎ‰ๅ…จ่พน็•Œ
        ๆฒ™็›’้š”็ฆป
        ็กฌ็ผ–็ ๆ‹’็ป
        ็บตๆทฑ้˜ฒๅพก
    Entropy Management
      ๅฎšๆœŸๆธ…็†
        ๆญปไปฃ็ ๆฃ€ๆต‹
        ๆ–‡ๆกฃไธ€่‡ดๆ€ง
      ็บฆๆŸ้ชŒ่ฏ
        ไพ่ต–ๅฎก่ฎก
        ๆจกๅผๅผบๅˆถ
      ๆ€ง่ƒฝ็›‘ๆŽง
        ่ฆ†็›–็އๅฎˆๅซ
        ๅ›žๅฝ’ๆฃ€ๆต‹

Three Pillars Time Allocation Figure 1-2: Engineering Time Allocation Across the Three Pillars โ€” Context Engineering commands the largest share (45%), because "information the Agent cannot see might as well not exist." Architectural Constraints come second (35%). Entropy Management accounts for 20% but is critical for long-term stability.

Pillar One: Context Engineering

Manages the accessibility, structure, and timing of information. Key techniques include:

Pillar Two: Architectural Constraints

Establishes boundaries through mechanical enforcement rather than suggestions:

A counterintuitive benefit: Constraining the solution space makes the Agent more efficient, not less โ€” by preventing fruitless exploration.

Pillar Three: Entropy Management

Periodic cleanup Agents address code degradation:

Discipline Relationship
Prompt Engineering A subset of Context Engineering (single interaction vs. system-level)
ML Engineering An independent discipline; assumes the model is already deployed
Agent Engineering Complementary; Harness engineers build infrastructure for Agents
DevOps Overlapping infrastructure skills applied to the AI context

1.4 Pause and Reflect

Before continuing, consider the following question:

If you were building an AI coding assistant today, where would you spend 80% of your engineering time โ€” improving the model, or improving the systems surrounding the model?

If your answer is "the model," Harness Engineering challenges that intuition. The LangChain case study demonstrated that modifying only the Harness โ€” without changing the model โ€” yielded a 14 percentage point improvement on benchmarks. The model is a given; the Harness is what you can control.

1.5 Quantitative Evidence: ROI of Harness Investment

Before delving into "why now," let the data speak for itself:

Harness ROI Comparison Figure 1-1: ROI Comparison of Harness Optimization vs. Model Optimization โ€” In both Terminal Bench scores and development cycle reduction, Harness optimization yields returns far exceeding model optimization, while requiring only one-tenth the engineering effort.

Metric Model Optimization Only Harness Optimization Only Combined
Terminal Bench 2.0 Score +3-5% (model upgrade) +14% (LangChain case) +18-20%
Development Cycle Reduction Negligible 10x (OpenAI million-line case) >10x
Engineer Time Investment Months (training/fine-tuning) 1-2 hours (Level 1 Harness) Months
Transferability Model-specific Reusable across models Partially reusable

Key Insight: The return on investment (ROI) of Harness optimization far exceeds that of model optimization. A carefully crafted CLAUDE.md file takes only 30 minutes to write but can boost Agent performance on a given project by 20-40%. By contrast, model fine-tuning requires weeks of effort and substantial computational resources, yet is effective only for specific tasks.

1.6 Why Now?

Three converging factors have given rise to this need:

  1. Model Commoditization โ€” Competitive advantage is shifting from models to systems
  2. Production Deployment โ€” Agents are moving from demos to customer-facing reliability requirements
  3. Benchmark Limitations โ€” Standard metrics cannot capture multi-hour, multi-step Agent stability

Real-world impact: By modifying only the Harness architecture (without switching models), LangChain improved its Terminal Bench 2.0 score from 52.8% to 66.5%, vaulting from the top 30 to the top 5.

1.5 Implementation Tiers

Tier Scope Investment Contents
Level 1 Individual 1-2 hours CLAUDE.md + pre-commit hooks + test suite
Level 2 Small team 1-2 days AGENTS.md specification + CI constraints + shared templates
Level 3 Organization 1-2 weeks Custom middleware + observability + scheduled Agents

Chapter 2: Claude Code Architecture Overview

In the previous chapter we established the theoretical framework. Beginning with this chapter, we validate that theory against a real, production-grade system. That system is Claude Code โ€” Anthropic's official AI coding assistant CLI, comprising over 500,000 lines of TypeScript. It represents the most complete production-grade Agent Harness reference implementation available today.

Why Claude Code? Because it is not an educational project โ€” it is a real product used daily by tens of thousands of developers. Every design decision is backed by real user pain points and genuine engineering trade-offs. By reverse-engineering its architecture, we can learn practical wisdom that "textbooks never cover."

2.1 Technology Stack

Category Technology
Runtime Bun (native TypeScript, high performance)
Language TypeScript (strict mode)
UI Framework React + Ink (terminal components)
CLI Parser Commander.js (@commander-js/extra-typings)
Schema Validation Zod v4
Search Engine ripgrep (invoked via BashTool)
API Client @anthropic-ai/sdk
Protocols MCP SDK, LSP
State Management Custom Zustand-like Store + React Context
Telemetry OpenTelemetry + gRPC
Feature Flags GrowthBook + Bun bun:bundle
Auth OAuth 2.0, JWT, macOS Keychain

LOC Distribution Figure 2-1: Lines of Code Distribution by Directory in Claude Code โ€” tools/ and utils/ are the two largest directories, together accounting for approximately 32% of the codebase. This reflects the centrality of the tool system and infrastructure utilities to the Harness.

Module Counts Figure 2-2: Module Counts by Category โ€” components (144) and commands (101) are the most numerous, reflecting Claude Code's nature as a terminal UI application.

2.2 Scale

2.3 Directory Structure

src/
โ”œโ”€โ”€ main.tsx                    # ๅ…ฅๅฃ็‚น๏ผŒCLI ๅผ•ๅฏผ๏ผˆ803 KB๏ผ‰
โ”œโ”€โ”€ query.ts                    # ๆ ธๅฟƒ Agent ๅพช็Žฏ๏ผˆ68 KB๏ผ‰
โ”œโ”€โ”€ QueryEngine.ts              # LLM ๆŸฅ่ฏขๅผ•ๆ“Ž๏ผˆ46 KB๏ผ‰
โ”œโ”€โ”€ Tool.ts                     # Tool ๅŸบ็ก€ๆŽฅๅฃ๏ผˆ29 KB๏ผ‰
โ”œโ”€โ”€ tools.ts                    # Tool ๆณจๅ†Œ่กจ๏ผˆ25 KB๏ผ‰
โ”œโ”€โ”€ Task.ts                     # ไปปๅŠก็ฑปๅž‹ๅฎšไน‰
โ”œโ”€โ”€ commands.ts                 # ๅ‘ฝไปคๆณจๅ†Œ
โ”‚
โ”œโ”€โ”€ tools/                      # 43 ไธชๅทฅๅ…ท็›ฎๅฝ•
โ”‚   โ”œโ”€โ”€ BashTool/              # Shell ๅ‘ฝไปคๆ‰ง่กŒ
โ”‚   โ”œโ”€โ”€ FileReadTool/          # ๆ–‡ไปถ่ฏปๅ–
โ”‚   โ”œโ”€โ”€ FileWriteTool/         # ๆ–‡ไปถๅˆ›ๅปบ
โ”‚   โ”œโ”€โ”€ FileEditTool/          # ้ƒจๅˆ†ๆ–‡ไปถไฟฎๆ”น
โ”‚   โ”œโ”€โ”€ GlobTool/              # ๆ–‡ไปถๆจกๅผๅŒน้…
โ”‚   โ”œโ”€โ”€ GrepTool/              # ripgrep ๅ†…ๅฎนๆœ็ดข
โ”‚   โ”œโ”€โ”€ AgentTool/             # ๅญ Agent ็”Ÿๆˆ
โ”‚   โ”œโ”€โ”€ SkillTool/             # Skill ๆ‰ง่กŒ
โ”‚   โ”œโ”€โ”€ MCPTool/               # MCP ๆœๅŠกๅ™จ่ฐƒ็”จ
โ”‚   โ”œโ”€โ”€ WebFetchTool/          # URL ๅ†…ๅฎนๆŠ“ๅ–
โ”‚   โ”œโ”€โ”€ WebSearchTool/         # ็ฝ‘้กตๆœ็ดข
โ”‚   โ””โ”€โ”€ ...                    # ๆ›ดๅคšๅทฅๅ…ท
โ”‚
โ”œโ”€โ”€ commands/                   # ~101 ไธชๅ‘ฝไปค็›ฎๅฝ•
โ”‚   โ”œโ”€โ”€ commit/                # Git ๆไบค
โ”‚   โ”œโ”€โ”€ review/                # ไปฃ็ ๅฎกๆŸฅ
โ”‚   โ”œโ”€โ”€ mcp/                   # MCP ็ฎก็†
โ”‚   โ”œโ”€โ”€ skills/                # Skill ็ฎก็†
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ components/                 # 144+ React/Ink ็ปˆ็ซฏ็ป„ไปถ
โ”œโ”€โ”€ hooks/                      # 80+ ่‡ชๅฎšไน‰ React Hooks
โ”œโ”€โ”€ services/                   # 22 ไธชๆœๅŠกๅญ็›ฎๅฝ•
โ”‚   โ”œโ”€โ”€ api/                   # Anthropic API ๅฎขๆˆท็ซฏ
โ”‚   โ”œโ”€โ”€ mcp/                   # MCP ๅ่ฎฎ่ฟžๆŽฅ
โ”‚   โ”œโ”€โ”€ oauth/                 # OAuth ่ฎค่ฏ
โ”‚   โ”œโ”€โ”€ lsp/                   # ่ฏญ่จ€ๆœๅŠกๅ™จๅ่ฎฎ
โ”‚   โ”œโ”€โ”€ compact/               # ๅฏน่ฏๅŽ‹็ผฉ
โ”‚   โ”œโ”€โ”€ plugins/               # ๆ’ไปถๅŠ ่ฝฝ
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ utils/                      # 33+ ๅญ็›ฎๅฝ•๏ผŒ100+ ๆ–‡ไปถ
โ”‚   โ”œโ”€โ”€ permissions/           # ๆƒ้™้€ป่พ‘
โ”‚   โ”œโ”€โ”€ hooks.ts               # Hook ๆ‰ง่กŒๅผ•ๆ“Ž
โ”‚   โ”œโ”€โ”€ hooks/                 # Hook ้…็ฝฎ็ฎก็†
โ”‚   โ”œโ”€โ”€ sandbox/               # ๆฒ™็›’้€‚้…ๅ™จ
โ”‚   โ”œโ”€โ”€ settings/              # ่ฎพ็ฝฎ็ฎก็†
โ”‚   โ”œโ”€โ”€ bash/                  # Shell ๅทฅๅ…ท
โ”‚   โ”œโ”€โ”€ memdir/                # ๆŒไน…่ฎฐๅฟ†็›ฎๅฝ•
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ state/                      # ๅบ”็”จ็Šถๆ€็ฎก็†
โ”œโ”€โ”€ entrypoints/                # CLI/MCP/SDK ๅ…ฅๅฃ
โ”œโ”€โ”€ bridge/                     # IDE ๅŒๅ‘้€šไฟก
โ”œโ”€โ”€ coordinator/                # ๅคš Agent ็ผ–ๆŽ’
โ”œโ”€โ”€ skills/                     # Skill ็ณป็ปŸ
โ”œโ”€โ”€ plugins/                    # ๆ’ไปถ็ณป็ปŸ
โ”œโ”€โ”€ memdir/                     # ่ฎฐๅฟ†็›ฎๅฝ•็ณป็ปŸ
โ”œโ”€โ”€ schemas/                    # Zod ้ชŒ่ฏ Schema
โ”œโ”€โ”€ types/                      # ็ฑปๅž‹ๅฎšไน‰
โ””โ”€โ”€ constants/                  # ๅบ”็”จๅธธ้‡

2.4 Entry Point Flow

main.tsx โ†’ ๅนถ่กŒ้ข„ๅ–๏ผˆMDM่ฎพ็ฝฎ + Keychain + API้ข„่ฟžๆŽฅ๏ผ‰
    โ†“
Commander.js CLI ่งฃๆžๅ™จๅˆๅง‹ๅŒ–
    โ†“
preAction Hook: init() โ†’ ้ฅๆต‹ โ†’ ๆ’ไปถ โ†’ ่ฟ็งป โ†’ ่ฟœ็จ‹่ฎพ็ฝฎ
    โ†“
React/Ink ๆธฒๆŸ“ๅ™จๅฏๅŠจ
    โ†“
ไบคไบ’ๅผ REPL/ๅฏน่ฏๅพช็Žฏ

Design Philosophy: The Claude Code entry point main.tsx (803 KB) employs a lazy loading strategy. Heavy modules (OpenTelemetry, gRPC, analytics) are loaded only when needed, while critical-path resources (MDM settings, Keychain) are prefetched in parallel to ensure fast startup times.

2.5 Core Data Flow Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Claude Code ๆ•ฐๆฎๆตๅ…จๆ™ฏ                             โ”‚
โ”‚                                                                      โ”‚
โ”‚  ็”จๆˆท่พ“ๅ…ฅ โ”€โ”€โ†’ UserPromptSubmit Hook โ”€โ”€โ†’ Slash Command ่งฃๆž           โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     v                                                                โ”‚
โ”‚  QueryEngine.submitMessage()                                         โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     โ”œโ”€โ†’ ็ณป็ปŸๆ็คบๆž„ๅปบ: base + tools + CLAUDE.md + MCP + memory       โ”‚
โ”‚     โ”œโ”€โ†’ ๆถˆๆฏ่ง„่ŒƒๅŒ–: normalizeMessagesForAPI()                        โ”‚
โ”‚     โ”‚   โ”œโ”€ ้‡ๆŽ’ๅบ attachment ๆถˆๆฏ                                    โ”‚
โ”‚     โ”‚   โ”œโ”€ ๅˆๅนถ่ฟž็ปญ user/assistant ๆถˆๆฏ                              โ”‚
โ”‚     โ”‚   โ”œโ”€ ๅ‰ฅ็ฆป PDF/ๅ›พ็‰‡้”™่ฏฏ็š„้‡ๅคๅ†…ๅฎน                                โ”‚
โ”‚     โ”‚   โ”œโ”€ ่ง„่ŒƒๅŒ–ๅทฅๅ…ทๅ็งฐ๏ผˆๅˆซๅโ†’ๆญฃๅผๅ๏ผ‰                              โ”‚
โ”‚     โ”‚   โ””โ”€ ๅทฅๅ…ทๆœ็ดขๅผ•็”จๅ—ๅค„็†                                        โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     v                                                                โ”‚
โ”‚  queryLoop() [while(true)]                                           โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     โ”œโ”€โ†’ ๅŽ‹็ผฉ็ฎก้“: snip โ†’ micro โ†’ collapse โ†’ auto                    โ”‚
โ”‚     โ”œโ”€โ†’ API ่ฐƒ็”จ: deps.sample() [ๆตๅผ]                               โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     โ”œโ”€โ†’ ๅทฅๅ…ทๆ‰ง่กŒ: StreamingToolExecutor (ๅนถๅ‘) / runTools (้กบๅบ)     โ”‚
โ”‚     โ”‚   โ”‚                                                            โ”‚
โ”‚     โ”‚   โ”œโ”€โ†’ ๅทฅๅ…ทๅˆ†ๅŒบ: partitionToolCalls()                           โ”‚
โ”‚     โ”‚   โ”‚   โ”œโ”€ isConcurrencySafe=true โ†’ ๅนถๅ‘ๆ‰ง่กŒ                     โ”‚
โ”‚     โ”‚   โ”‚   โ””โ”€ isConcurrencySafe=false โ†’ ไธฒ่กŒๆ‰ง่กŒ                    โ”‚
โ”‚     โ”‚   โ”‚                                                            โ”‚
โ”‚     โ”‚   โ””โ”€โ†’ ๆฏไธชๅทฅๅ…ท:                                                โ”‚
โ”‚     โ”‚       โ”œโ”€ Zod schema ้ชŒ่ฏ                                       โ”‚
โ”‚     โ”‚       โ”œโ”€ tool.validateInput()                                  โ”‚
โ”‚     โ”‚       โ”œโ”€ PreToolUse Hook                                       โ”‚
โ”‚     โ”‚       โ”œโ”€ ๆƒ้™ๆฃ€ๆŸฅ (rules โ†’ mode โ†’ classifier)                 โ”‚
โ”‚     โ”‚       โ”œโ”€ Sandbox ๅŒ…่ฃ… (BashTool)                               โ”‚
โ”‚     โ”‚       โ”œโ”€ tool.call() [ๅฎž้™…ๆ‰ง่กŒ]                                โ”‚
โ”‚     โ”‚       โ””โ”€ PostToolUse Hook                                      โ”‚
โ”‚     โ”‚                                                                โ”‚
โ”‚     โ”œโ”€โ†’ ้”™่ฏฏๆขๅค: 7 ไธช continue ็ซ™็‚น                                โ”‚
โ”‚     โ””โ”€โ†’ Stop Hook โ†’ ็ปˆๆญขๆˆ–็ปง็ปญ                                      โ”‚
โ”‚                                                                      โ”‚
โ”‚  ็ปˆๆญข โ†’ SessionEnd Hook โ†’ ่ฝฌๅฝ•ไฟๅญ˜ โ†’ ้€€ๅ‡บ                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.6 Message Type System

Claude Code defines a rich message type system, with each type following a distinct processing path within the Agent Loop:

// src/types/message.ts โ€” ๆถˆๆฏ็ฑปๅž‹ๅฑ‚ๆฌก
type Message =
  | UserMessage           // ไบบ็ฑป่พ“ๅ…ฅ๏ผˆๆˆ–ๅทฅๅ…ท็ป“ๆžœ๏ผ‰
  | AssistantMessage      // ๆจกๅž‹ๅ“ๅบ”๏ผˆๆ–‡ๆœฌ + ๅทฅๅ…ท่ฐƒ็”จ๏ผ‰
  | AttachmentMessage     // ่ฎฐๅฟ†/่ต„ๆบ้™„ไปถ
  | SystemMessage         // ็ณป็ปŸๆถˆๆฏ
  | SystemLocalCommandMessage  // ๆœฌๅœฐๅทฅๅ…ท็ป“ๆžœ๏ผˆbash, read ็ญ‰๏ผ‰
  | ToolUseSummaryMessage // ๅŽ‹็ผฉๅŽ็š„ๅทฅๅ…ทๅކๅฒ
  | TombstoneMessage      // ๅทฒๅˆ ้™คๆถˆๆฏๆ ‡่ฎฐ
  | ProgressMessage       // ๆตๅผ่ฟ›ๅบฆๆ›ดๆ–ฐ

Message normalization (normalizeMessagesForAPI) is a complex pipeline that handles:


Chapter 3: Agent Loop โ€” The Heart of the Harness

If the Harness is an automobile, the Agent Loop is its engine. No matter how luxurious the seats or how advanced the airbags, without an engine the car cannot move.

This is the most important chapter in the entire book. We will dissect Claude Code's core loop โ€” queryLoop() โ€” line by line, examining how a single while(true) drives the entire AI coding assistant. By the end of this chapter, you will have a thorough understanding of "how an Agent works," from the lowest level to the highest.

The Agent Loop is the most critical component of the entire Harness. In Claude Code, it is implemented as the queryLoop() function in src/query.ts.

3.1 Basic Architecture: Infinite Loop + Async Generator

Below is the actual signature and initialization of queryLoop from Claude Code's real source code (src/query.ts):

// src/query.ts โ€” ็œŸๅฎž็š„ๅ‡ฝๆ•ฐ็ญพๅ
async function* queryLoop(
  params: QueryParams,
  consumedCommandUuids: string[],
): AsyncGenerator<
  | StreamEvent
  | RequestStartEvent
  | Message
  | TombstoneMessage
  | ToolUseSummaryMessage,
  Terminal
> {
  // ===== ไธๅฏๅ˜ๅ‚ๆ•ฐ โ€” ๅพช็ŽฏๆœŸ้—ดๆฐธไธ้‡ๆ–ฐ่ต‹ๅ€ผ =====
  const {
    systemPrompt, userContext, systemContext,
    canUseTool, fallbackModel, querySource,
    maxTurns, skipCacheWrite,
  } = params
  const deps = params.deps ?? productionDeps()

  // ===== ๅฏๅ˜่ทจ่ฟญไปฃ็Šถๆ€ =====
  // ๅพช็Žฏไฝ“ๅœจๆฏๆฌก่ฟญไปฃๅผ€ๅง‹ๆ—ถ่งฃๆž„ๆญคๅฏน่ฑกไปฅไฟๆŒ่ฃธๅ่ฏปๅ–ใ€‚
  // Continue ็ซ™็‚นๅ†™ๅ…ฅ `state = { ... }` ่€Œไธๆ˜ฏ 9 ไธช็‹ฌ็ซ‹่ต‹ๅ€ผใ€‚
  let state: State = {
    messages: params.messages,
    toolUseContext: params.toolUseContext,
    maxOutputTokensOverride: params.maxOutputTokensOverride,
    autoCompactTracking: undefined,
    stopHookActive: undefined,
    maxOutputTokensRecoveryCount: 0,
    hasAttemptedReactiveCompact: false,
    turnCount: 1,
    pendingToolUseSummary: undefined,
    transition: undefined,  // ไธบไป€ไนˆไธŠๆฌก่ฟญไปฃ continue ไบ†
  }

  // ้ข„็ฎ—่ทŸ่ธช่ทจๅŽ‹็ผฉ่พน็•Œ๏ผˆๅพช็Žฏๅฑ€้ƒจ๏ผŒไธๅœจ State ไธŠ๏ผ‰
  let taskBudgetRemaining: number | undefined = undefined

  // ๆŸฅ่ฏข้…็ฝฎๅฟซ็…ง๏ผˆไธ€ๆฌกๆ€งๆ•่Žท็Žฏๅขƒ/statsig/ไผš่ฏ็Šถๆ€๏ผ‰
  const config = buildQueryConfig()

  // ่ฎฐๅฟ†้ข„ๅ–๏ผˆไฝฟ็”จ `using` ็กฎไฟๅœจ็”Ÿๆˆๅ™จ้€€ๅ‡บๆ—ถๆธ…็†๏ผ‰
  using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
    state.messages, state.toolUseContext,
  )

  while (true) {
    // ... ๅพช็Žฏไฝ“๏ผˆไธ‹ๆ–‡่ฏฆ่งฃ๏ผ‰
  }
}

State Type Definition (this is the "skeleton" of the loop):

type State = {
  messages: Message[]
  toolUseContext: ToolUseContext
  autoCompactTracking: AutoCompactTrackingState | undefined
  maxOutputTokensRecoveryCount: number
  hasAttemptedReactiveCompact: boolean
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
  stopHookActive: boolean | undefined
  turnCount: number
  transition: Continue | undefined  // ไธŠๆฌก่ฟญไปฃไธบไฝ• continue
}

The state diagram below illustrates the complete lifecycle of queryLoop โ€” each state corresponds to a phase within the loop, and each edge corresponds to a transition reason:

stateDiagram-v2
    [*] --> Compaction: ่ฟ›ๅ…ฅๅพช็Žฏ
    Compaction --> APICall: ๅŽ‹็ผฉๅฎŒๆˆ
    APICall --> ToolExecution: ๆœ‰ tool_use ๅ—
    APICall --> StopHooks: ๆ—  tool_use ๅ—
    APICall --> CollapseRetry: 413 ้”™่ฏฏ
    APICall --> ReactiveCompact: collapse ๅคฑ่ดฅ
    APICall --> EscalateTokens: max_output_tokens
    APICall --> MultiTurnRetry: ๅ‡็บงๅŽไปๆˆชๆ–ญ
    APICall --> FallbackModel: FallbackTriggeredError

    CollapseRetry --> Compaction: continue site 1
    ReactiveCompact --> Compaction: continue site 2
    EscalateTokens --> Compaction: continue site 3
    MultiTurnRetry --> Compaction: continue site 4
    FallbackModel --> Compaction: continue site 6

    ToolExecution --> Compaction: continue site 7\n๏ผˆๆญฃๅธธไธ‹ไธ€่ฝฎ๏ผ‰

    StopHooks --> [*]: ๆญฃๅธธๅฎŒๆˆ
    StopHooks --> Compaction: blocking error\ncontinue site 5
    StopHooks --> [*]: hook ้˜ปๆญข็ปง็ปญ

    ReactiveCompact --> [*]: ๆขๅคๅคฑ่ดฅ
    MultiTurnRetry --> [*]: ้‡่ฏ• 3 ๆฌกๅŽ่€—ๅฐฝ

Simplified Logic Flow (for comprehension):

while (true) {
    // 1. ่งฃๆž„็Šถๆ€
    const { messages, toolUseContext, ... } = state;

    // 2. ๅŽ‹็ผฉ็ฎก้“
    // 3. ๆž„ๅปบ็ณป็ปŸๆ็คบ + ่ง„่ŒƒๅŒ–ๆถˆๆฏ
    // 4. ่ฐƒ็”จ LLM API๏ผˆๆตๅผ๏ผ‰
    // 5. ๆ”ถ้›† tool_use ๅ—
    // 6. ้”™่ฏฏๆขๅค๏ผˆ7 ไธช continue ็ซ™็‚น๏ผ‰
    // 7. ๅทฅๅ…ทๆ‰ง่กŒ
    // 8. Stop Hook โ†’ ็ปˆๆญขๆˆ–็ปง็ปญ
    // 9. ๆ›ดๆ–ฐ็Šถๆ€ โ†’ continue
}

Design Philosophy:

3.2 The Seven Continue Sites

Claude Code's queryLoop has 7+ continue sites, each corresponding to a distinct recovery scenario:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 queryLoop()                      โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 1: Proactive Compaction     โ”‚   โ”‚
โ”‚  โ”‚ Trigger: tokens exceed threshold          โ”‚   โ”‚
โ”‚  โ”‚ Action: autocompact โ†’ new messages โ†’      โ”‚   โ”‚
โ”‚  โ”‚         continue                          โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 2: Prompt Too Long          โ”‚   โ”‚
โ”‚  โ”‚ Trigger: API returns prompt-too-long      โ”‚   โ”‚
โ”‚  โ”‚ Action: context-collapse โ†’ reactive       โ”‚   โ”‚
โ”‚  โ”‚         compact                           โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 3: Max Output Tokens        โ”‚   โ”‚
โ”‚  โ”‚ Trigger: model output truncated           โ”‚   โ”‚
โ”‚  โ”‚ Action: escalate 8kโ†’64k โ†’ multi-turn     โ”‚   โ”‚
โ”‚  โ”‚         retry (up to 3 times)             โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 4: Fallback Model           โ”‚   โ”‚
โ”‚  โ”‚ Trigger: FallbackTriggeredError           โ”‚   โ”‚
โ”‚  โ”‚ Action: switch model โ†’ retry request      โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 5: Stop Hook Blocking       โ”‚   โ”‚
โ”‚  โ”‚ Trigger: user Hook requests additional    โ”‚   โ”‚
โ”‚  โ”‚          turns                            โ”‚   โ”‚
โ”‚  โ”‚ Action: inject Hook message โ†’ continue    โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 6: Image/Media Errors       โ”‚   โ”‚
โ”‚  โ”‚ Trigger: ImageSizeError /                 โ”‚   โ”‚
โ”‚  โ”‚          ImageResizeError                 โ”‚   โ”‚
โ”‚  โ”‚ Action: reactive compact (remove images)  โ”‚   โ”‚
โ”‚  โ”‚         โ†’ continue                        โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Continue Site 7: Tool Execution           โ”‚   โ”‚
โ”‚  โ”‚ Trigger: normal tool execution complete   โ”‚   โ”‚
โ”‚  โ”‚ Action: collect results โ†’ update state โ†’  โ”‚   โ”‚
โ”‚  โ”‚         continue                          โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ return Terminal โ€” the only exit point      โ”‚   โ”‚
โ”‚  โ”‚ Condition: no tool calls + Stop Hook does โ”‚   โ”‚
โ”‚  โ”‚ not block                                 โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3.3 The Compaction Pipeline

The context window is finite โ€” even a 1M token window can be filled during long conversations. Claude Code implements a four-level compaction pipeline, one of its most sophisticated subsystems.

flowchart TD
    subgraph Pipeline["ๅŽ‹็ผฉ็ฎก้“๏ผˆๆฏ่ฝฎ่ฟญไปฃๆ‰ง่กŒ๏ผ‰"]
        direction TB
        S["Level 1: Snip\nๅކๅฒๆˆชๆ–ญ\nๆˆๆœฌ: ๆžไฝŽ | ๅปถ่ฟŸ: ~0ms"]
        MC["Level 2: Microcompact\n่€ๅŒ–ๅทฅๅ…ท็ป“ๆžœ็ผฉๅ‡\nๆˆๆœฌ: ไฝŽ | ๅปถ่ฟŸ: ~1ms"]
        CC["Level 3: Context-Collapse\n่ฏปๆ—ถๆŠ•ๅฐ„๏ผˆไธไฟฎๆ”นๆ•ฐ็ป„๏ผ‰\nๆˆๆœฌ: ไธญ | ๅปถ่ฟŸ: ~5ms"]
        AC["Level 4: Autocompact\nLLM ๅ…จๅฏน่ฏๆ‘˜่ฆ\nๆˆๆœฌ: ้ซ˜ | ๅปถ่ฟŸ: ~2s"]
    end

    S -->|"้‡Šๆ”พๅฐ‘้‡ token"| MC
    MC -->|"่พน็•Œๆถˆๆฏๅปถ่ฟŸ"| CC
    CC -->|"ๅฆ‚ๆžœไป่ถ…้˜ˆๅ€ผ"| AC
    CC -->|"ๅฆ‚ๆžœไฝŽไบŽ้˜ˆๅ€ผ"| Skip["่ทณ่ฟ‡ Autocompact\nไฟ็•™็ฒ’ๅบฆไธŠไธ‹ๆ–‡"]

    classDef light fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef medium fill:#fef9c3,stroke:#ca8a04,color:#713f12
    classDef heavy fill:#fee2e2,stroke:#dc2626,color:#7f1d1d
    classDef skip fill:#f3f4f6,stroke:#6b7280,color:#374151

    class S,MC light
    class CC medium
    class AC heavy
    class Skip skip

Source Code Annotation Analysis: Regarding the execution order of Microcompact and Snip, the source code comments: "Apply snip before microcompact (both may run โ€” they are not mutually exclusive)... snipTokensFreed is plumbed to autocompact: snip's threshold check must reflect what snip removed." This reveals a subtle data-flow dependency: the number of tokens freed by Snip must be propagated to Autocompact's threshold check; otherwise Autocompact would underestimate the freed space, triggering unnecessary full-conversation summarization.

Regarding Context-Collapse, the comments state: "Nothing is yielded โ€” the collapsed view is a read-time projection... summary messages live in the collapse store, not the REPL array." This means Level 3 modifies no data structures โ€” it only changes how data is "read." This design allows collapse to persist across turns and remain fully reversible.

Each level operates independently, but a strict execution order constraint applies:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Compaction Pipeline                     โ”‚
โ”‚                                                         โ”‚
โ”‚  Level 1: Snip Compact (every turn)                     โ”‚
โ”‚  โ”œโ”€ Feature-gated history truncation                    โ”‚
โ”‚  โ”œโ”€ Tracks freed token count                            โ”‚
โ”‚  โ””โ”€ Lightest weight, near-zero latency                  โ”‚
โ”‚                                                         โ”‚
โ”‚  Level 2: Microcompact (every turn)                     โ”‚
โ”‚  โ”œโ”€ Replaces tool results 3+ turns old with             โ”‚
โ”‚  โ”‚  "[Previous: used {tool}]"                           โ”‚
โ”‚  โ”œโ”€ Caches compacted results                            โ”‚
โ”‚  โ””โ”€ Defers boundary messages until API response         โ”‚
โ”‚     (when cache_deleted_input is known)                  โ”‚
โ”‚                                                         โ”‚
โ”‚  Level 3: Context-Collapse (read-time projection)       โ”‚
โ”‚  โ”œโ”€ Does not modify the message array; projects at      โ”‚
โ”‚  โ”‚  read time instead                                   โ”‚
โ”‚  โ”œโ”€ Progressively drains collapsible context by         โ”‚
โ”‚  โ”‚  granularity                                         โ”‚
โ”‚  โ””โ”€ Low cost, incremental                               โ”‚
โ”‚                                                         โ”‚
โ”‚  Level 4: Autocompact (triggered at >50k tokens)        โ”‚
โ”‚  โ”œโ”€ Saves full transcript to disk                       โ”‚
โ”‚  โ”œโ”€ LLM summarizes all messages                         โ”‚
โ”‚  โ”œโ”€ Replaces all messages with summary                  โ”‚
โ”‚  โ””โ”€ Heaviest weight, but frees the most space           โ”‚
โ”‚                                                         โ”‚
โ”‚  Execution order: snip โ†’ micro โ†’ context-collapse โ†’     โ”‚
โ”‚                   auto                                   โ”‚
โ”‚  Levels are not mutually exclusive and can run in        โ”‚
โ”‚  combination                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Design Philosophy:

3.4 Tool Execution Orchestration

Claude Code features two tool execution modes, both of which coexist in production:

Mode 1: StreamingToolExecutor (Default โ€” Execute While Streaming)

// src/services/tools/toolOrchestration.ts โ€” ็œŸๅฎžไปฃ็ 
export class StreamingToolExecutor {
  private tools: TrackedTool[] = []
  private toolUseContext: ToolUseContext
  private hasErrored = false
  // ๅญ AbortController๏ผšๅฝ“ไธ€ไธช Bash ๅทฅๅ…ทๅ‡บ้”™ๆ—ถ๏ผŒ
  // ๅ…„ๅผŸๅญ่ฟ›็จ‹็ซ‹ๅณๆญปไบก๏ผŒไฝ†ไธไธญๆญข็ˆถ็บงๆŸฅ่ฏข
  private siblingAbortController: AbortController
  private discarded = false

  addTool(block: ToolUseBlock, assistantMessage: AssistantMessage): void {
    const toolDefinition = findToolByName(this.toolDefinitions, block.name)
    if (!toolDefinition) {
      // ๅทฅๅ…ทไธๅญ˜ๅœจ โ†’ ็ซ‹ๅณๅˆ›ๅปบ้”™่ฏฏ็ป“ๆžœ
      this.tools.push({
        id: block.id, block, assistantMessage,
        status: 'completed',
        isConcurrencySafe: true,
        pendingProgress: [],
        results: [createUserMessage({
          content: [{
            type: 'tool_result',
            content: `<tool_use_error>Error: No such tool: ${block.name}</tool_use_error>`,
            is_error: true,
            tool_use_id: block.id,
          }],
        })],
      })
      return
    }

    // ่งฃๆž่พ“ๅ…ฅๅนถๅˆคๆ–ญๆ˜ฏๅฆๅฏๅนถๅ‘
    const parsedInput = toolDefinition.inputSchema.safeParse(block.input)
    const isConcurrencySafe = parsedInput?.success
      ? Boolean(toolDefinition.isConcurrencySafe(parsedInput.data))
      : false

    this.tools.push({
      id: block.id, block, assistantMessage,
      status: 'queued', isConcurrencySafe,
      pendingProgress: [],
    })

    void this.processQueue()  // ็ซ‹ๅณๅผ€ๅง‹ๅค„็†
  }

  // ๅฝ“ๆตๅผๅ›ž้€€ๅ‘็”Ÿๆ—ถ๏ผŒไธขๅผƒๆ‰€ๆœ‰ๅพ…ๅค„็†ๅ’Œ่ฟ›่กŒไธญ็š„ๅทฅๅ…ท
  discard(): void { this.discarded = true }
}

Key design: addTool() is called during the model's streaming generation. Whenever a complete tool_use JSON block is identified, the tool is immediately queued for execution. While the model is still generating the second tool call, the first is already running.

Mode 2: runTools() (Fallback โ€” Execute After Partitioning)

// src/services/tools/toolOrchestration.ts โ€” ็œŸๅฎžไปฃ็ 
export async function* runTools(
  toolUseMessages: ToolUseBlock[],
  assistantMessages: AssistantMessage[],
  canUseTool: CanUseToolFn,
  toolUseContext: ToolUseContext,
): AsyncGenerator<MessageUpdate, void> {
  let currentContext = toolUseContext

  // ๆ ธๅฟƒ่ฎพ่ฎก๏ผšๅทฅๅ…ทๅˆ†ๅŒบ
  for (const { isConcurrencySafe, blocks } of partitionToolCalls(
    toolUseMessages, currentContext,
  )) {
    if (isConcurrencySafe) {
      // ===== ๅช่ฏปๆ‰นๆฌก๏ผšๅนถๅ‘ๆ‰ง่กŒ =====
      const queuedContextModifiers: Record<string, ((ctx) => ctx)[]> = {}
      for await (const update of runToolsConcurrently(blocks, ...)) {
        if (update.contextModifier) {
          // ๆ”ถ้›†ไธŠไธ‹ๆ–‡ไฟฎๆ”นๅ™จ๏ผŒๅปถ่ฟŸๅบ”็”จ
          queuedContextModifiers[update.contextModifier.toolUseID]
            ?.push(update.contextModifier.modifyContext)
        }
        yield { message: update.message, newContext: currentContext }
      }
      // ๆ‰นๆฌกๅฎŒๆˆๅŽ๏ผŒๆŒ‰้กบๅบๅบ”็”จๆ‰€ๆœ‰ไธŠไธ‹ๆ–‡ไฟฎๆ”น
      for (const block of blocks) {
        for (const modifier of queuedContextModifiers[block.id] ?? []) {
          currentContext = modifier(currentContext)
        }
      }
    } else {
      // ===== ๅ†™ๅ…ฅๆ‰นๆฌก๏ผšไธฒ่กŒๆ‰ง่กŒ =====
      for await (const update of runToolsSerially(blocks, ...)) {
        if (update.newContext) currentContext = update.newContext
        yield { message: update.message, newContext: currentContext }
      }
    }
  }
}

Tool Partitioning Algorithm (partitionToolCalls):

Input: [Read("a.ts"), Read("b.ts"), Write("c.ts"), Read("d.ts")]

Partition result:
  Batch 1: { isConcurrencySafe: true,  blocks: [Read("a.ts"), Read("b.ts")] }
  Batch 2: { isConcurrencySafe: false, blocks: [Write("c.ts")] }
  Batch 3: { isConcurrencySafe: true,  blocks: [Read("d.ts")] }

Execution order: Batch 1 concurrent โ†’ Batch 2 serial โ†’ Batch 3 concurrent

Design Philosophy: Read-only tools (Read, Glob, Grep) are inherently safe for concurrent execution โ€” they do not modify state. Write tools (Write, Edit, Bash) must execute serially because they may depend on the side effects of preceding tools. The partitioning algorithm groups consecutive tools of the same type, achieving an optimal balance between safety and performance. Context modifiers (contextModifier) are collected and applied lazily, ensuring context consistency during concurrent execution.

Why does this matter? Suppose the Agent needs to read 10 files to answer an architecture question. Without tool partitioning, these 10 Read operations would execute sequentially โ€” each potentially taking tens of milliseconds. With partitioning, they execute concurrently, and total time approximates that of the slowest individual read. In practice, this reduces "codebase reading" operations from seconds to milliseconds. This optimization produces a "perceptual fluency" for the user โ€” you need not understand the mechanism, but you notice the speed.

3.5 Error Recovery Cascade (with Real Source Code)

Claude Code implements a cascading recovery strategy for recoverable errors. The following is real code from src/query.ts:

Prompt Too Long (413) Recovery

// src/query.ts โ€” ็œŸๅฎž็š„ 413 ๆขๅคไปฃ็ 
if (isWithheld413) {
  // ็ฌฌ 1 ๆญฅ: ๆŽ’็ฉบ context-collapse ้˜Ÿๅˆ—
  // ๅชๆœ‰ๅœจไธŠๆฌก transition ไธๆ˜ฏ collapse_drain_retry ๆ—ถๆ‰ๅฐ่ฏ•
  // ๏ผˆๅฆ‚ๆžœๅทฒ็ปๆŽ’็ฉบ่ฟ‡ไฝ†ไป็„ถ 413๏ผŒ่ทณ่ฟ‡็›ดๆŽฅ่ฟ›ๅ…ฅ reactive compact๏ผ‰
  if (feature('CONTEXT_COLLAPSE') && contextCollapse
      && state.transition?.reason !== 'collapse_drain_retry') {
    const drained = contextCollapse.recoverFromOverflow(messagesForQuery, querySource)
    if (drained.committed > 0) {
      state = { ...state,
        messages: drained.messages,
        transition: { reason: 'collapse_drain_retry', committed: drained.committed },
      }
      continue  // โ† Continue Site 1
    }
  }
}

// ็ฌฌ 2 ๆญฅ: Reactive Compact๏ผˆๅฎŒๆ•ดๆ‘˜่ฆ๏ผ‰
if ((isWithheld413 || isWithheldMedia) && reactiveCompact) {
  const compacted = await reactiveCompact.tryReactiveCompact({
    hasAttempted: hasAttemptedReactiveCompact,  // ้˜ฒๆญขๆ— ้™ๅพช็Žฏ
    querySource,
    aborted: toolUseContext.abortController.signal.aborted,
    messages: messagesForQuery,
    cacheSafeParams: { systemPrompt, userContext, systemContext, ... },
  })

  if (compacted) {
    // ้ข„็ฎ—่ทŸ่ธช๏ผšๆ•่ŽทๅŽ‹็ผฉๅ‰็š„ๆœ€็ปˆไธŠไธ‹ๆ–‡็ช—ๅฃ
    if (params.taskBudget) {
      const preCompactContext = finalContextTokensFromLastResponse(messagesForQuery)
      taskBudgetRemaining = Math.max(0,
        (taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext)
    }

    const postCompactMessages = buildPostCompactMessages(compacted)
    for (const msg of postCompactMessages) { yield msg }

    state = { ...state,
      messages: postCompactMessages,
      hasAttemptedReactiveCompact: true,
      transition: { reason: 'reactive_compact_retry' },
    }
    continue  // โ† Continue Site 2
  }

  // ็ฌฌ 3 ๆญฅ: ๆ‰€ๆœ‰ๆขๅคๅคฑ่ดฅ โ†’ ๅ‘็”จๆˆทๆŠฅๅ‘Š
  // ๅ…ณ้”ฎ๏ผšไธ่ฆ่ฟ›ๅ…ฅ Stop Hooks๏ผๆจกๅž‹ไปŽๆœชไบง็”Ÿๆœ‰ๆ•ˆๅ“ๅบ”๏ผŒ
  // Stop Hooks ๆ— ๆณ•ๆœ‰ๆ„ไน‰ๅœฐ่ฏ„ไผฐใ€‚่ฟ่กŒ Stop Hooks ไผš้€ ๆˆๆญปไบก่žบๆ—‹๏ผš
  // error โ†’ hook blocking โ†’ retry โ†’ error โ†’ ...
  yield lastMessage
  void executeStopFailureHooks(lastMessage, toolUseContext)
  return { reason: 'prompt_too_long' }
}

Max Output Tokens Recovery

// src/query.ts โ€” ็œŸๅฎž็š„ max_output_tokens ๆขๅคไปฃ็ 
if (isWithheldMaxOutputTokens(lastMessage)) {

  // ๅ‡็บง้‡่ฏ•๏ผšๅฆ‚ๆžœไฝฟ็”จไบ†้ป˜่ฎค็š„ 8k ไธŠ้™๏ผŒๅ‡็บงๅˆฐ 64k ้‡่ฏ• **ๅŒไธ€่ฏทๆฑ‚**
  // ๆ—  meta ๆถˆๆฏ๏ผŒๆ— ๅคš่ฝฎไบคไบ’
  const capEnabled = getFeatureValue_CACHED_MAY_BE_STALE('tengu_otk_slot_v1', false)
  if (capEnabled && maxOutputTokensOverride === undefined
      && !process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS) {
    logEvent('tengu_max_tokens_escalate', { escalatedTo: ESCALATED_MAX_TOKENS })
    state = { ...state,
      maxOutputTokensOverride: ESCALATED_MAX_TOKENS,  // 64k
      transition: { reason: 'max_output_tokens_escalate' },
    }
    continue  // โ† Continue Site 3
  }

  // ๅคš่ฝฎๆขๅค๏ผšๆณจๅ…ฅๆขๅคๆถˆๆฏ๏ผŒ่ฆๆฑ‚ๆจกๅž‹ไปŽๆ–ญ็‚น็ปง็ปญ
  if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {  // ้™ๅˆถ 3 ๆฌก
    const recoveryMessage = createUserMessage({
      content: `Output token limit hit. Resume directly โ€” no apology, no recap. ` +
        `Pick up mid-thought if that is where the cut happened. ` +
        `Break remaining work into smaller pieces.`,
      isMeta: true,
    })

    state = { ...state,
      messages: [...messagesForQuery, ...assistantMessages, recoveryMessage],
      maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
      transition: {
        reason: 'max_output_tokens_recovery',
        attempt: maxOutputTokensRecoveryCount + 1,
      },
    }
    continue  // โ† Continue Site 4
  }

  // ๆขๅค่€—ๅฐฝ โ†’ ๅฑ•็คบ่ขซๆˆชๆ–ญ็š„้”™่ฏฏ
  yield lastMessage
}

Stop Hook Recovery

// src/query.ts โ€” Stop Hook ้˜ปๆญขๅŽ็š„ๆขๅค
const stopHookResult = yield* handleStopHooks(
  messagesForQuery, assistantMessages, systemPrompt,
  userContext, systemContext, toolUseContext, querySource, stopHookActive,
)

if (stopHookResult.preventContinuation) {
  return { reason: 'stop_hook_prevented' }
}

if (stopHookResult.blockingErrors.length > 0) {
  state = { ...state,
    messages: [...messagesForQuery, ...assistantMessages, ...stopHookResult.blockingErrors],
    maxOutputTokensRecoveryCount: 0,
    // ๅ…ณ้”ฎ๏ผšไฟ็•™ hasAttemptedReactiveCompact ๆ ‡ๅฟ—๏ผ
    // ๅฆ‚ๆžœ compact ๅทฒ็ป่ฟ่กŒไฝ†ๆ— ๆณ•ๆขๅค prompt-too-long๏ผŒ
    // ้‡็ฝฎๆญคๆ ‡ๅฟ—ไผšๅฏผ่‡ดๆ— ้™ๅพช็Žฏ๏ผš
    // compact โ†’ ไป็„ถๅคช้•ฟ โ†’ error โ†’ stop hook โ†’ compact โ†’ ...
    hasAttemptedReactiveCompact,
    stopHookActive: true,
    transition: { reason: 'stop_hook_blocking' },
  }
  continue  // โ† Continue Site 5
}

All Termination Reasons

// query.ts ไธญ็š„ 10 ็ง็ปˆๆญขๅŽŸๅ› 
return { reason: 'completed' }           // ๆญฃๅธธๅฎŒๆˆ๏ผˆๆ— ๅทฅๅ…ท่ฐƒ็”จ + Stop Hook ไธ้˜ปๆญข๏ผ‰
return { reason: 'blocking_limit' }      // ็กฌๆ€ง token ้™ๅˆถ
return { reason: 'stop_hook_prevented' } // Stop Hook ้˜ปๆญข็ปง็ปญ
return { reason: 'aborted_streaming' }   // ็”จๆˆทไธญๆ–ญ๏ผˆๆจกๅž‹ๅ“ๅบ”ไธญ๏ผ‰
return { reason: 'aborted_tools' }       // ็”จๆˆทไธญๆ–ญ๏ผˆๅทฅๅ…ทๆ‰ง่กŒไธญ๏ผ‰
return { reason: 'hook_stopped' }        // Hook ้™„ไปถๅœๆญข็ปง็ปญ
return { reason: 'max_turns', turnCount }// ่พพๅˆฐๆœ€ๅคง่ฝฎๆฌก้™ๅˆถ
return { reason: 'prompt_too_long' }     // 413 ๆขๅค่€—ๅฐฝ
return { reason: 'image_error' }         // ๅ›พ็‰‡/PDF ๅคชๅคง
return { reason: 'model_error', error }  // ๆ„ๅค–ๅผ‚ๅธธ

In-Depth Source Code Annotation Analysis:

The following insights are drawn from internal developer comments within the Claude Code source code, revealing real-world engineering challenges encountered in production:

1. Error Withholding Strategy

The source code comments: "yielding early would leak intermediate errors to consumers like cowork/desktop that terminate on any error field, even though recovery is still running."

This means that queryLoop does not yield immediately upon discovering an error. Instead, it withholds the error, waiting until the recovery flow completes before deciding whether to expose it. This prevents downstream consumers (IDE extensions, desktop applications) from seeing intermediate errors and terminating prematurely.

2. Budget Tracking Across Compaction Boundaries

The source code comments: "remaining is undefined until first compact fires โ€” before compact the server sees full history and counts down from {total} itself (see api/api/sampling/prompt/renderer.py:292); after compact, server only sees summary and would under-count spend."

This reveals an elegant server-client coordination design: before compaction, the server can see the full history and compute budget consumption itself; after compaction, the server can only see the summary, so the client must inform the server "how much the part you can no longer see has consumed."

3. Why hasAttemptedReactiveCompact Is Not Reset

Note that in the Stop Hook recovery path, the hasAttemptedReactiveCompact flag is preserved rather than reset. The source code explains: if compact has already run but failed to recover from prompt-too-long, resetting this flag would cause an infinite loop: compact -> still too long -> error -> stop hook -> compact -> ... This is a fix for a real production bug.

4. The using Semantics of Memory Prefetch

using pendingMemoryPrefetch = startRelevantMemoryPrefetch(...) employs the TC39 Explicit Resource Management proposal (using keyword). The source code comments: "Fired once per user turn โ€” the prompt is invariant across loop iterations, so per-iteration firing would ask sideQuery the same question N times." using ensures that prefetch resources are automatically cleaned up when the generator exits (whether normally or abnormally).

Pedagogical Takeaway: These details reveal a core principle โ€” the complexity of a production-grade Agent Loop lies not in "the loop itself" but in "how to recover gracefully when the loop fails." A 30-line while(true) suffices for a basic Agent Loop, but handling all edge cases properly requires 1,800+ lines. The gap between the two represents the entire value of Harness Engineering.

3.6 Pause and Reflect

Having studied the Agent Loop, attempt to answer the following:

  1. Why does queryLoop use while(true) instead of recursion? (Hint: consider memory and stack depth)
  2. Why does the compaction pipeline have 4 levels instead of 1? (Hint: consider the trade-off between cost and latency)
  3. If you were to add a new recovery path (e.g., "API key expired"), which parts of the code would you need to modify?

These questions have no canonical answers, but reflecting on them will deepen your understanding of the "why" rather than merely the "how."

Agent Loop Complexity Analysis Figure 3-2: (Left) Complexity comparison between the minimal implementation and the production implementation (logarithmic scale) โ€” a 60x increase in lines of code, where each dimension of growth corresponds to a real production requirement. (Right) Token release efficiency of the four-level compaction pipeline โ€” from 180K progressively reduced to 45K.

Continue Site Frequency Figure 3-3: Estimated trigger frequency of the 7 continue sites โ€” "Next Turn" (normal progression) accounts for 95%. Error recovery sites collectively account for approximately 5%, yet it is precisely this 5% of the code (approximately 500 lines) that prevents session interruptions and cost overruns.

Agent Loop State Machine Figure 3-1: The queryLoop() state machine โ€” showing the complete flow of 7 continue sites, the 4-level compaction pipeline, StreamingToolExecutor parallel execution, and 10 termination reasons.

3.7 Quantitative Analysis: Complexity Metrics of the Agent Loop

To quantify the gap between a "minimal Agent Loop" and a "production Agent Loop," we conducted a code metrics analysis of src/query.ts:

Metric Minimal Implementation (s01) Claude Code Production Implementation
Lines of Code 30 lines 1,800+ lines
Continue Sites 1 7
Termination Reasons 1 (completed) 10
Error Recovery Paths 0 5 cascading recoveries
Compaction Strategies 0 levels 4-level pipeline
Concurrency Modes Serial 2 (Streaming + Sequential)
State Fields 1 (messages) 10 (State type)
Analytics Instrumentation Points 0 15+
Feature Gates 0 8+

Complexity Growth Analysis: The growth from 30 lines to 1,800 lines represents a 60x increase. But this is not "over-engineering" โ€” every line corresponds to a real production problem. For example: - The hasAttemptedReactiveCompact flag is only 1 line, yet it prevents an infinite-loop bug that could consume thousands of dollars in API costs. - The taskBudgetRemaining tracking logic is approximately 20 lines, yet it is the only mechanism capable of correctly computing token consumption across compaction boundaries. - The StreamingToolExecutor is approximately 200 lines, yet it reduces multi-tool execution latency from O(n) to O(1) (the time of the slowest tool).

3.8 Summary of Agent Loop Design Philosophy

  1. Resilience over rigidity: 7+ continue sites enable recovery from nearly any error
  2. Progressive degradation: Each error type first attempts the lightest recovery, escalating gradually
  3. Streaming-first: The Async Generator makes every intermediate state observable
  4. Explicit state: A single State object with no implicit global state
  5. Built-in observability: Every recovery point includes analytics and profiling

Chapter 4: Tool System โ€” The Agent's Hands

The Agent Loop is the engine, while the tool system is the steering wheel and throttle. No matter how powerful the engine, the vehicle cannot reach its destination without the ability to steer and modulate speed.

In Claude Code, the model (LLM) itself cannot read files, run commands, or search code. Its sole capability is generating text. Through the tool system, however, these text outputs are translated into real operations โ€” reading a file, editing a line of code, running a test.

In this chapter we examine how Claude Code designs a system of 43+ tools, each independently self-contained yet uniformly managed. These design patterns can be directly reused in your own Agent projects.

The tool system is the sole channel through which the Agent interacts with the external world in the Harness. Claude Code implements a system of 43+ tools, each a self-contained module.

4.1 Tool Interface Definition

Located in src/Tool.ts, this is the base type for all tools:

type Tool<
  Input extends AnyObject = AnyObject,
  Output = unknown,
  P extends ToolProgressData = ToolProgressData,
> = {
  // ===== ๆ ธๅฟƒๆ ‡่ฏ† =====
  name: string;                    // ๅทฅๅ…ทๅ็งฐ๏ผˆไธปๆ ‡่ฏ†็ฌฆ๏ผ‰
  aliases?: string[];              // ๅˆซๅ๏ผˆๅ‘ๅŽๅ…ผๅฎน๏ผ‰
  userFacingName(): string;        // ๆ˜พ็คบๅ็งฐ

  // ===== Schema & ้ชŒ่ฏ =====
  inputSchema: ZodType<Input>;     // Zod ่พ“ๅ…ฅ้ชŒ่ฏ
  inputJSONSchema?: JSONSchema;    // ๅฏ้€‰ JSON Schema๏ผˆMCP ๅทฅๅ…ท๏ผ‰
  outputSchema?: ZodType<Output>;  // ๅฏ้€‰่พ“ๅ‡บ็ฑปๅž‹
  validateInput(input): Promise<ValidationResult>;

  // ===== ๆ‰ง่กŒ =====
  call(
    args: Input,
    context: ToolUseContext,
    canUseTool: CanUseTool,
    parentMessage: AssistantMessage,
    progressCallback?: ProgressCallback<P>,
  ): Promise<ToolResult<Output>>;

  // ===== ๆƒ้™ & ๅฎ‰ๅ…จ =====
  checkPermissions(args, context): Promise<PermissionDecision>;
  isConcurrencySafe(args): boolean;      // ่ƒฝๅฆๅนถ่กŒๆ‰ง่กŒ
  isDestructive(args): boolean;          // ไธๅฏ้€†ๆ“ไฝœ๏ผŸ
  isReadOnly(): boolean;                 // ๅช่ฏปๆ“ไฝœ๏ผŸ
  preparePermissionMatcher(args): string; // Hook ๆจกๅผๅŒน้…

  // ===== ่กŒไธบ =====
  isEnabled(): boolean;                  // ็‰นๆ€ง้—จๆŽงๆฃ€ๆŸฅ
  interruptBehavior(): 'cancel' | 'block';
  requiresUserInteraction(): boolean;

  // ===== ๆธฒๆŸ“ =====
  renderToolUseMessage(args): ReactElement;
  renderToolResultMessage(result): ReactElement;
  renderToolUseProgressMessage(progress): ReactElement;

  // ===== ๆœ็ดข & ๆŠ˜ๅ  =====
  searchHint: string;                    // ToolSearch ็š„ 3-10 ่ฏๅ…ณ้”ฎ่ฏ
  shouldDefer: boolean;                  // ๅปถ่ฟŸๅŠ ่ฝฝ
  alwaysLoad: boolean;                   // ๆฐธไธๅปถ่ฟŸ

  // ===== ๆ่ฟฐ๏ผˆๅŠจๆ€็”Ÿๆˆ๏ผ‰=====
  description(isNonInteractive?: boolean): string;
  prompt(context): string;               // ็ณป็ปŸๆ็คบ็‰‡ๆฎต
};

Design Philosophy:

Common Beginner Misconception: Many people focus exclusively on the call() method when designing tool systems โ€” "what can the tool do." In production, however, permission checking, input validation, and progress rendering account for 80% of tool code. A well-designed tool interface is not merely about "execution" but about "executing safely, observably, and interruptibly."

4.2 Tool Registry

Located in src/tools.ts:

// ๅ”ฏไธ€็š„ๅทฅๅ…ท็œŸๅฎžๆฅๆบ
function getAllBaseTools(): Tool[] {
  return [
    // === ๅง‹็ปˆๅŠ ่ฝฝ ===
    AgentTool,
    TaskOutputTool,
    BashTool,
    FileReadTool,
    FileEditTool,
    FileWriteTool,
    WebFetchTool,
    WebSearchTool,
    AskUserQuestionTool,
    SkillTool,
    // ...ๆ›ดๅคšๅง‹็ปˆๅฏ็”จ็š„ๅทฅๅ…ท

    // === ็‰นๆ€ง้—จๆŽง ===
    ...(feature('PROACTIVE') ? [SleepTool] : []),
    ...(feature('AGENT_TRIGGERS') ? [ScheduleCronTool] : []),
    ...(feature('COORDINATOR_MODE') ? [TeamCreateTool, TeamDeleteTool] : []),
    ...(isReplModeEnabled() ? [REPLTool] : []),
    // ...ๆ›ดๅคšๆกไปถๅทฅๅ…ท
  ];
}

Dead Code Elimination:

// Bun ็š„ bun:bundle ๅœจ็ผ–่ฏ‘ๆ—ถ่ฏ„ไผฐ feature() ่ฐƒ็”จ
// ๅฆ‚ๆžœ feature('PROACTIVE') ็ผ–่ฏ‘ไธบ false:
...(false ? [SleepTool] : [])
// โ†’ SleepTool ็š„ๅ…จ้ƒจไปฃ็ ่ขซ tree-shake ็งป้™ค
// ๅŒ…ๆ‹ฌๅ…ถๅผ•็”จ็š„ๆ‰€ๆœ‰ๅญ—็ฌฆไธฒๅ’Œไพ่ต–

This is a key design pattern in Claude Code: compile-time feature gating. External distributions can remove entire subsystems by setting feature flags, without the need to manually delete code.

4.3 Tool Pool Assembly

function assembleToolPool(builtInTools: Tool[], mcpTools: Tool[]): Tool[] {
  // 1. ่ฟ‡ๆปค่ขซ deny ่ง„ๅˆ™็ฆๆญข็š„ MCP ๅทฅๅ…ท
  const filteredMcp = mcpTools.filter(t => !getDenyRuleForTool(t));

  // 2. ๅˆ†ๅˆซๆŽ’ๅบ๏ผˆไฟๆŒ prompt cache ็จณๅฎšๆ€ง๏ผ‰
  const sortedBuiltIn = sortBy(builtInTools, t => t.name);
  const sortedMcp = sortBy(filteredMcp, t => t.name);

  // 3. ่ฟžๆŽฅ๏ผšๅ†…็ฝฎๅทฅๅ…ทๅœจๅ‰๏ผˆไฝœไธบ็ผ“ๅญ˜ๅ‰็ผ€๏ผ‰
  const combined = [...sortedBuiltIn, ...sortedMcp];

  // 4. ๅŽป้‡๏ผˆๅ†…็ฝฎไผ˜ๅ…ˆ๏ผ‰
  return uniqBy(combined, t => t.name);
}

Cache Stability Design:

Built-in tools, sorted by name, form a stable cache prefix. When MCP tools are added or removed, the prefix remains unchanged, and the Anthropic API's prompt cache is not invalidated. This is a subtle but important performance optimization.

4.4 Tool Execution Lifecycle

Tool Execution Pipeline Figure 4-1: The 7-step tool execution pipeline โ€” from Zod Schema validation to the PostToolUse Hook, each step can alter the tool's behavior or block its execution.

The sequence diagram below illustrates the complete path of a tool from request to execution โ€” note how Hooks intervene at critical junctures:

sequenceDiagram
    participant M as Model
    participant V as Validator
    participant PH as PreToolUse Hook
    participant P as Permission Engine
    participant S as Sandbox
    participant T as Tool.call()
    participant AH as PostToolUse Hook

    M->>V: tool_use block
    V->>V: Zod Schema ้ชŒ่ฏ
    alt ้ชŒ่ฏๅคฑ่ดฅ
        V-->>M: ๆ ผๅผ้”™่ฏฏๆถˆๆฏ
    end
    V->>V: tool.validateInput()
    alt ้ชŒ่ฏๅคฑ่ดฅ
        V-->>M: ไธšๅŠก้€ป่พ‘้”™่ฏฏ
    end
    V->>PH: ่พ“ๅ…ฅ JSON
    PH->>PH: ๆกไปถๅŒน้… (if ๅญ—ๆฎต)
    alt Hook ้˜ปๆญข
        PH-->>M: blocking error
    else Hook ไฟฎๆ”น่พ“ๅ…ฅ
        PH->>P: updatedInput
    else Hook ๆ‰นๅ‡†
        PH->>P: allow (ไฝ†ไธ็ป•่ฟ‡ deny ่ง„ๅˆ™)
    end
    P->>P: deny่ง„ๅˆ™ โ†’ ask่ง„ๅˆ™ โ†’ ๆจกๅผๆฃ€ๆŸฅ
    alt ๆƒ้™ๆ‹’็ป
        P-->>M: ๆ‹’็ปๆถˆๆฏ + ๅปบ่ฎฎ
    end
    P->>S: ๅ‘ฝไปคๅŒ…่ฃ… (ไป… BashTool)
    S->>S: wrapWithSandbox()
    S->>T: ๆฒ™็›’ๅŒ–ๅ‘ฝไปค
    T->>T: ๅฎž้™…ๆ‰ง่กŒ
    T->>AH: ๆ‰ง่กŒ็ป“ๆžœ
    AH->>AH: ๅฎก่ฎกๆ—ฅๅฟ— / ่พ“ๅ‡บไฟฎๆ”น
    AH-->>M: ๆœ€็ปˆ tool_result

The same process illustrated as a traditional flowchart:

็”จๆˆท/ๆจกๅž‹่ฏทๆฑ‚ๅทฅๅ…ท่ฐƒ็”จ
    โ”‚
    v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. validateInput()       โ”‚  ็ป“ๆž„้ชŒ่ฏ๏ผˆๅฟ…ๅกซๅญ—ๆฎตใ€่Œƒๅ›ด๏ผ‰
โ”‚    ไฝฟ็”จ Zod Schema       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ ้€š่ฟ‡
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 2. checkPermissions()    โ”‚  ๅทฅๅ…ท็‰นๅฎš็š„ๆƒ้™้€ป่พ‘
โ”‚    ่ฟ”ๅ›ž allow/ask/deny   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ ๆœช่ขซๆ‹’็ป
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 3. Rule-Based Perms      โ”‚  ่ฎพ็ฝฎไธญ็š„ allow/deny/ask ่ง„ๅˆ™
โ”‚    checkRuleBasedPerms() โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ ๆœช่ขซๆ‹’็ป
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 4. PreToolUse Hooks      โ”‚  ็”จๆˆทๅฎšไน‰็š„ Hook
โ”‚    ๅฏๆ‰นๅ‡†ๆˆ–้˜ปๆญข           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ ๆœช่ขซ้˜ปๆญข
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 5. User Prompt/Classifierโ”‚  ๆœ€็ปˆๅฎกๆ‰น๏ผˆๆˆ–่‡ชๅŠจๅˆ†็ฑปๅ™จ๏ผ‰
โ”‚    auto ๆจกๅผ๏ผšYOLO ๅˆ†็ฑปๅ™จโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ ๆ‰นๅ‡†
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 6. call()                โ”‚  ๅฎž้™…ๆ‰ง่กŒๅทฅๅ…ท
โ”‚    ่ฟ”ๅ›ž ToolResult       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         v
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 7. PostToolUse Hooks     โ”‚  ๆ‰ง่กŒๅŽๅ›ž่ฐƒ
โ”‚    ๅฎก่ฎกๆ—ฅๅฟ—ใ€้€š็Ÿฅ็ญ‰       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.5 Tool Execution Pipeline (with Real Code Including Hook Integration)

Tool execution involves far more than simply calling tool.call() โ€” it is a multi-step pipeline where each step can alter behavior:

// src/services/tools/toolExecution.ts โ€” ็œŸๅฎž็š„ๆ‰ง่กŒ็ฎก้“
async function checkPermissionsAndCallTool(
  tool: Tool,
  toolUseID: string,
  input: Record<string, unknown>,
  toolUseContext: ToolUseContext,
  canUseTool: CanUseToolFn,
  assistantMessage: AssistantMessage,
  onToolProgress: (progress: ToolProgress) => void,
): Promise<MessageUpdate[]> {

  // ===== ๆญฅ้ชค 1: Zod Schema ้ชŒ่ฏ =====
  const parsedInput = tool.inputSchema.safeParse(input)
  if (!parsedInput.success) {
    return [{ message: createUserMessage({
      content: formatZodValidationError(tool.name, parsedInput.error),
    }) }]
  }

  // ===== ๆญฅ้ชค 2: ๅทฅๅ…ท็‰นๅฎš้ชŒ่ฏ =====
  const isValidCall = await tool.validateInput?.(parsedInput.data, toolUseContext)
  if (isValidCall?.result === false) {
    return [{ message: createUserMessage({
      content: isValidCall.message,
    }) }]
  }

  // ===== ๆญฅ้ชค 3: PreToolUse Hook =====
  // Hook ๅฏไปฅ๏ผšๆ‰นๅ‡†ใ€้˜ปๆญขใ€ไฟฎๆ”น่พ“ๅ…ฅใ€ๆณจๅ…ฅไธŠไธ‹ๆ–‡
  let processedInput = parsedInput.data
  let hookPermissionResult: PermissionResult | undefined
  for await (const result of runPreToolUseHooks(...)) {
    switch (result.type) {
      case 'hookPermissionResult':
        hookPermissionResult = result.hookPermissionResult
        break
      case 'hookUpdatedInput':
        processedInput = result.updatedInput  // Hook ไฟฎๆ”นไบ†่พ“ๅ…ฅ๏ผ
        break
    }
  }

  // ===== ๆญฅ้ชค 4: ๆƒ้™่งฃๆž๏ผˆHook + Rules ไบคไบ’๏ผ‰=====
  // ๅ…ณ้”ฎ่ฎพ่ฎก๏ผšHook ็š„ 'allow' ไธ็ป•่ฟ‡ settings.json ็š„ deny/ask ่ง„ๅˆ™
  const { decision, input: callInput } = await resolveHookPermissionDecision(
    hookPermissionResult, tool, processedInput,
    toolUseContext, canUseTool, assistantMessage, toolUseID,
  )
  if (decision.behavior !== 'allow') {
    return [/* ๆƒ้™่ขซๆ‹’็ป็š„ๆถˆๆฏ */]
  }

  // ===== ๆญฅ้ชค 5: ๅฎž้™…ๆ‰ง่กŒๅทฅๅ…ท =====
  let toolResult = await tool.call(
    callInput, toolUseContext, canUseTool,
    assistantMessage, onToolProgress,
  )

  // ===== ๆญฅ้ชค 6: PostToolUse Hook =====
  // Hook ๅฏไปฅ๏ผšไฟฎๆ”น MCP ๅทฅๅ…ท่พ“ๅ‡บใ€ๆณจๅ…ฅ้ขๅค–ไธŠไธ‹ๆ–‡
  for await (const result of runPostToolUseHooks(...)) {
    if (result.updatedMCPToolOutput) {
      toolResult = { ...toolResult, data: result.updatedMCPToolOutput }
    }
  }

  // ===== ๆญฅ้ชค 7: ่ฝฌๆขไธบ API ๆ ผๅผๅนถ่ฟ”ๅ›ž =====
  return resultingMessages
}

Hook Permission Decision Resolution (this is the most nuanced part):

// src/services/tools/toolHooks.ts โ€” ็œŸๅฎžไปฃ็ 
export async function resolveHookPermissionDecision(
  hookPermissionResult, tool, input, toolUseContext,
  canUseTool, assistantMessage, toolUseID,
) {
  if (hookPermissionResult?.behavior === 'allow') {
    // Hook ่ฏด"ๅ…่ฎธ"โ€”โ€”ไฝ†่ฟ™ไธๆ˜ฏๆœ€็ปˆๅˆคๅ†ณ๏ผ
    // deny/ask ่ง„ๅˆ™ไป็„ถ้€‚็”จ๏ผˆๅฎ‰ๅ…จไธๅฏๅ˜้‡๏ผ‰

    // ๅฆ‚ๆžœๅทฅๅ…ท้œ€่ฆ็”จๆˆทไบคไบ’๏ผŒไธ” Hook ๆไพ›ไบ† updatedInput๏ผŒ
    // ้‚ฃไนˆ Hook ๅฐฑๆ˜ฏ"็”จๆˆทไบคไบ’"๏ผˆๅฆ‚ headless ๅŒ…่ฃ…ๅ™จ๏ผ‰
    const interactionSatisfied =
      tool.requiresUserInteraction?.() &&
      hookPermissionResult.updatedInput !== undefined

    // ๅณไฝฟ Hook ๅ…่ฎธ๏ผŒไปๆฃ€ๆŸฅ่ง„ๅˆ™
    const ruleCheck = await checkRuleBasedPermissions(tool, input, toolUseContext)
    if (ruleCheck?.behavior === 'deny') {
      // Deny ่ง„ๅˆ™่ฆ†็›– Hook ็š„ allow๏ผ
      return { decision: ruleCheck, input }
    }
    if (ruleCheck?.behavior === 'ask') {
      // Ask ่ง„ๅˆ™ไป้œ€่ฆๅฏน่ฏๆก†
      return { decision: await canUseTool(...), input }
    }

    // ๆ— ่ง„ๅˆ™้˜ปๆญข โ†’ Hook ็š„ allow ็”Ÿๆ•ˆ
    return { decision: hookPermissionResult, input }
  }
  // ... deny ๅ’Œ ask ๅค„็†
}

Core Security Invariant: deny > settings rules > hook allow. Even if a Hook approves an operation, deny rules in settings.json still block it. This prevents malicious Hooks from circumventing security policies.

Why can't a Hook allow override deny rules? This is a real security consideration. Imagine you install a third-party MCP server that provides a PreToolUse Hook returning allow for all operations. If Hook allows could override deny rules, this third-party code would gain permissions exceeding your security policy โ€” it could enable the Agent to perform operations you have explicitly forbidden. Claude Code's design guarantees: deny rules you write in settings.json constitute an inviolable baseline, regardless of any Hook intervention.

4.6 FileEditTool String Replacement Algorithm

The FileEditTool's core algorithm merits separate analysis โ€” it handles smart quote matching, a genuine engineering challenge:

// src/tools/FileEditTool/utils.ts โ€” ็œŸๅฎžไปฃ็ 

// ้—ฎ้ข˜๏ผšๆจกๅž‹ๆœ‰ๆ—ถ็”Ÿๆˆ "curly quotes"๏ผˆๆ™บ่ƒฝๅผ•ๅท๏ผ‰
// ไฝ†ๆ–‡ไปถไธญๆ˜ฏ "straight quotes"๏ผˆ็›ดๅผ•ๅท๏ผ‰๏ผŒๆˆ–ๅไน‹
const LEFT_DOUBLE_CURLY_QUOTE = '\u201C'   // "
const RIGHT_DOUBLE_CURLY_QUOTE = '\u201D'  // "

function normalizeQuotes(str: string): string {
  return str
    .replaceAll('\u2018', "'")   // ' โ†’ '
    .replaceAll('\u2019', "'")   // ' โ†’ '
    .replaceAll('\u201C', '"')   // " โ†’ "
    .replaceAll('\u201D', '"')   // " โ†’ "
}

// ไธ‰้˜ถๆฎตๆŸฅๆ‰พ็ฎ—ๆณ•
function findActualString(fileContent: string, searchString: string): string | null {
  // ้˜ถๆฎต 1: ็ฒพ็กฎๅŒน้…
  if (fileContent.includes(searchString)) return searchString

  // ้˜ถๆฎต 2: ๅผ•ๅท่ง„่ŒƒๅŒ–ๅŒน้…
  const normalizedSearch = normalizeQuotes(searchString)
  const normalizedFile = normalizeQuotes(fileContent)
  const searchIndex = normalizedFile.indexOf(normalizedSearch)
  if (searchIndex !== -1) {
    // ่ฟ”ๅ›žๆ–‡ไปถไธญ็š„ **ๅŽŸๅง‹** ๅญ—็ฌฆไธฒ๏ผˆไฟ็•™ๅŽŸๅง‹ๅผ•ๅท้ฃŽๆ ผ๏ผ‰
    return fileContent.substring(searchIndex, searchIndex + searchString.length)
  }

  return null
}

// ๆ›ฟๆข็ฎ—ๆณ•
function applyEditToFile(
  originalContent: string,
  oldString: string,
  newString: string,
  replaceAll: boolean = false,
): string {
  const f = replaceAll
    ? (content, search, replace) => content.replaceAll(search, () => replace)
    : (content, search, replace) => content.replace(search, () => replace)

  if (newString !== '') return f(originalContent, oldString, newString)

  // ่พน็•Œๆƒ…ๅ†ต๏ผšๅˆ ้™คๆ“ไฝœ
  // ๅฆ‚ๆžœ oldString ไธไปฅๆข่กŒ็ป“ๅฐพ๏ผŒไฝ†ๆ–‡ไปถไธญ oldString ๅŽ้ข็ดง่ทŸๆข่กŒ๏ผŒ
  // ๅŒๆ—ถๅˆ ้™ค้‚ฃไธชๆข่กŒ๏ผˆ้˜ฒๆญข็•™ไธ‹็ฉบ่กŒ๏ผ‰
  const stripTrailingNewline =
    !oldString.endsWith('\n') && originalContent.includes(oldString + '\n')

  return stripTrailingNewline
    ? f(originalContent, oldString + '\n', newString)
    : f(originalContent, oldString, newString)
}

Design Wisdom: The use of () => replace rather than passing replace directly prevents special patterns such as $1 and $& in the replacement string from being misinterpreted by JavaScript's regex replacement engine. A subtle but critical safeguard.

4.7 Quantitative Analysis: Tool Execution Performance Characteristics

Execution Mode Scenario Latency Model Actual Performance
StreamingToolExecutor concurrent 10 Read tools O(max(t_i)) ~50ms (slowest file read time)
StreamingToolExecutor serial 1 Write then 1 Read O(sum of t_i) ~80ms (write+read sequential)
runTools concurrent batch 5 Read + 1 Write + 3 Read O(max(5)) + O(1) + O(max(3)) ~130ms
Internal callback Hook fast path PostToolUse (all internal Hooks) O(n) but very fast ~1.8us (after optimization)
External Hook execution PreToolUse command Hook O(hook_timeout) 5-30s (depends on script)

Source Code Annotation: Regarding the internal Hook fast path, the source code comments: "Fast-path: all hooks are internal callbacks (sessionFileAccessHooks, attributionHooks). These return {} and don't use the abort signal... Measured: 6.01us -> ~1.8us per PostToolUse hit (-70%)." This 70% performance improvement comes from skipping span/progress/abortSignal/JSON parsing โ€” for PostToolUse Hooks triggered on every tool call, such micro-optimizations produce significant cumulative effects.

Tool Categories Figure 4-2: Category distribution of 43+ tools โ€” Core I/O (6 tools) has the highest usage frequency, while Advanced (6 tools) are loaded on demand via feature gates.

Tool Capability Radar Figure 4-3: Core tool capability radar chart โ€” showing dimensions such as concurrency safety, read-only status, and destructiveness. FileReadTool and GrepTool are the "safest" tools (concurrency safe + read-only), while BashTool is the "most dangerous" (potentially destructive + non-read-only + not concurrency safe).

Tool Latency Figure 4-4: Tool execution latency distribution (logarithmic scale) โ€” ranging from 2us for internal Hooks to 15 seconds for Agent Explore, latencies span 7 orders of magnitude. This explains why StreamingToolExecutor's concurrency optimization is so important โ€” in multi-tool scenarios, it reduces total latency from the sum of all tools to the time of the slowest tool.

4.8 Tool Classification

Category Tools Characteristics
Core I/O BashTool, FileReadTool, FileWriteTool, FileEditTool, GlobTool, GrepTool Always loaded
Agent AgentTool, SendMessageTool, TeamCreateTool, TeamDeleteTool Sub-Agent creation and management
Workflow WebFetchTool, WebSearchTool, NotebookEditTool External resource access
Task TaskCreateTool, TaskUpdateTool, TaskListTool, TaskOutputTool, TaskStopTool Task management
Planning EnterPlanModeTool, ExitPlanModeTool, TodoWriteTool Plan mode
Advanced ScheduleCronTool, SleepTool, MonitorTool, REPLTool Feature-gated
Worktree EnterWorktreeTool, ExitWorktreeTool Git worktree isolation
MCP MCPTool, ListMcpResourcesTool, ReadMcpResourceTool MCP protocol
Search ToolSearchTool Deferred tool discovery

4.6 Tool Deferral (Lazy Loading)

// ๅนถ้žๆ‰€ๆœ‰ๅทฅๅ…ท้ƒฝๅœจ็ฌฌไธ€่ฝฎๅŠ ่ฝฝ
// shouldDefer = true ็š„ๅทฅๅ…ทไธๅ‘้€็ป™ๆจกๅž‹
// ็›ดๅˆฐ ToolSearchTool ่ขซ่ฐƒ็”จๅ‘็Žฐๅฎƒไปฌ

// ไพ‹๏ผšNotebookEditTool
{
  name: 'NotebookEdit',
  shouldDefer: true,        // ็ฌฌไธ€่ฝฎไธๅŠ ่ฝฝ
  searchHint: 'jupyter notebook cell edit insert',
  alwaysLoad: false,
}

// ไพ‹๏ผšBashTool
{
  name: 'Bash',
  shouldDefer: false,       // ๅง‹็ปˆๅŠ ่ฝฝ
  alwaysLoad: true,         // ๆฐธไธๅปถ่ฟŸ
}

Design Philosophy:

Model tool definitions consume from the token budget. Loading all 43+ tools simultaneously would consume a substantial portion of the context window. Through lazy loading, only core tools (~15) are loaded on the first turn, with the remainder discovered on demand via ToolSearch. This is a canonical application of context engineering.


Chapter 5: Permission Model โ€” Constraint Architecture

Recall the "taming a horse" metaphor from Chapter 1. By now, our horse (Agent) has an engine (Loop) and a control system (Tools). But what if it can gallop freely and trample the crops? We need fences โ€” this is the permission model.

This represents the most essential "constraint" pillar of Harness Engineering. A well-designed permission model does not "restrict" the Agent โ€” it "reduces the Agent's probability of making mistakes." Claude Code's permission system is among the most mature implementations in the industry, revealing a counterintuitive truth: the more precise the constraints, the freer the Agent becomes. Because when you can precisely control risk, you dare to let the Agent do more.

The permission model is the Harness's "safety valve." It determines what the Agent can do, cannot do, and must ask about.

5.1 Five Permission Modes

type PermissionMode =
  | 'default'            // ๆ•ๆ„Ÿๆ“ไฝœๅง‹็ปˆ่ฏข้—ฎ
  | 'acceptEdits'        // ่‡ชๅŠจๆ‰นๅ‡†ๆ–‡ไปถ็ผ–่พ‘๏ผŒๅ…ถไป–่ฏข้—ฎ
  | 'bypassPermissions'  // ่‡ชๅŠจๆ‰นๅ‡†ไธ€ๅˆ‡๏ผˆๅฑ้™ฉ๏ผ‰
  | 'dontAsk'           // ่‡ชๅŠจๆ‹’็ป้œ€่ฆ่ฏข้—ฎ็š„ๆ“ไฝœ
  | 'plan'              // ่ฎกๅˆ’ๆจกๅผ้™ๅˆถ๏ผˆๅช่ฏป + ่ฎกๅˆ’ๆ–‡ไปถ๏ผ‰
  | 'auto'              // AI ๅˆ†็ฑปๅ™จ่‡ชๅŠจๅฎกๆ‰น๏ผˆๅฎž้ชŒๆ€ง๏ผ‰
  | 'bubble';           // ๅ†’ๆณกๅˆฐ็ˆถ Agent๏ผˆๅญ Agent ็”จ๏ผ‰

Design Philosophy:

Modes are not binary (allow/deny) but form a spectrum. default is the safest, suitable for new users; bypassPermissions is appropriate for trusted automation environments; auto is the most interesting โ€” it uses a two-stage AI classifier to determine whether an operation is safe.

Analogy: Permission modes are akin to driving assistance systems. default is like novice mode โ€” every lane change requires confirmation. acceptEdits is like adaptive cruise control โ€” driving straight is automatic, turns are manual. bypassPermissions is like full self-driving โ€” you place complete trust in the system. auto is the most interesting โ€” an AI driver is behind the wheel, but it has its own "safety supervisor" (the YOLO classifier) monitoring it.

5.2 Three-Level Rule System

type PermissionRule = {
  source: PermissionRuleSource;  // ่ง„ๅˆ™ๆฅๆบ
  ruleBehavior: 'allow' | 'deny' | 'ask';
  ruleValue: {
    toolName: string;       // ไพ‹: "Bash", "Write", "mcp__server"
    ruleContent?: string;   // ไพ‹: "git *", "*.ts", "prefix:npm *"
  };
};

Rule syntax examples:

# ๅ…่ฎธๆ‰€ๆœ‰ git ๅ‘ฝไปค
Bash(git *)

# ๅ…่ฎธๅ†™ๅ…ฅ TypeScript ๆ–‡ไปถ
Write(*.ts)

# ๆ‹’็ปๆ‰€ๆœ‰ MCP ๆœๅŠกๅ™จๅทฅๅ…ท
mcp__*

# ๅ…่ฎธ่ฏปๅ–ไปปไฝ•ๆ–‡ไปถ
Read

# ๆ‹’็ป rm -rf ๅ‘ฝไปค
Bash(rm -rf *)

# ๅ…่ฎธ็‰นๅฎš MCP ๆœๅŠกๅ™จ็š„ๆ‰€ๆœ‰ๅทฅๅ…ท
mcp__my-server(*)

5.3 Defense-in-Depth Model

Defense in Depth Figure 5-1: The six-layer defense-in-depth security model โ€” from soft constraints (CLAUDE.md, ~95% compliance rate) to hard constraints (hardcoded denials, 100% unbypassable). The cumulative stacking of layers drives the overall bypass probability toward zero.

Before delving into specific permission rules, it is important to first understand Claude Code's six-layer security architecture from a macro perspective. This is one of the most important design patterns in the entire Harness:

flowchart TB
    subgraph Layer1["็ฌฌ 1 ๅฑ‚: CLAUDE.md๏ผˆๆŒ‡ๅฏผๆ€ง็บฆๆŸ๏ผ‰"]
        direction LR
        L1["ๅ‘Š่ฏ‰ Agent 'ไธ่ฆไฟฎๆ”น migrations/ ็›ฎๅฝ•'"]
    end
    subgraph Layer2["็ฌฌ 2 ๅฑ‚: Permission Rules๏ผˆๅฃฐๆ˜Žๆ€ง็บฆๆŸ๏ผ‰"]
        direction LR
        L2["settings.json ไธญ็š„ allow/deny/ask ่ง„ๅˆ™"]
    end
    subgraph Layer3["็ฌฌ 3 ๅฑ‚: Hooks๏ผˆๅฏ็ผ–็จ‹็บฆๆŸ๏ผ‰"]
        direction LR
        L3["PreToolUse ่„šๆœฌๆฃ€ๆŸฅๆ“ไฝœๅˆๆณ•ๆ€ง"]
    end
    subgraph Layer4["็ฌฌ 4 ๅฑ‚: YOLO Classifier๏ผˆAI ็บฆๆŸ๏ผ‰"]
        direction LR
        L4["็‹ฌ็ซ‹ AI ๆจกๅž‹ๅฎกๆŸฅๆ“ไฝœๅฎ‰ๅ…จๆ€ง"]
    end
    subgraph Layer5["็ฌฌ 5 ๅฑ‚: Sandbox๏ผˆ็ณป็ปŸ็บง็บฆๆŸ๏ผ‰"]
        direction LR
        L5["ๆ“ไฝœ็ณป็ปŸ็บงๆ–‡ไปถ/็ฝ‘็ปœ้š”็ฆป"]
    end
    subgraph Layer6["็ฌฌ 6 ๅฑ‚: Hardcoded Denials๏ผˆไธๅฏ่ฆ†็›–็บฆๆŸ๏ผ‰"]
        direction LR
        L6["settings.json ๅง‹็ปˆไธๅฏๅ†™๏ผŒๆ— ๆณ•้€š่ฟ‡้…็ฝฎ็ฆ็”จ"]
    end

    Layer1 --> Layer2 --> Layer3 --> Layer4 --> Layer5 --> Layer6

    classDef soft fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
    classDef medium fill:#fef9c3,stroke:#ca8a04,color:#713f12
    classDef hard fill:#fee2e2,stroke:#dc2626,color:#7f1d1d

    class Layer1 soft
    class Layer2,Layer3 medium
    class Layer4 medium
    class Layer5,Layer6 hard

Analysis: Note the color gradient โ€” from blue (soft constraints, can be ignored) to yellow (medium constraints, can be overridden via configuration) to red (hard constraints, unbypassable). In engineering practice, Layer 1 (CLAUDE.md) has a compliance rate of approximately 95% โ€” the model occasionally "forgets." But Layer 6's compliance rate is 100%, since it is hardcoded. This gradient design means: you do not need every layer to be perfect; you only need the cumulative bypass probability to be sufficiently low. If each layer has a 5% bypass rate, six layers stacked yield a bypass probability of 0.05^6, which is approximately 0.000000002%.

5.4 Settings Hierarchy (7 Priority Levels)

Rules originate from multiple sources, ordered from highest to lowest priority:

Highest priority
    โ†“
1. CLI arguments (cliArg)              โ€” Command-line overrides
2. Session commands (command)           โ€” /permissions command
3. Flag settings (flagSettings)         โ€” CLAUDE_CODE_FLAG_SETTINGS
4. Policy settings (policySettings)     โ€” Organization policy
5. Local settings (localSettings)       โ€” .claude/settings.json.local
6. Project settings (projectSettings)   โ€” .claude/settings.json
7. User settings (userSettings)         โ€” ~/.claude/settings.json
    โ†“
Lowest priority

With enterprise managed settings:

Managed Settings (MDM/Enterprise):
โ”œโ”€โ”€ /managed/managed-settings.json        โ€” Base managed settings
โ”œโ”€โ”€ /managed/managed-settings.d/*.json    โ€” Drop-in overrides
โ””โ”€โ”€ macOS plutil / Windows Registry       โ€” OS-level MDM

Design Philosophy:

The hierarchical settings system allows organization-level policy enforcement without modifying user settings. Enterprise administrators can lock down certain permissions via MDM (Mobile Device Management), project maintainers can define sensible defaults in project settings, and individual users can fine-tune on top of these.

flowchart BT
    U["็”จๆˆท่ฎพ็ฝฎ\n~/.claude/settings.json\nไผ˜ๅ…ˆ็บงๆœ€ไฝŽ"] --> P["้กน็›ฎ่ฎพ็ฝฎ\n.claude/settings.json"]
    P --> L["ๆœฌๅœฐ่ฎพ็ฝฎ\n.claude/settings.json.local\nไธๆไบค git"]
    L --> Po["็ญ–็•ฅ่ฎพ็ฝฎ\n็ป„็ป‡็ญ–็•ฅ"]
    Po --> M["็ฎก็†่ฎพ็ฝฎ\nMDM/ไผไธš\nๅฏ้”ๅฎš"]
    M --> F["Flag ่ฎพ็ฝฎ\n็Žฏๅขƒๅ˜้‡"]
    F --> C["CLI ๅ‚ๆ•ฐ\nไผ˜ๅ…ˆ็บงๆœ€้ซ˜"]

    classDef low fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef mid fill:#fef9c3,stroke:#ca8a04,color:#713f12
    classDef high fill:#fee2e2,stroke:#dc2626,color:#7f1d1d

    class U,P low
    class L,Po mid
    class M,F,C high

Analysis: Trust Hierarchy and Override Direction

Note that the arrows point from bottom to top โ€” lowest priority at the bottom, highest priority at the top. This is not accidental: the closer a setting is to "runtime," the higher its priority. User settings are the "most distant" (edited once, used long-term), while CLI arguments are the "most immediate" (can differ on each run). This design lets you temporarily override any setting via CLI arguments without modifying files.

Another key design: lockedByPolicy: true allows administrators to lock sandbox settings so that users cannot disable them. The source code comments note this was "Added to unblock NVIDIA enterprise rollout" โ€” a feature driven by a real enterprise customer requirement.

5.4 Permission Decision Pipeline (Real Source Code)

The following is the real permission pipeline from src/utils/permissions/permissions.ts, with comments revealing the rationale behind each decision:

// src/utils/permissions/permissions.ts โ€” ็œŸๅฎžไปฃ็ 
async function hasPermissionsToUseToolInner(
  tool: Tool, input: Record<string, unknown>, context: ToolUseContext,
): Promise<PermissionDecision> {

  if (context.abortController.signal.aborted) throw new AbortError()

  let appState = context.getAppState()

  // ===== 1a. ๆ•ดไธชๅทฅๅ…ท่ขซ Deny =====
  const denyRule = getDenyRuleForTool(appState.toolPermissionContext, tool)
  if (denyRule) {
    return { behavior: 'deny', decisionReason: { type: 'rule', rule: denyRule },
      message: `Permission to use ${tool.name} has been denied.` }
  }

  // ===== 1b. ๆ•ดไธชๅทฅๅ…ท่ขซ Ask =====
  const askRule = getAskRuleForTool(appState.toolPermissionContext, tool)
  if (askRule) {
    // ็‰นๆฎŠๆƒ…ๅ†ต๏ผšๆฒ™็›’่‡ชๅŠจๅ…่ฎธ
    // ๅฝ“ autoAllowBashIfSandboxed ๅผ€ๅฏๆ—ถ๏ผŒๆฒ™็›’ๅŒ–็š„ๅ‘ฝไปค่ทณ่ฟ‡ ask ่ง„ๅˆ™
    // ไธไผšๆฒ™็›’ๅŒ–็š„ๅ‘ฝไปค๏ผˆๆŽ’้™คๅ‘ฝไปคใ€dangerouslyDisableSandbox๏ผ‰ไป้ตๅฎˆ ask
    const canSandboxAutoAllow =
      tool.name === BASH_TOOL_NAME &&
      SandboxManager.isSandboxingEnabled() &&
      SandboxManager.isAutoAllowBashIfSandboxedEnabled() &&
      shouldUseSandbox(input)
    if (!canSandboxAutoAllow) {
      return { behavior: 'ask', decisionReason: { type: 'rule', rule: askRule } }
    }
    // ่ฝๅ…ฅไธ‹ๆ–น่ฎฉ Bash ็š„ checkPermissions ๅค„็†ๅ‘ฝไปค็บง่ง„ๅˆ™
  }

  // ===== 1c. ๅทฅๅ…ท็‰นๅฎšๆƒ้™ๆฃ€ๆŸฅ =====
  let toolPermissionResult: PermissionResult = { behavior: 'passthrough' }
  try {
    const parsedInput = tool.inputSchema.parse(input)
    toolPermissionResult = await tool.checkPermissions(parsedInput, context)
  } catch (e) {
    if (e instanceof AbortError) throw e
    logError(e)
  }

  // ===== 1d. ๅทฅๅ…ทๅฎž็Žฐๆ‹’็ป =====
  if (toolPermissionResult?.behavior === 'deny') return toolPermissionResult

  // ===== 1e. ้œ€่ฆ็”จๆˆทไบคไบ’็š„ๅทฅๅ…ท =====
  if (tool.requiresUserInteraction?.() && toolPermissionResult?.behavior === 'ask') {
    return toolPermissionResult
  }

  // ===== 1f. ๅ†…ๅฎน็บง ask ่ง„ๅˆ™๏ผˆ้‡่ฆ๏ผ๏ผ‰=====
  // ๅฝ“็”จๆˆท้…็ฝฎไบ†ๅ†…ๅฎน็บง ask ่ง„ๅˆ™ๅฆ‚ Bash(npm publish:*)๏ผŒ
  // tool.checkPermissions ่ฟ”ๅ›ž {behavior:'ask', decisionReason:{type:'rule', ruleBehavior:'ask'}}
  // ่ฟ™ๅฟ…้กป่ขซๅฐŠ้‡๏ผŒๅณไฝฟๅœจ bypassPermissions ๆจกๅผไธ‹๏ผ
  if (toolPermissionResult?.behavior === 'ask' &&
      toolPermissionResult.decisionReason?.type === 'rule' &&
      toolPermissionResult.decisionReason.rule.ruleBehavior === 'ask') {
    return toolPermissionResult
  }

  // ===== 1g. ๅฎ‰ๅ…จๆฃ€ๆŸฅ๏ผˆไธๅฏ็ป•่ฟ‡๏ผ‰=====
  // .git/, .claude/, .vscode/, shell ้…็ฝฎ็ญ‰่ทฏๅพ„
  // ๅณไฝฟ bypassPermissions ๆจกๅผไนŸๅฟ…้กปๆ็คบ
  if (toolPermissionResult?.behavior === 'ask' &&
      toolPermissionResult.decisionReason?.type === 'safetyCheck') {
    return toolPermissionResult
  }

  // ===== 2a. ๆจกๅผๆฃ€ๆŸฅ =====
  appState = context.getAppState()  // ้‡ๆ–ฐ่Žทๅ–ๆœ€ๆ–ฐ็Šถๆ€
  const shouldBypassPermissions =
    appState.toolPermissionContext.mode === 'bypassPermissions' ||
    (appState.toolPermissionContext.mode === 'plan' &&
     appState.toolPermissionContext.isBypassPermissionsModeAvailable)
  if (shouldBypassPermissions) {
    return { behavior: 'allow', decisionReason: { type: 'mode', mode: '...' } }
  }

  // ===== 2b. ๆ•ดไธชๅทฅๅ…ท่ขซ Allow =====
  const allowRule = toolAlwaysAllowedRule(appState.toolPermissionContext, tool)
  if (allowRule) {
    return { behavior: 'allow', decisionReason: { type: 'rule', rule: allowRule } }
  }

  // ===== 3. passthrough โ†’ ask =====
  return toolPermissionResult.behavior === 'passthrough'
    ? { ...toolPermissionResult, behavior: 'ask' }
    : toolPermissionResult
}

// ===== ๅค–ๅฑ‚ๅŒ…่ฃ…๏ผšๆจกๅผ่ฝฌๆข =====
export const hasPermissionsToUseTool: CanUseToolFn = async (...) => {
  const result = await hasPermissionsToUseToolInner(...)

  // ๅ…่ฎธ โ†’ ้‡็ฝฎ่ฟž็ปญๆ‹’็ป่ฎกๆ•ฐๅ™จ
  if (result.behavior === 'allow') {
    if (feature('TRANSCRIPT_CLASSIFIER') && context.mode === 'auto') {
      persistDenialState(context, recordSuccess(currentDenialState))
    }
    return result
  }

  // ask โ†’ ๆจกๅผ่ฝฌๆข
  if (result.behavior === 'ask') {
    if (appState.toolPermissionContext.mode === 'dontAsk') {
      return { behavior: 'deny', decisionReason: { type: 'mode', mode: 'dontAsk' } }
    }
    // auto ๆจกๅผ โ†’ AI ๅˆ†็ฑปๅ™จ๏ผˆ่ง 5.5 ่Š‚๏ผ‰
  }
  return result
}

Permission Decision Order Diagram:

                    ๅทฅๅ…ท่ฐƒ็”จ่ฏทๆฑ‚
                        โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚             โ”‚             โ”‚
     Deny ่ง„ๅˆ™?    Ask ่ง„ๅˆ™?     Allow ่ง„ๅˆ™?
       โ”‚ ๆ˜ฏ            โ”‚ ๆ˜ฏ           โ”‚ ๆ˜ฏ
       v              โ”‚             v
     ๆ‹’็ป          ๆฒ™็›’ๅฏไปฅ        ๅ…่ฎธ
                  ่‡ชๅŠจๅ…่ฎธ?
                   โ”‚ ๅฆ
                   v
               tool.checkPermissions()
                   โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚        โ”‚        โ”‚
        deny     ask      allow/
          โ”‚        โ”‚      passthrough
          v        โ”‚        โ”‚
        ๆ‹’็ป    ๅ†…ๅฎน็บง่ง„ๅˆ™?   โ”‚
                 โ”‚ ๆ˜ฏ       โ”‚
                 v         โ”‚
               ๆ‹’็ป/่ฏข้—ฎ   โ”‚
                           โ”‚
                 ๅฎ‰ๅ…จๆฃ€ๆŸฅ?โ”€โ”€โ”ค
                  โ”‚ ๆ˜ฏ     โ”‚
                  v        โ”‚
                 ่ฏข้—ฎ      โ”‚
                           โ”‚
              bypassPermissions?
                  โ”‚ ๆ˜ฏ     โ”‚ ๅฆ
                  v        v
                ๅ…่ฎธ    Allow ่ง„ๅˆ™?
                         โ”‚ ๆ˜ฏ  โ”‚ ๅฆ
                         v     v
                       ๅ…่ฎธ   ask โ†’ ๆจกๅผ่ฝฌๆข
                              โ”œโ”€ dontAsk โ†’ deny
                              โ”œโ”€ auto โ†’ AI ๅˆ†็ฑปๅ™จ
                              โ””โ”€ default โ†’ ็”จๆˆทๆ็คบ

5.5 Permission Rule Parser (Real Source Code)

Parsing rule strings is more complex than it appears โ€” escaped parentheses must be handled:

// src/utils/permissions/permissionRuleParser.ts โ€” ็œŸๅฎžไปฃ็ 

// ่พ“ๅ…ฅ: "Bash(python -c \"print\\(1\\)\")"
// ่พ“ๅ‡บ: { toolName: "Bash", ruleContent: "python -c \"print(1)\"" }
export function permissionRuleValueFromString(ruleString: string): PermissionRuleValue {
  // ๆ‰พๅˆฐ็ฌฌไธ€ไธช **ๆœช่ฝฌไน‰** ็š„ๅทฆๆ‹ฌๅท
  const openParenIndex = findFirstUnescapedChar(ruleString, '(')
  if (openParenIndex === -1) {
    return { toolName: normalizeLegacyToolName(ruleString) }
  }

  // ๆ‰พๅˆฐๆœ€ๅŽไธ€ไธช **ๆœช่ฝฌไน‰** ็š„ๅณๆ‹ฌๅท
  const closeParenIndex = findLastUnescapedChar(ruleString, ')')
  if (closeParenIndex === -1 || closeParenIndex <= openParenIndex) {
    return { toolName: normalizeLegacyToolName(ruleString) }
  }

  // ๅณๆ‹ฌๅทๅฟ…้กปๅœจๆœซๅฐพ
  if (closeParenIndex !== ruleString.length - 1) {
    return { toolName: normalizeLegacyToolName(ruleString) }
  }

  const toolName = ruleString.substring(0, openParenIndex)
  const rawContent = ruleString.substring(openParenIndex + 1, closeParenIndex)

  // ็ฉบๅ†…ๅฎน "Bash()" ๆˆ–้€š้…็ฌฆ "Bash(*)" โ†’ ๅทฅๅ…ท็บง่ง„ๅˆ™
  if (rawContent === '' || rawContent === '*') {
    return { toolName: normalizeLegacyToolName(toolName) }
  }

  // ๅ่ฝฌไน‰: \\( โ†’ (, \\) โ†’ ), \\\\ โ†’ \\
  const ruleContent = unescapeRuleContent(rawContent)
  return { toolName: normalizeLegacyToolName(toolName), ruleContent }
}

// ๅˆคๆ–ญๅญ—็ฌฆๆ˜ฏๅฆ่ขซ่ฝฌไน‰๏ผˆๅ‰้ขๆœ‰ๅฅ‡ๆ•ฐไธชๅๆ–œๆ ๏ผ‰
function findFirstUnescapedChar(str: string, char: string): number {
  for (let i = 0; i < str.length; i++) {
    if (str[i] === char) {
      let backslashCount = 0
      let j = i - 1
      while (j >= 0 && str[j] === '\\') { backslashCount++; j-- }
      if (backslashCount % 2 === 0) return i  // ๅถๆ•ฐไธชๅๆ–œๆ  = ๆœช่ฝฌไน‰
    }
  }
  return -1
}

MCP Server-Level Rule Matching:

// ่ง„ๅˆ™ "mcp__server1" ๅŒน้…ๅทฅๅ…ท "mcp__server1__tool1"
// ่ง„ๅˆ™ "mcp__server1__*" ๅŒน้… server1 ็š„ๆ‰€ๆœ‰ๅทฅๅ…ท
function toolMatchesRule(tool, rule): boolean {
  if (rule.ruleValue.ruleContent !== undefined) return false  // ๅ†…ๅฎน่ง„ๅˆ™ไธๅŒน้…ๆ•ดไธชๅทฅๅ…ท

  const nameForRuleMatch = getToolNameForPermissionCheck(tool)
  if (rule.ruleValue.toolName === nameForRuleMatch) return true

  // MCP ๆœๅŠกๅ™จ็บงๅŒน้…
  const ruleInfo = mcpInfoFromString(rule.ruleValue.toolName)
  const toolInfo = mcpInfoFromString(nameForRuleMatch)
  return ruleInfo !== null && toolInfo !== null &&
    (ruleInfo.toolName === undefined || ruleInfo.toolName === '*') &&
    ruleInfo.serverName === toolInfo.serverName
}

Permission Funnel Figure 5-2: The permission decision funnel โ€” 100% of tool calls pass through successive filter layers. Ultimately, only approximately 11% require an expensive classifier or user confirmation. The first 4 steps filter out 89% of calls, all via zero-cost rule matching.

Defense Probability Figure 5-3: Layer-by-layer bypass probability in the defense-in-depth model (blue bars: per-layer probability; red line: cumulative probability, logarithmic scale) โ€” After stacking 6 layers, the cumulative bypass probability approaches zero. Note that Layer 6 (hardcoded denials) has a per-layer probability of 0, reducing the cumulative probability to zero.

Decision Matrix Figure 5-4: Permission decision matrix heatmap โ€” 5 permission modes x 6 tool types. Green=ALLOW, orange=ASK, red=DENY. Note that under auto mode, Bash(danger) remains ASK โ€” the classifier adopts a conservative strategy for high-risk operations.

5.6 Quantitative Analysis: Distribution of Permission Decisions

Based on analytics instrumentation points and comments in the Claude Code source code, we can infer the typical distribution of permission decisions:

Decision Path Estimated Share Latency Cost
Rule-based Allow (entire tool) ~40% <1ms 0
Mode-based Allow (bypassPermissions) ~20% <1ms 0
Safe-tool Allowlist (auto mode) ~15% <1ms 0
Tool checkPermissions Allow ~10% 1-5ms 0
YOLO Classifier Allow (fast stage) ~8% 50-200ms ~$0.001
YOLO Classifier Allow (thinking stage) ~3% 500ms-2s ~$0.01
User Prompt (interactive) ~3% 1-30s 0 (waiting for user)
Deny (rule or classifier) ~1% varies varies

Source Code Annotation: Regarding classifier optimization, the source code comments: "Before running the auto mode classifier, check if acceptEdits mode would allow this action. This avoids expensive classifier API calls for safe operations like file edits." This fast-path check is estimated to skip approximately 35% of classifier calls. Another comment notes: "Allowlisted tools are safe and don't need YOLO classification." This skips an additional ~15%. Combined, only about 11% of tool calls actually require a classifier API call.

Performance Implications: The average latency for permission checking is approximately 5-10ms (weighted average), but variance is extremely high. Rule-based paths incur virtually no latency, while classifier paths may require 2 seconds. This is why Claude Code implements speculative prefetching in toolExecution.ts โ€” it starts the Bash classifier check in parallel with PreToolUse Hook execution. The source code comments: "Speculatively start the bash allow classifier check early so it runs in parallel with pre-tool hooks."

5.7 YOLO Classifier (Auto Mode)

The auto mode uses a two-stage AI classifier to automatically approve tool calls:

Stage 1: Fast Classifier
โ”œโ”€ Uses a smaller/faster model
โ”œโ”€ Checks: Is this operation safe?
โ”œโ”€ Returns confidence: high/medium/low
โ”œโ”€ If high confidence + shouldBlock=false โ†’ approve directly
โ””โ”€ If uncertain โ†’ proceed to Stage 2

Stage 2: Thinking Classifier
โ”œโ”€ Uses a larger, more deliberative model
โ”œโ”€ Analyzes full context (conversation history + tool input)
โ”œโ”€ Returns final judgment
โ””โ”€ If still uncertain โ†’ fall back to user prompt

Safe Tool Fast Path:
โ”œโ”€ Read, Glob, Grep and other read-only tools
โ”œโ”€ Skips API call (saving latency and cost)
โ””โ”€ Returns allow directly

Consecutive Denial Tracking:
โ”œโ”€ If the classifier denies multiple times consecutively
โ”œโ”€ Falls back to user prompt
โ””โ”€ Prevents the classifier from getting stuck when being overly conservative

Design Philosophy:

The YOLO classifier embodies a core tenet of Harness engineering โ€” do not trust the model to judge whether its own operations are safe. Even if the primary model considers an operation correct, an independent "gatekeeper" model performs a secondary review. The two-stage design strikes a balance between speed and safety.

5.6 Permission Decision Reason Tracking

Every permission decision includes a detailed reason, used for auditing and debugging:

type PermissionDecisionReason =
  | { type: 'rule'; rule: PermissionRule }
  | { type: 'mode'; mode: PermissionMode }
  | { type: 'classifier'; classifier: string; reason: string }
  | { type: 'hook'; hookName: string; reason?: string }
  | { type: 'safetyCheck'; reason: string; classifierApprovable: boolean }
  // ... ๆ›ดๅคšๅ˜ไฝ“

Chapter 6: Hooks System โ€” Lifecycle Extensibility

The permission model tells the Agent "whether it can act," while Hooks let you inject your own logic "before" and "after" the Agent acts.

Think of Hooks as airport security checkpoints. Passengers (tool calls) pass through security screening (PreToolUse Hook), may be tagged after passing (PostToolUse Hook), and are stopped if their luggage is problematic (blocking error). Security personnel can be human (command Hook), AI-powered (agent Hook), or even remote (http Hook).

Claude Code defines 26 Hook events and 4 Hook types โ€” the most complete Agent lifecycle extension system visible in open source to date. Understanding it gives you the key to making a Harness "customizable."

Hooks are the Harness's extension points. They allow users to inject custom logic at critical moments in the Agent lifecycle.

6.1 The 26 Hook Events

// src/types/hooks.ts โ€” ๅฎŒๆ•ด็š„ Hook ไบ‹ไปถๅˆ—่กจ
type HookEvent =
  // ๅทฅๅ…ท็›ธๅ…ณ
  | 'PreToolUse'        // ๅทฅๅ…ทๆ‰ง่กŒๅ‰
  | 'PostToolUse'       // ๅทฅๅ…ทๆ‰ง่กŒๅŽ
  | 'PostToolUseFailure'// ๅทฅๅ…ทๆ‰ง่กŒๅคฑ่ดฅๅŽ

  // ๆƒ้™
  | 'PermissionRequest' // ๆƒ้™่ฏทๆฑ‚
  | 'PermissionDenied'  // ๆƒ้™่ขซๆ‹’็ป

  // ไผš่ฏ
  | 'SessionStart'      // ไผš่ฏๅผ€ๅง‹
  | 'SessionEnd'        // ไผš่ฏ็ป“ๆŸ
  | 'Stop'              // ๆจกๅž‹ๅœๆญข
  | 'StopFailure'       // ๅœๆญขๅคฑ่ดฅ

  // ็”จๆˆท่พ“ๅ…ฅ
  | 'UserPromptSubmit'  // ็”จๆˆทๆไบคๆ็คบ

  // Agent
  | 'SubagentStart'     // ๅญ Agent ๅฏๅŠจ
  | 'SubagentStop'      // ๅญ Agent ๅœๆญข
  | 'TeammateIdle'      // ้˜Ÿๅ‹็ฉบ้—ฒ

  // ไปปๅŠก
  | 'TaskCreated'       // ไปปๅŠกๅˆ›ๅปบ
  | 'TaskCompleted'     // ไปปๅŠกๅฎŒๆˆ

  // ๅŽ‹็ผฉ
  | 'PreCompact'        // ๅŽ‹็ผฉๅ‰
  | 'PostCompact'       // ๅŽ‹็ผฉๅŽ

  // ๅ…ถไป–
  | 'Setup'             // ๅˆๅง‹่ฎพ็ฝฎ
  | 'Notification'      // ้€š็Ÿฅ
  | 'Elicitation'       // ไฟกๆฏ่ฏทๆฑ‚
  | 'ElicitationResult' // ไฟกๆฏ่ฏทๆฑ‚็ป“ๆžœ
  | 'ConfigChange'      // ้…็ฝฎๅ˜ๆ›ด
  | 'CwdChanged'        // ๅทฅไฝœ็›ฎๅฝ•ๅ˜ๆ›ด
  | 'FileChanged'       // ๆ–‡ไปถๅ˜ๆ›ด
  | 'WorktreeCreate'    // Worktree ๅˆ›ๅปบ
  | 'WorktreeRemove'    // Worktree ็งป้™ค
  | 'InstructionsLoaded'; // ๆŒ‡ไปคๅŠ ่ฝฝๅฎŒๆˆ

Hook Frequency Figure 6-1: Estimated trigger frequency of the 26 Hook events โ€” PreToolUse and PostToolUse are the most frequent events (triggered on every tool call), while SessionStart/End fire only at session boundaries. This explains why PostToolUse's internal Hooks have a dedicated fast-path optimization (-70% latency).

Hook Cost vs Intelligence Figure 6-2: Cost-intelligence scatter plot of Hook types โ€” Command Hook in the lower left (cheap but simple), Agent Hook in the upper right (expensive but intelligent). The dashed line divides four quadrants: upper left is the "ideal zone" (intelligent and cheap), lower right is the "avoid zone" (unintelligent and expensive).

6.2 Hook Lifecycle and Type Selection

Choosing the correct Hook type is a critical decision in Harness customization. The following chart aids selection based on the scenario:

quadrantChart
    title Hook ็ฑปๅž‹้€‰ๆ‹ฉ็Ÿฉ้˜ต
    x-axis "ไฝŽๆˆๆœฌ" --> "้ซ˜ๆˆๆœฌ"
    y-axis "ไฝŽๆ™บ่ƒฝ" --> "้ซ˜ๆ™บ่ƒฝ"
    quadrant-1 "Agent Hook"
    quadrant-2 "Prompt Hook"
    quadrant-3 "Command Hook"
    quadrant-4 "HTTP Hook"
    "่ง„ๅˆ™ๆฃ€ๆŸฅ": [0.15, 0.2]
    "Lint ่ฟ่กŒ": [0.25, 0.15]
    "ๅฎ‰ๅ…จๅฎกๆŸฅ": [0.6, 0.85]
    "ๆต‹่ฏ•้ชŒ่ฏ": [0.75, 0.9]
    "Slack ้€š็Ÿฅ": [0.7, 0.1]
    "ๅฎก่ฎกๆ—ฅๅฟ—": [0.65, 0.15]
    "ไปฃ็ ่ดจ้‡่ฏ„ๅˆ†": [0.45, 0.7]
    "ๅˆ่ง„ๆฃ€ๆŸฅ": [0.5, 0.6]

Interpretation: The lower-left corner is Command Hook territory โ€” simple, cheap, and deterministic (e.g., lint, grep checks). The upper-right corner is Agent Hook territory โ€” the smartest but most expensive (requiring a full Claude call to understand code semantics). HTTP Hooks occupy the lower right โ€” moderate cost (network latency) but low intelligence (merely POSTing data). Prompt Hooks sit in the upper middle โ€” a single LLM judgment, cheaper than an Agent but smarter than a script.

Four Hook Types

Type 1: Command Hook (Shell Command)

{
  "type": "command",
  "command": "npm test -- --bail",
  "if": "Bash(npm *)",
  "shell": "bash",
  "timeout": 30,
  "statusMessage": "Running tests...",
  "once": false,
  "async": false,
  "asyncRewake": false
}

Execution flow: 1. Path conversion (Windows: C:\Users\foo -> /c/Users/foo) 2. Variable substitution (${CLAUDE_PROJECT_DIR}, ${CLAUDE_PLUGIN_ROOT}) 3. Shell selection (Bash or PowerShell) 4. JSON input written to stdin 5. Line-by-line stdout parsing (detecting async signals and prompt requests) 6. Exit code determines result

Exit code semantics: - 0: Success; stdout content optionally displayed - 2: Blocking error; stderr shown to both the model and the user - Other: Non-blocking error; stderr shown only to the user

Type 2: Prompt Hook (LLM Evaluation)

{
  "type": "prompt",
  "prompt": "Review this code change for security vulnerabilities: $ARGUMENTS",
  "model": "claude-sonnet-4-6",
  "timeout": 60
}

Uses an independent LLM call to evaluate Hook input. Suitable for checks requiring semantic understanding (e.g., code security review).

Type 3: HTTP Hook (External Service)

{
  "type": "http",
  "url": "https://hooks.slack.com/triggers/...",
  "headers": {
    "Authorization": "Bearer $SLACK_TOKEN"
  },
  "allowedEnvVars": ["SLACK_TOKEN"],
  "timeout": 10
}

POSTs JSON to an external URL. The $VAR_NAME syntax in headers interpolates from whitelisted environment variables. allowedEnvVars restricts accessible environment variables, preventing accidental leakage.

Type 4: Agent Hook (Agent Validator)

{
  "type": "agent",
  "prompt": "Verify that the test suite passes and no regressions were introduced",
  "model": "claude-sonnet-4-6",
  "timeout": 120
}

Uses a full Claude Agent (with tool access) to validate operations. The most expensive but most powerful โ€” the Agent can read files, run tests, and check results.

6.3 Hook Configuration Structure

// ~/.claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "eslint --stdin --stdin-filename=$TOOL_INPUT_FILE_PATH",
            "if": "Write(*.ts)"
          }
        ]
      },
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "echo '{\"decision\": \"block\", \"reason\": \"sudo is not allowed\"}' ",
            "if": "Bash(sudo *)"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "hooks": [
          {
            "type": "http",
            "url": "https://audit.company.com/log",
            "headers": { "Authorization": "Bearer $AUDIT_TOKEN" },
            "allowedEnvVars": ["AUDIT_TOKEN"]
          }
        ]
      }
    ],
    "SessionEnd": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "git stash",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

6.4 Hook Input/Output Protocol

Input (JSON passed via stdin):

// ๆ‰€ๆœ‰ Hook ็š„ๅŸบ็ก€ๅญ—ๆฎต
{
  session_id: string,
  transcript_path: string,
  cwd: string,
  permission_mode?: string,
  agent_id?: string,
  agent_type?: string,
}

// PreToolUse ็‰นๆœ‰ๅญ—ๆฎต
{
  tool_name: "Write",
  tool_input: { file_path: "/src/index.ts", content: "..." },
  tool_use_id: "toolu_xxx",
}

// PostToolUse ็‰นๆœ‰ๅญ—ๆฎต
{
  tool_name: "Bash",
  tool_input: { command: "npm test" },
  tool_response: { stdout: "...", exit_code: 0 },
  tool_use_id: "toolu_xxx",
}

// UserPromptSubmit
{
  prompt_text: "Fix the bug in login.ts",
}

// SessionStart
{
  source: "startup" | "resume" | "clear" | "compact",
}

Output (JSON returned via stdout):

type HookJSONOutput = {
  // ๅ…จๅฑ€ๆŽงๅˆถ
  continue?: boolean;           // false = ๅœๆญขๅฏน่ฏ
  stopReason?: string;
  decision?: 'approve' | 'block';
  reason?: string;
  systemMessage?: string;       // ๆณจๅ…ฅ็ณป็ปŸๆถˆๆฏ
  suppressOutput?: boolean;

  // ๆƒ้™็›ธๅ…ณ
  permissionDecision?: 'allow' | 'deny' | 'ask';

  // ไบ‹ไปถ็‰นๅฎš
  hookSpecificOutput?: {
    updatedInput?: object;           // ไฟฎๆ”นๅทฅๅ…ท่พ“ๅ…ฅ๏ผˆPreToolUse๏ผ‰
    additionalContext?: string;       // ๆณจๅ…ฅ้ขๅค–ไธŠไธ‹ๆ–‡
    watchPaths?: string[];           // ๆณจๅ†Œๆ–‡ไปถ็›‘่ง†ๅ™จ
    updatedMCPToolOutput?: unknown;  // ไฟฎๆ”น MCP ๅทฅๅ…ท่พ“ๅ‡บ
    action?: 'accept' | 'decline' | 'cancel';
  }
};

6.5 In-Depth Analysis of the Hook Execution Engine

The following is the real implementation of the Hook execution engine from src/utils/hooks.ts:

// src/utils/hooks.ts โ€” ็œŸๅฎžไปฃ็ ๏ผˆ็ฎ€ๅŒ–ไฝ†ไฟ็•™ๅ…ณ้”ฎ้€ป่พ‘๏ผ‰
async function* executeHooks({
  hookInput, toolUseID, matchQuery, signal, timeoutMs,
  toolUseContext, messages, forceSyncExecution, requestPrompt,
}) {
  // ๅฎ‰ๅ…จๆฃ€ๆŸฅ 1: ๅ…จๅฑ€็ฆ็”จ
  if (shouldDisableAllHooksIncludingManaged()) return
  if (isEnvTruthy(process.env.CLAUDE_CODE_SIMPLE)) return

  // ๅฎ‰ๅ…จๆฃ€ๆŸฅ 2: ๅทฅไฝœ็ฉบ้—ดไฟกไปป
  // ๆ‰€ๆœ‰ Hook ้ƒฝ้œ€่ฆๅทฅไฝœ็ฉบ้—ดไฟกไปป๏ผˆ้˜ฒๆญข RCE ๆผๆดž๏ผ‰
  if (shouldSkipHookDueToTrust()) return

  // ๆŸฅๆ‰พๅŒน้…็š„ Hook
  const matchingHooks = await getMatchingHooks(
    appState, sessionId, hookEvent, hookInput, tools,
  )
  if (matchingHooks.length === 0) return

  // ===== ๅฟซ้€Ÿ่ทฏๅพ„ไผ˜ๅŒ– =====
  // ๅฆ‚ๆžœๆ‰€ๆœ‰ Hook ้ƒฝๆ˜ฏๅ†…้ƒจๅ›ž่ฐƒ๏ผˆๅฆ‚ sessionFileAccessHooksใ€attributionHooks๏ผ‰
  // ่ทณ่ฟ‡ span/progress/abortSignal/JSONๅค„็† โ†’ ๆ€ง่ƒฝๆๅ‡ 70%
  const userHooks = matchingHooks.filter(h => !isInternalHook(h))
  if (userHooks.length === 0) {
    // 6.01ยตs โ†’ ~1.8ยตs per PostToolUse hit (-70%)
    for (const { hook } of matchingHooks) {
      if (hook.type === 'callback') {
        await hook.callback(hookInput, toolUseID, signal, context)
      }
    }
    return
  }

  // ===== ๅนถ่กŒๆ‰ง่กŒๆ‰€ๆœ‰ Hook =====
  // ๆฏไธช Hook ๆœ‰็‹ฌ็ซ‹็š„่ถ…ๆ—ถ
  // ... ่šๅˆ็ป“ๆžœ ...
}

Shell Execution of Command Hooks

// src/utils/hooks.ts โ€” execCommandHook ็š„ๅ…ณ้”ฎ็ป†่Š‚
async function execCommandHook(hook, hookEvent, hookName, jsonInput, signal, ...) {
  // 1. Shell ้€‰ๆ‹ฉ
  // PowerShell: pwsh -NoProfile -NonInteractive -Command
  // Bash: spawn(command, [], { shell: gitBashPath | true })

  // 2. ๅ˜้‡ๆ›ฟๆข
  // ${CLAUDE_PROJECT_DIR} โ†’ ้กน็›ฎ็›ฎๅฝ•
  // ${CLAUDE_PLUGIN_ROOT} โ†’ ๆ’ไปถ็›ฎๅฝ•
  // ${CLAUDE_PLUGIN_DATA} โ†’ ๆ’ไปถๆ•ฐๆฎ็›ฎๅฝ•
  // ${user_config.X} โ†’ ๆ’ไปถ้…็ฝฎๅ€ผ

  // 3. stdin ๅ†™ๅ…ฅ๏ผˆUTF-8 ็ผ–็ ๏ผ‰
  child.stdin.write(jsonInput + '\n', 'utf8')

  // 4. stdout ้€่กŒ่งฃๆž
  child.stdout.on('data', data => {
    stdout += data

    // ===== Prompt Request ๅ่ฎฎ =====
    // Hook ๅฏไปฅ่ฏทๆฑ‚็”จๆˆท่พ“ๅ…ฅ๏ผ
    // ่พ“ๅ‡บ่กŒๆ ผๅผ: {"prompt": {"type": "text", "message": "Enter value:"}}
    if (requestPrompt) {
      for (const line of lines) {
        const parsed = jsonParse(line.trim())
        const validation = promptRequestSchema().safeParse(parsed)
        if (validation.success) {
          // ๅบๅˆ—ๅŒ–ๅผ‚ๆญฅ prompt ๅค„็†
          promptChain = promptChain.then(async () => {
            const response = await requestPrompt(validation.data)
            child.stdin.write(jsonStringify(response) + '\n', 'utf8')
          })
          continue
        }
      }
    }

    // ===== ๅผ‚ๆญฅๆฃ€ๆต‹ =====
    // ็ฌฌไธ€่กŒ่พ“ๅ‡บๅฆ‚ๆžœๆ˜ฏ {"async": true, ...}
    // โ†’ ๅฐ†่ฟ›็จ‹่ฝฌๅ…ฅๅŽๅฐ๏ผŒไธป็บฟ็จ‹็ปง็ปญ
    if (!initialResponseChecked) {
      const firstLine = firstLineOf(stdout).trim()
      const parsed = jsonParse(firstLine)
      if (isAsyncHookJSONOutput(parsed) && !forceSyncExecution) {
        executeInBackground({ processId, hookId, ... })
        shellCommandTransferred = true
        resolve({ stdout, stderr, output, status: 0 })
      }
    }
  })

  // 5. ็ญ‰ๅพ…ๅฎŒๆˆ
  // ๅ…ณ้”ฎ๏ผš็ญ‰ๅพ… stdout ๅ’Œ stderr ๆต็ป“ๆŸๅŽๅ†่ฎคไธบ่พ“ๅ‡บๅฎŒๆˆ
  // ้˜ฒๆญข 'close' ไบ‹ไปถๅœจๆ‰€ๆœ‰ 'data' ไบ‹ไปถๅค„็†ๅ‰่งฆๅ‘็š„็ซžๆ€ๆกไปถ
  await Promise.all([stdoutEndPromise, stderrEndPromise])

  // 6. ๅ‰ฅ็ฆปๅทฒๅค„็†็š„ prompt ่ฏทๆฑ‚่กŒ
  // ไฝฟ็”จๅ†…ๅฎนๅŒน้…่€Œ้ž็ดขๅผ•๏ผŒ้˜ฒๆญข็ดขๅผ•ๆผ‚็งปๅฏผ่‡ด็š„ prompt JSON ๆณ„้œฒ
  const finalStdout = processedPromptLines.size === 0 ? stdout :
    stdout.split('\n').filter(line => !processedPromptLines.has(line.trim())).join('\n')

  return { stdout: finalStdout, stderr, output, status: exitCode }
}

Hook JSON Output Processing:

// Hook ่พ“ๅ‡บ่ขซ่งฃๆžไธบ JSON๏ผŒๆๅ–ๅ†ณ็ญ–ๅ’Œๅ‰ฏไฝœ็”จ
function processHookJSONOutput({ json, ... }) {
  const result = {}

  // ๅ…จๅฑ€ๆŽงๅˆถ
  if (json.continue === false) result.preventContinuation = true

  // ๅ†ณ็ญ–
  if (json.decision === 'approve') result.permissionBehavior = 'allow'
  if (json.decision === 'block') {
    result.permissionBehavior = 'deny'
    result.blockingError = { blockingError: json.reason || 'Blocked by hook' }
  }

  // ไบ‹ไปถ็‰นๅฎšๅญ—ๆฎต
  if (json.hookSpecificOutput?.hookEventName === 'PreToolUse') {
    if (json.hookSpecificOutput.updatedInput) {
      result.updatedInput = json.hookSpecificOutput.updatedInput  // ไฟฎๆ”นๅทฅๅ…ท่พ“ๅ…ฅ
    }
    result.additionalContext = json.hookSpecificOutput.additionalContext
  }

  if (json.hookSpecificOutput?.hookEventName === 'PostToolUse') {
    if (json.hookSpecificOutput.updatedMCPToolOutput) {
      result.updatedMCPToolOutput = json.hookSpecificOutput.updatedMCPToolOutput
    }
  }

  return result
}

6.6 Async Hook Protocol

// Hook ๅฏไปฅ้€š่ฟ‡็ฌฌไธ€่กŒ JSON ไฟกๅทๅผ‚ๆญฅๆ‰ง่กŒ๏ผš
// stdout ็ฌฌไธ€่กŒ: {"async": true, "asyncTimeout": 60000}
// ๆญคๅŽ Hook ๅœจๅŽๅฐ่ฟ่กŒ

// ๆˆ–้€š่ฟ‡้…็ฝฎ๏ผš
{ type: 'command', command: '...', async: true, asyncRewake: true }

// asyncRewake: ๅฆ‚ๆžœ้€€ๅ‡บ็ ไธบ 2๏ผŒๆŽ’ๅ…ฅ้€š็Ÿฅ้˜Ÿๅˆ—ๅ”ค้†’ๆจกๅž‹
// ็”จไพ‹๏ผšๅŽๅฐ่ฟ่กŒๆต‹่ฏ•ๅฅ—ไปถ๏ผŒๅคฑ่ดฅๆ—ถ้€š็Ÿฅ Agent

6.6 Conditional Execution (the if Field)

The if field of Hooks uses the same syntax as permission rules:

// ๅชๅฏน git ๅ‘ฝไปค่ฟ่กŒ
{ "if": "Bash(git *)" }

// ๅชๅฏน TypeScript ๆ–‡ไปถๅ†™ๅ…ฅ่ฟ่กŒ
{ "if": "Write(*.ts)" }

// ๅชๅฏน npm ๅ‘ฝไปค่ฟ่กŒ
{ "if": "Bash(prefix:npm)" }

// ๅŒน้…ๆ‰€ๆœ‰ Bash ่ฐƒ็”จ
{ "if": "Bash" }

Design Philosophy:

The Hook system follows the principle of "data-driven extensibility." Rather than requiring users to modify source code, all customization is accomplished through declarative configuration in settings.json. The four Hook types cover the full spectrum from simple shell scripts to full Agent validation, with unified and straightforward exit code semantics.

Practical Advice: If you are using Hooks for the first time, start with the simplest approach โ€” a PreToolUse command hook that prints the tool name and input. Once you are comfortable with the input/output protocol, try conditional execution (the if field) and HTTP callbacks. Agent hooks are the most powerful but also the most expensive (each invocation requires a complete Agent execution); reserve them for scenarios where you genuinely need to "understand code semantics."

Common Pitfall: Avoid heavy operations in Stop hooks. If your Stop hook injects a large number of tokens (such as an entire test report), it may trigger a prompt-too-long error. The prompt-too-long recovery skips Stop hooks (to prevent a death spiral), causing your logic to be silently bypassed. Keep Stop hooks lightweight โ€” if you need to convey large amounts of information, write it to a file and let the Agent read it.


Chapter 7: Sandbox & Security โ€” The Safety Net

If the permission model is the fence and Hooks are the security checkpoints, then the sandbox is the physical isolation barrier. The first two operate at the logical level โ€” they depend on software executing correctly. But software can have bugs, and logic can be circumvented. The sandbox enforces isolation at the operating system level: even if the Agent's code has a vulnerability, it cannot access files or networks it should not touch.

This is the classic "Defense in Depth" principle of security engineering: never stake security on a single mechanism; instead, layer defenses. Claude Code has six defensive layers, with the sandbox being the second-to-last (the final layer is hardcoded denials).

The sandbox is the Harness's last line of defense. Even if the permission model and Hooks are both bypassed, the sandbox still limits what the Agent can do.

7.1 Sandbox Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 Claude Code                      โ”‚
โ”‚                                                  โ”‚
โ”‚  BashTool.call()                                โ”‚
โ”‚      โ”‚                                           โ”‚
โ”‚      v                                           โ”‚
โ”‚  shouldUseSandbox()                              โ”‚
โ”‚      โ”‚                                           โ”‚
โ”‚      โ”œโ”€ ๆฒ™็›’ๆ˜ฏๅฆๅฏ็”จ๏ผŸ                            โ”‚
โ”‚      โ”œโ”€ dangerouslyDisableSandbox ๆ˜ฏๅฆๅ…่ฎธ๏ผŸ       โ”‚
โ”‚      โ”œโ”€ ๅ‘ฝไปคๆ˜ฏๅฆๅœจๆŽ’้™คๅˆ—่กจไธญ๏ผŸ                     โ”‚
โ”‚      โ”‚                                           โ”‚
โ”‚      v                                           โ”‚
โ”‚  SandboxManager.wrapWithSandbox(command)         โ”‚
โ”‚      โ”‚                                           โ”‚
โ”‚      v                                           โ”‚
โ”‚  @anthropic-ai/sandbox-runtime                   โ”‚
โ”‚      โ”‚                                           โ”‚
โ”‚      โ”œโ”€ ๆ–‡ไปถ็ณป็ปŸ้™ๅˆถ                              โ”‚
โ”‚      โ”œโ”€ ็ฝ‘็ปœ้™ๅˆถ                                  โ”‚
โ”‚      โ””โ”€ ่ฟ›็จ‹้™ๅˆถ                                  โ”‚
โ”‚                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Sandbox Coverage Figure 7-1: Sandbox coverage โ€” Settings Files, Skills Dir, and Git Config are 100% restricted (hardcoded DENY) and cannot be opened via configuration. Filesystem Write restriction is at 85% (most paths are restricted; only the project directory is permitted). Process Spawn has the lowest restriction rate (50%), since certain commands (such as Docker) need to bypass the sandbox.

7.2 Three Restriction Dimensions

Filesystem Restrictions

interface FsReadRestrictionConfig {
  allowRead: string[];   // ๅ…่ฎธ่ฏปๅ–็š„่ทฏๅพ„ๆจกๅผ
  denyRead: string[];    // ็ฆๆญข่ฏปๅ–็š„่ทฏๅพ„ๆจกๅผ
}

interface FsWriteRestrictionConfig {
  allowWrite: string[];  // ๅ…่ฎธๅ†™ๅ…ฅ็š„่ทฏๅพ„ๆจกๅผ
  denyWrite: string[];   // ็ฆๆญขๅ†™ๅ…ฅ็š„่ทฏๅพ„ๆจกๅผ
}

// ่ทฏๅพ„่ฏญๆณ•:
// "/path"   โ€” ็›ธๅฏนไบŽ่ฎพ็ฝฎๆ น่ทฏๅพ„
// "//path"  โ€” ็ปๅฏน่ทฏๅพ„๏ผˆไปŽๆ น็›ฎๅฝ•ๅผ€ๅง‹๏ผ‰
// "~/path"  โ€” ็”จๆˆทไธป็›ฎๅฝ•
// "./path"  โ€” ๅฝ“ๅ‰ๅทฅไฝœ็›ฎๅฝ•
// "*.ext"   โ€” ้€š้…็ฌฆๆจกๅผ

Security Hardcoding: - .claude/settings*.json is always deny write (preventing sandbox escape) - .claude/skills, .claude/commands are always deny write (preventing malicious code injection) - Detects bare repository files (HEAD, objects, refs) and clears them after command execution

Network Restrictions

interface NetworkRestrictionConfig {
  allowedDomains: string[];     // ๅ…่ฎธ็š„ๅŸŸๅๆจกๅผ
  deniedDomains: string[];      // ็ฆๆญข็š„ๅŸŸๅๆจกๅผ
  allowUnixSockets?: boolean;   // Unix socket ่ฎฟ้—ฎ
  allowLocalBinding?: boolean;  // ๆœฌๅœฐ็ซฏๅฃ็ป‘ๅฎš
}

Command Exclusions

// settings.sandbox.excludedCommands
// ๅŒน้…ๆจกๅผ:
//   "cmd:"    โ€” ๅ‰็ผ€ๅŒน้…
//   "cmd"     โ€” ็ฒพ็กฎๅŒน้…
//   "cmd*"    โ€” ้€š้…็ฌฆ

// ๆณจๆ„: ่ฟ™ **ไธๆ˜ฏ** ๅฎ‰ๅ…จ่พน็•Œ
// ไป…ไธบไพฟๅˆฉๅŠŸ่ƒฝ๏ผˆ่ทณ่ฟ‡ๆฒ™็›’ไปฅ้ฟๅ…ๅ…ผๅฎนๆ€ง้—ฎ้ข˜๏ผ‰

7.3 Path Resolution (Claude Code-Specific Conventions)

Claude Code uses special path prefix conventions distinct from sandbox-runtime's standard paths:

// src/utils/sandbox/sandbox-adapter.ts โ€” ็œŸๅฎžไปฃ็ 
export function resolvePathPatternForSandbox(
  pattern: string, source: SettingSource,
): string {
  // "//" ๅ‰็ผ€ โ†’ ไปŽๆ–‡ไปถ็ณป็ปŸๆ น็›ฎๅฝ•ๅผ€ๅง‹็š„็ปๅฏน่ทฏๅพ„
  // "//.aws/**" โ†’ "/.aws/**"
  if (pattern.startsWith('//')) {
    return pattern.slice(1)
  }

  // "/" ๅ‰็ผ€ โ†’ ็›ธๅฏนไบŽ่ฎพ็ฝฎๆ–‡ไปถ็›ฎๅฝ•
  // ๆƒ้™่ง„ๅˆ™ไธญ "/foo/**" โ†’ "${settings_root}/foo/**"
  if (pattern.startsWith('/') && !pattern.startsWith('//')) {
    const root = getSettingsRootPathForSource(source)
    return resolve(root, pattern.slice(1))
  }

  // ๅ…ถไป–ๆจกๅผ๏ผˆ~/path, ./path, path๏ผ‰็›ดๆŽฅไผ ้€’
  // sandbox-runtime ็š„ normalizePathForSandbox ไผšๅค„็†
  return pattern
}

Sandbox Initialization Flow:

// ๆฒ™็›’ๅˆๅง‹ๅŒ–ๆ˜ฏๅผ‚ๆญฅ็š„๏ผŒไฝ†ๅฟ…้กปๅœจ็ฌฌไธ€ไธชๅ‘ฝไปคๆ‰ง่กŒๅ‰ๅฎŒๆˆ
async function initialize(sandboxAskCallback?) {
  if (initializationPromise) return initializationPromise
  if (!isSandboxingEnabled()) return

  // ๅŒ…่ฃ…ๅ›ž่ฐƒไปฅๅผบๅˆถๆ‰ง่กŒ allowManagedDomainsOnly ็ญ–็•ฅ
  const wrappedCallback = sandboxAskCallback ? async (hostPattern) => {
    if (shouldAllowManagedSandboxDomainsOnly()) {
      logForDebugging(`Blocked: ${hostPattern.host} (allowManagedDomainsOnly)`)
      return false
    }
    return sandboxAskCallback(hostPattern)
  } : undefined

  // ๅˆ›ๅปบ Promise๏ผˆๅœจไปปไฝ• await ไน‹ๅ‰ๅŒๆญฅๅˆ›ๅปบ๏ผŒ้˜ฒๆญข็ซžๆ€ๆกไปถ๏ผ‰
  initializationPromise = (async () => {
    // ๆฃ€ๆต‹ worktree ไธปไป“ๅบ“่ทฏๅพ„๏ผˆไผš่ฏๆœŸ้—ดไธๅ˜๏ผŒ็ผ“ๅญ˜๏ผ‰
    if (worktreeMainRepoPath === undefined) {
      worktreeMainRepoPath = await detectWorktreeMainRepoPath(getCwdState())
    }

    const settings = getSettings_DEPRECATED()
    const runtimeConfig = convertToSandboxRuntimeConfig(settings)
    await BaseSandboxManager.initialize(runtimeConfig, wrappedCallback)

    // ่ฎข้˜…่ฎพ็ฝฎๅ˜ๆ›ด๏ผŒๅŠจๆ€ๆ›ดๆ–ฐๆฒ™็›’้…็ฝฎ
    settingsSubscriptionCleanup = settingsChangeDetector.subscribe(() => {
      const newConfig = convertToSandboxRuntimeConfig(getSettings_DEPRECATED())
      BaseSandboxManager.updateConfig(newConfig)
    })
  })()

  return initializationPromise
}

7.4 Converting Permission Rules to Sandbox Configuration

Claude Code converts its permission rule syntax into the sandbox-runtime configuration format:

// ๆƒ้™่ง„ๅˆ™ โ†’ ๆฒ™็›’่ทฏๅพ„่ฝฌๆข
// Edit(/path/to/dir/*) โ†’ sandbox.filesystem.allowWrite
// Read(/patterns)      โ†’ sandbox.filesystem.allowRead
// WebFetch(domain:example.com) โ†’ sandbox.network.allowedDomains

function convertToSandboxConfig(
  permissionRules: PermissionRule[],
  sandboxSettings: SandboxSettings,
): SandboxRuntimeConfig {
  // ๅˆๅนถๆƒ้™่ง„ๅˆ™ๅ’Œๆฒ™็›’่ฎพ็ฝฎ
  // ๆƒ้™่ง„ๅˆ™ไธญ็š„ allow ่ง„ๅˆ™ โ†’ sandbox ็š„ allow ๅˆ—่กจ
  // ๆฒ™็›’่ฎพ็ฝฎ็›ดๆŽฅๆ˜ ๅฐ„
  // deny ่ง„ๅˆ™ๅง‹็ปˆๅŒ…ๅซๅฎ‰ๅ…จ็กฌ็ผ–็ 
}

7.4 Sandbox Enablement Check

function isSandboxingEnabled(): boolean {
  return isSupportedPlatform()           // macOS, Linux, WSL2+
    && checkDependencies().errors.length === 0  // bubblewrap, socat
    && isPlatformInEnabledList()         // sandbox.enabledPlatforms
    && getSandboxEnabledSetting();       // ็”จๆˆท่ฎพ็ฝฎ
}

// ไพ่ต–ๆฃ€ๆŸฅ๏ผˆๅธฆ็ผ“ๅญ˜๏ผ‰
function checkDependencies(): { errors: string[], warnings: string[] } {
  // ๆฃ€ๆŸฅ: bubblewrap ๅฏๆ‰ง่กŒๆ–‡ไปถ
  // ๆฃ€ๆŸฅ: socat ๅฏๆ‰ง่กŒๆ–‡ไปถ
  // ๆฃ€ๆŸฅ: cap_setfcap capability
}

7.5 dangerouslyDisableSandbox

// BashTool ็š„่พ“ๅ…ฅๅ‚ๆ•ฐ
{
  command: "docker build .",
  dangerouslyDisableSandbox: true  // ่ทณ่ฟ‡ๆฒ™็›’
}

// ไป…ๅฝ“่ฎพ็ฝฎๅ…่ฎธๆ—ถ็”Ÿๆ•ˆ:
// settings.sandbox.allowUnsandboxedCommands = true

// ่ฎพ่ฎกๆ„ๅ›พ: ๆŸไบ›ๅ‘ฝไปค๏ผˆๅฆ‚ Docker๏ผ‰ไธๅ…ผๅฎนๆฒ™็›’
// ไฝ†ๅฟ…้กปๆ˜พๅผ่ฏทๆฑ‚ๅนถ่ฎฐๅฝ•ๅœจๆกˆ

Design Philosophy:

The sandbox design follows the Defense in Depth principle. Even if the permission model allows an operation, the sandbox still limits that operation's "blast radius." Critical security files (settings, skills) are hardcoded as non-writable โ€” a security guarantee that does not depend on configuration.

Why are settings files hardcoded as non-writable? Imagine this scenario: the Agent discovers that the sandbox is restricting its operations, so it "cleverly" modifies settings.json to disable the sandbox, then proceeds. This constitutes a sandbox escape attack. Claude Code prevents this by hardcoding deny write for .claude/settings*.json โ€” even if all other security layers are bypassed, this rule remains in effect. This embodies the security engineering principle that "untrusted code must not be able to modify trust boundaries."


Chapter 8: Context Engineering โ€” The Art of Information Management

The preceding five chapters addressed "what the Agent does" and "what the Agent cannot do." This chapter shifts perspective โ€” "what the Agent knows."

Imagine you are a new employee on your first day. If nobody tells you the company's coding standards, architectural decisions, and known bugs, the code you write will most likely fail to meet expectations. Agents face the same challenge โ€” their performance is directly determined by the information they can "see."

Context Engineering is the discipline of managing the Agent's "field of vision." It answers three questions: 1. What should the Agent know? (CLAUDE.md, memory system) 2. When should it be told? (On-demand loading, prefetching) 3. What happens when there is too much information? (Four-level compaction pipeline)

Claude Code's design in this area is particularly sophisticated โ€” a 200-line memory index, a four-level compaction pipeline, and a parallel prefetch mechanism together constitute the most refined Agent context management system in the industry.

Context Engineering is the first pillar of Harness Engineering. It manages what information enters the model's context window, when, and in what form.

Context Engineering Pipeline Figure 8-1: The context engineering pipeline โ€” Information flows from multiple sources (CLAUDE.md, memory files, MCP instructions, environmental context) into the model's context window, passing through parallel prefetching, relevance filtering, and token budget allocation. The right side shows the composition of the context window and the four compaction zones.

8.1 Quantitative Analysis: Context Window Budget Allocation

The typical budget allocation for Claude Code's context window (assuming 200K tokens):

Component Estimated Token Share Size Compacted?
System prompt (base) ~5-8% 10-16K No (cache prefix)
Tool definitions (15 core + N MCP) ~8-15% 16-30K No (cache prefix)
CLAUDE.md content ~2-5% 4-10K No
MCP server instructions ~1-3% 2-6K No
Memory attachments ~1-2% 2-4K No (attached on demand)
Conversation history ~60-80% 120-160K Yes (four-level pipeline)
Reserved space (model output) ~5-10% 10-20K N/A

Key Insight: Conversation history occupies 60-80% of the context space, which is precisely why the compaction pipeline is so important. System prompt and tool definitions occupy 13-23% โ€” this is also why tool lazy loading (ToolSearch) is valuable: loading 15 out of 43 tools saves approximately 8% of context space, equivalent to an additional 16K tokens of conversation history in long conversations.

Cache Economics: The system prompt and tool definitions serve as the cache prefix (~30-46K tokens). Anthropic API's prompt cache does not charge input fees for the matched prefix portion. At $3/M input tokens, this saves approximately $0.0001-0.00014 per API call. A typical session involves 20-50 API calls, yielding total savings of approximately $0.002-0.007/session. At scale (millions of daily active users), this becomes a significant cost optimization.

Context Window Allocation Figure 8-2: Space allocation in the 200K context window โ€” conversation history occupies 70% (140K tokens) and is the primary target of the compaction pipeline. System prompt + tool definitions total approximately 18.5%, which is the primary beneficiary area of the prompt cache.

Compaction Efficiency Curve Figure 8-3: Token growth curve of the four-level compaction pipeline (over 50 turns) โ€” The red dashed line shows that without compaction, the 200K limit is breached at turn 45. The purple line (full pipeline) shows that tokens drop to 45K after Autocompact triggers at turn 15, then grow slowly. This enables Claude Code to handle 100+ turn conversations without interruption.

8.2 CLAUDE.md โ€” Project-Level Persistent Context

CLAUDE.md is Claude Code's core context mechanism. It is a Markdown file that provides project-level persistent context:

# CLAUDE.md

## ้กน็›ฎๆฆ‚่ฟฐ
่ฟ™ๆ˜ฏไธ€ไธช Next.js 14 ๅบ”็”จ๏ผŒไฝฟ็”จ TypeScript + Tailwind CSSใ€‚

## ๆžถๆž„็บฆๆŸ
- ็ป„ไปถๆ”พๅœจ src/components/
- API ่ทฏ็”ฑๆ”พๅœจ src/app/api/
- ไธ่ฆไฝฟ็”จ class ็ป„ไปถ
- ๆ‰€ๆœ‰ API ่ฐƒ็”จๅฟ…้กปไฝฟ็”จ fetch๏ผŒไธ่ฆ็”จ axios

## ๅ‘ฝๅ่ง„่Œƒ
- ็ป„ไปถ๏ผšPascalCase
- ๅทฅๅ…ทๅ‡ฝๆ•ฐ๏ผšcamelCase
- ๅธธ้‡๏ผšUPPER_SNAKE_CASE

## ๆต‹่ฏ•
- ่ฟ่กŒๆต‹่ฏ•: npm test
- ๆต‹่ฏ•ๆก†ๆžถ: Jest + React Testing Library
- ่ฆ†็›–็އ่ฆๆฑ‚: >80%

## ๅทฒ็Ÿฅ้—ฎ้ข˜
- #123: ็™ปๅฝ•้กต้ขๅœจ Safari ไธ‹ๆœ‰ๅธƒๅฑ€้—ฎ้ข˜
- ไธ่ฆไฟฎๆ”น legacy/ ็›ฎๅฝ•ไธ‹็š„ๆ–‡ไปถ

Loading hierarchy:

~/.claude/CLAUDE.md              # Global (all projects)
.claude/CLAUDE.md                # Project-level
.claude/CLAUDE.md.local          # Local override (not committed to git)
subdirectory/CLAUDE.md           # Directory-level (loaded when entering)

8.2 System Prompt Construction Pipeline

// src/QueryEngine.ts โ€” ็ณป็ปŸๆ็คบๆž„ๅปบ
function buildSystemPrompt(): SystemPrompt {
  const parts = [];

  // 1. ๅŸบ็ก€็ณป็ปŸๆ็คบ๏ผˆๅทฅๅ…ทๆ่ฟฐใ€่กŒไธบๆŒ‡ๅ—๏ผ‰
  parts.push(getBaseSystemPrompt());

  // 2. ๅทฅๅ…ทๅฎšไน‰
  for (const tool of tools) {
    parts.push(tool.prompt(context));
  }

  // 3. CLAUDE.md ๅ†…ๅฎน
  parts.push(loadClaudeMd());

  // 4. MCP ๆœๅŠกๅ™จๆŒ‡ไปค
  for (const mcp of mcpClients) {
    parts.push(mcp.instructions);
  }

  // 5. ่‡ชๅฎšไน‰็ณป็ปŸๆ็คบ๏ผˆ็”จๆˆท่ฆ†็›–๏ผ‰
  if (customPrompt) parts.push(customPrompt);

  // 6. ่ฎฐๅฟ†ๆœบๅˆถๆ็คบ๏ผˆๅฆ‚ๆœ‰่ฎฐๅฟ†็ณป็ปŸ๏ผ‰
  if (memoryEnabled) parts.push(memoryMechanicsPrompt);

  return asSystemPrompt(parts.join('\n'));
}

8.3 Memory System

Claude Code implements a file-based persistent memory system, located in src/memdir/.

Four Memory Types

type MemoryType =
  | 'user'       // ็”จๆˆท่ง’่‰ฒใ€ๅๅฅฝใ€็Ÿฅ่ฏ†ๆฐดๅนณ
  | 'feedback'   // ๅทฅไฝœๆ–นๆณ•ๆŒ‡ๅฏผ๏ผˆไป€ไนˆๅฏๅš/้ฟๅ…๏ผ‰
  | 'project'    // ๆญฃๅœจ่ฟ›่กŒ็š„ๅทฅไฝœใ€็›ฎๆ ‡ใ€ๅ€’่ฎกๆ—ถ
  | 'reference'; // ๅค–้ƒจ็ณป็ปŸๆŒ‡้’ˆ

Memory File Format

---
name: user-prefers-terse-responses
description: ็”จๆˆทไธๅ–œๆฌขๅ†—้•ฟ็š„ๆ€ป็ป“๏ผŒๅธŒๆœ›็ฎ€ๆด็›ดๆŽฅ็š„ๅ›žๅค
type: feedback
---

ไธ่ฆๅœจๆฏๆฌกๅ›žๅคๆœซๅฐพๆ€ป็ป“ๅˆšๅš็š„ไบ‹ๆƒ…โ€”โ€”็”จๆˆทๅฏไปฅ่‡ชๅทฑ่ฏป diffใ€‚

**Why:** ็”จๆˆทๆ˜Ž็กฎ่กจ็คบไธๅ–œๆฌขๅฐพ้ƒจๆ€ป็ป“ใ€‚
**How to apply:** ๆ‰€ๆœ‰ๅ›žๅคไฟๆŒ็ฎ€ๆด๏ผŒไธๅŠ ๅฐพ้ƒจๆ€ป็ป“ๆฎต่ฝใ€‚

Memory Index (MEMORY.md)

- [User Role](user_role.md) โ€” ้ซ˜็บง Go ๅทฅ็จ‹ๅธˆ๏ผŒReact ๆ–ฐๆ‰‹
- [Terse Responses](feedback_terse.md) โ€” ไธ่ฆๅฐพ้ƒจๆ€ป็ป“
- [Auth Rewrite](project_auth.md) โ€” ๅˆ่ง„้ฉฑๅŠจ็š„่ฎค่ฏไธญ้—ดไปถ้‡ๅ†™
- [Bug Tracker](reference_linear.md) โ€” ็ฎก้“ bug ๅœจ Linear INGEST ้กน็›ฎ

Memory Scanning and Attachment

// src/memdir/memoryScan.ts
function scanMemories(memoryDir: string): MemoryHeader[] {
  // ๆ‰ซๆ ~/.claude/memory/ ็›ฎๅฝ•
  // ่ฏปๅ–ๆฏไธช .md ๆ–‡ไปถ็š„ frontmatter
  // ๆŒ‰ไฟฎๆ”นๆ—ถ้—ดๆŽ’ๅบ๏ผˆๆœ€ๆ–ฐไผ˜ๅ…ˆ๏ผ‰
  // ไธŠ้™: MAX_MEMORY_FILES = 200
  return headers;
}

// ไธŽๆŸฅ่ฏขๅพช็Žฏ้›†ๆˆ
// 1. ๅœจๆตๅผๅ“ๅบ”ๆœŸ้—ดๅผ€ๅง‹่ฎฐๅฟ†ๆ‰ซๆ๏ผˆ้ข„ๅ–๏ผ‰
startRelevantMemoryPrefetch();

// 2. ่ฟ‡ๆปค็›ธๅ…ณ่ฎฐๅฟ†ๅนถๅˆ›ๅปบ้™„ไปถๆถˆๆฏ
const attachments = getAttachmentMessages(memories, userMessage);

// 3. ้™„ๅŠ ๅˆฐ็”จๆˆทๆถˆๆฏ
messages.push(...attachments);

Real Implementation of Memory Scanning

// src/memdir/memoryScan.ts โ€” ็œŸๅฎžไปฃ็ 
// ๅ•ๆฌก้ๅކ๏ผšstat + read ๅˆๅนถ๏ผˆๅ‡ๅฐ‘็ณป็ปŸ่ฐƒ็”จ๏ผ‰
// ๅฏนไบŽๅธธ่งๆƒ…ๅ†ต๏ผˆN โ‰ค 200๏ผ‰๏ผŒ็›ธๆฏ”ๅ…ˆ stat ๆŽ’ๅบๅ† read๏ผŒ็ณป็ปŸ่ฐƒ็”จๅ‡ๅŠ
export async function scanMemoryFiles(
  memoryDir: string, signal: AbortSignal,
): Promise<MemoryHeader[]> {
  try {
    const entries = await readdir(memoryDir, { recursive: true })
    const mdFiles = entries.filter(
      f => f.endsWith('.md') && basename(f) !== 'MEMORY.md',
    )

    const headerResults = await Promise.allSettled(
      mdFiles.map(async (relativePath): Promise<MemoryHeader> => {
        const filePath = join(memoryDir, relativePath)
        // ๅช่ฏปๅ–ๅ‰ FRONTMATTER_MAX_LINES ่กŒ๏ผˆไผ˜ๅŒ–ๅคงๆ–‡ไปถ๏ผ‰
        const { content, mtimeMs } = await readFileInRange(
          filePath, 0, FRONTMATTER_MAX_LINES, undefined, signal,
        )
        const { frontmatter } = parseFrontmatter(content, filePath)
        return {
          filename: relativePath,
          filePath,
          mtimeMs,
          description: frontmatter.description || null,
          type: parseMemoryType(frontmatter.type),
        }
      }),
    )

    return headerResults
      .filter((r): r is PromiseFulfilledResult<MemoryHeader> =>
        r.status === 'fulfilled')
      .map(r => r.value)
      .sort((a, b) => b.mtimeMs - a.mtimeMs)  // ๆœ€ๆ–ฐไผ˜ๅ…ˆ
      .slice(0, MAX_MEMORY_FILES)  // ไธŠ้™ 200
  } catch {
    return []  // ็›ฎๅฝ•ไธๅญ˜ๅœจๆ—ถไผ˜้›…้™็บง
  }
}

Memory Manifest Formatting:

// ็”จไบŽ่ฎฐๅฟ†้€‰ๆ‹ฉๆ็คบๅ’Œๆๅ– Agent ๆ็คบ
export function formatMemoryManifest(memories: MemoryHeader[]): string {
  return memories.map(m => {
    const tag = m.type ? `[${m.type}] ` : ''
    const ts = new Date(m.mtimeMs).toISOString()
    return m.description
      ? `- ${tag}${m.filename} (${ts}): ${m.description}`
      : `- ${tag}${m.filename} (${ts})`
  }).join('\n')
}

Design Philosophy:

The memory system follows the explicit over implicit principle. Memories are structured Markdown files (not a database) with explicit types and metadata. The MEMORY.md index is capped at 200 entries to prevent memory bloat. Memory scanning runs in parallel with API calls (prefetching), adding no latency.

Promise.allSettled (rather than Promise.all) ensures that a single corrupted memory file does not cause the entire scan to fail โ€” this is defensive programming as applied within the Harness.

8.4 Context Compaction Strategies

(See Section 3.3 for the four-level compaction pipeline)

Key additions:

Full Transcript Save Strategy:
โ”œโ”€ Saves the complete transcript to disk before every autocompact
โ”œโ”€ Path: ~/.claude/history/<session_id>/
โ”œโ”€ Purpose: --resume recovery, auditing, debugging
โ””โ”€ Does not participate in compaction decisions (backup only)

Budget Tracking Across Compaction:
โ”œโ”€ taskBudgetRemaining is captured before compaction
โ”œโ”€ Accumulated across multiple compaction events
โ””โ”€ Ensures total spend does not exceed budget

8.5 Dynamic Context

// src/context.ts โ€” ็ŽฏๅขƒไธŠไธ‹ๆ–‡ๆ”ถ้›†
function collectContext(): UserContext {
  return {
    currentDate: new Date(),
    platform: process.platform,
    shell: process.env.SHELL,
    osVersion: getOSVersion(),
    modelInfo: getModelInfo(),
    cwd: process.cwd(),
    gitState: getGitState(),        // ๅˆ†ๆ”ฏใ€็Šถๆ€ใ€่ฟœ็จ‹
    terminalSize: getTerminalSize(),
    // ...ๆ›ดๅคš็Žฏๅขƒไฟกๆฏ
  };
}

Chapter 9: Settings & Configuration โ€” Harness Tunability

So far we have seen the loop, tools, permissions, Hooks, sandbox, and context โ€” all of these components have tunable parameters. But where are those parameters stored? Whose settings take priority? Can enterprise administrators lock down certain settings?

The settings system is the Harness's "control panel." A well-designed control panel must let newcomers work out of the box, allow advanced users to fine-tune precisely, and enable enterprise administrators to enforce policy. Claude Code solves this perfectly with a 7-level hierarchical settings system.

The settings system determines how the Harness's behavior is adjusted and customized.

9.1 settings.json Structure

{
  // ===== ๆƒ้™ =====
  "permissions": {
    "allow": ["Read", "Glob", "Grep", "Bash(git *)"],
    "deny": ["Bash(sudo *)", "Bash(rm -rf *)"],
    "ask": ["Write(*.env)", "Bash(npm publish)"],
    "defaultMode": "default",
    "additionalDirectories": ["/shared/libs"],
    "disableBypassPermissionsMode": "disable",
    "disableAutoMode": "disable"
  },

  // ===== Hooks =====
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "check-safety.sh",
            "if": "Bash(rm *)"
          }
        ]
      }
    ]
  },

  // ===== ๆฒ™็›’ =====
  "sandbox": {
    "enabled": true,
    "autoAllowBashIfSandboxed": true,
    "allowUnsandboxedCommands": false,
    "fsRead": { "allow": ["**"], "deny": ["/etc/shadow"] },
    "fsWrite": { "allow": ["./**"], "deny": [".env", "*.key"] },
    "network": {
      "allowedDomains": ["*.github.com", "registry.npmjs.org"],
      "deniedDomains": ["*.malware.com"]
    },
    "excludedCommands": ["docker", "podman"],
    "lockedByPolicy": false
  },

  // ===== ๅ…ถไป– =====
  "model": "claude-sonnet-4-6",
  "env": {
    "NODE_ENV": "development"
  },
  "attribution": "Co-Authored-By: Claude",
  "cleanupPeriodDays": 30,
  "defaultShell": "bash",
  "allowedMcpServers": ["@anthropic/mcp-*"],
  "deniedMcpServers": ["*-untrusted"]
}

9.2 Hierarchical Loading

// src/utils/settings/settings.ts
function loadSettings(): MergedSettings {
  // ๆŒ‰ไผ˜ๅ…ˆ็บงไปŽไฝŽๅˆฐ้ซ˜ๅˆๅนถ
  const layers = [
    loadUserSettings(),          // ~/.claude/settings.json
    loadProjectSettings(),       // .claude/settings.json
    loadLocalSettings(),         // .claude/settings.json.local
    loadPolicySettings(),        // ็ป„็ป‡็ญ–็•ฅ
    loadManagedSettings(),       // MDM/ไผไธš
    loadFlagSettings(),          // ็Žฏๅขƒๅ˜้‡
    loadCliArgSettings(),        // ๅ‘ฝไปค่กŒๅ‚ๆ•ฐ
  ];

  return deepMerge(layers);  // ๅŽ่€…่ฆ†็›–ๅ‰่€…
}

Enterprise Managed Settings:

/managed/managed-settings.json
โ”œโ”€ Base managed settings
โ”œโ”€ Distributed by MDM (Jamf, Intune, etc.)
โ””โ”€ Can lock sandbox: { "sandbox": { "lockedByPolicy": true } }

/managed/managed-settings.d/
โ”œโ”€ security-policy.json      # Security policy drop-in
โ”œโ”€ compliance-rules.json     # Compliance rules drop-in
โ””โ”€ (loaded in alphabetical order; later files override earlier ones)

9.3 Schema Validation

// src/schemas/ โ€” Zod ้ชŒ่ฏ
const SettingsSchema = z.object({
  permissions: PermissionsSchema.optional(),
  hooks: HooksSchema.optional(),
  sandbox: SandboxSchema.optional(),
  model: z.string().optional(),
  env: z.record(z.string()).optional(),
  // ...
}).passthrough();  // ไฟ็•™ๆœช็Ÿฅๅญ—ๆฎต๏ผˆๅ‘ๅŽๅ…ผๅฎน๏ผ‰

.passthrough() is a critical design decision: older settings files may contain fields unrecognized by newer versions. passthrough() preserves these fields without raising errors, ensuring backward compatibility.

9.4 In-Depth Analysis: Settings Merge Algorithm

Settings merging appears simple but involves carefully crafted custom logic:

// src/utils/settings/settings.ts โ€” ็œŸๅฎžไปฃ็ 
// ๆ•ฐ็ป„ๅˆๅนถ็ญ–็•ฅ๏ผš่ฟžๆŽฅ + ๅŽป้‡๏ผˆ่€Œ้žๆ›ฟๆข๏ผ‰
function settingsMergeCustomizer(objValue, srcValue) {
  if (Array.isArray(objValue) && Array.isArray(srcValue)) {
    return mergeArrays(objValue, srcValue)  // concat + uniq
  }
  // ้žๆ•ฐ็ป„๏ผšๆ ‡้‡่ฆ†็›–๏ผŒๅฏน่ฑก้€’ๅฝ’ๅˆๅนถ
}

Why concatenate arrays rather than replace? Consider permission rules: an enterprise policy defines deny: ["Bash(sudo *)"], and the project settings define deny: ["Bash(rm -rf *)"]. With replacement, the project's deny would overwrite the enterprise deny. Through concatenation and deduplication, the final deny list contains both rules โ€” which is the correct security behavior.

9.5 Managed Settings Drop-in Pattern

// ็ฑปไผผ systemd ็š„ drop-in ็›ฎๅฝ•ๆจกๅผ๏ผš
// managed-settings.json        โ† ๅŸบ็ก€๏ผˆๆœ€ไฝŽไผ˜ๅ…ˆ็บง๏ผ‰
// managed-settings.d/
//   10-otel.json               โ† ๅฏ่ง‚ๆต‹ๆ€งๅ›ข้˜Ÿ็š„้…็ฝฎ
//   20-security.json           โ† ๅฎ‰ๅ…จๅ›ข้˜Ÿ็š„้…็ฝฎ
//   30-compliance.json         โ† ๅˆ่ง„ๅ›ข้˜Ÿ็š„้…็ฝฎ
// ๆŒ‰ๅญ—ๆฏๅบๆŽ’ๅบๅˆๅนถ๏ผŒๅŽ่€…่ฆ†็›–ๅ‰่€…

Why drop-in rather than a single file? In large enterprises, different teams manage different configuration dimensions. The security team is responsible for deny rules, the platform team for MCP whitelists, and the compliance team for data retention policies. The drop-in pattern lets each team independently manage their own configuration fragments without coordinating edits on a single file โ€” consistent with Linux system administration best practices.

9.6 Defensive Cache Cloning

// ็œŸๅฎžไปฃ็ ๏ผš็ผ“ๅญ˜่ฏปๅ–ๆ—ถ่ฟ”ๅ›žๅ…‹้š†ๅ‰ฏๆœฌ
// ๅŽŸๅ› ๏ผšlodash ็š„ mergeWith() ไผšไฟฎๆ”น็ฌฌไธ€ไธชๅ‚ๆ•ฐ
// ๅฆ‚ๆžœ็›ดๆŽฅ่ฟ”ๅ›ž็ผ“ๅญ˜ๅฏน่ฑก๏ผŒ่ฐƒ็”จ่€…็š„ๅˆๅนถๆ“ไฝœไผšๆฑกๆŸ“็ผ“ๅญ˜
// ไธ‹ไธ€ไธช่ฏปๅ–่€…ไผš็œ‹ๅˆฐ่ขซไฟฎๆ”น็š„ๆ•ฐๆฎโ€”โ€”ไธ€ไธชๆž้šพ่ฐƒ่ฏ•็š„ bug
const cached = getCachedParsedFile(path)
return cached ? structuredClone(cached) : loadAndCache(path)

Design Pattern Analysis: Postel's Law (Be Liberal in What You Accept)

This exemplifies the classic internet engineering principle โ€” "be conservative in what you send, be liberal in what you accept." Settings files are edited by users and may contain typos, outdated fields, or experimental configuration. passthrough() ensures Claude Code does not crash when encountering such "imperfect" input, silently ignoring unrecognized fields. This is particularly important in Harness Engineering because settings files persist across versions โ€” old settings should not be rejected by new versions when a user upgrades Claude Code.


Chapter 10: MCP Integration โ€” Extending the Harness Boundary

Claude Code ships with 43+ built-in tools, but real-world needs are infinite โ€” someone needs to query a database, someone else needs to operate Kubernetes, and another needs to send Slack messages. Anthropic cannot anticipate every need.

MCP (Model Context Protocol) is the solution: a standard protocol that allows anyone to write a "tool server," which Claude Code automatically discovers and uses. This is analogous to the USB protocol โ€” you do not need to redesign the computer for each peripheral; you need only a unified interface.

In this chapter we examine how Claude Code connects to MCP servers via 6 transport protocols and how it seamlessly integrates external tools into its permission and Hook systems.

MCP (Model Context Protocol) allows Claude Code to connect to external tool servers, vastly extending the Harness's capabilities.

10.1 Six Transport Protocols

flowchart LR
    CC[Claude Code] --> stdio["Stdio\nๅญ่ฟ›็จ‹ stdin/stdout"]
    CC --> sse["SSE\nServer-Sent Events"]
    CC --> http["Streamable HTTP\nHTTP ๆต"]
    CC --> ws["WebSocket\nTLS/ไปฃ็†ๆ”ฏๆŒ"]
    CC --> inproc["InProcess\nๅ†…ๅญ˜ TS ๆจกๅ—"]
    CC --> sdk["SdkControl\nSDK daemon"]

    stdio --> local["ๆœฌๅœฐ MCP ๆœๅŠกๅ™จ\n(filesystem, git)"]
    sse --> remote["่ฟœ็จ‹ MCP ๆœๅŠกๅ™จ\n(Slack, Linear)"]
    http --> remote
    ws --> remote
    inproc --> chrome["Chrome/Computer Use\n้ฟๅ… 325MB ๅญ่ฟ›็จ‹"]
    sdk --> daemon["SDK Daemon\nๆŽงๅˆถ้ข"]

    classDef transport fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
    classDef server fill:#dcfce7,stroke:#16a34a,color:#14532d

    class stdio,sse,http,ws,inproc,sdk transport
    class local,remote,chrome,daemon server

Source Code Annotation Analysis: A key comment in the MCP client implementation reads: "Run Chrome MCP server in-process to avoid spawning ~325MB subprocess." This reveals the true reason InProcess transport exists: certain MCP servers (such as Chrome browser control) incur excessive memory overhead when started as independent processes. Through InProcess transport, they run within the Claude Code process, sharing memory space.

// src/services/mcp/client.ts โ€” ๅ…ญ็งไผ ่พ“็ฑปๅž‹
type MCPTransport =
  | StdioClientTransport       // ๅญ่ฟ›็จ‹๏ผˆstdin/stdout๏ผ‰
  | SSEClientTransport         // Server-Sent Events
  | StreamableHTTPTransport    // HTTP ๆต
  | WebSocketTransport         // WebSocket๏ผˆๆ”ฏๆŒ TLS/ไปฃ็†๏ผ‰
  | InProcessTransport         // ๅ†…ๅญ˜๏ผˆTypeScript ๆจกๅ—๏ผ‰
  | SdkControlTransport;       // SDK daemon ๆŽงๅˆถ

10.2 MCP Configuration

// ~/.claude/mcp.json๏ผˆๅ…จๅฑ€๏ผ‰
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
      "env": {}
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    }
  }
}

// .claude/mcp.json๏ผˆ้กน็›ฎ็บง๏ผŒไธŽ mcp.json ๅˆๅนถ๏ผ‰
{
  "mcpServers": {
    "database": {
      "command": "python",
      "args": ["mcp_db_server.py"],
      "env": { "DB_URL": "postgresql://..." }
    }
  }
}

10.3 MCP Tool Execution

// MCP ๅทฅๅ…ทๆ‰ง่กŒๆต็จ‹
async function callMcpTool(
  server: MCPClient,
  toolName: string,
  input: object,
): Promise<ToolResult> {
  // 1. ่ฐƒ็”จ MCP ๆœๅŠกๅ™จ
  const result = await server.callTool(toolName, input);

  // 2. ๅค„็†ๆตๅผ่ฟ›ๅบฆ
  if (result.isStreaming) {
    for await (const progress of result.stream) {
      yield progress;
    }
  }

  // 3. ็ป“ๆžœๆˆชๆ–ญๅ’Œ้ชŒ่ฏ
  const truncated = truncateIfNeeded(result);

  // 4. OAuth token ๅˆทๆ–ฐ๏ผˆๅฆ‚้œ€่ฆ๏ผ‰
  if (result.error?.type === 'auth_error') {
    await refreshOAuthToken(server);
    return callMcpTool(server, toolName, input);  // ้‡่ฏ•
  }

  // 5. ไบŒ่ฟ›ๅˆถๅ†…ๅฎนๆŒไน…ๅŒ–๏ผˆๅคง่พ“ๅ‡บ๏ผ‰
  if (isBinaryContent(result)) {
    await persistBinaryContent(result);
  }

  return truncated;
}

10.4 In-Depth Analysis: MCP Connection Lifecycle

// src/services/mcp/client.ts โ€” ็œŸๅฎž็š„่ฟžๆŽฅ็ผ–ๆŽ’
// getMcpToolsCommandsAndResources() ็š„ๆ ธๅฟƒ้€ป่พ‘๏ผš

// 1. ๆŒ‰ไผ ่พ“็ฑปๅž‹ๅˆ†ๅŒบ๏ผŒไธๅŒๅนถๅ‘ๅบฆ
//    ๆœฌๅœฐๆœๅŠกๅ™จ๏ผˆstdio๏ผ‰: ไฝŽๅนถๅ‘๏ผˆbatch ~3๏ผ‰๏ผŒ้ฟๅ…่ฟ›็จ‹็”Ÿๆˆไบ‰็”จ
//    ่ฟœ็จ‹ๆœๅŠกๅ™จ๏ผˆsse/http/ws๏ผ‰: ้ซ˜ๅนถๅ‘๏ผˆbatch ~20๏ผ‰๏ผŒไป…็ฝ‘็ปœ่ฟžๆŽฅ

// 2. ไธ‰็บง่ฟ‡ๆปค
//    Level 1: ็ฆ็”จๆฃ€ๆŸฅ๏ผˆsettings ไธญๆ ‡่ฎฐไธบ disabled๏ผ‰
//    Level 2: ่ฎค่ฏ็ผ“ๅญ˜ๆฃ€ๆŸฅ๏ผˆ15 ๅˆ†้’Ÿ TTL ็š„ๅคฑ่ดฅ็ผ“ๅญ˜๏ผ‰
//    Level 3: ๅฎž้™…่ฟžๆŽฅๅฐ่ฏ•๏ผˆmemoized by name + config hash๏ผ‰

// 3. ๆตๅผ็ป“ๆžœๅ›ž่ฐƒ๏ผˆไธ็ญ‰ๆ‰€ๆœ‰ๆœๅŠกๅ™จ่ฟžๆŽฅๅฎŒๆฏ•๏ผ‰
//    ๆฏไธชๆœๅŠกๅ™จ่ฟžๆŽฅๅฎŒๆˆๅŽ็ซ‹ๅณๅ›ž่ฐƒ onConnectionAttempt()
//    UI ๅฏไปฅๅขž้‡ๆธฒๆŸ“ๅทฒ่ฟžๆŽฅ็š„ๆœๅŠกๅ™จ

Race condition guard for auth cache: The source code comments: "Serialize cache writes through a promise chain to prevent concurrent read-modify-write races when multiple servers return 401 in the same batch." This is a classic concurrency problem โ€” 10 MCP servers returning 401 simultaneously would corrupt the cache file if writes were not serialized. The solution: serialize all write operations through a Promise chain, with read operations sharing the same memoized Promise.

10.5 MCP Configuration Deduplication Strategy

Configuration source priority:
  Manual config > Plugin auto-discovery > Claude.ai browser connector

Deduplication rules ("same server" determination):
  stdio servers: command array exactly identical
  remote servers: URL exactly identical
  Name conflicts: later overrides earlier

Enterprise policies:
  allowlist + denylist with three matching modes:
    Name match: exact server name
    Command match: stdio server's command array
    URL match: wildcard patterns (https://*/api/* โ†’ regex)
  denylist merged from all sources (always in effect)
  allowlist controlled by shouldAllowManagedMcpServersOnly() policy

10.6 MCP Tool Integration with Built-in Tools

// MCP ๅทฅๅ…ทไธŽๅ†…็ฝฎๅทฅๅ…ทๅ…ฑไบซๅŒไธ€ไธชๅทฅๅ…ทๆฑ 
// assembleToolPool() ๅฐ†ไธค่€…ๅˆๅนถ

// ๅŽป้‡่ง„ๅˆ™: ๅ†…็ฝฎๅทฅๅ…ทไผ˜ๅ…ˆ
// ๅฆ‚ๆžœ MCP ๆœๅŠกๅ™จๆไพ›ไบ†ๅŒๅๅทฅๅ…ท๏ผŒๅ†…็ฝฎ็‰ˆๆœฌ่ขซไฟ็•™

// Deny ่ง„ๅˆ™ๅŒๆ ท้€‚็”จไบŽ MCP ๅทฅๅ…ท
// ๅฏไปฅๅœจ settings.json ไธญ:
{
  "permissions": {
    "deny": ["mcp__untrusted-server"]  // ็ฆๆญขๆ•ดไธช MCP ๆœๅŠกๅ™จ
  },
  "allowedMcpServers": ["@official/*"],
  "deniedMcpServers": ["*-untrusted"]
}

10.5 MCP Skills Discovery

// ๅฏ้€‰็‰นๆ€ง: MCP_SKILLS
// ไปŽ MCP ๆœๅŠกๅ™จๅ‘็Žฐๅนถๆณจๅ†Œ Skills
// skills/mcpSkills.ts

// MCP ๆœๅŠกๅ™จๅฏไปฅๆšด้œฒ Skills๏ผˆไธไป…ๆ˜ฏ Tools๏ผ‰
// Skills ๆ˜ฏๆ›ด้ซ˜็บง็š„ๅทฅไฝœๆตๆŠฝ่ฑก
// ้€š่ฟ‡ skills builder ๆจกๅผๆณจๅ†Œ

Chapter 11: Sub-Agent System โ€” Multi-Agent Orchestration

Up to this point, our Harness has only one Agent. But for complex tasks, a single Agent is often insufficient โ€” it may need to search different parts of the codebase simultaneously, or have a dedicated "reviewer" check the code it wrote.

This is analogous to a company: the CEO cannot do everything personally and needs to delegate tasks to team members. But delegation is not simply "go do this" โ€” it requires clear permission scopes (which resources can you access), information isolation (do not let noise from one subtask pollute another), and result reporting (give me only the summary, not the raw data).

Claude Code's sub-Agent system is designed precisely according to this approach. Each sub-Agent has its own message history, tool set, permission mode, and token budget โ€” fully isolated, returning only a summary upon completion.

Multi-Agent Orchestration Figure 11-1: Multi-Agent orchestration architecture โ€” The parent Agent spawns isolated sub-Agents (Explore/Plan/General/Custom/Fork) via AgentTool, each with independent message history, token budget, and permission mode. The bottom shows the Coordinator/Swarm system and Worktree isolation.

11.1 Agent Tool

Agent Tool is Claude Code's sub-Agent spawning mechanism, located in src/tools/AgentTool/.

// Agent ๅฎšไน‰็ป“ๆž„
interface AgentDefinition {
  agentType: string;           // ไพ‹: "Explore", "Plan", "general-purpose"
  description: string;         // ็”จ้€”ๆ่ฟฐ
  whenToUse: string;           // ไฝ•ๆ—ถไฝฟ็”จ
  tools: string[] | '*';       // ๅฏ็”จๅทฅๅ…ท๏ผˆ'*' = ๆ‰€ๆœ‰๏ผ‰
  maxTurns?: number;           // ๆœ€ๅคง่ฝฎๆฌก้™ๅˆถ
  model?: string | 'inherit';  // ๆจกๅž‹้€‰ๆ‹ฉ
  permissionMode?: PermissionMode;  // ๆƒ้™ๆจกๅผ
  getSystemPrompt(): string;   // ็ณป็ปŸๆ็คบ็”Ÿๆˆ
}

11.2 Agent Types

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                Agent Types                        โ”‚
โ”‚                                                   โ”‚
โ”‚  general-purpose (General-Purpose Agent)          โ”‚
โ”‚  โ”œโ”€ Tools: all (*)                               โ”‚
โ”‚  โ”œโ”€ Use case: complex multi-step tasks           โ”‚
โ”‚  โ””โ”€ Model: inherits from parent                  โ”‚
โ”‚                                                   โ”‚
โ”‚  Explore (Exploration Agent)                      โ”‚
โ”‚  โ”œโ”€ Tools: read-only (Read, Glob, Grep,          โ”‚
โ”‚  โ”‚         WebFetch, ...)                         โ”‚
โ”‚  โ”œโ”€ Use case: codebase exploration, search        โ”‚
โ”‚  โ”œโ”€ Cannot: edit, write, run commands            โ”‚
โ”‚  โ””โ”€ Three depth levels: quick, medium,           โ”‚
โ”‚     very thorough                                 โ”‚
โ”‚                                                   โ”‚
โ”‚  Plan (Planning Agent)                            โ”‚
โ”‚  โ”œโ”€ Tools: read-only + Plan file writing         โ”‚
โ”‚  โ”œโ”€ Use case: designing implementation plans     โ”‚
โ”‚  โ””โ”€ Cannot: execute actual code changes          โ”‚
โ”‚                                                   โ”‚
โ”‚  custom (Custom Agent)                            โ”‚
โ”‚  โ”œโ”€ Definition: ~/.claude/agents/<name>.md       โ”‚
โ”‚  โ”œโ”€ Frontmatter: tools, model, maxTurns          โ”‚
โ”‚  โ””โ”€ System prompt: Markdown body                 โ”‚
โ”‚                                                   โ”‚
โ”‚  Fork (Implicit Fork Agent)                       โ”‚
โ”‚  โ”œโ”€ Experimental feature                         โ”‚
โ”‚  โ”œโ”€ Automatically forks from parent context      โ”‚
โ”‚  โ””โ”€ Inherits parent's tools and permissions      โ”‚
โ”‚                                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

11.3 Sub-Agent Spawning Flow

sequenceDiagram
    participant P as ็ˆถ Agent
    participant AT as AgentTool
    participant MCP as MCP ๆœๅŠกๅ™จ
    participant C as ๅญ Agent
    participant T as ๅญ Agent ๅทฅๅ…ท้›†

    P->>AT: Agent(type:"Explore", prompt:"...")
    AT->>AT: ๅŠ ่ฝฝ Agent ๅฎšไน‰
    AT->>MCP: ๆฃ€ๆŸฅ MCP ๆœๅŠกๅ™จๅฐฑ็ปช
    alt MCP ๆœชๅฐฑ็ปช
        loop ๆฏ 500ms ่ฝฎ่ฏข (ๆœ€ๅคš 30s)
            AT->>MCP: ่ฟžๆŽฅ็Šถๆ€?
        end
    end
    AT->>C: ๅˆ›ๅปบ้š”็ฆปไธŠไธ‹ๆ–‡
    Note over C: messages = []<br/>็‹ฌ็ซ‹ Token ้ข„็ฎ—<br/>็‹ฌ็ซ‹ๅŽ‹็ผฉ็ฎก้“
    AT->>T: ็ป„่ฃ…ๅทฅๅ…ท้›† (ๅฎšไน‰ๆŒ‡ๅฎšๆˆ–็ปงๆ‰ฟ)
    C->>C: ็‹ฌ็ซ‹ queryLoop()
    loop Agent ๆ‰ง่กŒ
        C->>T: ๅทฅๅ…ท่ฐƒ็”จ
        T-->>C: ๅทฅๅ…ท็ป“ๆžœ
    end
    C-->>AT: ่ฟ”ๅ›žๆ‘˜่ฆ
    AT-->>P: ๆ‘˜่ฆ (ไธๅซๅญ Agent ๅฎŒๆ•ดๅކๅฒ)

Source Code Annotation Analysis: A key comment in the AgentTool source code reads: "Fork children keep the Agent tool in their pool for cache-identical tool defs, so reject fork attempts at call time." This means Fork sub-Agents retain the Agent tool in their tool pool (to maintain cache-identical tool definitions with the parent) but reject recursive Fork attempts at runtime. The primary checking mechanism is querySource (unaffected by compaction), with message scanning as a fallback. This prevents the runaway scenario of "Agents infinitely spawning Agents."

็ˆถ Agent ่ฏทๆฑ‚: Agent(type: "Explore", prompt: "...")
    โ”‚
    v
AgentTool.call()
    โ”‚
    โ”œโ”€ 1. ๅŠ ่ฝฝ Agent ๅฎšไน‰๏ผˆๅ†…็ฝฎๆˆ–่‡ชๅฎšไน‰๏ผ‰
    โ”œโ”€ 2. ๆž„ๅปบ้š”็ฆป็š„ๆถˆๆฏๆ•ฐ็ป„๏ผˆmessages = []๏ผ‰
    โ”œโ”€ 3. ้€‰ๆ‹ฉๅทฅๅ…ท้›†๏ผˆๅฎšไน‰ไธญๆŒ‡ๅฎšๆˆ–็ปงๆ‰ฟ๏ผ‰
    โ”œโ”€ 4. ่ฎพ็ฝฎๆƒ้™ๆจกๅผ๏ผˆbubble / default / ็ปงๆ‰ฟ๏ผ‰
    โ”œโ”€ 5. ๅฏๅŠจ็‹ฌ็ซ‹็š„ๆŸฅ่ฏขๅพช็Žฏ
    โ”‚     โ””โ”€ ๅญ Agent ๆœ‰่‡ชๅทฑ็š„:
    โ”‚        โ”œโ”€ ๆถˆๆฏๅކๅฒ
    โ”‚        โ”œโ”€ ๅทฅๅ…ทไธŠไธ‹ๆ–‡
    โ”‚        โ”œโ”€ ๅŽ‹็ผฉ็ฎก้“
    โ”‚        โ””โ”€ Token ้ข„็ฎ—
    โ”œโ”€ 6. ๆ”ถ้›†่พ“ๅ‡บ
    โ””โ”€ 7. ่ฟ”ๅ›žๆ‘˜่ฆ็ป™็ˆถ Agent

Design Philosophy:

The core design of sub-Agents is context isolation. Each sub-Agent begins with a blank message list and returns only a summary upon completion. This prevents: - Subtask noise from polluting the parent context - Token budgets from being exhausted by exploratory queries - Permission leakage (the sub-Agent's tool set can be more restricted)

11.4 Custom Agents

<!-- ~/.claude/agents/code-reviewer.md -->
---
name: code-reviewer
description: Specialized agent for code review
tools: [Read, Grep, Glob]
model: claude-sonnet-4-6
maxTurns: 50
---

You are a code reviewer. Your job is to:
1. Read the changed files
2. Check for bugs, security issues, and style violations
3. Provide actionable feedback

You are READ-ONLY. You cannot modify any files.

Focus on:
- Security vulnerabilities (injection, XSS, etc.)
- Performance issues (N+1 queries, memory leaks)
- Code quality (naming, SRP, test coverage)

11.5 Coordinator / Swarm System

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            Coordinator System                โ”‚
โ”‚                                              โ”‚
โ”‚  src/coordinator/                            โ”‚
โ”‚  โ”œโ”€ Multi-Agent orchestration               โ”‚
โ”‚  โ”œโ”€ Team creation/deletion                  โ”‚
โ”‚  โ”œโ”€ Task assignment                         โ”‚
โ”‚  โ””โ”€ State synchronization                   โ”‚
โ”‚                                              โ”‚
โ”‚  utils/swarm/                                โ”‚
โ”‚  โ”œโ”€ Coordination logic                      โ”‚
โ”‚  โ”œโ”€ Teammate tools                          โ”‚
โ”‚  โ””โ”€ Communication protocol                  โ”‚
โ”‚                                              โ”‚
โ”‚  Tools:                                      โ”‚
โ”‚  โ”œโ”€ TeamCreateTool  โ€” Create team Agents    โ”‚
โ”‚  โ”œโ”€ TeamDeleteTool  โ€” Delete team Agents    โ”‚
โ”‚  โ”œโ”€ SendMessageTool โ€” Inter-Agent messaging โ”‚
โ”‚  โ””โ”€ TaskStopTool    โ€” Stop tasks            โ”‚
โ”‚                                              โ”‚
โ”‚  Services:                                   โ”‚
โ”‚  โ”œโ”€ teamMemorySync/ โ€” Multi-Agent memory    โ”‚
โ”‚  โ”‚                    sync                   โ”‚
โ”‚  โ”œโ”€ AgentSummary/   โ€” Agent state summaries โ”‚
โ”‚  โ””โ”€ swarm/          โ€” Swarm permission      โ”‚
โ”‚                       polling                โ”‚
โ”‚                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

11.6 Quantitative Analysis: Sub-Agent Isolation Overhead and Benefits

Metric Without Sub-Agents (Single Agent) With Sub-Agents (Isolated)
Context pollution risk High (all exploration noise remains in history) Low (sub-Agent history discarded)
Token consumption O(exploration tokens + work tokens) O(summary tokens + work tokens)
Exploring 10 files scenario ~50K tokens (all retained in context) ~2K tokens (only summary returned)
Token savings rate โ€” ~96% (in the above example)
Startup overhead 0 ~5-10K tokens (system prompt duplication)
MCP readiness wait 0 0-30s (initial connection)

Source Code Annotation: Regarding MCP readiness waiting, the source code comments: "Avoids a race condition where the agent is invoked before MCP servers finish connecting. Early exit if any required server has already failed โ€” no point waiting if the check will fail anyway." The 30-second timeout and 500ms polling interval are empirical values balancing wait cost and connection reliability.

Regarding Fork Agent cache optimization, the comments note: "Fork path: child inherits the PARENT's system prompt (not FORK_AGENT's) for CACHE-IDENTICAL API request prefixes." This means Fork sub-Agents reuse the parent's prompt cache, and the first API call does not need to rebuild the cache โ€” saving approximately 30-46K tokens of cache creation cost.

Sub-Agent Token Savings Figure 11-2: Token savings from sub-Agent context isolation โ€” In the "explore 10 files" scenario, operating without sub-Agents requires 50K tokens (all retained in context), while using sub-Agents requires only 2K tokens (summary only), a 96% saving. These savings compound significantly over long conversations.

11.7 Task System

// src/Task.ts โ€” ไปปๅŠก็ฑปๅž‹
type TaskType =
  | 'local_bash'          // Bash ๅ‘ฝไปคๆ‰ง่กŒ
  | 'local_agent'         // ๆœฌๅœฐ Agent
  | 'remote_agent'        // ่ฟœ็จ‹ Agent๏ผˆCCR๏ผ‰
  | 'in_process_teammate' // ่ฟ›็จ‹ๅ†…้˜Ÿๅ‹๏ผˆassistant ๆจกๅผ๏ผ‰
  | 'local_workflow'      // ๅทฅไฝœๆต่„šๆœฌ
  | 'monitor_mcp'         // MCP ็›‘ๆŽง
  | 'dream';              // Dream ๆจกๅผไปปๅŠก

interface TaskState {
  id: string;             // ๅ‰็ผ€ + 8 ไฝ้šๆœบๅญ—็ฌฆ๏ผˆbase36๏ผ‰
  type: TaskType;
  status: 'pending' | 'running' | 'completed' | 'failed' | 'killed';
  description: string;
  toolUseId?: string;
  startTime: number;
  endTime?: number;
  totalPausedMs: number;
  outputFile: string;     // ็ฃ็›˜ไธŠ็š„่พ“ๅ‡บๆ–‡ไปถ
  outputOffset: number;
  notified: boolean;
}

11.7 Worktree Isolation

Git Worktree Isolation Mode:
โ”œโ”€ Each sub-Agent works in an independent git worktree
โ”œโ”€ Prevents file conflicts (multiple Agents editing the same file simultaneously)
โ”œโ”€ Tools: EnterWorktreeTool / ExitWorktreeTool
โ”œโ”€ Upon completion:
โ”‚   โ”œโ”€ If changes exist: retain worktree, return path and branch
โ”‚   โ””โ”€ If no changes: automatically clean up worktree
โ””โ”€ Sandbox integration: worktree path added to sandbox allowWrite

Chapter 12: Skills & Plugins โ€” The Extension Ecosystem

Tools are atomic operations (read a file, run a command), while Skills are workflows ("do a code review for me," "deploy to staging"). If Tools are LEGO bricks, Skills are pre-built models โ€” you can use them as-is or disassemble and recombine them.

Claude Code's Skills system is what transforms it from a "coding tool" into a "workflow platform." A single .md file can define a new workflow โ€” no TypeScript required, no recompilation needed.

12.1 Skills System

Skills are reusable workflow definitions, analogous to "advanced macros."

Built-in Skills

// src/skills/bundled/index.ts
function initBundledSkills(): void {
  registerUpdateConfigSkill();   // /update-config
  registerKeybindingsSkill();    // /keybindings-help
  registerDebugSkill();          // /debug
  registerSimplifySkill();       // /simplify
  registerBatchSkill();          // /batch

  // ็‰นๆ€ง้—จๆŽง Skills
  if (feature('AGENT_TRIGGERS_REMOTE')) {
    registerScheduleSkill();     // /schedule
  }
  if (feature('AGENT_TRIGGERS')) {
    registerLoopSkill();         // /loop
  }
  if (feature('BUILDING_CLAUDE_APPS')) {
    registerClaudeApiSkill();    // /claude-api
  }
}

Custom Skills

<!-- ~/.claude/skills/my-deploy.md -->
---
name: deploy
description: Deploy the application to staging
args: environment
---

# Deploy Skill

When invoked with /deploy <environment>:

1. Run the test suite: `npm test`
2. Build the application: `npm run build`
3. Deploy to the specified environment:
   - staging: `aws deploy --env staging`
   - production: `aws deploy --env production` (requires confirmation)
4. Verify deployment health check
5. Report results

Skill Loading

// src/skills/loadSkillsDir.ts
function loadSkillsDir(dir: string): SkillDefinition[] {
  // 1. ๆ‰ซๆ ~/.claude/skills/ ็›ฎๅฝ•
  // 2. ่ฏปๅ–ๆฏไธช .md ๆ–‡ไปถ
  // 3. ่งฃๆž frontmatter๏ผˆname, description, args๏ผ‰
  // 4. ๅˆ›ๅปบ SkillDefinition
  // 5. ๆณจๅ†Œไธบๅฏ็”จๅ‘ฝไปค
}

12.2 In-Depth Analysis: Skill Loading Pipeline

// src/skills/loadSkillsDir.ts โ€” Skill ๅ‰็ฝฎๆ•ฐๆฎ่งฃๆž
// 25+ ไธช frontmatter ๅญ—ๆฎต๏ผš

// ่บซไปฝ: name, description, version, when_to_use
// ๆ‰ง่กŒ: model, disable-model-invocation, user-invocable, context('fork')
// ๅทฅๅ…ท: allowed-tools, disallowed-tools
// ๅ‚ๆ•ฐ: arguments (ๆ•ฐ็ป„/้€—ๅทๅˆ†้š”), argument-hint
// Hooks: ้€š่ฟ‡ HooksSchema() ้ชŒ่ฏ
// ่ทฏๅพ„: parseSkillPaths() (ไธŽ CLAUDE.md ่ง„ๅˆ™็›ธๅŒๆ ผๅผ)

// ไธค็ง็›ฎๅฝ•ๆ ผๅผ๏ผš
// ๆ–ฐๆ ผๅผ: /skills/skill-name/SKILL.md (ๆฏไธช skill ไธ€ไธช็›ฎๅฝ•)
// ๆ—งๆ ผๅผ: /commands/namespace/file.md (ๆ‰ๅนณ markdown)
// ๆ—งๆ ผๅผ็š„ๅ‘ฝๅ็ฉบ้—ด: /commands/auth/login/SKILL.md โ†’ "auth:login"

// ๅฎ‰ๅ…จ้™ๅˆถ๏ผš
// MCP skills (่ฟœ็จ‹/ไธๅฏไฟก) ็ฆๆญขๆ‰ง่กŒๅ†…่” Shell ๅ‘ฝไปค
// ๆœฌๅœฐ skills ๅฏไปฅ็”จ `!command` ่ฏญๆณ•ๆ‰ง่กŒ Shell

Skill's context: 'fork' mode: When a skill specifies context: 'fork', it executes in a forked sub-Agent with its own independent message history and context window. This means the Skill's execution does not pollute the main conversation's context โ€” the skill returns only result text upon completion. This is Context Engineering as applied within the Skill system.

12.3 Built-in Skills Registry

Always loaded (11):
  updateConfig, keybindings, verify, debug, loremIpsum,
  skillify, remember, simplify, batch, stuck

Feature-gated (7):
  dream         โ† KAIROS/KAIROS_DREAM (background memory consolidation)
  hunter        โ† REVIEW_ARTIFACT
  loop          โ† AGENT_TRIGGERS (loop execution)
  schedule      โ† AGENT_TRIGGERS_REMOTE (remote scheduling)
  claudeApi     โ† BUILDING_CLAUDE_APPS
  claudeInChromeโ† auto-detected
  skillGeneratorโ† RUN_SKILL_GENERATOR

12.4 Plugin System

src/plugins/          โ€” Plugin system core
src/services/plugins/ โ€” Plugin loading, version management, marketplace
src/utils/plugins/    โ€” Plugin utilities, caching

Plugin features:
โ”œโ”€ Version management (SemVer)
โ”œโ”€ Cache system (reduces redundant loading)
โ”œโ”€ Marketplace integration
โ”œโ”€ Plugin-level Hook injection
โ”œโ”€ Independent data directory (${CLAUDE_PLUGIN_DATA})
โ””โ”€ Configuration isolation (${user_config.X})

Chapter 13: Building Your Own Harness โ€” A Practical Guide

The preceding 12 chapters have dissected Claude Code as a "reference implementation." Now it is time to get hands-on.

Do not attempt to build a 500,000-line Harness in one go โ€” that is the result of dozens of Anthropic engineers working over several years. A good Harness grows organically; it is not designed top-down. Start with the minimum viable configuration, adding constraints and capabilities incrementally as you encounter real problems.

The three levels below are not "pick one" โ€” they form a progressive path. Spend 1 hour building a Level 1, use it for a few weeks. Then, based on real pain points, upgrade to Level 2 and use it for several months. Only consider Level 3 when your organization requires it.

Harness Maturity Radar Figure 13-1: Three-tier Harness maturity radar chart โ€” Level 1 (green) has basic capabilities in Context Management and Permission Control but lacks Multi-Agent, MCP, and enterprise MDM. Level 3 (purple) approaches full marks across all 8 dimensions. Note that Level 2 (yellow) is the "sweet spot" for most teams โ€” moderate investment with broad coverage.

13.1 Level 1: Personal Harness (1-2 Hours)

Step 1: Create CLAUDE.md

# CLAUDE.md

## ้กน็›ฎๆžถๆž„
- src/: ๆบไปฃ็ 
- tests/: ๆต‹่ฏ•ๆ–‡ไปถ
- docs/: ๆ–‡ๆกฃ

## ๅผ€ๅ‘่ง„่Œƒ
- ่ฏญ่จ€: TypeScript strict mode
- ๆต‹่ฏ•: vitest
- ๆ ผๅผๅŒ–: prettier
- ๆไบค: conventional commits

## ้‡่ฆ็บฆๆŸ
- ไธ่ฆไฟฎๆ”น migrations/ ็›ฎๅฝ•๏ผˆๅทฒ้ƒจ็ฝฒ็š„่ฟ็งปไธๅฏๅ˜๏ผ‰
- ๆ‰€ๆœ‰ API ็ซฏ็‚นๅฟ…้กปๆœ‰่ฎค่ฏไธญ้—ดไปถ
- ไธ่ฆไฝฟ็”จ any ็ฑปๅž‹

Step 2: Configure Basic Permissions

// .claude/settings.json
{
  "permissions": {
    "allow": [
      "Read",
      "Glob",
      "Grep",
      "Bash(npm test *)",
      "Bash(npm run *)",
      "Bash(git status)",
      "Bash(git diff *)",
      "Bash(git log *)"
    ],
    "deny": [
      "Bash(sudo *)",
      "Bash(rm -rf *)",
      "Bash(git push --force *)",
      "Bash(git reset --hard *)"
    ]
  }
}

Step 3: Add Basic Hooks

// .claude/settings.json๏ผˆhooks ้ƒจๅˆ†๏ผ‰
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "echo 'Writing file: '$(echo $HOOK_INPUT | jq -r '.tool_input.file_path')",
            "if": "Write(*.ts)"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint -- --quiet 2>/dev/null || true",
            "if": "Bash(npm *)",
            "async": true
          }
        ]
      }
    ]
  }
}

13.2 Level 2: Team Harness (1-2 Days)

Step 1: Shared Settings

// .claude/settings.json๏ผˆๆไบคๅˆฐ git๏ผ‰
{
  "permissions": {
    "allow": [
      "Bash(npm test *)",
      "Bash(npm run lint *)",
      "Bash(git *)"
    ],
    "deny": [
      "Bash(sudo *)",
      "Bash(rm -rf /)",
      "Bash(npm publish *)",
      "Write(*.env*)",
      "Write(*.key)",
      "Write(*.pem)"
    ],
    "defaultMode": "default"
  },
  "sandbox": {
    "enabled": true,
    "fsWrite": {
      "deny": [".env", ".env.*", "*.key", "*.pem", "secrets/"]
    },
    "network": {
      "allowedDomains": [
        "*.github.com",
        "registry.npmjs.org",
        "api.anthropic.com"
      ]
    }
  }
}

Step 2: Define Team Agents

<!-- .claude/agents/architect.md -->
---
name: architect
description: Reviews architecture decisions and suggests improvements
tools: [Read, Grep, Glob]
model: claude-opus-4-6
maxTurns: 30
---

You are an architecture reviewer. Analyze the codebase and provide:
1. Dependency analysis
2. Coupling/cohesion assessment
3. SOLID principle compliance
4. Suggestions for improvement

Focus on high-level design, not line-by-line code review.
<!-- .claude/agents/test-writer.md -->
---
name: test-writer
description: Writes comprehensive tests for existing code
tools: [Read, Write, Edit, Bash, Glob, Grep]
model: claude-sonnet-4-6
maxTurns: 100
---

You are a test engineer. For any given code:
1. Analyze the code and identify test cases
2. Write comprehensive tests (unit + integration)
3. Run the tests to verify they pass
4. Ensure >80% coverage for touched files

Step 3: Configure MCP Servers

// .claude/mcp.json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": { "DATABASE_URL": "${DATABASE_URL}" }
    }
  }
}

Step 4: Create Team Skills

<!-- .claude/skills/review-pr.md -->
---
name: review-pr
description: Comprehensive PR review workflow
args: pr_number
---

# PR Review Workflow

1. Fetch the PR diff: `gh pr diff $ARGUMENTS`
2. Read all changed files
3. For each file:
   - Check for security issues
   - Check for performance concerns
   - Check for test coverage
   - Check for style consistency
4. Generate a structured review comment
5. Post the review: `gh pr review $ARGUMENTS --comment --body "..."`

13.3 Level 3: Organization-Level Harness (1-2 Weeks)

Step 1: Enterprise MDM Settings

// /managed/managed-settings.json
{
  "permissions": {
    "deny": [
      "Bash(sudo *)",
      "Bash(curl * | bash)",
      "Bash(wget * | bash)",
      "Write(*.env*)",
      "Write(*.pem)",
      "Write(*.key)"
    ],
    "disableBypassPermissionsMode": "disable"
  },
  "sandbox": {
    "enabled": true,
    "lockedByPolicy": true,
    "network": {
      "deniedDomains": ["*.malware.com", "*.phishing.net"]
    }
  },
  "deniedMcpServers": ["*-untrusted", "*-experimental"]
}

Step 2: Audit Hooks

// /managed/managed-settings.d/audit.json
{
  "hooks": {
    "PostToolUse": [
      {
        "hooks": [
          {
            "type": "http",
            "url": "https://audit.company.com/api/agent-actions",
            "headers": {
              "Authorization": "Bearer $AUDIT_API_KEY",
              "Content-Type": "application/json"
            },
            "allowedEnvVars": ["AUDIT_API_KEY"],
            "async": true
          }
        ]
      }
    ],
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "echo '{\"session_started\": true}' | curl -s -X POST -d @- https://audit.company.com/api/sessions",
            "async": true
          }
        ]
      }
    ]
  }
}

Step 3: Implement the Plan-Work-Review Cycle

Reference the 13 guard rules from the claude-code-harness project:

R01: Block sudo commands
R02: Forbid writing to .git/, .env, SSH keys
R03: Forbid Shell writing to protected files
R04: Writing outside project root requires confirmation
R05: rm -rf requires confirmation
R06: Forbid git push --force
R07-R09: Mode-specific guards (work/codex/breezing)
R10: Forbid --no-verify, --no-gpg-sign
R11: Forbid git reset --hard main/master
R12: Warn on direct push to main/master
R13: Warn on editing protected files

Implemented as a PreToolUse Hook:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "node guard-rules.js",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Chapter 14: Advanced Patterns and Design Philosophy

This is the final content chapter and arguably the most important. The preceding chapters covered "how Claude Code does things." This chapter addresses "why it does things this way."

Tools and frameworks become obsolete, but design philosophies do not. If you retain only one chapter from this book, it should be this one โ€” these principles apply to any Agent Harness, not just Claude Code.

14.1 Ten Design Philosophies of Claude Code

1. Async Generator Streaming Architecture

ไธๆ˜ฏ: function query() โ†’ Promise<FinalResult>
่€Œๆ˜ฏ: async function* query() โ†’ AsyncGenerator<StreamEvent>

ไธบไป€ไนˆ: ๅฎขๆˆท็ซฏๅฏไปฅๅœจๆจกๅž‹ๆ€่€ƒๆ—ถๅฐฑๅผ€ๅง‹ๆธฒๆŸ“
         ็”จๆˆทๅฏไปฅ้šๆ—ถไธญๆ–ญ
         ่ฟ›ๅบฆๅฏน็”จๆˆทๅฏ่ง

2. State Machine via Continue Sites

ไธๆ˜ฏ: ๆ˜พๅผ็Šถๆ€ๆžšไธพ + switch/case
่€Œๆ˜ฏ: while(true) + 7 ไธช continue ็ซ™็‚น

ไธบไป€ไนˆ: ๆขๅค่ทฏๅพ„ๆ˜ฏ่‡ช็„ถ็š„๏ผˆcontinue = ้‡่ฏ•๏ผ‰
         ็Šถๆ€่ฝฌๆขๆ˜ฏๅฑ€้ƒจ็š„๏ผˆๅช้œ€ๆ›ดๆ–ฐ State ๅฏน่ฑก๏ผ‰
         ๆ–ฐๆขๅค่ทฏๅพ„ๅฏไปฅๆทปๅŠ ่€Œไธ้‡ๆž„

3. Compile-Time Feature Gating

ไธๆ˜ฏ: if (config.feature_x) { ... }
่€Œๆ˜ฏ: if (feature('FEATURE_X')) { ... }  // bun:bundle ็ผ–่ฏ‘ๆ—ถๆฑ‚ๅ€ผ

ไธบไป€ไนˆ: ๅค–้ƒจๅ‘่กŒ็‰ˆไธๅŒ…ๅซไปปไฝ•ๅ†…้ƒจ็‰นๆ€งไปฃ็ ๏ผˆ่ฟžๅญ—็ฌฆไธฒ้ƒฝๆฒกๆœ‰๏ผ‰
         ไธๅญ˜ๅœจ่ฟ่กŒๆ—ถๅˆ†ๆ”ฏ้ข„ๆต‹ๅผ€้”€
         ไปฃ็ ๅคงๅฐๆœ€ๅฐๅŒ–

4. Cache Prefix Stability

ไธๆ˜ฏ: ๅฐ†ๅ†…็ฝฎๅทฅๅ…ทๅ’Œ MCP ๅทฅๅ…ทๆททๅˆๆŽ’ๅบ
่€Œๆ˜ฏ: ๅ†…็ฝฎๅทฅๅ…ทๆŽ’ๅบๅŽไฝœไธบ็จณๅฎšๅ‰็ผ€๏ผŒMCP ๅทฅๅ…ทๆŽ’ๅบๅŽ่ฟฝๅŠ 

ไธบไป€ไนˆ: Anthropic API ็š„ prompt cache ๅŸบไบŽๅ‰็ผ€ๅŒน้…
         MCP ๅทฅๅ…ทๅ˜ๅŒ–ๆ—ถ๏ผŒๅ†…็ฝฎๅทฅๅ…ทๅ‰็ผ€ไธๅ˜ โ†’ ็ผ“ๅญ˜ไธๅคฑๆ•ˆ
         ๅคงๅน…้™ไฝŽ API ๆˆๆœฌ

5. Defense in Depth

Layer 1: CLAUDE.md๏ผˆๆŒ‡ๅฏผๆ€ง็บฆๆŸ๏ผ‰
Layer 2: Permission Rules๏ผˆๅฃฐๆ˜Žๆ€ง็บฆๆŸ๏ผ‰
Layer 3: Hooks๏ผˆๅฏ็ผ–็จ‹็บฆๆŸ๏ผ‰
Layer 4: YOLO Classifier๏ผˆAI ็บฆๆŸ๏ผ‰
Layer 5: Sandbox๏ผˆ็ณป็ปŸ็บง็บฆๆŸ๏ผ‰
Layer 6: Hardcoded Denials๏ผˆไธๅฏ่ฆ†็›–็บฆๆŸ๏ผ‰

ไธบไป€ไนˆ: ๆฏไธ€ๅฑ‚้ƒฝๅฏ่ƒฝ่ขซ็ป•่ฟ‡
         ๅคšๅฑ‚ๅ ๅŠ ไฝฟ็ป•่ฟ‡ๆฆ‚็އๆŒ‡ๆ•ฐไธ‹้™
         ๆœ€ๅ†…ๅฑ‚๏ผˆ็กฌ็ผ–็ ๏ผ‰ๆ— ๆณ•้€š่ฟ‡้…็ฝฎ็ฆ็”จ

6. Data-Driven Extensibility

ไธๆ˜ฏ: ไฟฎๆ”นๆบ็ ๆทปๅŠ ๆ–ฐๅŠŸ่ƒฝ
่€Œๆ˜ฏ: settings.json + agents/*.md + skills/*.md + hooks

ไธบไป€ไนˆ: ้žๅทฅ็จ‹ๅธˆไนŸ่ƒฝๅฎšๅˆถ Harness
         ๅฎšๅˆถไธŽๆ ธๅฟƒไปฃ็ ่งฃ่€ฆ
         ๅ‡็บงๆ—ถไธไธขๅคฑๅฎšๅˆถ

7. Context as a Scarce Resource

่ฎพ่ฎก: ๅทฅๅ…ทๅปถ่ฟŸๅŠ ่ฝฝใ€่ฎฐๅฟ†ๆŒ‰้œ€้™„ๅŠ ใ€ๅ››็บงๅŽ‹็ผฉ็ฎก้“
       ToolSearch ๅ‘็Žฐๆœบๅˆถใ€Microcompact ่€ๅŒ–็ญ–็•ฅ

ไธบไป€ไนˆ: ไธŠไธ‹ๆ–‡็ช—ๅฃๆœ‰้™๏ผˆๅณไฝฟ 1M token๏ผ‰
         ๆ— ๅ…ณไฟกๆฏ้™ไฝŽๆจกๅž‹ๆ€ง่ƒฝ
         ๆˆๆœฌไธŽ token ไฝฟ็”จ้‡ๆˆๆญฃๆฏ”

8. Hierarchical Configuration Overrides

7 ็บง่ฎพ็ฝฎ: CLI > Flag > Policy > Managed > Local > Project > User

ไธบไป€ไนˆ: ไธๅŒๅฑ‚็บงๆœ‰ไธๅŒ็š„ไฟกไปป็บงๅˆซ
         ไผไธš็ฎก็†ๅ‘˜ๅฏไปฅๅผบๅˆถ็ญ–็•ฅ
         ้กน็›ฎ็ปดๆŠค่€…ๅฏไปฅ่ฎพๅฎšๅˆ็†้ป˜่ฎค
         ็”จๆˆทๅฏไปฅไธชไบบๅพฎ่ฐƒ

9. Isolated Sub-Agent Context

่ฎพ่ฎก: ๅญ Agent ไปŽ็ฉบ็™ฝๆถˆๆฏๅˆ—่กจๅผ€ๅง‹
       ๅฎŒๆˆๅŽๅช่ฟ”ๅ›žๆ‘˜่ฆ
       ็ˆถ็บงไธŠไธ‹ๆ–‡ไธ่ขซๆฑกๆŸ“

ไธบไป€ไนˆ: ๆŽข็ดขๆ€งไปปๅŠกๅฏ่ƒฝไบง็”Ÿๅคง้‡ๅ™ช้Ÿณ
         ๅญ Agent ็š„ๅคฑ่ดฅไธๅบ”ๅฝฑๅ“็ˆถ็บง
         Token ้ข„็ฎ—้š”็ฆป

10. Reversibility-First

่ฎพ่ฎก: ๆ–‡ไปถ็ผ–่พ‘้€š่ฟ‡ Edit๏ผˆๆ›ฟๆขๅญ—็ฌฆไธฒ๏ผ‰๏ผŒไธๆ˜ฏ Write๏ผˆ่ฆ†็›–๏ผ‰
       ๆฏไธชๅทฅๅ…ท่ฐƒ็”จๆœ‰ undo ่ƒฝๅŠ›
       ่‡ชๅŠจๅฟซ็…ง๏ผˆๆ–‡ไปถๅކๅฒ๏ผ‰

ไธบไป€ไนˆ: Agent ไผš็Šฏ้”™
         ็”จๆˆท้œ€่ฆ่ฝปๆพๅ›žๆปš
         "ๅ…ˆ่กŒๅŠจๅŽๅฎกๆŸฅ"ๆฏ”"ๅ…ˆๅฎกๆŸฅๅŽ่กŒๅŠจ"ๆ›ด้ซ˜ๆ•ˆ

14.2 Entropy Management Patterns

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Entropy Management Patterns           โ”‚
โ”‚                                               โ”‚
โ”‚  1. Periodic Cleanup Agent                    โ”‚
โ”‚     /loop 1h "Check for dead code, unused     โ”‚
โ”‚               imports, expired TODOs,          โ”‚
โ”‚               inconsistent naming"             โ”‚
โ”‚                                               โ”‚
โ”‚  2. Documentation Validation Agent            โ”‚
โ”‚     SessionEnd Hook โ†’ Check whether CLAUDE.md โ”‚
โ”‚     is consistent with actual code state      โ”‚
โ”‚                                               โ”‚
โ”‚  3. Dependency Audit Agent                    โ”‚
โ”‚     /schedule "Every Monday 0:00" โ†’           โ”‚
โ”‚     "Check for outdated dependencies,         โ”‚
โ”‚      security vulnerabilities, license        โ”‚
โ”‚      compliance"                              โ”‚
โ”‚                                               โ”‚
โ”‚  4. Test Coverage Guard                       โ”‚
โ”‚     PostToolUse Hook (Write *.ts) โ†’           โ”‚
โ”‚     "Run coverage check; warn if decreased"   โ”‚
โ”‚                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

14.3 Bridge System โ€” IDE Integration

src/bridge/
โ”œโ”€โ”€ bridgeMain.ts             โ€” Main loop
โ”œโ”€โ”€ bridgeMessaging.ts        โ€” Messaging protocol
โ”œโ”€โ”€ bridgePermissionCallbacks.ts โ€” Permission handling
โ”œโ”€โ”€ replBridge.ts             โ€” REPL bridge
โ”œโ”€โ”€ jwtUtils.ts               โ€” JWT authentication
โ””โ”€โ”€ sessionRunner.ts          โ€” Session execution

Supports:
โ”œโ”€โ”€ VS Code extension
โ”œโ”€โ”€ JetBrains extension
โ”œโ”€โ”€ Bidirectional communication (IDE โ†” Claude Code)
โ”œโ”€โ”€ Permission synchronization
โ””โ”€โ”€ Session state synchronization

14.4 Plan Mode V2

5-Phase Workflow:

Phase 1: Interview
โ”œโ”€ Gather requirements
โ”œโ”€ Understand context
โ””โ”€ Launch up to 3 Explore Agents (in parallel)

Phase 2: Design
โ”œโ”€ Launch Plan Agent
โ”œโ”€ Generate implementation plan
โ””โ”€ May launch multiple to explore different directions

Phase 3: Review
โ”œโ”€ Read key files to verify the plan
โ”œโ”€ Ensure alignment with user intent
โ””โ”€ Clarify via AskUserQuestion

Phase 4: Final Plan
โ”œโ”€ Write to ~/.claude/plans/<name>.md
โ”œโ”€ Include context, steps, file paths
โ””โ”€ Verification section describes how to test

Phase 5: Exit Plan Mode
โ”œโ”€ Invoke ExitPlanMode
โ”œโ”€ Wait for user approval
โ””โ”€ Begin execution upon approval

14.5 Remote Triggers and Cron Scheduling

// ็‰นๆ€ง้—จๆŽง: AGENT_TRIGGERS + AGENT_TRIGGERS_REMOTE

// ScheduleCronTool โ€” ๅˆ›ๅปบๅฎšๆ—ถ Agent
{
  schedule: "0 9 * * MON",   // ๆฏๅ‘จไธ€ๆ—ฉ 9 ็‚น
  prompt: "Review open PRs and post summary to Slack",
  model: "claude-sonnet-4-6",
}

// RemoteTriggerTool โ€” ่ฟœ็จ‹่งฆๅ‘
{
  trigger: "deploy-check",
  prompt: "Verify deployment health",
}

14.6 Voice Integration

src/voice/                   โ€” Voice input
src/services/voice.ts        โ€” Voice/STT integration
src/services/voiceStreamSTT.ts โ€” Streaming speech-to-text

Feature gate: VOICE_MODE
Supports:
โ”œโ”€โ”€ Voice input transcription
โ”œโ”€โ”€ Streaming processing (real-time recognition)
โ””โ”€โ”€ Integration into the main query loop

14.7 The "Rippable Harness" Principle

Core idea: The Harness should simplify as model capabilities improve

Examples:
โ”œโ”€โ”€ Today: Need 13 guard rules to prevent dangerous operations
โ”‚   โ””โ”€โ”€ Future, better models may not need half of them
โ”‚
โ”œโ”€โ”€ Today: Four-level compaction pipeline for limited context
โ”‚   โ””โ”€โ”€ Future, larger context windows may require only one level
โ”‚
โ”œโ”€โ”€ Today: YOLO classifier needs two-stage checking
โ”‚   โ””โ”€โ”€ Future, primary models may have better built-in safety awareness
โ”‚
โ””โ”€โ”€ Design implications:
    โ”œโ”€โ”€ Each Harness component should be independently disableable
    โ”œโ”€โ”€ Settings overrides preferred over hardcoding
    โ”œโ”€โ”€ Simplify rather than stack as models improve
    โ””โ”€โ”€ "The best Harness is the one you eventually don't need"

Appendix C: Complete ToolUseContext Type Definition

ToolUseContext is the "request context" that threads through the entire execution pipeline โ€” understanding it means understanding the Harness's information flow:

// src/Tool.ts โ€” ็œŸๅฎžไปฃ็ ๏ผˆ็ฒพ็ฎ€ๆณจ้‡Š็‰ˆ๏ผ‰
export type ToolUseContext = {
  // ===== ้…็ฝฎ๏ผˆๅช่ฏป๏ผ‰=====
  options: {
    commands: Command[]           // ๅฏ็”จ็š„ Slash ๅ‘ฝไปค
    debug: boolean                // ่ฐƒ่ฏ•ๆจกๅผ
    mainLoopModel: string         // ไธปๅพช็Žฏไฝฟ็”จ็š„ๆจกๅž‹
    tools: Tools                  // ๅฏ็”จๅทฅๅ…ทๅˆ—่กจ
    verbose: boolean              // ่ฏฆ็ป†่พ“ๅ‡บ
    thinkingConfig: ThinkingConfig// ๆ€่€ƒๆจกๅผ้…็ฝฎ
    mcpClients: MCPServerConnection[]  // MCP ๆœๅŠกๅ™จ่ฟžๆŽฅ
    mcpResources: Record<string, ServerResource[]>  // MCP ่ต„ๆบ
    isNonInteractiveSession: boolean   // ้žไบคไบ’ๅผไผš่ฏ
    agentDefinitions: AgentDefinitionsResult  // Agent ๅฎšไน‰
    maxBudgetUsd?: number         // ๆœ€ๅคง้ข„็ฎ—๏ผˆ็พŽๅ…ƒ๏ผ‰
    customSystemPrompt?: string   // ่‡ชๅฎšไน‰็ณป็ปŸๆ็คบ
    appendSystemPrompt?: string   // ่ฟฝๅŠ ็ณป็ปŸๆ็คบ
    querySource?: QuerySource     // ๆŸฅ่ฏขๆฅๆบๆ ‡่ฏ†
    refreshTools?: () => Tools    // ๅทฅๅ…ทๅˆ—่กจๅˆทๆ–ฐๅ‡ฝๆ•ฐ
  }

  // ===== ๆŽงๅˆถ =====
  abortController: AbortController  // ๅ–ๆถˆไฟกๅท
  messages: Message[]               // ๅฝ“ๅ‰ๆถˆๆฏๅކๅฒ

  // ===== ็Šถๆ€่ฏปๅ†™ =====
  readFileState: FileStateCache     // ๆ–‡ไปถ็Šถๆ€็ผ“ๅญ˜
  getAppState(): AppState           // ่Žทๅ–ๅบ”็”จ็Šถๆ€
  setAppState(f: (prev) => AppState): void  // ๆ›ดๆ–ฐๅบ”็”จ็Šถๆ€

  // ===== ๅ›ž่ฐƒ =====
  setToolJSX?: SetToolJSXFn         // ่ฎพ็ฝฎๅทฅๅ…ท UI
  addNotification?: (n: Notification) => void  // ๆทปๅŠ ้€š็Ÿฅ
  sendOSNotification?: (opts) => void          // ๅ‘้€็ณป็ปŸ้€š็Ÿฅ
  setInProgressToolUseIDs: (f) => void         // ่ทŸ่ธช่ฟ›่กŒไธญ็š„ๅทฅๅ…ท
  setResponseLength: (f) => void               // ่ทŸ่ธชๅ“ๅบ”้•ฟๅบฆ
  updateFileHistoryState: (updater) => void    // ๆ›ดๆ–ฐๆ–‡ไปถๅކๅฒ
  updateAttributionState: (updater) => void    // ๆ›ดๆ–ฐๅฝ’ๅ› ็Šถๆ€

  // ===== Agent ็›ธๅ…ณ =====
  agentId?: AgentId                 // Agent ๆ ‡่ฏ†
  agentType?: string                // Agent ็ฑปๅž‹
  requireCanUseTool?: boolean       // ๆ˜ฏๅฆ้œ€่ฆๆƒ้™ๆฃ€ๆŸฅ

  // ===== ๅŠจๆ€ Skill/Memory =====
  nestedMemoryAttachmentTriggers?: Set<string>  // ๅตŒๅฅ—่ฎฐๅฟ†่งฆๅ‘ๅ™จ
  loadedNestedMemoryPaths?: Set<string>         // ๅทฒๅŠ ่ฝฝ็š„่ฎฐๅฟ†่ทฏๅพ„
  dynamicSkillDirTriggers?: Set<string>         // ๅŠจๆ€ Skill ็›ฎๅฝ•่งฆๅ‘ๅ™จ
  discoveredSkillNames?: Set<string>            // ๅทฒๅ‘็Žฐ็š„ Skill ๅ็งฐ

  // ===== ๆƒ้™ =====
  localDenialTracking?: DenialTrackingState     // ๆœฌๅœฐๆ‹’็ป่ทŸ่ธช
  toolDecisions?: Map<string, {...}>            // ๅทฅๅ…ทๅ†ณ็ญ–่ฎฐๅฝ•
  contentReplacementState?: ContentReplacementState  // ๅ†…ๅฎนๆ›ฟๆข็Šถๆ€

  // ===== ้ซ˜็บง =====
  requestPrompt?: (sourceName, summary?) =>     // Hook prompt ่ฏทๆฑ‚
    (request: PromptRequest) => Promise<PromptResponse>
  handleElicitation?: (serverName, params, signal) =>  // MCP elicitation
    Promise<ElicitResult>
  onCompactProgress?: (event) => void           // ๅŽ‹็ผฉ่ฟ›ๅบฆๅ›ž่ฐƒ
  criticalSystemReminder_EXPERIMENTAL?: string  // ๅ…ณ้”ฎ็ณป็ปŸๆ้†’
}

Design Analysis: ToolUseContext is a large context object, analogous to a Request object in web frameworks. It carries everything needed for tool execution โ€” configuration, state, callbacks, and permissions. Although the 50+ fields may appear complex, this design avoids: 1. Global state (each query has its own context) 2. Parameter explosion (no need to pass arguments individually) 3. Tight coupling (tools use only the fields they need)


Appendix D: 13 Guard Rules Reference Model

The claude-code-harness project defines 13 declarative guard rules, implemented as PreToolUse Hooks:

// ๆฅ่‡ช claude-code-harness/core/src/guardrails/

// R01: ้˜ปๆญข sudo โ€” ๆฐธ่ฟœไธๅ…่ฎธๆๆƒ
{ rule: 'R01', test: (input) => /^sudo\s/.test(input.command),
  action: 'deny', reason: 'sudo commands are not allowed' }

// R02: ็ฆๆญขๅ†™ๅ…ฅๆ•ๆ„Ÿ่ทฏๅพ„
{ rule: 'R02', test: (input) => SENSITIVE_PATHS.some(p => input.file_path?.startsWith(p)),
  action: 'deny', paths: ['.git/', '.env', '~/.ssh/'] }

// R03: ็ฆๆญข Shell ๅ†™ๅ…ฅๅ—ไฟๆŠคๆ–‡ไปถ
{ rule: 'R03', test: (input) => isShellWriteToProtectedFile(input),
  action: 'deny' }

// R04: ้กน็›ฎๆ น็›ฎๅฝ•ๅค–ๅ†™ๅ…ฅ้œ€็กฎ่ฎค
{ rule: 'R04', test: (input) => !isWithinProjectRoot(input.file_path),
  action: 'ask', reason: 'Writing outside project root' }

// R05: rm -rf ้œ€็กฎ่ฎค
{ rule: 'R05', test: (input) => /rm\s+(-rf|-r\s+-f|-f\s+-r)/.test(input.command),
  action: 'ask', reason: 'Recursive delete detected' }

// R06: ็ฆๆญข force push
{ rule: 'R06', test: (input) => /git\s+push\s+.*--force/.test(input.command),
  action: 'deny', reason: 'Force push is not allowed' }

// R07-R09: ๆจกๅผ็‰นๅฎš้˜ฒๆŠค
{ rule: 'R07', mode: 'work', ... }   // Work ๆจกๅผ้™ๅˆถ
{ rule: 'R08', mode: 'codex', ... }  // Codex ้›†ๆˆ้™ๅˆถ
{ rule: 'R09', mode: 'breezing', ... } // Breezing ๆจกๅผ้™ๅˆถ

// R10: ็ฆๆญข่ทณ่ฟ‡ๅฎ‰ๅ…จ้’ฉๅญ
{ rule: 'R10', test: (input) => /--no-verify|--no-gpg-sign/.test(input.command),
  action: 'deny', reason: 'Skipping safety hooks is not allowed' }

// R11: ็ฆๆญข hard reset ๅˆฐ main
{ rule: 'R11', test: (input) => /git\s+reset\s+--hard\s+(main|master)/.test(input.command),
  action: 'deny' }

// R12: ่ญฆๅ‘Š็›ดๆŽฅๆŽจ้€ๅˆฐ main
{ rule: 'R12', test: (input) => isDirectPushToMain(input),
  action: 'warn' }

// R13: ่ญฆๅ‘Š็ผ–่พ‘ๅ—ไฟๆŠคๆ–‡ไปถ
{ rule: 'R13', test: (input) => PROTECTED_FILES.includes(input.file_path),
  action: 'warn', files: ['package-lock.json', 'yarn.lock', 'Cargo.lock'] }

These rules can be fully implemented using Claude Code's native Hook system:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{
          "type": "command",
          "command": "node guard-rules.js",
          "if": "Bash(sudo *)"
        }]
      }
    ]
  }
}

Chapter 15: Building a Mini Harness from Scratch (Hands-on Lab)

The preceding 14 chapters dissected Claude Code's 512K lines of source code. Now it is time to build โ€” we will construct a 200-line Mini Harness in Python from scratch, implementing Claude Code's core design patterns.

15.1 Goal: A Minimal Harness Running in 30 Minutes

#!/usr/bin/env python3
"""mini_harness.py โ€” ไธ€ไธช 200 ่กŒ็š„ Claude Code ๅผ Agent Harness"""
import anthropic, json, os, subprocess, re
from pathlib import Path

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

# ====================================================
# Layer 1: Tool System (ๅฏนๅบ” Claude Code ็š„ src/Tool.ts)
# ====================================================
TOOLS = [
    {
        "name": "bash",
        "description": "Run a shell command. Returns stdout and exit code.",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "The command to run"}
            },
            "required": ["command"]
        }
    },
    {
        "name": "read_file",
        "description": "Read a file and return its contents.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "File path to read"}
            },
            "required": ["path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "File path to write"},
                "content": {"type": "string", "description": "Content to write"}
            },
            "required": ["path", "content"]
        }
    }
]

# ====================================================
# Layer 2: Permission Rules (ๅฏนๅบ” Claude Code ็š„ permissions.ts)
# ====================================================
DENY_PATTERNS = [
    r"^sudo\s",           # R01: ็ฆๆญข sudo
    r"rm\s+-rf\s+/",      # R05: ็ฆๆญข rm -rf /
    r"git\s+push\s+--force", # R06: ็ฆๆญข force push
]

ALLOW_TOOLS = {"read_file"}  # ๅช่ฏปๅทฅๅ…ทๅง‹็ปˆๅ…่ฎธ

def check_permission(tool_name: str, tool_input: dict) -> tuple[bool, str]:
    """Layer 2: ๆƒ้™ๆฃ€ๆŸฅ (็ฎ€ๅŒ–็‰ˆ hasPermissionsToUseTool)"""
    if tool_name in ALLOW_TOOLS:
        return True, "allowed by rule"

    if tool_name == "bash":
        cmd = tool_input.get("command", "")
        for pattern in DENY_PATTERNS:
            if re.search(pattern, cmd):
                return False, f"denied: matches {pattern}"

    # ้ป˜่ฎค๏ผš่ฏข้—ฎ็”จๆˆท๏ผˆๅฏนๅบ” Claude Code ็š„ 'ask' ๆจกๅผ๏ผ‰
    answer = input(f"  Allow {tool_name}({json.dumps(tool_input)[:80]})? [y/n] ")
    return answer.lower() == 'y', "user decision"

# ====================================================
# Layer 3: Tool Execution (ๅฏนๅบ” Claude Code ็š„ toolExecution.ts)
# ====================================================
def execute_tool(tool_name: str, tool_input: dict) -> str:
    """Layer 3: ๆ‰ง่กŒๅทฅๅ…ท"""
    if tool_name == "bash":
        result = subprocess.run(
            tool_input["command"], shell=True,
            capture_output=True, text=True, timeout=30
        )
        return f"exit_code={result.returncode}\n{result.stdout}{result.stderr}"
    elif tool_name == "read_file":
        return Path(tool_input["path"]).read_text()
    elif tool_name == "write_file":
        Path(tool_input["path"]).write_text(tool_input["content"])
        return f"Written to {tool_input['path']}"
    return "Unknown tool"

# ====================================================
# Layer 4: Context Engineering (ๅฏนๅบ” Claude Code ็š„ CLAUDE.md)
# ====================================================
def load_context() -> str:
    """Layer 4: ๅŠ ่ฝฝ CLAUDE.md ไธŠไธ‹ๆ–‡"""
    claude_md = Path("CLAUDE.md")
    if claude_md.exists():
        return f"\n<project_context>\n{claude_md.read_text()}\n</project_context>\n"
    return ""

SYSTEM_PROMPT = f"""You are a coding assistant. Use tools to help the user.
Be concise. Don't explain what you're about to do โ€” just do it.
{load_context()}"""

# ====================================================
# Layer 5: Agent Loop (ๅฏนๅบ” Claude Code ็š„ query.ts)
# ====================================================
def agent_loop():
    """Layer 5: ๆ ธๅฟƒ Agent ๅพช็Žฏ"""
    messages = []
    print("Mini Harness (type 'exit' to quit)")

    while True:
        user_input = input("\n> ")
        if user_input.lower() == 'exit':
            break

        messages.append({"role": "user", "content": user_input})

        # ๅ†…ๅพช็Žฏ๏ผšๅทฅๅ…ท่ฐƒ็”จ้“พ๏ผˆๅฏนๅบ” queryLoop ็š„ while(true)๏ผ‰
        while True:
            response = client.messages.create(
                model=MODEL, max_tokens=4096,
                system=SYSTEM_PROMPT,
                messages=messages, tools=TOOLS,
            )

            # ๆ”ถ้›†ๅŠฉๆ‰‹ๆถˆๆฏ
            messages.append({"role": "assistant", "content": response.content})

            # ๆ— ๅทฅๅ…ท่ฐƒ็”จ โ†’ ๆ‰“ๅฐๆ–‡ๆœฌๅนถ้€€ๅ‡บๅ†…ๅพช็Žฏ
            if response.stop_reason != "tool_use":
                for block in response.content:
                    if hasattr(block, 'text'):
                        print(f"\n{block.text}")
                break

            # ๆ‰ง่กŒๅทฅๅ…ท่ฐƒ็”จ
            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue

                # ๆƒ้™ๆฃ€ๆŸฅ๏ผˆLayer 2๏ผ‰
                allowed, reason = check_permission(block.name, block.input)
                if not allowed:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": f"Permission denied: {reason}",
                        "is_error": True,
                    })
                    print(f"  โœ— {block.name} denied: {reason}")
                    continue

                # ๆ‰ง่กŒๅทฅๅ…ท๏ผˆLayer 3๏ผ‰
                try:
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
                    print(f"  โœ“ {block.name} completed")
                except Exception as e:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": f"Error: {e}",
                        "is_error": True,
                    })

            messages.append({"role": "user", "content": tool_results})

if __name__ == "__main__":
    agent_loop()

15.2 Running Your Mini Harness

# 1. ่ฎพ็ฝฎ API key
export ANTHROPIC_API_KEY="sk-ant-..."

# 2. ๅˆ›ๅปบไธ€ไธช CLAUDE.md
echo "# Project Rules
- Use TypeScript for all new files
- Run tests after making changes: npm test" > CLAUDE.md

# 3. ่ฟ่กŒ
python mini_harness.py

# 4. ๆต‹่ฏ•ๆƒ้™็ณป็ปŸ
> run sudo rm -rf /
  โœ— bash denied: matches ^sudo\s

> read the file package.json
  โœ“ read_file completed  (่‡ชๅŠจๅ…่ฎธ๏ผŒๆ— ้œ€็กฎ่ฎค)

> run npm test
  Allow bash({"command": "npm test"})? [y/n] y
  โœ“ bash completed

15.3 Comparison with Claude Code's Design Patterns

Mini Harness (200 lines) Claude Code (512K lines) Where the Gap Lies
3 tools 43+ tools Tool count
Regex permission matching 7-level hierarchy + AI classifier Permission precision
User y/n confirmation 5 modes + Hook system Flexibility
No compaction Four-level compaction pipeline Long conversation support
No sub-Agents 5 Agent types Complex task decomposition
Static CLAUDE.md loading Memory system + dynamic injection Context management
No retry 7 continue sites Robustness

Exercises: Try adding the following features to the Mini Harness (approximately 20 lines each): 1. Simple Hook system: Run shell scripts before and after tool execution 2. Token counter: Track token consumption per API call and cumulative cost 3. Session save/restore: Serialize messages to a JSON file with --resume support 4. Sub-Agent: Add an agent tool that forks a new agent_loop() with independent messages


Chapter 16: Competitive Analysis

Claude Code is not the only AI coding Agent. Understanding competitor Harness designs helps distinguish which patterns are universal and which are unique innovations of Claude Code.

16.1 Harness Architecture Comparison of Three Major Agents

Dimension Claude Code Cursor GitHub Copilot
Runtime Terminal CLI VS Code fork VS Code extension
Interaction Mode Autonomous Agent Collaborative editor Reactive autocomplete + Agent Mode
Agent Loop while(true) + 7 continue sites Not public Not public
Tool System 43+ built-in + MCP extension Built-in editing + terminal Built-in editing + terminal
Permission Model 5 modes + 7-level rules + AI classifier Editor-level sandbox GitHub permissions
Hook System 26 events x 4 types Not public Not public
Context Management CLAUDE.md + memory + four-level compaction .cursorrules + codebase index .github/copilot-instructions.md
Multi-Agent 5 Agent types + Swarm orchestration 8 parallel Agents (worktree) Single Agent
MCP Support 6 transport protocols MCP support Limited
Open Source Visibility Source analyzable (512K LOC) Closed source Closed source
Evaluation Integration SWE-bench + Headless Profiler Not public Not public
Market Share (2026) 41% ~15% 38%

Data sources: dev.to/raxxostudios, faros.ai, tech-insider.org

16.2 Key Design Difference Analysis

Unique Innovations of Claude Code: 1. Compile-time feature gating (bun:bundle dead code elimination) โ€” no competitor has this mechanism 2. Six-layer defense-in-depth security model โ€” the deepest security layering 3. Two-stage YOLO classifier โ€” the only Agent using an independent AI to review permissions 4. Reversible tool design (Edit uses string replacement, not file overwrite) โ€” reduces destruction risk 5. Prompt cache stability sorting โ€” tool ordering optimizes cache hit rates

Unique Innovations of Cursor: 1. Codebase indexing (global semantic understanding) โ€” Claude Code relies on Grep/Glob 2. 8 parallel Agents + worktree isolation โ€” more aggressive parallelism strategy 3. Editor-native integration โ€” superior diff preview and inline editing experience

Universal Patterns (shared by all competitors): 1. Project-level configuration files (CLAUDE.md / .cursorrules / copilot-instructions.md) 2. Tool call loops (ReAct pattern) 3. Permission confirmation mechanisms 4. MCP protocol support (gradually converging)

16.3 Architectural Insights from the OpenDev Paper

arXiv:2603.05344v1 "Building AI Coding Agents for the Terminal" proposes several important architectural patterns:

  1. Scaffolding vs. Harness Separation:
  2. Scaffolding = assembly before the first prompt (system prompt compilation, tool schema generation)
  3. Harness = everything after assembly (ReAct loop, tool dispatch, secure execution)
  4. Claude Code's main.tsx (parallel prefetch) corresponds to Scaffolding; query.ts (queryLoop) corresponds to Harness

  5. Dual-Mode via Subagent Rather Than State Machine:

  6. The paper finds that state machine patterns (Plan Mode <-> Execute Mode switching) tend to get stuck
  7. Recommended approach: Plan Mode as a tool-restricted sub-Agent rather than a state switch
  8. Claude Code's Plan Mode follows exactly this design โ€” EnterPlanModeTool changes the available tool set

  9. Defense-in-Depth: Five Layers vs. Claude Code's Six:

  10. The paper proposes five layers (Prompt -> Schema -> Runtime Approval -> Tool Validation -> Lifecycle Hooks)
  11. Claude Code adds a sixth: hardcoded denials (non-configurable, unbypassable)

  12. Context Pressure as the Central Design Constraint:

  13. The paper argues that token budget drives all downstream architectural decisions
  14. Claude Code's four-level compaction pipeline, tool lazy loading, and memory prefetching all serve this end

Conclusion: From Reader to Builder

Congratulations on reading this far. If you have carefully digested the content of the preceding 14 chapters, you now possess more Harness Engineering knowledge than most AI engineers.

Three sentences to summarize the entire book:

  1. The Agent is not the product; the Harness is. Models will be replaced, but your carefully designed permission rules, Hook pipelines, and sandbox configurations โ€” those are your moat.

  2. Constraints are power, not limitations. Every precise permission rule you add enables you to confidently let the Agent do more. The six layers of defense in depth are what allow Claude Code to run dangerous commands like rm and git push in production โ€” because every layer stands guard.

  3. Build incrementally; do not over-engineer. Spend 1 hour building a Level 1 Harness with CLAUDE.md and basic permissions. Add Hooks when you hit real pain points. Configure MCP and custom Agents when team collaboration demands it. The best Harness is the one that is "just enough."

One final thought exercise: Claude Code comprises 512,664 lines of code, yet its Agent Loop core is nothing more than a single while(true) and a State object. The remaining 99.9% of the code all answers the same question โ€” "How do we make this loop run reliably in the real world?" That is the entire meaning of Harness Engineering.

Now go build your own Harness.


Appendix A: Claude Code Source File Index

Core Engine

File Size Purpose
src/main.tsx 803 KB Entry point, CLI bootstrap, parallel prefetch
src/query.ts 68 KB Core Agent loop (queryLoop)
src/QueryEngine.ts 46 KB LLM query engine, system prompt construction
src/Tool.ts 29 KB Tool base interface, 60+ methods
src/tools.ts 25 KB Tool registry, pool assembly, cache stability
src/Task.ts 3.2 KB Task type definitions
src/commands.ts 25 KB Command registration, conditional imports

Permissions and Security

File Purpose
src/utils/permissions/permissions.ts Main permission logic, decision pipeline
src/types/permissions.ts Permission type definitions (13 KB)
src/utils/permissions/permissionsLoader.ts Load rules from settings
src/utils/permissions/denialTracking.ts Denial tracking
src/utils/permissions/yoloClassifier.ts Auto mode classifier
src/utils/sandbox/sandbox-adapter.ts Sandbox adapter (985 lines)
src/tools/BashTool/shouldUseSandbox.ts Sandbox decision logic

Hooks

File Purpose
src/utils/hooks.ts Hook execution engine
src/utils/hooks/hooksConfigManager.ts Hook configuration management
src/utils/hooks/hooksSettings.ts Hook settings loading
src/schemas/hooks.ts Zod Schema definitions
src/types/hooks.ts Hook type definitions
src/services/tools/toolHooks.ts Tool-specific Hook execution

Settings and Configuration

File Purpose
src/utils/settings/settings.ts Core settings management
src/utils/settings/types.ts Settings Schema definitions (600+ lines)
src/utils/settings/settingsCache.ts In-memory cache
src/utils/settings/permissionValidation.ts Permission rule validation

MCP

File Purpose
src/services/mcp/client.ts MCP client, 6 transport types
src/services/mcp/config.ts MCP configuration loading
src/skills/mcpSkills.ts MCP Skills discovery

Agents and Tasks

File Purpose
src/tools/AgentTool/ Sub-Agent spawning
src/tools/AgentTool/forkSubagent.ts Implicit forking
src/tools/AgentTool/agentMemory.ts Agent memory
src/coordinator/ Multi-Agent orchestration
src/utils/swarm/ Swarm coordination logic

Memory

File Purpose
src/memdir/memoryScan.ts Memory scanning
src/memdir/memoryTypes.ts Memory type definitions
src/services/extractMemories/ Automatic memory extraction
src/services/teamMemorySync/ Team memory synchronization

State and UI

File Purpose
src/state/AppState.tsx Main React context (23 KB)
src/state/AppStateStore.ts Zustand-like Store (21 KB)
src/cli/print.ts Rich text printing (212 KB)
src/components/App.tsx Root component

Appendix B: Reference Resources

Open Source Projects

Project Description URL
learn-claude-code 12-lesson Harness Engineering course github.com/shareAI-lab/learn-claude-code
claude-code-harness Plan-Work-Review cycle implementation github.com/Chachamaru127/claude-code-harness
your-claude-engineer Agent Harness demo (Slack+GitHub+Linear) github.com/coleam00/your-claude-engineer
Week 1: Foundations
โ”œโ”€โ”€ Read Chapters 1-3 of this tutorial
โ”œโ”€โ”€ Build a Level 1 personal Harness (CLAUDE.md + basic permissions)
โ”œโ”€โ”€ Hands-on experiment: use Claude Code in your own project
โ””โ”€โ”€ Understand: Agent Loop, Tool Dispatch, Plan Mode

Week 2: Constraints and Security
โ”œโ”€โ”€ Read Chapters 4-7 of this tutorial
โ”œโ”€โ”€ Configure Permission Rules
โ”œโ”€โ”€ Write a PreToolUse Hook
โ””โ”€โ”€ Understand: Permission Model, Hooks, Sandbox

Week 3: Context and Memory
โ”œโ”€โ”€ Read Chapters 8-9 of this tutorial
โ”œโ”€โ”€ Optimize CLAUDE.md
โ”œโ”€โ”€ Configure Settings hierarchy
โ”œโ”€โ”€ Hands-on experiment: write a PreToolUse Hook
โ””โ”€โ”€ Understand: Context Engineering, Memory, Compaction

Week 4: Extension and Collaboration
โ”œโ”€โ”€ Read Chapters 10-12 of this tutorial
โ”œโ”€โ”€ Configure MCP servers (start with GitHub MCP)
โ”œโ”€โ”€ Create custom Agents and Skills
โ””โ”€โ”€ Understand: MCP, Sub-Agents, Skills

Week 5: Multi-Agent and Production
โ”œโ”€โ”€ Read Chapters 13-14 of this tutorial
โ”œโ”€โ”€ Build a team Harness
โ”œโ”€โ”€ Implement the Plan-Work-Review cycle
โ””โ”€โ”€ Understand: Team Protocols, Worktree Isolation, Entropy Management

Week 6+: Advanced
โ”œโ”€โ”€ Deep dive into Claude Code source code (start with query.ts)
โ”œโ”€โ”€ Build an organization-level Harness
โ””โ”€โ”€ Explore: Bridge, Voice, Remote Triggers

Key Concept Quick Reference

Concept One-Line Definition
Harness All infrastructure beyond the Agent itself
Agent Loop The core cycle of message -> LLM -> tool -> loop
Tool A standardized interface for Agent interaction with the outside world
Permission Mode A mode determining the Agent's degree of autonomy (5 types)
Permission Rule A declarative allow/deny/ask rule
Hook A programmable callback for lifecycle events (4 types)
Sandbox System-level file/network/process restrictions
CLAUDE.md A project-level persistent context file
Memory Cross-session structured memory (4 types)
Compaction Context window compression strategy (4 levels)
MCP Model Context Protocol, extending Harness capabilities
Sub-Agent A context-isolated child Agent
Skill A reusable workflow definition
Plan Mode A 5-phase planning workflow
Worktree Git worktree isolation preventing parallel conflicts
Entropy Management Combating system degradation through periodic maintenance
Feature Gate Compile-time feature toggle (bun:bundle)
Prompt Cache API caching strategy optimized by tool ordering
YOLO Classifier Two-stage AI automatic permission approval
Defense in Depth Six stacked layers of security protection

"Build rippable harnesses โ€” the best harness is the one you eventually don't need."

This tutorial is based on reverse engineering of the Claude Code source code (2026-03-31 snapshot). All referenced file paths and code patterns are derived from actual source code analysis.



References

[1] Martin Fowler. "Harness Engineering." martinfowler.com/articles/exploring-gen-ai/harness-engineering.html, 2026.

[2] OpenAI. "Harness Engineering: Leveraging Codex in an Agent-First World." openai.com/index/harness-engineering/, 2026.

[3] "Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned." arXiv:2603.05344v1, 2026.

[4] "Evaluation and Benchmarking of LLM Agents: A Survey." arXiv:2507.21504v1, 2025.

[5] Jimenez et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" ICLR, 2024.

[6] NxCode. "Harness Engineering: The Complete Guide to Building Systems That Make AI Agents Actually Work." nxcode.io, 2026.

[7] Phil Schmid. "The Importance of Agent Harness in 2026." philschmid.de/agent-harness-2026, 2026.

[8] Epsilla. "Harness Engineering: Why the Focus is Shifting from Models to Agent Control Systems." epsilla.com/blogs, 2026.

[9] EleutherAI. "lm-evaluation-harness: A Framework for Few-Shot Evaluation of Language Models." github.com/EleutherAI/lm-evaluation-harness, 2024.

[10] Harness Engineering Academy. "What is Harness Engineering? A Complete Introduction." harnessengineering.academy/blog, 2026.


Last updated: 2026-04-01


Two reverse-engineered findings discovered while writing the textbook above. Both cover infrastructure not previously documented in any public analysis.

๐Ÿ”ฌ Grove: The Hidden Training-Data Pipeline in Claude Code's Source

From a user's keyboard to BigQuery's training warehouse โ€” a complete, source-verifiable data pipeline.

While reverse-engineering Claude Code's 512K LOC TypeScript source, I found a previously unreported infrastructure for training-data collection. Not speculation, not analogy โ€” every step has a precise file path and line number. This section traces the full pipeline.

One-sentence summary

Anthropic ships a system named "Grove" in Claude Code that explicitly tells users their data will be used to "train and improve" models, extends data retention from 30 days to 5 years, ships 796 telemetry events through a Protobuf-typed dual pipeline, and lands them in privileged BigQuery columns. The same pipeline carries SWE-bench evaluation IDs โ€” meaning training-data collection and agent-capability evaluation share the same infrastructure.

Ring 1: Grove โ€” "Help Improve Claude"

In Claude Code's source, a component named Grove (src/components/grove/Grove.tsx) presents users with a toggle:

// src/components/grove/Grove.tsx

// Line 47:
<Text bold={true}>You can help improve Claude </Text>

// Line 56 โ€” the key copy:
"Allow the use of your chats and coding sessions to train and improve
 Anthropic AI models. Change anytime in your Privacy Settings."

// Line 63 โ€” data-retention notice:
"Updates to data retention โ€” To help us improve our AI models and safety
 protections, we're extending data retention to 5 years."

// Line 116 โ€” what the retention actually means:
"Turning ON the improve Claude setting extends data retention from
 30 days to 5 years. Turning it OFF keeps the default 30-day data
 retention. Delete data anytime."

This is not a hidden backdoor. It's Anthropic's official, user-facing training-data collection mechanism. What hasn't been publicly analyzed is the engineering implementation behind it.

Grove's type definitions

// src/services/api/grove.ts Lines 25-35:
export type AccountSettings = {
  grove_enabled: boolean | null       // user opted into "Help improve Claude"
  grove_notice_viewed_at: string | null
}

export type GroveConfig = {
  grove_enabled: boolean
  domain_excluded: boolean            // some orgs/domains are excluded
  notice_is_grace_period: boolean     // within the grace window
  notice_reminder_frequency: number | null
}

// API endpoints:
// GET   /api/oauth/account/settings
// POST  /api/oauth/account/grove_notice_viewed
// PATCH /api/oauth/account/settings
// GET   /api/claude_code_grove

Worth noting: the domain_excluded field means certain enterprise customers' data is auto-excluded from training. grove_enabled is the toggle โ€” flip it on, and your retention silently jumps from 30 days to 5 years.

Why 5 years? Product analytics typically retains data for 30โ€“90 days. The most plausible explanation for 5-year retention is that the data feeds training datasets โ€” and training datasets have a much longer lifecycle than ops logs.

Ring 2: 796 telemetry events ร— dual pipeline

Claude Code's telemetry runs regardless of the Grove switch โ€” Grove controls retention duration (30 days vs 5 years), not whether collection happens. The source contains 796 unique analytics event names prefixed with tengu_. Every API call, every tool execution, every permission decision, every Bash command โ€” all logged.

These events flow through a dual pipeline:

ChannelEndpointPayloadPurpose
Datadoglogs.us5.datadoghq.comRedacted (_PROTO_* stripped)Ops monitoring
1P (first-party)/api/event_logging/batchFull payload incl. privileged fieldsLands in BigQuery
// src/services/analytics/sink.ts Lines 48-71:
function logEventImpl(eventName: string, metadata: LogEventMetadata): void {
  const sampleResult = shouldSampleEvent(eventName)
  if (sampleResult === 0) return

  if (shouldTrackDatadog()) {
    // Datadog only sees redacted data
    void trackDatadogEvent(eventName, stripProtoFields(metadataWithSampleRate))
  }

  // 1P gets the full payload, including _PROTO_* privileged fields
  logEventTo1P(eventName, metadataWithSampleRate)
}
// src/services/analytics/firstPartyEventLoggingExporter.ts Lines 714-750:
// _PROTO_* keys are PII-tagged values meant only for privileged BQ
// columns. Hoist known keys to proto fields, then defensively strip any
// remaining _PROTO_* so an unrecognized future key can't silently land
// in the general-access additional_metadata blob.

That comment spells out the architecture: _PROTO_* fields are routed to privileged columns in BigQuery, physically isolated from general-access analytics data. This isn't a casual product log โ€” it's an intentionally tiered, segregated data warehouse.

Ring 3: Protobuf schema โ€” production-grade data pipeline

These events aren't ad-hoc JSON blobs. They're strictly defined by a Protobuf schema, compiled from an external monorepo:

// src/services/analytics/firstPartyEventLoggingExporter.ts:
// Adding a field? Update the monorepo proto first (go/cc-logging):
//   event_schemas/.../claude_code/v1/claude_code_internal_event.proto
// then run `bun run generate:proto` here.
//
// claude_code_internal_event.ts โ€” 865 lines of generated proto code

The generated schema defines 29 fields:

message ClaudeCodeInternalEvent {
  string event_name = 1;
  Date   client_timestamp = 2;
  string model = 3;
  string session_id = 4;
  string user_type = 5;                // "ant" vs "external"
  EnvironmentMetadata env = 7;         // platform, version, CI flags, 30+ subfields
  string additional_metadata = 13;     // event-specific data (BASE64 JSON)
  string device_id = 17;

  // โฌ‡๏ธ These three are the surprising part
  string swe_bench_run_id = 18;
  string swe_bench_instance_id = 19;
  string swe_bench_task_id = 20;

  // Swarm/team agent attribution
  string agent_id = 22;
  string parent_session_id = 23;
  string agent_type = 24;

  // PII privileged columns
  string skill_name = 27;
  string plugin_name = 28;
  string marketplace_name = 29;
}

Three things this tells us:

  1. Not a temporary analytics hack โ€” Protobuf + external monorepo (go/cc-logging) + compile-time enforcement = production-grade data-pipeline infrastructure.
  2. SWE-bench fields are embedded in every event โ€” not a separate eval system; they share the same pipeline as user data.
  3. Dedicated PII privileged columns โ€” skill_name, plugin_name are isolated, indicating tiered data management.

Ring 4: SWE-bench โ€” evals and training data share the pipeline

This is the most surprising finding. The three SWE-bench IDs aren't off in some standalone eval script โ€” they're embedded in the Proto schema of every analytics event.

// src/services/analytics/metadata.ts Lines 722-724:
sweBenchRunId:      process.env.SWE_BENCH_RUN_ID || '',
sweBenchInstanceId: process.env.SWE_BENCH_INSTANCE_ID || '',
sweBenchTaskId:     process.env.SWE_BENCH_TASK_ID || '',

// Lines 912-920 โ€” written into the Proto event:
if (coreFields.sweBenchRunId) {
  core.swe_bench_run_id = coreFields.sweBenchRunId
}

What this means: when Anthropic runs SWE-bench internally, all 796 tengu_* events get tagged with swe_bench_run_id + swe_bench_instance_id + swe_bench_task_id. In BigQuery, they can do full agent-trajectory analysis per SWE-bench problem โ€” which tools fired, how many tokens each API call cost, what the permission classifier decided, whether each command succeeded.

Implication: Claude Code is not just a product. It's also Anthropic's agent-capability evaluation platform. Eval data and user-interaction data share the same Protobuf schema, the same pipeline, and the same BigQuery warehouse.

Combined with the full evaluation harness CLI surface:

// src/main.tsx โ€” eval-related CLI flags:
'-p, --print'              // non-interactive mode, exit after output
'--output-format <format>' // "text" | "json" | "stream-json"
'--max-turns <turns>'      // max agent turns
'--max-budget-usd <amount>'// budget limit
'--json-schema <schema>'   // structured output validation

This is a complete evaluation loop: env vars tag the task โ†’ --print mode runs non-interactively โ†’ streaming NDJSON output โ†’ exit code signals success/failure โ†’ all telemetry lands in BigQuery.

Ring 5: Developer comments โ€” "training data"

Two internal developer comments are the most direct evidence:

// src/utils/messages.ts Line 245:
"content is fake, which poisons training data if submitted"

// src/utils/sessionStorage.ts Line 4388:
"Ant transcripts keep the wrapper so /share training data sees REPL usage"

The first says certain content "poisons training data" โ€” the developer knows message content lands in the training pipeline.

The second describes /share data as "training data" โ€” the most direct textual evidence.

Ring 6: Five-layer performance telemetry

Beyond functional data, Anthropic has embedded a five-layer performance-measurement system:

LayerImplementationSamplingOutput sink
Headless Latency ProfilerheadlessProfiler.ts100% internal / 5% externaltengu_headless_latency event
Frame TiminginteractiveHelpers.tsxOn demandLocal JSONL files
FPS TrackerfpsTracker.tsContinuousavgFps + P99 frame time
Perfetto Chrome TraceperfettoTracing.tsInternal only~/.claude/traces/
OpenTelemetryinstrumentation.tsContinuousOTLP / Prometheus / BigQuery

Note the last row: OpenTelemetry's exporter list includes BigQuery. Performance data flows into the same warehouse.

Ring 7: Competitor detection

A small but telling detail:

// src/utils/codeIndexing.ts Lines 52, 82:
// Detect aider CLI and MCP server
// CLI: 'aider'
// MCP: /^aider$/i
// Use: track Claude Code coexisting with competitors in analytics

Anthropic tracks whether users run competitor tools (Aider) alongside Claude Code. That data also flows into BigQuery.

Full pipeline diagram

User interaction (every tool call, API request, Bash command)
  โ”‚
  โ”œโ”€ grove_enabled = true โ†’ retention = 5 years
  โ”‚
  โ–ผ
796 tengu_* events ร— 40+ metadata fields
  โ”‚
  โ”œโ”€โ†’ Datadog (redacted, _PROTO_* stripped) โ†’ ops monitoring
  โ”‚
  โ””โ”€โ†’ 1P API /api/event_logging/batch (full payload)
       โ”‚
       โ”œโ”€ Protobuf schema, strictly typed (865 lines generated)
       โ”œโ”€ _PROTO_* fields โ†’ BigQuery privileged columns (isolated)
       โ”œโ”€ SWE-bench IDs embedded โ†’ eval shares the pipeline
       โ”‚
       โ–ผ
     BigQuery warehouse
       โ”‚
       โ”œโ”€ Feedback text "approved for BQ" (after PII redaction)
       โ”œโ”€ Transcript shares โ†’ POST /api/claude_code_shared_session_transcripts
       โ”œโ”€ OpenTelemetry performance data
       โ”œโ”€ Competitor-tool detection
       โ”‚
       โ–ผ
     Developer comments confirm: "training data"
       โ”‚
       โ–ผ
     ???? (server-side training process, not visible to client)

Why this pipeline matters

  1. It's complete โ€” every step from UI toggle to BigQuery privileged columns has a source line number.
  2. It's tiered โ€” Datadog gets redacted, 1P gets the full payload, BigQuery has privileged-column isolation.
  3. It's reused โ€” SWE-bench evals and user data share the same pipeline and schema.
  4. It's not speculation โ€” Grove UI literally says "train and improve"; developer comments literally say "training data".
  5. It hasn't been reported โ€” as of publication, no public analysis mentions "Grove", BigQuery privileged columns, or the SWE-bench-shares-telemetry finding.

What I cannot verify from the client source

To be honest about the limits of this analysis:

None of those gaps weaken the core finding: the complete data pipeline from a user's keyboard to BigQuery's training warehouse is verifiable in the client source code.


๐Ÿ›ก๏ธ How Claude Code Detects You Distilling Its Model โ€” A 5-Layer Defense, Reverse-Engineered

Reverse-engineered from 512K lines of source: the complete engineering implementation Anthropic ships to prevent model theft. Each layer has a precise file path and line number.

Why distillation is the AI industry's #1 threat

Model distillation โ€” recording API traffic from a frontier model and using its outputs to train a smaller model โ€” is one of the largest commercial threats in AI. Replicate capabilities at a fraction of the cost.

This isn't theoretical. It's actively happening:

Against this backdrop, Anthropic shipped a 5-layer defense system in Claude Code. Unlike competitors', the full implementation is exposed via source โ€” making this the first complete, production-level analysis of a commercial AI system's anti-distillation engineering.

Overview: 5 layers

Anti-distillation: 5-layer defense overview Figure 1 โ€” The five-layer defense: client attestation, fingerprint attribution, fake-tool injection, signature-bearing blocks, and the distillation-resistant streamlined output mode.

Layer 1: Native Client Attestation โ€” "Are you really Claude Code?"

How it works

Before each API request, Bun's native HTTP stack injects an attestation token into the serialized request body. The server verifies the token to confirm the request came from a real Claude Code client, not a third-party proxy or recording tool.

Source

// src/constants/system.ts Lines 64-68:
// When NATIVE_CLIENT_ATTESTATION is enabled, includes a `cch=00000` placeholder.
// Before the request is sent, Bun's native HTTP stack finds this placeholder
// in the request body and overwrites the zeros with a computed hash. The
// server verifies this token to confirm the request came from a real Claude
// Code client. See bun-anthropic/src/http/Attestation.zig for implementation.

// Lines 81-82:
const cch = feature('NATIVE_CLIENT_ATTESTATION') ? ' cch=00000;' : ''

Engineering details

Native client attestation flow Figure 2 โ€” Attestation.zig at the Bun layer rewrites the cch=00000 placeholder to a real hash before the request hits the wire. The server validates the token; invalid or missing โ†’ request rejected.

Why a placeholder instead of direct injection? The source comment explains: "same-length replacement avoids Content-Length changes and buffer reallocation" โ€” a 5-character placeholder gets swapped with a 5-character hash, no body length change, no memory reallocation.

Implementation language: Zig (bun-anthropic/src/http/Attestation.zig), running at Bun's native HTTP layer. The auth logic isn't in JavaScript or TypeScript โ€” it can't be bypassed by patching the JS layer.

Layer 2: Fingerprint Attribution โ€” "Which client generated this?"

How it works

Each API request carries a 3-character fingerprint computed via SHA256 over a salt + specific characters from the user's first message + the version string. This lets Anthropic trace every piece of training data back to its originating client.

Source

// src/utils/fingerprint.ts:

// Hardcoded salt (must stay in sync with backend verification)
export const FINGERPRINT_SALT = '59cf53e54c78'

export function computeFingerprint(
  messageText: string,
  version: string,
): string {
  // Pull characters at positions 4, 7, 20 from the user's first message
  const indices = [4, 7, 20]
  const chars = indices.map(i => messageText[i] || '0').join('')

  const fingerprintInput = `${FINGERPRINT_SALT}${chars}${version}`

  // SHA256, take first 3 hex chars
  const hash = createHash('sha256').update(fingerprintInput).digest('hex')
  return hash.slice(0, 3)
}
// src/constants/system.ts Lines 78-91:
// Fingerprint is embedded in the version string sent in headers
const version = `${MACRO.VERSION}.${fingerprint}`
const header = `x-anthropic-billing-header: cc_version=${version};
  cc_entrypoint=${entrypoint};${cch}${workloadPair}`

Compute flow

Fingerprint computation flow Figure 3 โ€” Salt + chars-at-positions-[4,7,20] + version โ†’ SHA256 โ†’ first 3 hex chars โ†’ embedded in the billing header.

The source comment explicitly notes: "IMPORTANT: Do not change this method without careful coordination with 1P and 3P (Bedrock, Vertex, Azure) APIs." โ€” the fingerprint algorithm has to stay consistent across every API provider.

Layer 3: Fake Tools Injection โ€” "Honeypot tools"

How it works

This is the cleverest layer. Claude Code sends anti_distillation: ['fake_tools'] to the API; the server mixes fake tool definitions into the normal tool list. If anyone records this API traffic and trains a model on it, the distilled model will "know" the fake tools โ€” and that knowledge is the evidence of distillation.

Source

// src/services/api/claude.ts Lines 301-313:
// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

Trigger conditions

Fake tools injection conditions Figure 4 โ€” All four gates (feature flag, CLI entrypoint, first-party API, GrowthBook gate) must pass before fake tools are opted in.

Critical limitation: only fires on Anthropic's first-party API. Requests via Bedrock, Vertex, or other third-party proxies do not get fake tools โ€” shouldIncludeFirstPartyOnlyBetas() excludes those providers.

Honeypot mechanics: the actual fake-tool definitions live server-side; the client only sends an opt-in signal. Means the fake tools can be rotated server-side without shipping a new client.

Layer 4: Signature-Bearing Blocks โ€” "Signatures as locks"

How it works

The API's thinking blocks and connector_text blocks both carry a cryptographic signature bound to the API key that generated them. Switch keys, and those signatures instantly become invalid โ€” the API rejects them with a 400.

Source

// src/utils/messages.ts Lines 5060-5064:
// Strip signature-bearing blocks (thinking, redacted_thinking, connector_text)
// from all assistant messages. Their signatures are bound to the API key that
// generated them; after a credential change (e.g. /login) they're invalid and
// the API rejects them with a 400.

export function stripSignatureBlocks(messages: Message[]): Message[] {
  const result = messages.map(msg => {
    if (msg.type !== 'assistant') return msg
    const content = msg.message.content
    const filtered = content.filter(block => {
      if (isThinkingBlock(block)) return false        // drop thinking
      if (feature('CONNECTOR_TEXT')) {
        if (isConnectorTextBlock(block)) return false  // drop connector_text
      }
      return true
    })
    return { ...msg, message: { ...msg.message, content: filtered } }
  })
  return result
}

Connector Text โ€” the anti-distillation mechanism

// src/utils/betas.ts Lines 279-284:
// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns โ€” same mechanism as thinking blocks.

Connector text signature flow Figure 5 โ€” The API buffers assistant text between tool calls, summarizes it, and returns a summary plus a signature_delta. On the next turn, the signed summary lets the API restore the original text โ€” but anyone recording the traffic only sees summaries.

Current limitation: Connector Text is only enabled for Anthropic-internal users (USER_TYPE === 'ant') โ€” measuring TTFT/TTLT/capacity impact. Feature gate: tengu_slate_prism.

Layer 5: Streamlined Mode โ€” "Distillation-resistant output format"

How it works

The source comment names this layer literally: "distillation-resistant output format". It strips information specifically to reduce the value of the output as distillation training data.

Source

// src/utils/streamlinedTransform.ts Lines 1-9:
// Streamlined mode is a "distillation-resistant" output format that:
// - Keeps text messages intact
// - Summarizes tool calls with cumulative counts (resets when text appears)
// - Omits thinking content
// - Strips tool list and model info from init messages

What gets filtered

Original outputStreamlined output
Text contentโœ… Preserved
Thinking contentโŒ Dropped
Tool call details๐Ÿ“Š Cumulative counts only
Tool definition listโŒ Dropped
Model infoโŒ Dropped

What does the would-be distiller lose?

Defense matrix

LayerDefends againstDetect / BlockBypass difficulty
AttestationSpoofed clientsBlockHigh โ€” requires reversing native Zig code
FingerprintAnonymous distillationPost-hoc traceMedium โ€” algorithm is public
Fake ToolsAPI traffic recordingPost-hoc detectLow โ€” don't request fake_tools
SignatureCross-key data reuseBlockHigh โ€” server-side validation
StreamlinedOutput distillationReduce valueLow โ€” only applies to SDK mode

Where this defense system is weak

From a pure security-research perspective (understanding the defense is how you improve the defense):

1. Fake Tools only fires on first-party. Bedrock/Vertex requests don't get fake tools, because shouldIncludeFirstPartyOnlyBetas() excludes third-party providers. If the distiller uses AWS Bedrock's Claude endpoint, this layer is wholly inactive.

2. Fingerprint algorithm is fully public via the source. Salt 59cf53e54c78, character positions [4, 7, 20], SHA256 โ€” all visible. A distiller can forge fingerprints (though the server may have other validation).

3. Connector Text only on internal users. process.env.USER_TYPE === 'ant' means external users' conversation text gets no Connector Text protection.

4. Streamlined Mode only applies to SDK output. Normal CLI interactions don't go through Streamlined Transform; full tool-call details and thinking content are visible in standard output.

5. Attestation is the strongest layer. Native client auth runs in Zig, outside the JavaScript-controllable surface. But if a distiller calls the API directly (not via Claude Code), this layer doesn't apply โ€” it protects against "impersonating Claude Code", not "using the raw API".

Key insight

Anthropic's anti-distillation strategy isn't a single point of defense โ€” it's a layered, complementary system. None of the layers is perfect, but stacked together they cover different attack surfaces:

Attacks vs defenses matrix Figure 6 โ€” Each attack vector is covered by a different defense layer.

The deepest defense isn't on the client at all: the actual reasoning capability (the weights) never leaves Anthropic's servers. Client-side defenses protect training-data quality and API usage attribution โ€” even if a distiller records output, Anthropic can trace its origin and detect fake-tool traces in the resulting distilled model.

Comparison with prior analyses

Alex Kim's blog [8] first reported the existence of fake tools and connector text, but only described the code snippets โ€” without analyzing the full defense system. This article's new contributions:

  1. Systematic 5-layer framework โ€” organizes scattered findings into a layered defense system.
  2. Native Client Attestation Zig analysis โ€” not previously mentioned in any public analysis.
  3. Full fingerprint algorithm reconstruction โ€” including salt, character positions, hash method.
  4. Streamlined Mode as a distillation-resistant format โ€” the source comment literally says "distillation-resistant", not previously reported.
  5. Defense matrix and weakness analysis โ€” assessing each layer's effectiveness and limitations from a security-research stance.

Mapping to the academic literature

Anthropic's anti-distillation implementation maps interestingly onto recent academic work:

Anthropic implementationAcademic conceptReference
Fake Tools InjectionRadioactive watermarking / data poisoning[6] Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? (ACL 2025)
Fingerprint AttributionLLM fingerprinting[5] LLM Fingerprinting via Semantically Conditioned Watermarks (arXiv 2505.16723)
Connector Text SummarizationTrace rewriting / output perturbation[7] Protecting Language Models Against Unauthorized Distillation through Trace Rewriting (arXiv 2602.15143)
Streamlined ModeDistillation-resistant decoding[4] Towards Distillation-Resistant LLMs: An Information-Theoretic Perspective (arXiv 2602.03396)
5-layer stackDefense-in-depth for ML systems[2] IAPS: AI Distillation Attacks โ€” The Case for Targeted Government Intervention

Worth noting: academic papers are largely theoretical proposals or controlled experiments โ€” Claude Code is a production deployment serving millions of users. That gives this analysis distinct empirical value: we're seeing the engineering trade-offs Anthropic actually made under real adversarial pressure, not a paper algorithm.

References

[1] Berkeley Law. "The Innovation Dilemma: AI Distillation in OpenAI v. DeepSeek." The Network, 2025. Link

[2] IAPS. "AI Distillation Attacks: The Case for Targeted Government Intervention." Institute for AI Policy and Strategy, 2026. Link

[3] Google Cloud Blog. "GTIG AI Threat Tracker: Distillation, Experimentation, and Integration of AI for Adversarial Use." 2026. Link

[4] arXiv:2602.03396. "Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective." 2026. Link

[5] arXiv:2505.16723. "LLM Fingerprinting via Semantically Conditioned Watermarks." 2025. Link

[6] arXiv:2502.11598. "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?" ACL 2025 Main. Link

[7] arXiv:2602.15143. "Protecting Language Models Against Unauthorized Distillation through Trace Rewriting." 2026. Link

[8] Alex Kim. "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode, and more." 2026. Link

All code references taken from the Claude Code source (2026-03-31 snapshot). File paths and line numbers reflect actual source locations.