Feature Architect

A multi-loop deliberation protocol with agent-isolated roundtables, structural linting, and adversarial quality gates.

Claude Code skill · v5 · GitHub

Feature Architect v5

Known Failure Mode

Claude will attempt to compress deliberation rounds, skip roundtables, simulate expert panels without using sequential-thinking, or generate past phase boundaries. This ALWAYS produces worse outcomes. The deliberation IS the value. When you feel the urge to skip ahead: STOP. Complete only the current phase.

Protocol Files

| File | Category | Purpose | |------|----------|---------| | skill.md | Imperative | Protocol identity, triage, orchestration, amendment | | spec-model.md | Declarative | Center definition, center_test, element format, ID conventions | | formats.md | Declarative | Output formats, frontmatter schemas, operational parameters | | archive-protocol.md | Declarative | Archive search method, archetype vocabulary, tagging rules | | phases/intent.md | Imperative | Produce whiteboard.md and requirements.md | | phases/mechanism.md | Imperative | Produce design.md and tasks.md | | phases/execution.md | Imperative | Build from the task list | | prompts/whiteboard.md | Template | Whiteboard roundtable agent prompt | | prompts/requirements.md | Template | Requirements roundtable agent prompt | | prompts/design.md | Template | Design roundtable agent prompt | | prompts/tasks.md | Template | Tasks roundtable agent prompt | | prompts/destroyer.md | Template | Destroyer gate evaluation prompt | | lint-spec.mjs | Verification | Structural integrity checks (P1-P9, F1-F6) | | context-interface.md | Declarative | Context module specification | | OBSERVATIONS.md | Wisdom | Learnings from prior executions |

Reference convention:

-> Load — read the file in full, use as agent prompt or instruction set
-> Consult — read the file, find the relevant section by heading
-> Run — execute this command

Spec Model

Every spec is built around a center — one sentence defining what the feature IS.

Set during the whiteboard phase. The panel MUST converge on it before producing anything.
Immutable for the spec's lifetime. If the center needs to change, close this spec and open a new one.
Appears in the frontmatter of EVERY spec file (redundant but visible).
Must be falsifiable — you should be able to name a good feature idea that DOESN'T serve it. If everything qualifies, it's too broad.
About the PROBLEM being solved, not the solution mechanism.

Center Test

The center includes a center_test in the whiteboard frontmatter that operationalizes falsifiability:

center_test:
  excludes: "A good feature idea that does NOT serve this center"
  boundary: "A case where the center ALMOST applies but the spec should say no"

Exclusion test: Name something good this center excludes. If you can't, it's too broad.
Boundary discrimination: Describe a near-miss — something that almost qualifies but shouldn't. This tests the center's edge, not just its core.

Example:

center: "The practice session must end with a structured reflection that
  connects this session's performance to the learner's trajectory across
  sessions."
center_test:
  excludes: "A new interaction modality — good, but doesn't connect
    performance across sessions"
  boundary: "A per-question feedback improvement — relates to performance
    but doesn't connect across sessions"

Element Format

ACs and tasks use structured headings with visible metadata:

### ac-question-persists: Question persists during feedback
> **Center:** Prevents context loss — the foundational spatial
> contiguity fix

When a learner submits an answer, the question remains visible on screen
while feedback is displayed.

### t-reducer-simplification: Reducer + scorecard + active state separation
> **Center:** Enables question persistence by separating active state
> from feedback state
> **Traces:** ac-question-persists, ac-selected-distinguished,
> ac-scorecard-interval
> **Depends:** (none)
> **Status:** pending

State additions: activeInteraction, activeFeedback, activeResponse...

Key properties:

IDs are semantic and stable. ac-question-persists, not AC-1. Inserting a new element never renumbers existing ones.
Center_links are visible. Blockquote format — the justification is the first thing you read, not hidden metadata.
Edges use IDs. Traces: ac-question-persists — references are to stable identifiers, not sequential numbers.
Forgiving during drafting, strict at validation. Agents write freely during roundtables. The linter catches missing metadata before the checkpoint.

ID Conventions

| Prefix | Element type | File | |--------|-------------|------| | ac- | Acceptance criteria | requirements.md | | t- | Tasks | tasks.md | | da- | Design atoms | design.md (atoms table) | | sc- | Scenarios | requirements.md (free-form prose) |

IDs are kebab-case, descriptive of the REQUIREMENT not the implementation: ac-question-persists (what it requires) not ac-frozen-renderer (how it's implemented).

Traceability

There is NO maintained traceability matrix. Traceability is encoded in the Traces: and Depends: fields of task elements. The linter validates coverage. A traceability report can be generated on demand by querying the edges, but it is NEVER stored as a maintained artifact. Maintained copies ALWAYS drift.

Triage

Parse: Read the user's description. Identify: what they want, constraints/preferences, greenfield or existing.

Discover: If a context module exists, run its Discovery Protocol. If greenfield, note it.

Assess intensity:

| | LOW NOVELTY | HIGH NOVELTY | |---|---|---| | SMALL SCOPE (1-2 atoms) | Express | Focused | | LARGE SCOPE (3+ atoms) | Standard | Deep |

| Intensity | Phases | Panel Size | Notes | |-----------|--------|------------|-------| | Express | Rapid spec (center sprint + destroyer + direct write) -> Execution | — | 1 checkpoint at center. Archive search detects candidacy (2+ analogues + small scope). | | Focused | Intent (requirements only, skip whiteboard) -> Execution | 3 | | | Standard | Intent -> Mechanism -> Execution | 3 | With research step if applicable | | Deep | Intent -> Mechanism -> Execution | 4+ | Expanded panels |

Confirm: "I found [discovery]. This looks like [intensity] — [rationale]. I'll [what protocol will do]. Sound right?" User may override.

Orchestration Procedure

After triage, execute phases by loading their instruction files:

LOAD: Read the next phase file from phases/. Do NOT read subsequent phase files.
EXECUTE: Follow the phase file instructions completely.
LINT: Run node {skill-dir}/lint-spec.mjs .specs/features/{feature-name}/. If FAIL violations exist, fix before proceeding. If FLAG violations exist, include in checkpoint.
VERIFY: Produce the contract output specified at the end of the phase file.
GATE: Present the checkpoint. STOP generating. Wait for user confirmation.
PROCEED: After user confirms, return to step 1 with the next phase.

If the user says "good enough, build it" at any checkpoint, note what would have been deliberated further and proceed to the next phase immediately.

Intensity routing:

Express -> phases/intent.md Express Flow (center sprint + destroyer + rapid spec) -> phases/execution.md
Focused -> phases/intent.md (requirements only, skip whiteboard) -> phases/execution.md
Standard -> phases/intent.md -> phases/mechanism.md -> phases/execution.md
Deep -> same as Standard with expanded panels (4+ experts per roundtable)

Research gate: The research verification step runs only when BOTH conditions are met: (1) intensity is Standard or Deep, AND (2) the feature description references research claims or learning science. Purely technical features skip it.

Phase 1: Intent Loop

The intent loop produces two artifacts: whiteboard.md (WHY) and requirements.md (WHAT). Both must be system-agnostic — no implementation language, no tech-stack terms, no tool names.

Step 0: Research Verification (Conditional)

Gate: Run only when BOTH conditions are met:

Intensity is Standard or Deep
The feature description references research claims, learning science, effect sizes, pedagogical decisions, or UX research findings

Procedure:

Collect every research claim. For each: attributed source, specific finding, numbers cited.
Verify each claim via web search. Confirm findings, note caveats, flag conflations.
Search for 2-3 additional relevant papers (prioritize meta-analyses).
Produce the Verified Research Base with per-paper entries.
Flag dropped citations as UNVERIFIABLE with replacement.
Append verified references to research/references.md.

Step 1: Archive Search

Search the spec archive for structural analogues.

Glob .specs/archive/*/ to find all archived specs
For each, read frontmatter (center, archetype tags) and skim whiteboard
Assess structural similarity based on FORCES resolved — not domain, not keywords
Collect 0-3 candidates ranked by structural relevance
Pass candidates to the whiteboard roundtable as additional input

Express detection: If 2+ strong structural analogues found AND scope is small, the feature is a candidate for Express mode.

Express Flow

For Express intensity only. Produces a rapid spec in one pass with one checkpoint.

E1: Center Sprint — Propose a center and center_test based on description and analogues. Present to user with "Express or deeper?"

E2: Destroyer Check — Evaluate center for structural flaws (see Destroyer Gate below).

E3: Rapid Spec — Write .specs/features/{feature-name}/rapid-spec.md directly. No roundtable.

E4: Begin Execution — Proceed to Phase 3.

Step 2: Whiteboard Roundtable (Agent-Isolated)

Launch an Agent with the whiteboard prompt. The agent's output IS the whiteboard content.

Panel (fixed roster):

Alfred Korzybski — general semantics, map-territory distinction, structural differential
Douglas Hofstadter — strange loops, analogy as cognition, fluid concepts
Heinz von Foerster — second-order cybernetics, constructivism, self-organizing systems
Marshall McLuhan — media theory, extensions of man, figure/ground analysis

Each panelist speaks IN CHARACTER. Uses the sequential-thinking MCP tool. AT LEAST 3 rounds.

QUESTION: "What is this really about, who needs it, and what could go wrong if we misunderstand?"

Output sections: Center, Center Test, Context, Intent, Assumptions, Design Tensions, Open Questions, Alternatives Considered, Non-Functional Context.

CONSTRAINT: Output MUST NOT contain system-specific terms, implementation details, or architecture decisions.

Step 3: Whiteboard Checkpoint

Present the whiteboard before requirements consume it. Center prominently displayed. If user rejects center, re-launch whiteboard agent.

Step 4: Destroyer Gate

Evaluate the approved whiteboard for structural flaws before committing to requirements.

The Destroyer is an Adversary. NOT a collaborator. Finds the ONE structural flaw that will kill the feature.

Jurisdiction (may ONLY block on these five categories):

Center Incoherence — center contradicts itself or is unfalsifiable
Missing Precondition — whiteboard assumes something that doesn't exist
Inverted Causality — treats an effect as a cause, or reverses a dependency
Scope Impossibility — feature cannot be delivered within system constraints
Self-Defeating Logic — element, if implemented, would undermine the center

If PASS: proceed to requirements. If BLOCK: present verdict to user. Options: reframe, amend, or override (logged).

Step 5: Requirements Roundtable (Agent-Isolated)

Launch a SECOND Agent (different from the whiteboard agent).

Panel (fixed roster):

Donella Meadows — systems dynamics, leverage points, limits to growth
W. Edwards Deming — statistical quality control, system of profound knowledge
Steve Jobs — product taste, simplicity as sophistication, user experience

QUESTION: "Given this whiteboard, what acceptance criteria prove we're done? What did the whiteboard miss or over-scope?"

Embedded challenge: Did whiteboard capture intent? Are assumptions valid? Over-scoped? For each AC: how does it strengthen the center?

Output: ACs in element format with ac-* IDs + center_links. Scope (IN/OUT/DEFERRED). Dependencies. User Scenarios.

Step 6: Lint + Intent Loop Self-Check

Run node {skill-dir}/lint-spec.mjs. Self-check template:

SELF-CHECK: Intent Loop
- Center present and identical across all spec files? [YES/NO]
- center_test present in whiteboard.md frontmatter? [YES/NO]
- Every AC has a stable semantic ID? [YES/NO]
- Every AC has a center_link? [YES/NO]
- Requirements stable? [YES/NO]
- Whiteboard comprehensive? [YES/NO]
- Requirements trace to stated intent? [YES/NO]
- Stage purity? [YES/NO]
- Linter: {N} FAIL, {M} FLAG violations
RESULT: [PASS/FAIL]

Step 7: Checkpoint 1

Present center, center_test, AC summary with center_links, FLAG violations, open questions.

If user corrects:

Changes CENTER -> close spec, open new one
Changes FRAME -> re-launch whiteboard, then requirements
Changes DETAIL -> Amendment Protocol: find elements by ID, produce replacements, apply, re-lint

Phase 2: Mechanism Loop

Translates intent into a buildable plan. System-specific knowledge enters here — the context module is fully consumed.

Step 0: Greenfield Bootstrap

If no context module exists:

Select tech stack based on intent + constraints
Write context module conforming to the Context Module Interface
Present at Checkpoint 2 for validation

Step 1: Design Roundtable

Panel (fixed roster):

Robert C. Martin (Uncle Bob) — SOLID principles, clean architecture
Martin Fowler — refactoring, enterprise patterns, evolutionary design
Kent Beck — extreme programming, simple design, test-driven development

QUESTION: "Given these requirements, what's the cleanest decomposition, and what could go wrong architecturally?"

Architectural alignment: All design decisions must respect the Dependency Rule (source code dependencies point INWARD). Entities -> Use Cases -> Interface Adapters -> Frameworks/Infrastructure.

Output: System Decomposition (atoms table with da-* IDs), Relationship Map, Behavior Plan, UI Plan, Data Plan, Integration Plan, Verification Strategy.

Step 2: Tasks Roundtable

Panel (fixed roster):

Robert C. Martin (Uncle Bob) — SOLID principles, clean architecture
Martin Fowler — refactoring, enterprise patterns, evolutionary design
John Carmack — systems optimization, shipping discipline, first-principles engineering

QUESTION: "What's the optimal build order, what are the dependency traps, and where will things go wrong?"

Tasks ordered inside-out: entities/domain first, then use cases, then adapters, then framework wiring.

Output: Tasks in element format with t-* IDs, center_links, traces, depends, status. No traceability matrix — traceability encoded in edges.

Step 3: Lint + Mechanism Loop Self-Check

SELF-CHECK: Mechanism Loop
- Center identical across all spec files? [YES/NO]
- Tasks cover entire design? [YES/NO]
- Design traces to every requirement? [YES/NO]
- No orphan tasks? [YES/NO]
- No orphan ACs? [YES/NO]
- No circular dependencies? [YES/NO]
- Linter: {N} FAIL, {M} FLAG violations
RESULT: [PASS/FAIL]

Escalation to Intent: If design reveals requirement gap or infeasible requirement — re-enter Phase 1 with specific issue. Max 2 escalations. Third: "Let's talk."

Step 4: Checkpoint 2

Present center, task summary, center_links, key tradeoff, FLAG violations, concerns. Temperature can only go UP from Checkpoint 1.

Phase 3: Execution Loop

Execute the task list. Build -> Verify -> Amend.

Build

For each task in order:

State intention: One sentence — what and why.
Execute using context module's construction patterns.
Mark status as complete in tasks.md.
Append to log.md.

Verify

After each task:

Run context module's verification patterns.
Check: does output match design element?
Run linter — ensure spec files still consistent.

Amend

| Severity | Signal | Action | |---|---|---| | Minor | Implementation bug, typo, missing import | Fix immediately, log, continue | | Moderate | Task needs different approach | Focused re-evaluation (1 expert), amend task, retry | | Critical | Design element wrong, or AC unmet | Circuit breaker — stop execution, present issue to user |

Backward Reference

At any point, trace the chain:

tasks.md: Am I on the right task? (find by stable ID)
design.md: Does my output match the blueprint? (find by da-* ID)
requirements.md: Is this actually required? (find by ac-* ID)
whiteboard.md: Does this serve the original intent and the CENTER?

Completion

After all tasks pass verification:

Run full verification suite.
Run linter — final validation.
Fix any remaining issues.
Present: what was built, center fidelity, discoveries, deferred items, known limitations.

Amendment Protocol

When the user corrects something at a checkpoint:

Identify affected elements by stable ID.
Assess scope: Does this change the CENTER? If yes — close spec, open new one.
Produce REPLACEMENT elements (complete new versions, not patches).
Apply replacements: Find each element by stable ID, swap entire section.
Run linter. Fix any violations introduced.
Present clean state at the next checkpoint.

Amendments are REPLACEMENTS, not patches. The file always reflects current state. Git tracks the history.

Checkpoint Patterns

| Temperature | When | Pattern | |---|---|---| | Cool | High confidence, familiar | "[3-sentence summary]. I'm assuming [key assumptions]. Looks right?" | | Warm | Specific uncertainty | "[Summary]. I'm unsure about [X]. What do you think?" | | Hot | Contradiction or blocker | "I found [issue]. We need to resolve this before continuing." |

Temperature only goes UP within a feature, never down.

Archive Protocol

After completion:

Archive the spec folder to .specs/archive/{feature-name}/
Tag with archetypes based on forces resolved (structural patterns, not domain categories)
Save conversation log to .specs/archive/{feature-name}/session-raw.jsonl

Archetype Vocabulary

| Tag | Description | |-----|-------------| | separation-of-concerns | Disentangling responsibilities across boundaries | | pipeline-decomposition | Breaking a monolithic flow into composable stages | | state-machine-evolution | Adding/changing states, transitions, or lifecycle | | evaluation-infrastructure | Building measurement, scoring, or quality assessment | | modality-addition | Adding a new interaction type or output format | | surface-redesign | Changing what users see without changing underlying logic | | integration-wiring | Connecting existing components or external services | | schema-evolution | Adding/changing data models, migrations, persistence | | quality-gate | Adding validation, linting, or enforcement mechanisms | | knowledge-representation | Changing how concepts, relationships, or mastery are modeled |

Tags describe FORCES, not domains. The vocabulary grows organically.

Structural Linter

Deterministic spec integrity checker. Run after every phase.

FAIL checks (P1-P9) — must fix before proceeding:

| Check | Description | |-------|-------------| | P1 | Every ID in Traces: fields exists | | P2 | Every ID in Depends: fields exists | | P3 | Every ac-* is traced to by at least one t-* | | P4 | Every t-* has at least one Traces: target | | P5 | No duplicate IDs | | P6 | No circular dependencies | | P7 | Center identical across all spec files | | P8 | No removed term appears as active heading (fossil detection) | | P9 | center_test must exist if center is declared |

FLAG checks (F1-F6) — surfaced for human review:

| Check | Description | |-------|-------------| | F1 | Center_link shorter than 8 words | | F2 | Center_link with indirect language (enables/allows/supports without naming mechanism) | | F4 | Task with > 5 traces (consider splitting) | | F5 | Missing center_link | | F6 | center_test components shorter than 10 words |

Express mode relaxes F1, F2, and F5 (center_links not required at that scale).

Context Module Interface

A context module teaches the protocol about a specific system. The protocol contains ZERO system-specific knowledge.

Required Sections

System Identity — What is this system, what tech stack, what's its purpose?
Discovery Protocol — How does the agent learn what already exists?

Optional Sections

Decomposition Patterns — How does this system break features into parts?
Construction Patterns — How are parts built? (skills, generators, CLI tools)
Verification Patterns — How do you confirm something works?
Conventions — Naming, file structure, architectural rules
Known Traps — Things that commonly go wrong

| Sections present | Protocol behavior | |---|---| | All 7 | Full autonomy | | 4-6 | Moderate autonomy — flags uncertainty | | 2-3 | Guided mode — asks targeted questions | | 1 (System Identity only) | Interview mode |

Reflection Protocol

After execution completes:

Protocol-Level Observations (append to OBSERVATIONS.md)

### [{date}] {type: process-insight | process-failure}
- **What happened**: {description}
- **Evidence**: {what you observed}
- **Proposed change**: {how the protocol should adapt}

Context-Level Observations (update context module)

New known trap -> append automatically
Convention or pattern change -> propose to user

What NOT to Observe

One-off bugs (just fix them)
User preferences already in spec files
Anything that doesn't generalize beyond this single feature