Build the durable kernel of the agent reflection loop. Passive end-of-run capture of the doer's end-state as structured `reflection.v1` data, plus a deterministic diff review risk-floor. The closed calibration/skill-synthesis loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3. - packages/macp: evaluateRiskFloor (pure, deterministic surface classifier) + reflection.v1 JSON Schema; 15 unit tests. - packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests. - framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar, registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless REFLECTION_MODE=solo|orchestrated; never blocks or fails a session. - scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill conditions and structured output. Mechanical fields (risk, files_changed, ids, provenance) are written by the hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true. Independent review remediations: empty/all-.mosaic diff still writes a sidecar (grep no-match no longer aborts); session_id sanitized before path use. Refs #544 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
PRD — Agent Reflection Loop (durable kernel)
Issue: #544
Source design: jarvis-brain docs/planning/AGENT-REFLECTION-LOOP.md (commit df6576fc, debate-hardened v2)
Status: in-progress
Scope rule: Build the durable kernel only. The closed calibration/skill-synthesis loop
(design §7–§8) is gated behind Phase-0 experiments P1/P2/P3 and is explicitly out of scope here.
1. Problem
At end-of-run an agent holds context that never reaches the diff or the "done" message —
assumptions, shortcuts, untested paths, the single most-likely way the work is wrong. That context
is what a lead/human needs to judge trust, and it evaporates when the session ends. Capture it
mechanically as structured data (reflection.v1), and derive a review risk-floor from the
change surface so risky diffs are flagged for independent review.
2. Non-goals (gated on Phase-0)
- No closed calibration loop (predicted-vs-actual scoring as a routing input).
- No skill synthesis.
- No automated reviewer routing/dispatch. The kernel writes the sidecar; pickup is future work.
3. Components & exact placement (main-branch truth)
| # | Component | Path | Mirror |
|---|---|---|---|
| a | Stop hook (capture) | packages/mosaic/framework/tools/qa/reflect-stop-hook.sh |
tools/qa/prevent-memory-write.sh |
| a | Hook registration | packages/mosaic/framework/runtime/claude/settings.json (hooks.Stop) |
existing PreToolUse/PostToolUse |
| b | JSON Schema | packages/macp/src/schemas/reflection.v1.schema.json |
schemas/task.schema.json |
| b | TS types (zod) + DTO | packages/types/src/reflection/{index.ts,reflection.dto.ts} + re-export from src/index.ts |
packages/types/src/federation/* |
| c | Diff risk-floor | packages/macp/src/risk-floor.ts (+ __tests__/risk-floor.test.ts, export from src/index.ts) |
packages/macp/src/gate-runner.ts |
| d | Phase-0 scripts | scripts/analysis/reflect-{git-history,board-history,calibration}.sh |
scripts/publish-npmjs.sh |
Activation note (deliberate deviation): the settings-overlays/ directory has no merge
mechanism (referenced only in docs), so a hooks overlay there would be inert. The Stop hook is
registered in the canonical runtime/claude/settings.json — the same file the mosaic launcher
reflects into ~/.claude/settings.json (verified byte-identical hooks live there). Still fully
vendored in-repo.
4. reflection.v1 schema (authoritative field list)
{
"schema": "reflection.v1", // literal
"task_ref": "string", // canonical task ref; kernel derives from REFLECTION_TASK_REF or repo+branch
"agent": "string", // persona/runtime id (REFLECTION_AGENT or "unknown")
"session_id": "string", // from Stop payload session_id, else "unknown"
"timestamp": "string", // ISO-8601 UTC
"repo": "string", // repo root basename
"confidence": 0.0, // FLOAT [0,1] — SELF-REPORTED (optional; null if not supplied)
"most_likely_wrong": {
// SELF-REPORTED (optional)
"surface": "auth|data|infra|ui|build|test|docs|none",
"description": "string",
},
"known_not_in_diff": "string|null", // SELF-REPORTED: "what I know that isn't visible in the diff"
"risk": {
// MECHANICAL — from risk-floor
"needs_review": true,
"score": 0.0, // [0,1]
"surface": "auth|data|infra|ui|build|test|docs|none",
"reason": "string",
},
"files_changed": ["string"], // MECHANICAL — git diff name-only
"provenance": {
"source": "stop-hook",
"reflection_attempt": 1,
"degraded": false, // true if self-report inputs missing/unreadable
"reflection_mode": "off|solo|orchestrated",
},
}
Mechanical vs self-reported. A bash Stop hook cannot author the agent's self-assessment. The
hook populates the mechanical fields deterministically (risk, files_changed, provenance, ids).
The self-reported fields are read from an optional agent-supplied input file
($REFLECTION_INPUT, default <repo>/.mosaic/reflection-input.json) and merged if present;
absent/unreadable → those fields null and provenance.degraded=true. This realizes the design's
"hook is a pre-seed, not the asker" (§4).
5. Stop hook behavior (fail-closed, non-blocking)
- Read Stop payload JSON from stdin.
- Fail-closed: if
REFLECTION_MODEis unset oroff→exit 0immediately (strict no-op). This is the global-registration safety guarantee. - Sentinel guard: if
<sidecar>.lockexists →exit 0(prevents re-fire loops). Create it,trapcleanup. - Determine output dir:
$REFLECTION_DIRelse<repo>/.mosaic/reflections/.mkdir -p. - Compute mechanical fields:
git diff --name-only(HEAD + staged + worktree, best-effort), call risk-floor logic (inline bash port ORnode -einto@mosaicstack/macp— see §6), session ids from payload + env. - Merge optional
$REFLECTION_INPUTself-report if readable JSON. - Write
reflection.v1to a temp file,mv(atomic) to<dir>/<session>-<ts>.reflection.json. - Always
exit 0. Never emit adecisionfield (Stop hooks are observational).
Hook must never fail the session: wrap risky steps, default to degraded:true on any error, exit 0.
6. Risk-floor (packages/macp/src/risk-floor.ts)
Pure, deterministic, no IO. Single source of truth for the verdict; the hook calls it via
node --input-type=module -e (importing the built package) or, to avoid a node dependency in the
hook path, the hook ports the same surface table. Decision: implement the canonical logic in TS
(tested), and have the hook shell out to node when available, else fall back to a minimal inline
classifier flagged degraded:true. (Keep the TS the authority; the inline path is a safety net.)
export type ReviewSurface = 'auth' | 'data' | 'infra' | 'ui' | 'build' | 'test' | 'docs' | 'none';
export interface RiskFloorInput {
filesChanged: string[];
insertions?: number;
deletions?: number;
}
export interface RiskFloorVerdict {
needs_review: boolean;
score: number;
surface: ReviewSurface;
reason: string;
}
export function evaluateRiskFloor(input: RiskFloorInput): RiskFloorVerdict;
Surface classification by path regex (first match wins, highest-risk surface dominates):
auth(weight 1.0):auth,login,session,token,permission,rbac,credential,secretdata(0.9):migration,prisma,schema,\.sql,entity,repository,seedinfra(0.85):docker,\.woodpecker,compose,traefik,deploy,helm,k8s,terraformbuild(0.6):package.json,tsconfig,turbo.json,pnpm-,\.config\.,eslint,viteui(0.4):\.tsx,\.css,components/,apps/web/test(0.2):\.spec\.,\.test\.,__tests__/docs(0.1):\.md,docs/none(0.0): anything else
needs_review = score >= THRESHOLD (default 0.5, overridable). reason names the files+surface
that tripped it. Subordinate to CI: this is a floor (minimum review requirement) only;
consumers MUST treat CI/tests as authoritative above the floor (precedence: CI/tests > human merge >
reviewer verdict > self-reflection). Documented in the module header.
7. Phase-0 experiment scripts (scripts/analysis/)
Offline, no-infra bash. Each script: #!/usr/bin/env bash, set -euo pipefail, header Usage: +
Requirements:, flag parsing, prints its pre-registered kill condition, emits structured
(JSON/markdown) output. They are harnesses + rubrics — real corpora are wired later.
reflect-git-history.sh(P2 — only-self-reflection bucket): scangit logfor failure signals (reverts,fix:/hotfixshortly after a feature merge) over a window; classify each by which gate would catch it (CI / human-review / only-self-reflection) via a pre-registered heuristic; tally. Kill: bucket-3 near-empty → no §7/§8.reflect-board-history.sh(P3 — outcome detectability): given a task/board export (or the git history ofdata/task files), measure the fraction of completed tasks with a machine-detectable correct/wrong signal within 30 days. Kill: base-rate < 20% → caveat-notes only.reflect-calibration.sh(P1 — confidence signal): consume a labeled corpus (JSONL of{confidence, correct}), compute discrimination (AUC/lift) on the self-rated-high subset, print the metric vs the pre-registered chance threshold. Kill: AUC ≈ chance on the high subset → no §7/§8.
8. CI / quality gates
- TS packages:
pnpm typecheck(tsc --noEmit),pnpm lint(eslint),pnpm format:check(prettier),pnpm test(vitest). ESM, NodeNext,.jsimport specifiers,*.dto.tsat boundaries. - New files in existing packages need no CI config change; add ≥1 vitest spec per new TS module.
- Bash scripts/hook are dev/runtime tooling, not CI-built; keep them
shellcheck-clean.
9. Acceptance criteria
REFLECTION_MODEunset → hook is a strict no-op (exit 0, no file written). (test)- With
REFLECTION_MODE=solo, hook writes a schema-validreflection.v1with correct mechanical fields; self-report merged when$REFLECTION_INPUTpresent,degraded:truewhen absent. evaluateRiskFloordeterministic across all surfaces; unit-tested incl. auth/data/infra → review, docs/test → no review, empty →none/no review.reflection.v1zod type + JSON Schema agree; sidecar validates against the schema.- Phase-0 scripts run offline, print kill conditions, emit structured output, shellcheck-clean.
pnpm typecheck && pnpm lint && pnpm format:check && pnpm testgreen; independent review passed.