4.2 KiB
544: Agent Reflection Loop — durable kernel
Issue: #544
PRD: docs/plans/agent-reflection-loop-PRD.md
Branch: feat/agent-reflection-loop
Context
Build the durable kernel of the agent reflection loop: passive end-of-run
capture of the doer's end-state as structured reflection.v1 data, plus a
deterministic diff review risk-floor. The closed calibration / skill-synthesis
loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3 and is
explicitly out of scope here. Source design: jarvis-brain
docs/planning/AGENT-REFLECTION-LOOP.md (debate-hardened v2).
Scope rule, non-goals, the full reflection.v1 field list, and acceptance
criteria live in the PRD. This file is the task breakdown + status.
Work items
| # | Item | Path | Status |
|---|---|---|---|
| 1 | Diff risk-floor (pure, deterministic) + unit tests | packages/macp/src/risk-floor.ts, risk-floor.spec.ts |
done |
| 2 | reflection.v1 JSON Schema (documented contract) |
packages/macp/src/schemas/reflection.v1.schema.json |
done |
| 3 | reflection.v1 zod schemas + self-report DTO + tests |
packages/types/src/reflection/* |
done |
| 4 | Stop hook (fail-closed capture) | packages/mosaic/framework/tools/qa/reflect-stop-hook.sh |
done |
| 5 | Hook registration (hooks.Stop) |
packages/mosaic/framework/runtime/claude/settings.json |
done |
| 6 | Phase-0 experiment harnesses (P1/P2/P3) | scripts/analysis/reflect-*.sh |
done |
Design decisions (this implementation)
- Mechanical vs self-reported split. A bash Stop hook cannot author the
agent's self-assessment, so it writes the mechanical fields (risk-floor verdict,
files_changed, ids, provenance) and merges an optional agent-supplied$REFLECTION_INPUTself-report; absent/unreadable ⇒ those fieldsnullandprovenance.degraded = true. - Risk-floor authority.
evaluateRiskFloor(TS, tested) is the source of truth. The hook ports the same surface table inline to avoid a node/build dependency on the hook path; the two are documented as kept in sync. - Hook registration deviation.
settings-overlays/has no merge mechanism (docs-only), so a hooks overlay there would be inert. The Stop hook is registered in the canonicalruntime/claude/settings.json— the same file themosaiclauncher reflects into~/.claude/settings.json. Still vendored in-repo. - DTO without class-transformer.
reflection.dto.tsuses class-validator only (no@Type), matchingchat.dto.ts, so the module imports without areflect-metadatashim in the types-package test env. Deep nested validation is owned by the zodReflectionSelfReportSchema(the runtime authority the hook uses). .mosaic/excluded from the change surface — it is agent scratch (reflections, locks, self-report input), not part of the diff under review.
Verification
pnpm --filter @mosaicstack/macp test→ 88 passed (15 new risk-floor).pnpm --filter @mosaicstack/types test→ 64 passed (10 new reflection).- Root
pnpm typecheck,pnpm lint,pnpm format:check,pnpm build→ green. - Stop hook smoke: fail-closed no-op (mode unset), solo capture (degraded), self-report merge (degraded=false), re-fire lock guard — all pass.
- All bash (hook + 3 Phase-0 scripts) shellcheck-clean; Phase-0 scripts emit structured JSON/markdown and print their pre-registered kill conditions.
Activation (post-merge, deployment concern — not a blocker)
The Stop hook only activates when a launcher/profile sets
REFLECTION_MODE=solo|orchestrated; unset/off is a strict no-op, so global
registration is safe. framework/install.sh rsyncs the hook into
~/.config/mosaic/tools/qa/, and the mosaic launcher reflects the updated
settings.json (hooks.Stop) into ~/.claude/settings.json.