Build the durable kernel of the agent reflection loop. Passive end-of-run
capture of the doer's end-state as structured `reflection.v1` data, plus a
deterministic diff review risk-floor. The closed calibration/skill-synthesis
loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3.
- packages/macp: evaluateRiskFloor (pure, deterministic surface classifier)
+ reflection.v1 JSON Schema; 15 unit tests.
- packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests.
- framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar,
registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless
REFLECTION_MODE=solo|orchestrated; never blocks or fails a session.
- scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill
conditions and structured output.
Mechanical fields (risk, files_changed, ids, provenance) are written by the
hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are
merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true.
Independent review remediations: empty/all-.mosaic diff still writes a sidecar
(grep no-match no longer aborts); session_id sanitized before path use.
Refs #544
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Break PRD into 8 milestones (Phase 0–7) with 59 issues on Gitea.
Populate TASKS.md, update mission manifest, initialize scratchpad.
Repo created at git.mosaicstack.dev/mosaic/mosaic-stack.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>