feat(agent-reflection): durable kernel — reflection.v1 capture + risk-floor + Phase-0 (#544)

Build the durable kernel of the agent reflection loop. Passive end-of-run capture of the doer's end-state as structured `reflection.v1` data, plus a deterministic diff review risk-floor. The closed calibration/skill-synthesis loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3. - packages/macp: evaluateRiskFloor (pure, deterministic surface classifier) + reflection.v1 JSON Schema; 15 unit tests. - packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests. - framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar, registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless REFLECTION_MODE=solo|orchestrated; never blocks or fails a session. - scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill conditions and structured output. Mechanical fields (risk, files_changed, ids, provenance) are written by the hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true. Independent review remediations: empty/all-.mosaic diff still writes a sidecar (grep no-match no longer aborts); session_id sanitized before path use. Refs #544 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 15:55:15 -05:00
parent c461380a4a
commit b76666166e
17 changed files with 1498 additions and 0 deletions
--- a/docs/plans/agent-reflection-loop-PRD.md
+++ b/docs/plans/agent-reflection-loop-PRD.md
@@ -0,0 +1,173 @@
+# PRD — Agent Reflection Loop (durable kernel)
+
+**Issue:** [#544](http://git.mosaicstack.dev/mosaicstack/stack/issues/544)
+**Source design:** jarvis-brain `docs/planning/AGENT-REFLECTION-LOOP.md` (commit df6576fc, debate-hardened v2)
+**Status:** in-progress
+**Scope rule:** Build the **durable kernel** only. The closed calibration/skill-synthesis loop
+(design §7–§8) is **gated** behind Phase-0 experiments P1/P2/P3 and is explicitly out of scope here.
+
+---
+
+## 1. Problem
+
+At end-of-run an agent holds context that never reaches the diff or the "done" message —
+assumptions, shortcuts, untested paths, the single most-likely way the work is wrong. That context
+is what a lead/human needs to judge trust, and it evaporates when the session ends. Capture it
+mechanically as **structured data** (`reflection.v1`), and derive a **review risk-floor** from the
+change surface so risky diffs are flagged for independent review.
+
+## 2. Non-goals (gated on Phase-0)
+
+- No closed calibration loop (predicted-vs-actual scoring as a routing input).
+- No skill synthesis.
+- No automated reviewer routing/dispatch. The kernel **writes** the sidecar; pickup is future work.
+
+## 3. Components & exact placement (main-branch truth)
+
+| #   | Component            | Path                                                                                             | Mirror                              |
+| --- | -------------------- | ------------------------------------------------------------------------------------------------ | ----------------------------------- |
+| a   | Stop hook (capture)  | `packages/mosaic/framework/tools/qa/reflect-stop-hook.sh`                                        | `tools/qa/prevent-memory-write.sh`  |
+| a   | Hook registration    | `packages/mosaic/framework/runtime/claude/settings.json` (`hooks.Stop`)                          | existing `PreToolUse`/`PostToolUse` |
+| b   | JSON Schema          | `packages/macp/src/schemas/reflection.v1.schema.json`                                            | `schemas/task.schema.json`          |
+| b   | TS types (zod) + DTO | `packages/types/src/reflection/{index.ts,reflection.dto.ts}` + re-export from `src/index.ts`     | `packages/types/src/federation/*`   |
+| c   | Diff risk-floor      | `packages/macp/src/risk-floor.ts` (+ `__tests__/risk-floor.test.ts`, export from `src/index.ts`) | `packages/macp/src/gate-runner.ts`  |
+| d   | Phase-0 scripts      | `scripts/analysis/reflect-{git-history,board-history,calibration}.sh`                            | `scripts/publish-npmjs.sh`          |
+
+**Activation note (deliberate deviation):** the `settings-overlays/` directory has **no merge
+mechanism** (referenced only in docs), so a hooks overlay there would be inert. The Stop hook is
+registered in the canonical `runtime/claude/settings.json` — the same file the `mosaic` launcher
+reflects into `~/.claude/settings.json` (verified byte-identical hooks live there). Still fully
+vendored in-repo.
+
+## 4. `reflection.v1` schema (authoritative field list)
+
+```jsonc
+{
+  "schema": "reflection.v1", // literal
+  "task_ref": "string", // canonical task ref; kernel derives from REFLECTION_TASK_REF or repo+branch
+  "agent": "string", // persona/runtime id (REFLECTION_AGENT or "unknown")
+  "session_id": "string", // from Stop payload session_id, else "unknown"
+  "timestamp": "string", // ISO-8601 UTC
+  "repo": "string", // repo root basename
+  "confidence": 0.0, // FLOAT [0,1] — SELF-REPORTED (optional; null if not supplied)
+  "most_likely_wrong": {
+    // SELF-REPORTED (optional)
+    "surface": "auth|data|infra|ui|build|test|docs|none",
+    "description": "string",
+  },
+  "known_not_in_diff": "string|null", // SELF-REPORTED: "what I know that isn't visible in the diff"
+  "risk": {
+    // MECHANICAL — from risk-floor
+    "needs_review": true,
+    "score": 0.0, // [0,1]
+    "surface": "auth|data|infra|ui|build|test|docs|none",
+    "reason": "string",
+  },
+  "files_changed": ["string"], // MECHANICAL — git diff name-only
+  "provenance": {
+    "source": "stop-hook",
+    "reflection_attempt": 1,
+    "degraded": false, // true if self-report inputs missing/unreadable
+    "reflection_mode": "off|solo|orchestrated",
+  },
+}
+```
+
+**Mechanical vs self-reported.** A bash Stop hook cannot author the agent's self-assessment. The
+hook populates the **mechanical** fields deterministically (risk, files_changed, provenance, ids).
+The **self-reported** fields are read from an optional agent-supplied input file
+(`$REFLECTION_INPUT`, default `<repo>/.mosaic/reflection-input.json`) and merged if present;
+absent/unreadable → those fields null and `provenance.degraded=true`. This realizes the design's
+"hook is a pre-seed, not the asker" (§4).
+
+## 5. Stop hook behavior (fail-closed, non-blocking)
+
+1. Read Stop payload JSON from stdin.
+2. **Fail-closed:** if `REFLECTION_MODE` is unset or `off` → `exit 0` immediately (strict no-op). This
+   is the global-registration safety guarantee.
+3. **Sentinel guard:** if `<sidecar>.lock` exists → `exit 0` (prevents re-fire loops). Create it,
+   `trap` cleanup.
+4. Determine output dir: `$REFLECTION_DIR` else `<repo>/.mosaic/reflections/`. `mkdir -p`.
+5. Compute mechanical fields: `git diff --name-only` (HEAD + staged + worktree, best-effort),
+   call risk-floor logic (inline bash port OR `node -e` into `@mosaicstack/macp` — see §6), session
+   ids from payload + env.
+6. Merge optional `$REFLECTION_INPUT` self-report if readable JSON.
+7. Write `reflection.v1` to a temp file, `mv` (atomic) to `<dir>/<session>-<ts>.reflection.json`.
+8. Always `exit 0`. **Never** emit a `decision` field (Stop hooks are observational).
+
+Hook must never fail the session: wrap risky steps, default to `degraded:true` on any error, exit 0.
+
+## 6. Risk-floor (`packages/macp/src/risk-floor.ts`)
+
+Pure, deterministic, no IO. Single source of truth for the verdict; the hook calls it via
+`node --input-type=module -e` (importing the built package) **or**, to avoid a node dependency in the
+hook path, the hook ports the same surface table. **Decision:** implement the canonical logic in TS
+(tested), and have the hook shell out to node when available, else fall back to a minimal inline
+classifier flagged `degraded:true`. (Keep the TS the authority; the inline path is a safety net.)
+
+```ts
+export type ReviewSurface = 'auth' | 'data' | 'infra' | 'ui' | 'build' | 'test' | 'docs' | 'none';
+export interface RiskFloorInput {
+  filesChanged: string[];
+  insertions?: number;
+  deletions?: number;
+}
+export interface RiskFloorVerdict {
+  needs_review: boolean;
+  score: number;
+  surface: ReviewSurface;
+  reason: string;
+}
+export function evaluateRiskFloor(input: RiskFloorInput): RiskFloorVerdict;
+```
+
+Surface classification by path regex (first match wins, highest-risk surface dominates):
+
+- `auth` (weight 1.0): `auth`, `login`, `session`, `token`, `permission`, `rbac`, `credential`, `secret`
+- `data` (0.9): `migration`, `prisma`, `schema`, `\.sql`, `entity`, `repository`, `seed`
+- `infra` (0.85): `docker`, `\.woodpecker`, `compose`, `traefik`, `deploy`, `helm`, `k8s`, `terraform`
+- `build` (0.6): `package.json`, `tsconfig`, `turbo.json`, `pnpm-`, `\.config\.`, `eslint`, `vite`
+- `ui` (0.4): `\.tsx`, `\.css`, `components/`, `apps/web/`
+- `test` (0.2): `\.spec\.`, `\.test\.`, `__tests__/`
+- `docs` (0.1): `\.md`, `docs/`
+- `none` (0.0): anything else
+
+`needs_review = score >= THRESHOLD` (default `0.5`, overridable). `reason` names the files+surface
+that tripped it. **Subordinate to CI:** this is a _floor_ (minimum review requirement) only;
+consumers MUST treat CI/tests as authoritative above the floor (precedence: CI/tests > human merge >
+reviewer verdict > self-reflection). Documented in the module header.
+
+## 7. Phase-0 experiment scripts (`scripts/analysis/`)
+
+Offline, no-infra bash. Each script: `#!/usr/bin/env bash`, `set -euo pipefail`, header `Usage:` +
+`Requirements:`, flag parsing, **prints its pre-registered kill condition**, emits structured
+(JSON/markdown) output. They are harnesses + rubrics — real corpora are wired later.
+
+- `reflect-git-history.sh` (**P2** — only-self-reflection bucket): scan `git log` for failure signals
+  (reverts, `fix:`/`hotfix` shortly after a feature merge) over a window; classify each by which gate
+  would catch it (CI / human-review / only-self-reflection) via a pre-registered heuristic; tally.
+  Kill: bucket-3 near-empty → no §7/§8.
+- `reflect-board-history.sh` (**P3** — outcome detectability): given a task/board export (or the
+  git history of `data/` task files), measure the fraction of completed tasks with a
+  machine-detectable correct/wrong signal within 30 days. Kill: base-rate < 20% → caveat-notes only.
+- `reflect-calibration.sh` (**P1** — confidence signal): consume a labeled corpus (JSONL of
+  `{confidence, correct}`), compute discrimination (AUC/lift) on the self-rated-high subset, print
+  the metric vs the pre-registered chance threshold. Kill: AUC ≈ chance on the high subset → no §7/§8.
+
+## 8. CI / quality gates
+
+- TS packages: `pnpm typecheck` (tsc --noEmit), `pnpm lint` (eslint), `pnpm format:check`
+  (prettier), `pnpm test` (vitest). ESM, NodeNext, `.js` import specifiers, `*.dto.ts` at boundaries.
+- New files in existing packages need no CI config change; add ≥1 vitest spec per new TS module.
+- Bash scripts/hook are dev/runtime tooling, not CI-built; keep them `shellcheck`-clean.
+
+## 9. Acceptance criteria
+
+1. `REFLECTION_MODE` unset → hook is a strict no-op (`exit 0`, no file written). **(test)**
+2. With `REFLECTION_MODE=solo`, hook writes a schema-valid `reflection.v1` with correct mechanical
+   fields; self-report merged when `$REFLECTION_INPUT` present, `degraded:true` when absent.
+3. `evaluateRiskFloor` deterministic across all surfaces; unit-tested incl. auth/data/infra → review,
+   docs/test → no review, empty → `none`/no review.
+4. `reflection.v1` zod type + JSON Schema agree; sidecar validates against the schema.
+5. Phase-0 scripts run offline, print kill conditions, emit structured output, shellcheck-clean.
+6. `pnpm typecheck && pnpm lint && pnpm format:check && pnpm test` green; independent review passed.