feat(agent-reflection): durable kernel — reflection.v1 capture + risk-floor + Phase-0 (#544)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful

Build the durable kernel of the agent reflection loop. Passive end-of-run
capture of the doer's end-state as structured `reflection.v1` data, plus a
deterministic diff review risk-floor. The closed calibration/skill-synthesis
loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3.

- packages/macp: evaluateRiskFloor (pure, deterministic surface classifier)
  + reflection.v1 JSON Schema; 15 unit tests.
- packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests.
- framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar,
  registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless
  REFLECTION_MODE=solo|orchestrated; never blocks or fails a session.
- scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill
  conditions and structured output.

Mechanical fields (risk, files_changed, ids, provenance) are written by the
hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are
merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true.

Independent review remediations: empty/all-.mosaic diff still writes a sidecar
(grep no-match no longer aborts); session_id sanitized before path use.

Refs #544

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Hermes Agent
2026-06-16 15:55:15 -05:00
parent c461380a4a
commit b76666166e
17 changed files with 1498 additions and 0 deletions

View File

@@ -0,0 +1,55 @@
# Scratchpad — #544 Agent Reflection Loop (durable kernel)
**Started:** 2026-06-16 · **Branch:** `feat/agent-reflection-loop` · **Base:** `main` @ c461380
## Goal
Bake the durable kernel of the agent reflection loop into the Mosaic Stack
monorepo through full delivery gates. Kernel only; closed loop (§7§8) gated on
Phase-0. Authoritative spec: `docs/plans/agent-reflection-loop-PRD.md`. Task
breakdown: `docs/tasks/544-agent-reflection-loop.md`.
## Timeline / decisions
- Mapped house style against `main` truth (the earlier recon had mapped a dirty
feature branch and returned non-existent paths; re-cloned `main` clean).
- macp uses co-located `*.spec.ts`; types uses `src/<mod>/{*.ts, *.dto.ts, __tests__/*.spec.ts}`.
- zod v4 + class-validator/class-transformer present in `@mosaicstack/types`;
`packages/types/tsconfig.json` enables `experimentalDecorators`/`emitDecoratorMetadata`.
- **Gotcha (fixed):** `class-transformer`'s `@Type` calls `Reflect.getMetadata`
at module-load time; the types vitest env has no `reflect-metadata`, so any test
importing the reflection barrel crashed on import. `chat.dto.ts` avoids this by
using class-validator only. Fix: dropped `@Type`/`@ValidateNested` from the DTO;
zod owns deep nested validation.
- **Gotcha (fixed):** Stop hook `EXIT` trap referenced a `main`-local `lock`
`unbound variable` under `set -u` at exit. Promoted to a global `LOCKFILE`.
- **Gotcha (fixed):** the hook's own lock + `.mosaic/` scratch leaked into
`files_changed`. Excluded `^\.mosaic/` from the change-surface scan.
## Verification evidence
- macp: typecheck OK, lint OK, **88 tests pass** (15 new risk-floor).
- types: typecheck OK, lint OK, **64 tests pass** (10 new reflection).
- Root: `pnpm typecheck` (41 tasks), `pnpm lint` (23), `pnpm format:check`, `pnpm build` (23) — all green.
- Stop hook smoke (throwaway git repo): TEST1 no-op (mode unset, 0 files);
TEST2 solo degraded, `.mosaic/` excluded, auth→needs_review; TEST3 self-report
merged, degraded=false; TEST4 lock suppresses re-fire. All pass, always exit 0.
- shellcheck clean: hook + `reflect-{git-history,board-history,calibration}.sh`.
- Phase-0 smoke: P2 on this repo (142 failures classified), P1 AUC=0.875 on a
synthetic fixture, P3 base-rate on a synthetic board — all emit structured output
- kill conditions.
## Open risks / follow-ups
- Full `pnpm test` (DB-bound packages) validated via CI's postgres service, not
locally; affected packages (macp, types) are DB-independent and green here.
- sequential-thinking MCP was registered mid-session (effective next session);
this session compensated with the written PRD as the planning artifact.
- Phase-0 corpora are not yet wired — scripts are harnesses + pre-registered
rubrics (P1/P2/P3 tasks tracked in jarvis-brain `agent-reflection-loop` project).
## Gate status
- [x] PRD authored · [x] issue #544 created + linked · [x] code + tests
- [x] local gates green · [ ] independent code review · [ ] PR opened
- [ ] CI terminal green · [ ] merged to main · [ ] issue closed