feat(agent-reflection): durable kernel — reflection.v1 capture + risk-floor + Phase-0 (#544)
Build the durable kernel of the agent reflection loop. Passive end-of-run capture of the doer's end-state as structured `reflection.v1` data, plus a deterministic diff review risk-floor. The closed calibration/skill-synthesis loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3. - packages/macp: evaluateRiskFloor (pure, deterministic surface classifier) + reflection.v1 JSON Schema; 15 unit tests. - packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests. - framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar, registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless REFLECTION_MODE=solo|orchestrated; never blocks or fails a session. - scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill conditions and structured output. Mechanical fields (risk, files_changed, ids, provenance) are written by the hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true. Independent review remediations: empty/all-.mosaic diff still writes a sidecar (grep no-match no longer aborts); session_id sanitized before path use. Refs #544 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
55
docs/scratchpads/544-agent-reflection-loop.md
Normal file
55
docs/scratchpads/544-agent-reflection-loop.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Scratchpad — #544 Agent Reflection Loop (durable kernel)
|
||||
|
||||
**Started:** 2026-06-16 · **Branch:** `feat/agent-reflection-loop` · **Base:** `main` @ c461380
|
||||
|
||||
## Goal
|
||||
|
||||
Bake the durable kernel of the agent reflection loop into the Mosaic Stack
|
||||
monorepo through full delivery gates. Kernel only; closed loop (§7–§8) gated on
|
||||
Phase-0. Authoritative spec: `docs/plans/agent-reflection-loop-PRD.md`. Task
|
||||
breakdown: `docs/tasks/544-agent-reflection-loop.md`.
|
||||
|
||||
## Timeline / decisions
|
||||
|
||||
- Mapped house style against `main` truth (the earlier recon had mapped a dirty
|
||||
feature branch and returned non-existent paths; re-cloned `main` clean).
|
||||
- macp uses co-located `*.spec.ts`; types uses `src/<mod>/{*.ts, *.dto.ts, __tests__/*.spec.ts}`.
|
||||
- zod v4 + class-validator/class-transformer present in `@mosaicstack/types`;
|
||||
`packages/types/tsconfig.json` enables `experimentalDecorators`/`emitDecoratorMetadata`.
|
||||
- **Gotcha (fixed):** `class-transformer`'s `@Type` calls `Reflect.getMetadata`
|
||||
at module-load time; the types vitest env has no `reflect-metadata`, so any test
|
||||
importing the reflection barrel crashed on import. `chat.dto.ts` avoids this by
|
||||
using class-validator only. Fix: dropped `@Type`/`@ValidateNested` from the DTO;
|
||||
zod owns deep nested validation.
|
||||
- **Gotcha (fixed):** Stop hook `EXIT` trap referenced a `main`-local `lock` →
|
||||
`unbound variable` under `set -u` at exit. Promoted to a global `LOCKFILE`.
|
||||
- **Gotcha (fixed):** the hook's own lock + `.mosaic/` scratch leaked into
|
||||
`files_changed`. Excluded `^\.mosaic/` from the change-surface scan.
|
||||
|
||||
## Verification evidence
|
||||
|
||||
- macp: typecheck OK, lint OK, **88 tests pass** (15 new risk-floor).
|
||||
- types: typecheck OK, lint OK, **64 tests pass** (10 new reflection).
|
||||
- Root: `pnpm typecheck` (41 tasks), `pnpm lint` (23), `pnpm format:check`, `pnpm build` (23) — all green.
|
||||
- Stop hook smoke (throwaway git repo): TEST1 no-op (mode unset, 0 files);
|
||||
TEST2 solo degraded, `.mosaic/` excluded, auth→needs_review; TEST3 self-report
|
||||
merged, degraded=false; TEST4 lock suppresses re-fire. All pass, always exit 0.
|
||||
- shellcheck clean: hook + `reflect-{git-history,board-history,calibration}.sh`.
|
||||
- Phase-0 smoke: P2 on this repo (142 failures classified), P1 AUC=0.875 on a
|
||||
synthetic fixture, P3 base-rate on a synthetic board — all emit structured output
|
||||
- kill conditions.
|
||||
|
||||
## Open risks / follow-ups
|
||||
|
||||
- Full `pnpm test` (DB-bound packages) validated via CI's postgres service, not
|
||||
locally; affected packages (macp, types) are DB-independent and green here.
|
||||
- sequential-thinking MCP was registered mid-session (effective next session);
|
||||
this session compensated with the written PRD as the planning artifact.
|
||||
- Phase-0 corpora are not yet wired — scripts are harnesses + pre-registered
|
||||
rubrics (P1/P2/P3 tasks tracked in jarvis-brain `agent-reflection-loop` project).
|
||||
|
||||
## Gate status
|
||||
|
||||
- [x] PRD authored · [x] issue #544 created + linked · [x] code + tests
|
||||
- [x] local gates green · [ ] independent code review · [ ] PR opened
|
||||
- [ ] CI terminal green · [ ] merged to main · [ ] issue closed
|
||||
Reference in New Issue
Block a user