Build the durable kernel of the agent reflection loop. Passive end-of-run capture of the doer's end-state as structured `reflection.v1` data, plus a deterministic diff review risk-floor. The closed calibration/skill-synthesis loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3. - packages/macp: evaluateRiskFloor (pure, deterministic surface classifier) + reflection.v1 JSON Schema; 15 unit tests. - packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests. - framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar, registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless REFLECTION_MODE=solo|orchestrated; never blocks or fails a session. - scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill conditions and structured output. Mechanical fields (risk, files_changed, ids, provenance) are written by the hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true. Independent review remediations: empty/all-.mosaic diff still writes a sidecar (grep no-match no longer aborts); session_id sanitized before path use. Refs #544 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
56 lines
3.0 KiB
Markdown
56 lines
3.0 KiB
Markdown
# Scratchpad — #544 Agent Reflection Loop (durable kernel)
|
||
|
||
**Started:** 2026-06-16 · **Branch:** `feat/agent-reflection-loop` · **Base:** `main` @ c461380
|
||
|
||
## Goal
|
||
|
||
Bake the durable kernel of the agent reflection loop into the Mosaic Stack
|
||
monorepo through full delivery gates. Kernel only; closed loop (§7–§8) gated on
|
||
Phase-0. Authoritative spec: `docs/plans/agent-reflection-loop-PRD.md`. Task
|
||
breakdown: `docs/tasks/544-agent-reflection-loop.md`.
|
||
|
||
## Timeline / decisions
|
||
|
||
- Mapped house style against `main` truth (the earlier recon had mapped a dirty
|
||
feature branch and returned non-existent paths; re-cloned `main` clean).
|
||
- macp uses co-located `*.spec.ts`; types uses `src/<mod>/{*.ts, *.dto.ts, __tests__/*.spec.ts}`.
|
||
- zod v4 + class-validator/class-transformer present in `@mosaicstack/types`;
|
||
`packages/types/tsconfig.json` enables `experimentalDecorators`/`emitDecoratorMetadata`.
|
||
- **Gotcha (fixed):** `class-transformer`'s `@Type` calls `Reflect.getMetadata`
|
||
at module-load time; the types vitest env has no `reflect-metadata`, so any test
|
||
importing the reflection barrel crashed on import. `chat.dto.ts` avoids this by
|
||
using class-validator only. Fix: dropped `@Type`/`@ValidateNested` from the DTO;
|
||
zod owns deep nested validation.
|
||
- **Gotcha (fixed):** Stop hook `EXIT` trap referenced a `main`-local `lock` →
|
||
`unbound variable` under `set -u` at exit. Promoted to a global `LOCKFILE`.
|
||
- **Gotcha (fixed):** the hook's own lock + `.mosaic/` scratch leaked into
|
||
`files_changed`. Excluded `^\.mosaic/` from the change-surface scan.
|
||
|
||
## Verification evidence
|
||
|
||
- macp: typecheck OK, lint OK, **88 tests pass** (15 new risk-floor).
|
||
- types: typecheck OK, lint OK, **64 tests pass** (10 new reflection).
|
||
- Root: `pnpm typecheck` (41 tasks), `pnpm lint` (23), `pnpm format:check`, `pnpm build` (23) — all green.
|
||
- Stop hook smoke (throwaway git repo): TEST1 no-op (mode unset, 0 files);
|
||
TEST2 solo degraded, `.mosaic/` excluded, auth→needs_review; TEST3 self-report
|
||
merged, degraded=false; TEST4 lock suppresses re-fire. All pass, always exit 0.
|
||
- shellcheck clean: hook + `reflect-{git-history,board-history,calibration}.sh`.
|
||
- Phase-0 smoke: P2 on this repo (142 failures classified), P1 AUC=0.875 on a
|
||
synthetic fixture, P3 base-rate on a synthetic board — all emit structured output
|
||
- kill conditions.
|
||
|
||
## Open risks / follow-ups
|
||
|
||
- Full `pnpm test` (DB-bound packages) validated via CI's postgres service, not
|
||
locally; affected packages (macp, types) are DB-independent and green here.
|
||
- sequential-thinking MCP was registered mid-session (effective next session);
|
||
this session compensated with the written PRD as the planning artifact.
|
||
- Phase-0 corpora are not yet wired — scripts are harnesses + pre-registered
|
||
rubrics (P1/P2/P3 tasks tracked in jarvis-brain `agent-reflection-loop` project).
|
||
|
||
## Gate status
|
||
|
||
- [x] PRD authored · [x] issue #544 created + linked · [x] code + tests
|
||
- [x] local gates green · [ ] independent code review · [ ] PR opened
|
||
- [ ] CI terminal green · [ ] merged to main · [ ] issue closed
|