mosaicstack/stack

Fork 0

Files

Hermes Agent b76666166e

ci/woodpecker/push/ci Pipeline was successful

Details

ci/woodpecker/pr/ci Pipeline was successful

Details

feat(agent-reflection): durable kernel — reflection.v1 capture + risk-floor + Phase-0 (#544 )

Build the durable kernel of the agent reflection loop. Passive end-of-run
capture of the doer's end-state as structured `reflection.v1` data, plus a
deterministic diff review risk-floor. The closed calibration/skill-synthesis
loop (design §7–§8) stays gated behind Phase-0 experiments P1/P2/P3.

- packages/macp: evaluateRiskFloor (pure, deterministic surface classifier)
  + reflection.v1 JSON Schema; 15 unit tests.
- packages/types: reflection.v1 zod schemas + self-report DTO; 10 unit tests.
- framework: fail-closed Stop hook (reflect-stop-hook.sh) writing the sidecar,
  registered as hooks.Stop in runtime/claude/settings.json. Strict no-op unless
  REFLECTION_MODE=solo|orchestrated; never blocks or fails a session.
- scripts/analysis: P1/P2/P3 experiment harnesses with pre-registered kill
  conditions and structured output.

Mechanical fields (risk, files_changed, ids, provenance) are written by the
hook; self-report fields (confidence, most_likely_wrong, known_not_in_diff) are
merged from an optional $REFLECTION_INPUT, else null + provenance.degraded=true.

Independent review remediations: empty/all-.mosaic diff still writes a sidecar
(grep no-match no longer aborts); session_id sanitized before path use.

Refs #544

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-16 15:55:15 -05:00

10 KiB

Raw Blame History

PRD — Agent Reflection Loop (durable kernel)

Issue: #544 Source design: jarvis-brain docs/planning/AGENT-REFLECTION-LOOP.md (commit df6576fc, debate-hardened v2) Status: in-progress Scope rule: Build the durable kernel only. The closed calibration/skill-synthesis loop (design §7–§8) is gated behind Phase-0 experiments P1/P2/P3 and is explicitly out of scope here.

1. Problem

At end-of-run an agent holds context that never reaches the diff or the "done" message — assumptions, shortcuts, untested paths, the single most-likely way the work is wrong. That context is what a lead/human needs to judge trust, and it evaporates when the session ends. Capture it mechanically as structured data (reflection.v1), and derive a review risk-floor from the change surface so risky diffs are flagged for independent review.

2. Non-goals (gated on Phase-0)

No closed calibration loop (predicted-vs-actual scoring as a routing input).
No skill synthesis.
No automated reviewer routing/dispatch. The kernel writes the sidecar; pickup is future work.

3. Components & exact placement (main-branch truth)

#	Component	Path	Mirror
a	Stop hook (capture)	`packages/mosaic/framework/tools/qa/reflect-stop-hook.sh`	`tools/qa/prevent-memory-write.sh`
a	Hook registration	`packages/mosaic/framework/runtime/claude/settings.json` (`hooks.Stop`)	existing `PreToolUse`/`PostToolUse`
b	JSON Schema	`packages/macp/src/schemas/reflection.v1.schema.json`	`schemas/task.schema.json`
b	TS types (zod) + DTO	`packages/types/src/reflection/{index.ts,reflection.dto.ts}` + re-export from `src/index.ts`	`packages/types/src/federation/*`
c	Diff risk-floor	`packages/macp/src/risk-floor.ts` (+ `__tests__/risk-floor.test.ts`, export from `src/index.ts`)	`packages/macp/src/gate-runner.ts`
d	Phase-0 scripts	`scripts/analysis/reflect-{git-history,board-history,calibration}.sh`	`scripts/publish-npmjs.sh`

Activation note (deliberate deviation): the settings-overlays/ directory has no merge mechanism (referenced only in docs), so a hooks overlay there would be inert. The Stop hook is registered in the canonical runtime/claude/settings.json — the same file the mosaic launcher reflects into ~/.claude/settings.json (verified byte-identical hooks live there). Still fully vendored in-repo.

4. `reflection.v1` schema (authoritative field list)

{
  "schema": "reflection.v1", // literal
  "task_ref": "string", // canonical task ref; kernel derives from REFLECTION_TASK_REF or repo+branch
  "agent": "string", // persona/runtime id (REFLECTION_AGENT or "unknown")
  "session_id": "string", // from Stop payload session_id, else "unknown"
  "timestamp": "string", // ISO-8601 UTC
  "repo": "string", // repo root basename
  "confidence": 0.0, // FLOAT [0,1] — SELF-REPORTED (optional; null if not supplied)
  "most_likely_wrong": {
    // SELF-REPORTED (optional)
    "surface": "auth|data|infra|ui|build|test|docs|none",
    "description": "string",
  },
  "known_not_in_diff": "string|null", // SELF-REPORTED: "what I know that isn't visible in the diff"
  "risk": {
    // MECHANICAL — from risk-floor
    "needs_review": true,
    "score": 0.0, // [0,1]
    "surface": "auth|data|infra|ui|build|test|docs|none",
    "reason": "string",
  },
  "files_changed": ["string"], // MECHANICAL — git diff name-only
  "provenance": {
    "source": "stop-hook",
    "reflection_attempt": 1,
    "degraded": false, // true if self-report inputs missing/unreadable
    "reflection_mode": "off|solo|orchestrated",
  },
}

Mechanical vs self-reported. A bash Stop hook cannot author the agent's self-assessment. The hook populates the mechanical fields deterministically (risk, files_changed, provenance, ids). The self-reported fields are read from an optional agent-supplied input file ($REFLECTION_INPUT, default <repo>/.mosaic/reflection-input.json) and merged if present; absent/unreadable → those fields null and provenance.degraded=true. This realizes the design's "hook is a pre-seed, not the asker" (§4).

5. Stop hook behavior (fail-closed, non-blocking)

Read Stop payload JSON from stdin.
Fail-closed: if REFLECTION_MODE is unset or off → exit 0 immediately (strict no-op). This is the global-registration safety guarantee.
Sentinel guard: if <sidecar>.lock exists → exit 0 (prevents re-fire loops). Create it, trap cleanup.
Determine output dir: $REFLECTION_DIR else <repo>/.mosaic/reflections/. mkdir -p.
Compute mechanical fields: git diff --name-only (HEAD + staged + worktree, best-effort), call risk-floor logic (inline bash port OR node -e into @mosaicstack/macp — see §6), session ids from payload + env.
Merge optional $REFLECTION_INPUT self-report if readable JSON.
Write reflection.v1 to a temp file, mv (atomic) to <dir>/<session>-<ts>.reflection.json.
Always exit 0. Never emit a decision field (Stop hooks are observational).

Hook must never fail the session: wrap risky steps, default to degraded:true on any error, exit 0.

6. Risk-floor (`packages/macp/src/risk-floor.ts`)

Pure, deterministic, no IO. Single source of truth for the verdict; the hook calls it via node --input-type=module -e (importing the built package) or, to avoid a node dependency in the hook path, the hook ports the same surface table. Decision: implement the canonical logic in TS (tested), and have the hook shell out to node when available, else fall back to a minimal inline classifier flagged degraded:true. (Keep the TS the authority; the inline path is a safety net.)

export type ReviewSurface = 'auth' | 'data' | 'infra' | 'ui' | 'build' | 'test' | 'docs' | 'none';
export interface RiskFloorInput {
  filesChanged: string[];
  insertions?: number;
  deletions?: number;
}
export interface RiskFloorVerdict {
  needs_review: boolean;
  score: number;
  surface: ReviewSurface;
  reason: string;
}
export function evaluateRiskFloor(input: RiskFloorInput): RiskFloorVerdict;

Surface classification by path regex (first match wins, highest-risk surface dominates):

auth (weight 1.0): auth, login, session, token, permission, rbac, credential, secret
data (0.9): migration, prisma, schema, \.sql, entity, repository, seed
infra (0.85): docker, \.woodpecker, compose, traefik, deploy, helm, k8s, terraform
build (0.6): package.json, tsconfig, turbo.json, pnpm-, \.config\., eslint, vite
ui (0.4): \.tsx, \.css, components/, apps/web/
test (0.2): \.spec\., \.test\., __tests__/
docs (0.1): \.md, docs/
none (0.0): anything else

needs_review = score >= THRESHOLD (default 0.5, overridable). reason names the files+surface that tripped it. Subordinate to CI: this is a floor (minimum review requirement) only; consumers MUST treat CI/tests as authoritative above the floor (precedence: CI/tests > human merge > reviewer verdict > self-reflection). Documented in the module header.

7. Phase-0 experiment scripts (`scripts/analysis/`)

Offline, no-infra bash. Each script: #!/usr/bin/env bash, set -euo pipefail, header Usage: + Requirements:, flag parsing, prints its pre-registered kill condition, emits structured (JSON/markdown) output. They are harnesses + rubrics — real corpora are wired later.

reflect-git-history.sh (P2 — only-self-reflection bucket): scan git log for failure signals (reverts, fix:/hotfix shortly after a feature merge) over a window; classify each by which gate would catch it (CI / human-review / only-self-reflection) via a pre-registered heuristic; tally. Kill: bucket-3 near-empty → no §7/§8.
reflect-board-history.sh (P3 — outcome detectability): given a task/board export (or the git history of data/ task files), measure the fraction of completed tasks with a machine-detectable correct/wrong signal within 30 days. Kill: base-rate < 20% → caveat-notes only.
reflect-calibration.sh (P1 — confidence signal): consume a labeled corpus (JSONL of {confidence, correct}), compute discrimination (AUC/lift) on the self-rated-high subset, print the metric vs the pre-registered chance threshold. Kill: AUC ≈ chance on the high subset → no §7/§8.

8. CI / quality gates

TS packages: pnpm typecheck (tsc --noEmit), pnpm lint (eslint), pnpm format:check (prettier), pnpm test (vitest). ESM, NodeNext, .js import specifiers, *.dto.ts at boundaries.
New files in existing packages need no CI config change; add ≥1 vitest spec per new TS module.
Bash scripts/hook are dev/runtime tooling, not CI-built; keep them shellcheck-clean.

9. Acceptance criteria

REFLECTION_MODE unset → hook is a strict no-op (exit 0, no file written). (test)
With REFLECTION_MODE=solo, hook writes a schema-valid reflection.v1 with correct mechanical fields; self-report merged when $REFLECTION_INPUT present, degraded:true when absent.
evaluateRiskFloor deterministic across all surfaces; unit-tested incl. auth/data/infra → review, docs/test → no review, empty → none/no review.
reflection.v1 zod type + JSON Schema agree; sidecar validates against the schema.
Phase-0 scripts run offline, print kill conditions, emit structured output, shellcheck-clean.
pnpm typecheck && pnpm lint && pnpm format:check && pnpm test green; independent review passed.

10 KiB Raw Blame History Unescape Escape