refactor(framework): thin-core prompt diet — cut injected contract ~53%

Thin defaults/AGENTS.md + defaults/TOOLS.md + runtime/claude/RUNTIME.md to a core that references guides on demand; preserve the full tool catalog in new guides/TOOLS-REFERENCE.md. All 12 hard gates verbatim; validated via a deterministic gate-checklist + a 9-probe headless A/B (thin 7/9 vs monolith 5/9). Composed AGENTS+TOOLS+RUNTIME 8,827->4,122 tok. Diet-only against repo content; no behavioral change, nothing imported from drifted deployments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 16:39:56 -05:00
parent dde95a59b3
commit 52aa5d7982
6 changed files with 461 additions and 527 deletions
--- a/scratchpads/contract-thin-core.md
+++ b/scratchpads/contract-thin-core.md
@@ -0,0 +1,29 @@
+# Scratchpad — Thin-core prompt diet (#528)
+
+**Branch:** `feat/contract-thin-core` · **Issue:** #528 · **Mode:** Delivery
+
+## Objective
+Cut the always-injected contract (`defaults/AGENTS.md` + `defaults/TOOLS.md` + `runtime/claude/RUNTIME.md`, inlined every turn by the launcher) without losing any hard gate. Restore the original "thin core + on-demand guides" intent.
+
+## Change
+- `defaults/AGENTS.md` → thin core: 12 hard gates verbatim, 37 operating rules condensed to ~15 bullets (detail already in `guides/E2E-DELIVERY.md`), Superpowers condensed, load order made on-demand (no halt-on-missing for STANDARDS), conditional guide-loading index retained.
+- `defaults/TOOLS.md` → index; full catalog moved to new `guides/TOOLS-REFERENCE.md` (read on demand).
+- `runtime/claude/RUNTIME.md` → slimmed (dedup tier table, terser pointers).
+
+## Method (autoresearch-style validation)
+1. Built a 9-probe role battery (backend/deploy/review/orchestrate/secrets/docs/simple-trap/no-stop-at-PR/agent-work) + a deterministic 18-signature gate-checklist.
+2. Headless interactive runs (Claude Max **subscription**, tmux — no API), scored by per-probe rubric.
+3. Keep-or-discard hill-climb (token cost gated by per-probe fidelity) proved the method; final design re-derived against THIS repo's content (diet-only, no drifted-deployment content imported).
+
+## Validation evidence
+- Gate-checklist: ALL gates + critical rules + mode lines + sequential-thinking + OpenBrain + Superpowers present.
+- A/B on real repo content: **thin 7/9 vs monolith 5/9** probes; strictly better on deploy/review/simple-task; composed **8,827 → 4,122 tok (−53%)**.
+- p11 (don't-stop-at-PR): 3→2/3 on one rubric line — verified a scorer/phrasing artifact (answer correctly cites gates §5/§9 + close-issue; gate verbatim-present). Variance: thin 2/2/3, v0 3/3/3.
+
+## Decisions / risks
+- **Diet-only** vs repo content (user decision). Did NOT import web1's Gate 13-15 / federated memory / OpenViking — canonical repo is behind those deployments; flagged for separate reconciliation.
+- AGENTS/TOOLS are shared across runtimes → diet benefits codex/pi/opencode too; RUNTIME change is claude-only.
+- p11 accepted as-is (user decision) — not gaming the rubric.
+
+## Status
+PR open, paused for maintainer merge ratification (fleet-governing change). `mosaic upgrade` will propagate on merge.