Files

ci/woodpecker/push/ci Pipeline failed

Details

docs(design): mosaic framework constitution — expert conference output

Conference of 7 experts (architect/moonshot/contrarian/coder/aiml/devex/steward)
debated layering, sanitization, upgrade-safety, cross-harness robustness.
Artifacts: BRIEF, 7 positions, 7 rebuttals, synthesis-v1, 3 red-team passes,
canonical DESIGN.md, OPEN-QUESTIONS.md, MISSION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-15 23:47:49 -05:00

16 KiB

Raw Blame History

Rebuttal — AI/ML Prompt-Systems Lens

Author lens: AI/ML Prompt-Systems Expert (how LLMs actually consume system prompts/context; what placement, length, and structure help vs. hurt instruction-following across models and harnesses).

What this rebuttal does: keeps the best cross-persona ideas, attacks the proposals that will degrade real agent behavior (my lens's job), and sharpens the one disagreement that is genuinely a prompt-systems question — what actually makes a gate get followed — with a concrete resolution. All claims grounded in files under packages/mosaic/framework/, verified this pass.

1. The strongest ideas from other personas worth keeping

1.1 Contrarian: "subtraction before structure" is the load-bearing insight of the whole conference

position-contrarian.md (§"The one thing I'd die on") is the only paper that correctly identifies the primary disease as duplication-and-contradiction, not missing layers — and that adding a CONSTITUTION.md without deleting the four existing restatements yields "five disagreeing law files instead of four, plus a prettier diagram." This is exactly right from the prompt-systems view, and the repo proves it: I verified templates/agent/AGENTS.md.template:12-13 emits ~/.config/mosaic/rails/git/... while defaults/AGENTS.md:30 uses ~/.config/mosaic/tools/git/.... That is not a hypothetical drift risk — it is a live contradiction shipping to the model today, and an agent that follows the template's queue-guard command runs a path the installer deletes. Every persona that proposed a new layer should be forced to pass through the Contrarian's gate first: a layer is worth exactly the deletions and the CI grep that accompany it. I endorse this without reservation; it is the same conclusion my position paper reached from the attention-budget angle, arrived at independently.

1.2 DevEx + Contrarian: "hooks are the real enforcement; prose is advisory" should be promoted to doctrine

position-devex.md (DQ4 §4) and position-contrarian.md (DQ5 §4) both land on the single most important empirical fact in the repo, and it is already documented in the code: runtime/claude/RUNTIME.md (Memory Policy) says verbatim that MEMORY.md is write-blocked by prevent-memory-write.sh because "the rule alone proved insufficient — the hook is the hard gate." That is a prompt-systems result the maintainer learned the hard way: a prose MUST is a probability, a hook is a wall. From my lens this is the correct response to instruction-density decay — every checkable gate you move from prose to mechanism is a gate that no longer competes for attention in the resident context and no longer degrades when the window fills. Keep this; make it Constitution doctrine: "a hard gate that can be enforced by a hook/CI MUST be, on harnesses that support it; the prose is the spec, the hook is the enforcement."

1.3 DevEx: capability-verbs in the Constitution, tool-names in the adapter

position-devex.md (DQ4 §2, the capabilities.json manifest) is the best concrete cross-harness mechanism proposed. The grounding is real: adapters/pi.md states "Native thinking levels replace sequential-thinking MCP," while defaults/AGENTS.md:143 says "Sequential-thinking MCP is REQUIRED. If unavailable... stop." Those two sentences are a live contradiction across harnesses — the global gate is already false for Pi, and Pi's runtime had to carve out an exception in prose. DevEx's fix — the Constitution speaks in capability verbs ("use structured reasoning before planning"), the adapter binds the verb to a concrete tool and declares whether absence is a hard stop — is the correct prompt-systems shape: it removes a contradiction from the resident context instead of asking the model to reconcile it under task pressure. This directly serves my non-negotiable (one fact, one place, zero contradictions in-window).

2. The weakest / riskiest proposals — with concrete failure modes

2.1 Moonshot's YAML front-matter precedence headers — actively harmful to instruction-following

position-moonshot.md (DQ1) proposes adding machine-readable front matter to each resident file:

---
mosaic-layer: 0
mosaic-owner: framework
mosaic-override: forbidden
---

Failure mode (prompt-systems): this is the worst possible thing to put at the primacy position of a resident document. The top of the injected blob is the highest-attention real estate in the entire context — primacy is where the model anchors hardest. Moonshot wants to spend it on mosaic-owner: framework, which is metadata for a launcher that the model is not. The model does not parse YAML as inert config; it reads every token as potential instruction. Putting mosaic-override: forbidden at the top of CONSTITUTION.md teaches the model nothing about which behavior is forbidden — it burns the single most valuable placement slot on a key-value pair whose audience is a bash script. Worse, it normalizes a pattern where every file grows a 4-line metadata header; across L0+L2+L3+L4 that is ~16 lines of zero-behavioral-value tokens injected before the agent reads a single gate. The launcher's content-hash check Moonshot wants is a fine idea — but it belongs in install.sh/mosaic doctor reading the file on disk, never in the bytes injected into the model's context. Resolution: keep the hash-check; move the metadata to a sidecar (constitution/.manifest.json) that the model never sees.

2.2 Moonshot's "exactly 500 words" hard budget — a precise number defended with a vague mechanism

position-moonshot.md (DQ5, and its Single Strongest Recommendation) demands CONSTITUTION.md be "exactly 500 words or fewer." I share the instinct — I argued for a budget myself — but the specific number is asserted, not derived, and the failure mode is real: 500 words cannot hold the 13 hard gates verbatim. I counted them in defaults/AGENTS.md:23-37; gate #13 (the merge-authority carve-out) alone is ~110 words. Force the whole gate set under 500 words and you must compress the gates, which means paraphrasing law — and paraphrased law is exactly the drift vector every persona (including Moonshot) says to kill. A word-count budget that forces lossy compression of the normative text is self-defeating. Resolution: budget the resident core as a whole (gates + escalation + block/done + precedence + routing pointer), enforce it by line count in CI as several papers propose, but let the gates keep their full unambiguous wording and push procedure (the ci-queue-wait.sh --purpose invocation at AGENTS.md:30, the wrapper paths) out to the on-demand E2E-DELIVERY.md. Budget the container, not the constitution's clarity.

2.3 Architect's per-file/per-layer version stamps + 3-way merge (also DevEx, Moonshot) — over-engineering that the Contrarian correctly flagged

position-architect.md (DQ3) and position-devex.md (DQ3) propose per-file template versions plus a git merge-file-style 3-way merge of user files on upgrade. position-contrarian.md (DQ3) explicitly warns against exactly this: per-file pins create "a combinatorial matrix of (framework vN, user pinned vM) states that no one will test." The Contrarian is right, and there is a prompt-systems-specific aggravation the merge advocates missed: a 3-way merge can emit conflict markers (<<<<<<<, =======, >>>>>>>) into SOUL.md/USER.md — which are resident files. If a merge half-resolves and leaves a conflict marker in a persona file, the model reads <<<<<<< theirs as content and behaves erratically — this is the same failure class as my own paper's "half-rendered {{TEMPLATE}} token" warning, and it is worse because conflict markers look like structure. A reconciliation engine that can inject conflict markers into the agent's identity file is a net-negative for behavior. Resolution: for the alpha, use the simpler include-overlay pattern the Contrarian and Coder converge on (STANDARDS.local.md, SOUL.local.md — framework file pristine and overwritten, user delta in a never-touched sibling, loaded last-within-layer). Defer 3-way merge to post-alpha, and if it ever ships, it MUST write conflicts to a .mosaic-merge sidecar the model never loads, never into the resident file in place.

3. The key disagreement most relevant to my lens — and how to resolve it

The disagreement: what actually makes a gate get followed across harnesses?

There are two camps, and they are arguing past each other:

Camp "read-the-file is fine." position-coder.md (Biggest Risk + Single Strongest Recommendation) and position-steward.md (DQ4) want the Constitution to be self-bootstrapping by file-read: AGENTS.md says "if CONSTITUTION.md is not already in context, read it now," so enforcement does not depend on the launcher. Coder calls this the one change that "makes the Constitution harness-agnostic by construction."
Camp "injection-by-value or it's advisory." position-devex.md (DQ4, Biggest Risk) and position-contrarian.md (DQ4 §1) say a "please read this file" pointer is a fundamentally weaker enforcement tier than system-prompt injection, and that defaults/AGENTS.md:11 ("The core contract is ALREADY in your context... Do not re-read it") is literally false on a direct claude launch, where only the thin ~/.claude/CLAUDE.md pointer exists. An agent that believes a false "already loaded" claim skips loading the gates.

I verified the ground truth and both camps are half-right, which is why this needs a prompt-systems ruling rather than a vote. adapters/pi.md confirms Pi injects the full contract via --append-system-prompt (Tier 1, strong). runtime/claude/RUNTIME.md confirms Claude's runtime instructs a load order ("Follow the Session Start load order in ~/.config/mosaic/AGENTS.md") — a Tier 3 file-read nudge. So the law reaches Pi as a system prompt and reaches Claude as an instruction-to-go-read. These are not equivalent for instruction-following, and no amount of "self-bootstrapping" prose closes the gap, because the self-bootstrap instruction itself lives in the weakly-injected file — it is turtles all the way down.

Why the file-read camp is wrong as the primary mechanism (my lens's verdict)

A deferred read ("go load the law before you act") competes with task salience. Under a concrete task — "fix this failing test and push" — the model's attention is pulled to the task tokens, and a meta-instruction to first go read a file is exactly the kind of procedural preamble models shed under load. This is the same mechanism that produced defaults/AGENTS.md:36 (gate #12), the "COMPLEXITY TRAP" warning that exists because agents keep skipping intake. The framework's own history is the evidence: when prose said "do the intake," agents skipped it, and the response was a louder prose rule. A louder "go read the law" pointer will fail the same way. Coder's self-bootstrap is a good fallback but a bad primary.

Why the injection camp is wrong if it stops there

System-prompt injection is necessary but not sufficient. A 155-line resident blob injected strongly is still subject to lost-in-the-middle: I confirmed defaults/AGENTS.md carries the 13 gates plus 15 "Non-Negotiable Operating Rules" plus mode protocol plus escalation plus subagent tiers plus superpowers — ~33 imperatives across four "importance" framings (CRITICAL HARD GATES, Non-Negotiable Operating Rules, mode Hard Rule, Other Hard Rules). Inject all of that strongly and you have strongly placed mush: the model cannot weight gate #5 (real-completion-definition) over rule #28 (milestone versioning) when both are tagged "hard." Strong injection of a bloated core just guarantees the model reliably receives content it cannot prioritize.

Resolution — a three-part enforcement contract, ordered by what the physics rewards

The disagreement resolves cleanly once you stop treating "injection vs. read" as binary and rank enforcement by what actually moves adherence:

Mechanical first (highest reliability). Every gate that is checkable becomes a hook/CI check — adopting the DevEx/Contrarian doctrine (§1.2) and the repo's own prevent-memory-write.sh precedent. no-force-merge, green-CI-before-done, no-hardcoded-secrets, and the rails/-path-drift bug are all mechanically checkable. A hook does not care about attention budget or injection tier. Move as many gates here as possible; this shrinks the prose that must be resident.
System-prompt-resident second, byte-identical across harnesses, and TINY. The irreducible non-checkable gates (the ones that govern when the agent stops — block-vs-done, escalation triggers, real-completion-definition) must be injected by value into the system prompt on every harness, identically. This is the injection camp's correct half — but it only works because step 1 drained the checkable gates out, keeping the resident core small enough to survive lost-in-the-middle. Place it at primacy, restate the ~5-bullet gate summary at the recency anchor (bottom). For direct/un-launched harnesses where injection is impossible, the pointer carries the 5-bullet summary inline, never a bare "go read the law" and never the false AGENTS.md:11 claim that it's "already in your context" — fix that line; it is actively teaching the model to skip the gates.
File-read third, as fallback only. Coder's self-bootstrap read is the safety net for the case where injection silently failed — valuable, but explicitly the weakest tier, never the thing the design relies on.

The single sentence that resolves the conference's central tension, from my lens: enforcement strength is mechanical > resident-by-value > file-read, and you earn the right to a strongly-injected Constitution only by first making it small enough to survive attention — which means moving every checkable gate to a hook and every procedure to an on-demand guide. Injection-by-value and minimalism are not competing proposals; minimalism is the precondition that makes injection-by-value actually work.

16 KiB Raw Blame History