docs(framework): add agency & persistence patterns to config + guides

Seven additive behavioral rules distilled from the Claude Code system prompt, competitor autonomous-agent prompts (Devin/Cline/Cursor/Windsurf/ Droid/Manus/Replit), and Fable 5 consumer-prompt deltas: - SOUL.md: own-mistakes stance, USER.md formatting override, reversibility heuristic (hard-gate-reconciled), injected-content caution - AGENTS.md: Block vs. Done semantics - E2E-DELIVERY.md: failure-handling retry budget, pre-done self-interrogation - ORCHESTRATOR.md: worker-prompt-quality standard, trust-but-verify - QA-TESTING.md: integrity guardrails Additive only (+37/-0). Independent review passed (one remediation applied). Refs #542 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 23:09:47 -05:00
parent c461380a4a
commit d481a74a86
5 changed files with 37 additions and 0 deletions
--- a/packages/mosaic/framework/guides/E2E-DELIVERY.md
+++ b/packages/mosaic/framework/guides/E2E-DELIVERY.md
@@ -114,6 +114,13 @@ For implementation work, you MUST run this cycle in order:
 If any step fails, you MUST remediate and re-run from the relevant step before proceeding.
 If push-queue/merge-queue/PR merge/CI/issue closure fails, status is `blocked` (not complete) and you MUST report the exact failed wrapper command.

+### Failure Handling & Retry Budget (Hard Rule)
+
+1. On any step failure, diagnose before switching tactics: read the error, check assumptions, attempt one focused fix. Do not retry blindly; do not abandon the approach after a single failure.
+2. Cap remediation at 3 attempts per distinct failure (same test, same gate, same error class). Vary the approach each attempt; never repeat an identical fix.
+3. For transient network failures (push/pull/API), retry up to 4 times with exponential backoff (2s, 4s, 8s, 16s). Do not apply backoff retries to logic errors.
+4. After the attempt budget is exhausted, stop and escalate per the Steered Autonomy Escalation Triggers — record the failure, attempts made, and exact failing command in the scratchpad.
+
 ## 5. Testing Priority Model

 Use this order of priority:
@@ -178,6 +185,8 @@ For code/API/auth/infra changes, documentation updates are REQUIRED before compl

 You MUST satisfy all items before completion:

+Before running this checklist, pause and self-interrogate: did I fulfill the user's *full* intent (not a reframed subset), did I actually run every verification I'm about to claim, and did I catch every edit site? Treat any "I think so" as not-yet-done.
+
 1. Acceptance criteria met.
 2. Baseline tests passed.
 3. Situational tests passed (primary gate), including required greenfield situational validation.
--- a/packages/mosaic/framework/guides/ORCHESTRATOR.md
+++ b/packages/mosaic/framework/guides/ORCHESTRATOR.md
@@ -595,6 +595,15 @@ Review: needs-qa (1 blocker, 2 high) → QA task {task_id}-QA created

 ---

+## Worker Prompt Quality (Hard Rule)
+
+Brief each worker as if it just walked in with zero prior context — terse prompts produce shallow, generic work.
+
+1. State the goal, the constraints, and what has already been ruled out.
+2. Include concrete `file:line` references and the exact expected output/return form.
+3. Never delegate understanding: the orchestrator owns synthesis. Do not pass "based on your findings, decide what to do" — give the worker a bounded, well-specified task.
+4. When tasks are independent, dispatch workers in parallel; reserve sequential dispatch for genuine dependencies.
+
 ## Worker Prompt Template

 Construct this from the task row and pass to worker via Task tool:
@@ -653,6 +662,8 @@ End your response with this JSON block:
 `status=success` means "code pushed and ready for orchestrator integration gates";
 it does NOT mean PR merged/CI green/issue closed.

+**Trust but verify (Hard Rule):** A worker's reported `status` describes what it intended, not necessarily what landed. Before accepting `status=success`, the orchestrator MUST confirm the outcome independently — verify the commit SHA exists on the branch, the expected files changed, and quality gates/tests actually ran green. Never relay a worker self-report as completion evidence.
+
 ## Post-Coding Review

 After you complete and push your changes, the orchestrator will independently
--- a/packages/mosaic/framework/guides/QA-TESTING.md
+++ b/packages/mosaic/framework/guides/QA-TESTING.md
@@ -102,6 +102,10 @@ If a project's `playwright.config.ts` does not explicitly set `headless: true`,
 1. Do NOT stop at "tests pass" if acceptance criteria are not verified.
 2. Do NOT write narrow tests that only satisfy assertions while missing real workflow behavior.
 3. Do NOT claim completion without situational evidence for impacted surfaces.
+4. Do NOT edit tests to make them pass; assume the root cause is in the code under test unless the task is explicitly to fix the test.
+5. Do NOT fabricate sample data, stub responses, or mock around a real failure to produce a green result.
+6. Do NOT simplify, comment out, or narrow the feature/logic to dodge an error — debug the actual root cause.
+7. Do NOT reason about or claim behavior of code you have not opened and read.

 ## Reporting