Files
stack/docs/scratchpads/cli-unification-20260404.md
Jarvis df6f7306d3
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
docs: mission cli-unification-20260404 complete
Close out the mission book-keeping:
- MISSION-MANIFEST: status=completed, 8/8 milestones, all 8 AC checked,
  release link added
- TASKS: mark CU-06-01..05, CU-07-01..04, CU-08-01..04 done
- scratchpad: append Wave 4/5 outcomes, mission summary

Release: mosaic-v0.1.0 (@mosaicstack/mosaic@0.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 02:48:52 -05:00

19 KiB
Raw Blame History

Mission Scratchpad — CLI Unification & E2E First-Run

Append-only log. NEVER delete entries. NEVER overwrite sections. This is the orchestrator's working memory across sessions.

Mission ID: cli-unification-20260404 Started: 2026-04-04 Related PRDs: docs/PRD.md (v0.1.0 long-term target)

Original Mission Prompt

Original user framing (2026-04-04):

We are off the reservation right now. Working on getting the system to work via cli first, then working on the webUI. The missions are likely all wrong. The PRDs might have valid info.

E2E install to functional, with Mosaic Forge working. mosaic gateway config is broken — no token is created. Unable to configure. Installation doesn't really configure, it just installs and launches the gateway. Multiple mosaic commands are missing that should be included. Unified installer experience is not ready. UX is bad.

The various mosaic packages will need to be available within the mosaic cli: mosaic auth, mosaic brain, mosaic forge, mosaic log, mosaic macp, mosaic memory, mosaic queue, mosaic storage.

The list of commands in mosaic --help also need to be alphabetized for readability.

mosaic telemetry should also exist. Local OTEL for wide-event logging / post-mortems. Remote upload opt-in via @mosaicstack/telemetry-client-js (https://git.mosaicstack.dev/mosaicstack/telemetry-client-js) — the telemetry server will be part of the main mosaicstack.dev website. Python counterpart at https://git.mosaicstack.dev/mosaicstack/telemetry-client-py.

Planning Decisions

2026-04-04 — State discovery + prep PR

Critical finding: Two CLI packages both owned bin.mosaic@mosaicstack/mosaic (0.0.21) and @mosaicstack/cli (0.0.17). Their src/cli.ts files were near-verbatim duplicates (424 vs 422 lines) and their src/commands/ directories overlapped, with some files silently diverging (notably gateway/install.ts, the version responsible for the broken install UX). Whichever package was linked last won the mosaic symlink.

Decision: @mosaicstack/cli dies. @mosaicstack/mosaic is the single CLI + TUI package. This was confirmed with user ("The @mosaicstack/cli package is no longer a package. Its features were moved to @mosaicstack/mosaic instead."). Prep PR #398 executed the removal.

Decision: CLI registration pattern = register<Name>Command(parent: Command) exported by each sub-package, co-located with the library code. Proven by @mosaicstack/quality-railsregisterQualityRails(program). Avoids cross-package commander version mismatches.

Decision: Stale mission state (harness-20260321 manifest, storage-abstraction TASKS.md, PRD-Harness_Foundation.md) gets archived under docs/archive/missions/. Scratchpads for completed sub-missions are left in docs/scratchpads/ as historical record — they're append-only by design and valuable as breadcrumbs.

2026-04-04 — Gateway bootstrap token bug root cause

apps/gateway/src/admin/bootstrap.controller.ts:

  • GET /api/bootstrap/status returns needsSetup: true only when users table count is zero
  • POST /api/bootstrap/setup throws ForbiddenException if any user exists

packages/mosaic/src/commands/gateway/install.tsrunInstall() "explicit reinstall" branch (lines ~8798):

  1. Clears meta.adminToken from meta.json (line 175 — preserveToken = false when regeneratedConfig = true)
  2. Calls bootstrapFirstUser()
  3. Status endpoint returns needsSetup: false because users row still exists
  4. bootstrapFirstUser prints "Admin user already exists — skipping setup. (No admin token on file — sign in via the web UI to manage tokens.)" and returns
  5. Install "succeeds" with NO token, NO CLI path to generate one, and chicken-and-egg on /api/admin/tokens which requires auth

Recovery design options (to decide in CU-03-01):

  • Filesystem-signed nonce file written by the installer; recovery endpoint checks it
  • Accept a valid BetterAuth admin session cookie → mint new admin token via authenticated API call (leans on existing auth; mosaic gateway login becomes the recovery entry point)
  • Gateway daemon accepts --rescue flag that mints a one-shot recovery token, prints it, then exits

Current lean: option 2 (BetterAuth cookie) because it reuses existing auth and gives us mosaic gateway login as a useful command regardless. But the design spike in CU-03-01 should evaluate all three against: security, complexity, headless-environment friendliness, and disaster-recovery scenarios.

2026-04-04 — Telemetry architecture

  • @mosaicstack/telemetry-client-js + @mosaicstack/telemetry-client-py are separate repos on Gitea — not currently consumed anywhere in this monorepo (verified via grep)
  • Telemetry server will be combined with the main mosaicstack.dev website (not built yet)
  • Local OTEL stays — apps/gateway/src/tracing.ts already wires it up for wide-event logging and post-mortem traces
  • mosaic telemetry is a thin wrapper that:
    • mosaic telemetry local {status,tail,jaeger} → local OTEL state, Jaeger links
    • mosaic telemetry {status,opt-in,opt-out,test,upload} → remote upload path via telemetry-client-js
    • Remote disabled by default; opt-in requires explicit consent
    • test/upload ship with dry-run mode until the server endpoint is live

2026-04-04 — Open-question decisions (session 1)

Jason answered the four planning questions:

  1. Recovery endpoint design (CU-03-01): BetterAuth cookie. mosaic gateway login becomes the recovery entry point. The spike in CU-03-01 can be compressed — design is locked; task becomes implementation planning rather than evaluation.
  2. Sub-package command surface (M5): The current CU-05-01..08 scope is acceptable for this mission. Deeper command surfaces can be follow-up work.
  3. Telemetry server: Ship mosaic telemetry upload and mosaic telemetry test in dry-run-only mode until the mosaicstack.dev server endpoint is live. Capture intended payload shape and print/log instead of POSTing. Real upload path gets wired in as follow-up once the server is ready.
  4. Top-level mosaic config: Required. Add to M4 (CLI structure milestone) since it lives alongside help-shape work and uses the existing packages/mosaic/src/config/config-service.ts machinery. Separate concern from mosaic gateway config (which manages gateway .env + meta.json).

Session Log

Session Date Milestone Tasks Done Outcome
1 2026-04-04 cu-m01 Kill legacy CLI CU-01-01 PR #398 merged to main as c39433c3. 48 files deleted, 6685 LOC removed. CI green (pipeline 702).
1 2026-04-04 cu-m02 Archive + scaffold CU-02-01, CU-02-02, CU-02-03 PR #399 merged to main as 6f15a84c. Mission manifest + TASKS.md + scratchpad live.
1 2026-04-04 Planning 4 open questions resolved See decisions block above. Ready to start M3/M4/M5.

Corrections / Course Changes

(append here as they happen)

Handoff — end of Session 1 (2026-04-04)

Session 1 agent: claude-opus-4-6[1m] Reason for handoff: context budget (~80% used after bootstrap + two PRs + decision capture). Main is clean, no in-flight branches, no dirty state.

What Session 2 should read first

  1. docs/MISSION-MANIFEST.md — phase, progress, milestone table
  2. docs/TASKS.md — task state, dependencies, agent assignments
  3. This scratchpad — decisions, bug analysis, open risks, gotchas
  4. git log --oneline -5 — confirm #398 and #399 are on main

State of the world

  • Main branch HEAD: 6f15a84c docs: archive stale mission, scaffold CLI unification mission (#399)
  • Working tree: clean (no uncommitted changes after this handoff PR merges)
  • Open PRs: none (both M1 and M2 PRs merged)
  • Deleted branches: chore/remove-cli-package-duplicate, docs/mission-cli-unification (both local + remote)
  • Milestones done: cu-m01, cu-m02 (2 / 8)
  • Milestones unblocked for parallel start: cu-m03, cu-m04, cu-m05 (everything except M5.CU-05-06 which waits on M3.CU-03-03 for gateway login)

Decisions locked (do not re-debate)

  1. @mosaicstack/cli is dead; @mosaicstack/mosaic is the sole CLI package
  2. Sub-package CLI pattern: each package exports register<Name>Command(parent: Command), wired into packages/mosaic/src/cli.ts (copy the registerQualityRails pattern)
  3. Gateway recovery uses BetterAuth cookiemosaic gateway login + mosaic gateway config rotate-token via authenticated POST /api/admin/tokens
  4. Telemetry: mosaic telemetry wraps @mosaicstack/telemetry-client-js; remote upload is dry-run only until the mosaicstack.dev server endpoint is live
  5. Top-level mosaic config command is required (separate from mosaic gateway config) — wraps packages/mosaic/src/config/config-service.ts; added as CU-04-04

Known gotchas for Session 2

  • pr-create.sh eval bug: ~/.config/mosaic/tools/git/pr-create.sh line 158 uses eval "$CMD". Backticks and $() in PR bodies get shell-evaluated. Workaround: strip backticks from PR bodies OR use tea pr create --repo mosaicstack/mosaic-stack --login mosaicstack --title ... --description ... --head <branch> directly. Captured in openbrain.
  • ci-queue-wait.sh unknown state: The wrapper reports state=unknown and returns immediately instead of waiting. Poll the PR pipeline manually with ~/.config/mosaic/tools/woodpecker/pipeline-list.sh and grep for the PR branch.
  • pr-merge.sh branch delete: -d flag is accepted but warns "branch deletion may need to be done separately". Delete via the Gitea API: curl -X DELETE -H "Authorization: token $TOKEN" "https://git.mosaicstack.dev/api/v1/repos/mosaicstack/mosaic-stack/branches/<url-encoded-branch>".
  • Tea login not default: tea login list shows mosaicstack with DEFAULT=false. Pass --login mosaicstack explicitly on every tea call.
  • .mosaic/orchestrator/session.lock: auto-rewritten on every session launch. Shows up as dirty working tree on branch switch. Safe to git checkout the file before branching.
  • Dual install.ts files no longer exist: M1 removed packages/cli/src/commands/gateway/install.ts. The canonical (and only) one is packages/mosaic/src/commands/gateway/install.ts. The "user exists, no token" bug (CU-03-06) is in this file around lines 388-394 (bootstrapFirstUser). The server-side gate is in apps/gateway/src/admin/bootstrap.controller.ts lines 28 and 35.

Suggested starting task for Session 2

Pick based on what the user wants shipped first:

  • Highest user-impact: M3 — fixes the install bug that made the user "off the reservation" in the first place. Start with CU-03-01 (implementation plan, opus-tier, 4K) → CU-03-02 (server endpoint, sonnet).
  • Quickest win: M4.CU-04-01 — one-line configureHelp({ sortSubcommands: true }). 3K estimate. Good warm-up.
  • User priority stated in session 1: M5.CU-05-01 — mosaic forge. Larger scope (18K), but user flagged Forge specifically as part of "E2E install to functional, with Mosaic Forge working".

Session 2 orchestrator should pick one, update TASKS.md status to in-progress, follow the standard cycle: plan → code → test → review → remediate → commit → push → PR → queue guard → merge. Mosaic hard gates apply.

Files added / modified in Session 1

Session 1 touched only these files across PRs #398 and #399 plus this handoff PR:

  • Deleted: packages/cli/ (entire directory, 48 files)
  • Archived: docs/archive/missions/harness-20260321/MISSION-MANIFEST.md, docs/archive/missions/harness-20260321/PRD.md, docs/archive/missions/storage-abstraction/TASKS.md
  • Modified: pnpm-workspace.yaml, tools/install.sh, AGENTS.md, CLAUDE.md, README.md, docs/guides/user-guide.md, packages/mosaic/framework/defaults/README.md
  • Created: docs/MISSION-MANIFEST.md, docs/TASKS.md, docs/scratchpads/cli-unification-20260404.md (this file)

No code changes to apps/, packages/mosaic/, or any other runtime package. Session 2 starts fresh on the runtime code.

Open Risks

  • Telemetry server not live: CU-06-03 (mosaic telemetry upload) may need a dry-run stub until the server endpoint exists on mosaicstack.dev. Not blocking for this mission, but ships with reduced validation until then.
  • mosaic auth depends on gateway login: CU-05-06 is gated by CU-03-03 (mosaic gateway login). Sequencing matters — do not start CU-05-06 until M3 is done or significantly underway.
  • pr-create.sh wrapper bug: Discovered during M1 — ~/.config/mosaic/tools/git/pr-create.sh line 158 uses eval "$CMD", which shell-evaluates any backticks / $(…) / ${…} in PR bodies. Workaround: strip backticks from PR bodies (use bold / italic / plain text instead), or use tea pr create directly. Captured in openbrain as gotcha. Should be fixed upstream in Mosaic tools repo at some point, but out of scope for this mission.
  • Mosaic coord / orchestrator session lock drift: .mosaic/orchestrator/session.lock gets re-written every session launch and shows up as a dirty working tree on branch switch. Not blocking — just noise to ignore.

Session 2 Log (2026-04-05)

Session 2 agent: claude-opus-4-6[1m] Mode: parallel orchestration across worktrees

Wave 1 — M3 (gateway token recovery)

  • CU-03-01 plan landed as PR #401 → docs/plans/gateway-token-recovery.md. Confirmed no server changes needed — AdminGuard already accepts BetterAuth cookies, POST /api/admin/tokens is the existing mint endpoint.
  • CU-03-02..07 implemented as PR #411: mosaic gateway login (interactive BetterAuth sign-in, session persisted), mosaic gateway config rotate-token, mosaic gateway config recover-token, fix for bootstrapFirstUser "user exists, no token" dead-end, 22 new unit tests. New files: commands/gateway/login.ts, commands/gateway/token-ops.ts.
  • CU-03-08 independent code review surfaced 2 BLOCKER findings (session.json world-readable, password echoed during prompt) + 3 important findings (trimmed password, cross-gateway token persistence, unsafe --password flag). Remediated in PR #414: saveSession writes mode 0o600, new promptSecret() uses TTY raw mode, persistence target now matches --gateway host, --password marked UNSAFE with warning.

Wave 2 — M4 (help ergonomics + mosaic config)

  • CU-04-01..03 landed as PR #402: configureHelp({ sortSubcommands: true }) on root + gateway subgroup, plus an addHelpText('after', …) grouped-reference section (Commander 13 has no native command-group API).
  • CU-04-04/05 landed as PR #408: top-level mosaic config with show|get|set|edit|path, extends config/config-service.ts with readAll, getValue, setValue, getConfigPath, isInitialized + ConfigSection/ResolvedConfig types. Additive only.

Wave 3 — M5 (sub-package CLI surface, 8 commands + integration)

Parallel-dispatched in isolated worktrees. All merged:

  • PR #403 mosaic brain, PR #404 mosaic queue, PR #405 mosaic storage, PR #406 mosaic memory, PR #407 mosaic log, PR #410 mosaic macp, PR #412 mosaic forge, PR #413 mosaic auth.
  • Every package exports register<Name>Command(parent: Command) co-located with library code, following @mosaicstack/quality-rails pattern. Each wired into packages/mosaic/src/cli.ts with alphabetized register…Command(program) calls.
  • PR #415 landed CU-05-10 integration smoke test (packages/mosaic/src/cli-smoke.spec.ts, 19 tests covering all 9 registrars) PLUS a pre-existing exports bug fix in packages/macp/package.json (default pointed at ./src/index.ts instead of ./dist/index.js, breaking ERR_MODULE_NOT_FOUND when compiled mosaic CLI tried to load macp at runtime). Caught by empirical node packages/mosaic/dist/cli.js --help test before merge.

New gotchas captured in Session 2

  • pr-create.sh "Remote repository required" failure: wrapper can't detect origin in multi-remote contexts. Fallback used throughout: direct Gitea API curl -X POST …/api/v1/repos/mosaicstack/mosaic-stack/pulls with body JSON.
  • publish workflow killed on post-merge pushes: pipelines 735, 742, 747, 750, 758, 767 all show the Docker build step killed after ci workflow succeeded. Pre-existing infrastructure issue (observed on #714/#715 pre-mission). The ci workflow is the authoritative gate; publish killing is noise.
  • macp exports.default misaligned: latent bug from original monorepo consolidation — every other package already pointed at dist/. Only exposed when compiled CLI started loading macp at runtime.
  • Commander 13 grouping: no native command-group API; workaround is addHelpText('after', groupedReferenceString) + alphabetized flat list via sortSubcommands: true.

Wave 4 — M6 + M7 (parallel)

  • M6 mosaic telemetry landed as PR #417 (merge a531029c). Full scope CU-06-01..05: @mosaicstack/telemetry-client-js shim, telemetry local {status,tail,jaeger}, top-level telemetry {status,opt-in,opt-out,test,upload} with dry-run default, persistent consent state. New files: packages/mosaic/src/commands/telemetry.ts, src/telemetry/client-shim.ts, src/telemetry/consent-store.ts, plus telemetry.spec.ts.
  • M7 unified first-run UX landed as PR #418 (merge 872c1245). Full scope CU-07-01..04: install.sh --yes/--no-auto-launch flags + auto-handoff to wizard + gateway install, wizard/gateway-install coordination via transient state file, mosaic gateway verify post-install healthcheck, Docker-based tools/e2e-install-test.sh.

Wave 5 — M8 (release)

  • PR #419 (merge b9d464de) — CLI unification release v0.1.0. Single cohesive docs + release PR:
    • README.md: unified command tree, new install UX, mosaic gateway and mosaic config sections, removed stale @mosaicstack/cli refs.
    • docs/guides/user-guide.md: new "Sub-package Commands" + "Telemetry" sections covering all 11 top-level commands.
    • packages/mosaic/package.json: bumped 0.0.21 → 0.1.0 (CI publishes on merge).
  • Git tag: mosaic-v0.1.0 (scoped to avoid collision with existing v0.1.0 repo tag) — pushed to origin on merge sha.
  • Gitea release: https://git.mosaicstack.dev/mosaicstack/mosaic-stack/releases/tag/mosaic-v0.1.0 — "@mosaicstack/mosaic v0.1.0 — CLI Unification".

Mission outcome

All 8 milestones, all 8 success criteria, merged to main, green CI on every PR, released. Two sessions total (~10h combined). No rollbacks, no blocked milestones, no escalations required.

Verification Evidence

CU-01-01 (PR #398)

  • Branch: chore/remove-cli-package-duplicate
  • Commit: 7206b9411d96
  • Merge commit on main: c39433c3
  • CI pipeline: #702 (pull_request event, all 6 steps green: postgres, install, typecheck, lint, format, test)
  • Quality gates (pre-push): typecheck 38/38, lint 21/21, format clean, test 38/38