232 lines
21 KiB
Markdown
232 lines
21 KiB
Markdown
# Mission Scratchpad — CLI Unification & E2E First-Run
|
||
|
||
> Append-only log. NEVER delete entries. NEVER overwrite sections.
|
||
> This is the orchestrator's working memory across sessions.
|
||
|
||
**Mission ID:** cli-unification-20260404
|
||
**Started:** 2026-04-04
|
||
**Related PRDs:** `docs/PRD.md` (v0.1.0 long-term target)
|
||
|
||
## Original Mission Prompt
|
||
|
||
Original user framing (2026-04-04):
|
||
|
||
> We are off the reservation right now. Working on getting the system to work via cli first, then working on the webUI. The missions are likely all wrong. The PRDs might have valid info.
|
||
>
|
||
> E2E install to functional, with Mosaic Forge working. `mosaic gateway` config is broken — no token is created. Unable to configure. Installation doesn't really configure, it just installs and launches the gateway. Multiple `mosaic` commands are missing that should be included. Unified installer experience is not ready. UX is bad.
|
||
>
|
||
> The various mosaic packages will need to be available within the mosaic cli: `mosaic auth`, `mosaic brain`, `mosaic forge`, `mosaic log`, `mosaic macp`, `mosaic memory`, `mosaic queue`, `mosaic storage`.
|
||
>
|
||
> The list of commands in `mosaic --help` also need to be alphabetized for readability.
|
||
>
|
||
> `mosaic telemetry` should also exist. Local OTEL for wide-event logging / post-mortems. Remote upload opt-in via `@mosaicstack/telemetry-client-js` (https://git.mosaicstack.dev/mosaicstack/telemetry-client-js) — the telemetry server will be part of the main mosaicstack.dev website. Python counterpart at https://git.mosaicstack.dev/mosaicstack/telemetry-client-py.
|
||
|
||
## Planning Decisions
|
||
|
||
### 2026-04-04 — State discovery + prep PR
|
||
|
||
**Critical finding:** Two CLI packages both owned `bin.mosaic` — `@mosaicstack/mosaic` (0.0.21) and `@mosaicstack/cli` (0.0.17). Their `src/cli.ts` files were near-verbatim duplicates (424 vs 422 lines) and their `src/commands/` directories overlapped, with some files silently diverging (notably `gateway/install.ts`, the version responsible for the broken install UX). Whichever package was linked last won the `mosaic` symlink.
|
||
|
||
**Decision:** `@mosaicstack/cli` dies. `@mosaicstack/mosaic` is the single CLI + TUI package. This was confirmed with user ("The @mosaicstack/cli package is no longer a package. Its features were moved to @mosaicstack/mosaic instead."). Prep PR #398 executed the removal.
|
||
|
||
**Decision:** CLI registration pattern = `register<Name>Command(parent: Command)` exported by each sub-package, co-located with the library code. Proven by `@mosaicstack/quality-rails` → `registerQualityRails(program)`. Avoids cross-package commander version mismatches.
|
||
|
||
**Decision:** Stale mission state (harness-20260321 manifest, storage-abstraction TASKS.md, PRD-Harness_Foundation.md) gets archived under `docs/archive/missions/`. Scratchpads for completed sub-missions are left in `docs/scratchpads/` as historical record — they're append-only by design and valuable as breadcrumbs.
|
||
|
||
### 2026-04-04 — Gateway bootstrap token bug root cause
|
||
|
||
`apps/gateway/src/admin/bootstrap.controller.ts`:
|
||
|
||
- `GET /api/bootstrap/status` returns `needsSetup: true` **only** when `users` table count is zero
|
||
- `POST /api/bootstrap/setup` throws `ForbiddenException` if any user exists
|
||
|
||
`packages/mosaic/src/commands/gateway/install.ts` — `runInstall()` "explicit reinstall" branch (lines ~87–98):
|
||
|
||
1. Clears `meta.adminToken` from meta.json (line 175 — `preserveToken = false` when `regeneratedConfig = true`)
|
||
2. Calls `bootstrapFirstUser()`
|
||
3. Status endpoint returns `needsSetup: false` because users row still exists
|
||
4. `bootstrapFirstUser` prints _"Admin user already exists — skipping setup. (No admin token on file — sign in via the web UI to manage tokens.)"_ and returns
|
||
5. Install "succeeds" with NO token, NO CLI path to generate one, and chicken-and-egg on `/api/admin/tokens` which requires auth
|
||
|
||
**Recovery design options (to decide in CU-03-01):**
|
||
|
||
- Filesystem-signed nonce file written by the installer; recovery endpoint checks it
|
||
- Accept a valid BetterAuth admin session cookie → mint new admin token via authenticated API call (leans on existing auth; `mosaic gateway login` becomes the recovery entry point)
|
||
- Gateway daemon accepts `--rescue` flag that mints a one-shot recovery token, prints it, then exits
|
||
|
||
Current lean: option 2 (BetterAuth cookie) because it reuses existing auth and gives us `mosaic gateway login` as a useful command regardless. But the design spike in CU-03-01 should evaluate all three against: security, complexity, headless-environment friendliness, and disaster-recovery scenarios.
|
||
|
||
### 2026-04-04 — Telemetry architecture
|
||
|
||
- `@mosaicstack/telemetry-client-js` + `@mosaicstack/telemetry-client-py` are separate repos on Gitea — **not** currently consumed anywhere in this monorepo (verified via grep)
|
||
- Telemetry server will be combined with the main mosaicstack.dev website (not built yet)
|
||
- Local OTEL stays — `apps/gateway/src/tracing.ts` already wires it up for wide-event logging and post-mortem traces
|
||
- `mosaic telemetry` is a thin wrapper that:
|
||
- `mosaic telemetry local {status,tail,jaeger}` → local OTEL state, Jaeger links
|
||
- `mosaic telemetry {status,opt-in,opt-out,test,upload}` → remote upload path via telemetry-client-js
|
||
- Remote disabled by default; opt-in requires explicit consent
|
||
- `test`/`upload` ship with dry-run mode until the server endpoint is live
|
||
|
||
### 2026-04-04 — Open-question decisions (session 1)
|
||
|
||
Jason answered the four planning questions:
|
||
|
||
1. **Recovery endpoint design (CU-03-01):** BetterAuth cookie. `mosaic gateway login` becomes the recovery entry point. The spike in CU-03-01 can be compressed — design is locked; task becomes implementation planning rather than evaluation.
|
||
2. **Sub-package command surface (M5):** The current CU-05-01..08 scope is acceptable for this mission. Deeper command surfaces can be follow-up work.
|
||
3. **Telemetry server:** Ship `mosaic telemetry upload` and `mosaic telemetry test` in dry-run-only mode until the mosaicstack.dev server endpoint is live. Capture intended payload shape and print/log instead of POSTing. Real upload path gets wired in as follow-up once the server is ready.
|
||
4. **Top-level `mosaic config`:** Required. Add to M4 (CLI structure milestone) since it lives alongside help-shape work and uses the existing `packages/mosaic/src/config/config-service.ts` machinery. Separate concern from `mosaic gateway config` (which manages gateway .env + meta.json).
|
||
|
||
## Session Log
|
||
|
||
| Session | Date | Milestone | Tasks Done | Outcome |
|
||
| ------- | ---------- | ------------------------- | ---------------------------- | -------------------------------------------------------------------------------------------------- |
|
||
| 1 | 2026-04-04 | cu-m01 Kill legacy CLI | CU-01-01 | PR #398 merged to main as `c39433c3`. 48 files deleted, 6685 LOC removed. CI green (pipeline 702). |
|
||
| 1 | 2026-04-04 | cu-m02 Archive + scaffold | CU-02-01, CU-02-02, CU-02-03 | PR #399 merged to main as `6f15a84c`. Mission manifest + TASKS.md + scratchpad live. |
|
||
| 1 | 2026-04-04 | Planning | 4 open questions resolved | See decisions block above. Ready to start M3/M4/M5. |
|
||
|
||
## Corrections / Course Changes
|
||
|
||
_(append here as they happen)_
|
||
|
||
## Handoff — end of Session 1 (2026-04-04)
|
||
|
||
**Session 1 agent:** claude-opus-4-6[1m]
|
||
**Reason for handoff:** context budget (~80% used after bootstrap + two PRs + decision capture). Main is clean, no in-flight branches, no dirty state.
|
||
|
||
### What Session 2 should read first
|
||
|
||
1. `docs/MISSION-MANIFEST.md` — phase, progress, milestone table
|
||
2. `docs/TASKS.md` — task state, dependencies, agent assignments
|
||
3. This scratchpad — decisions, bug analysis, open risks, gotchas
|
||
4. `git log --oneline -5` — confirm #398 and #399 are on main
|
||
|
||
### State of the world
|
||
|
||
- **Main branch HEAD:** `6f15a84c docs: archive stale mission, scaffold CLI unification mission (#399)`
|
||
- **Working tree:** clean (no uncommitted changes after this handoff PR merges)
|
||
- **Open PRs:** none (both M1 and M2 PRs merged)
|
||
- **Deleted branches:** `chore/remove-cli-package-duplicate`, `docs/mission-cli-unification` (both local + remote)
|
||
- **Milestones done:** cu-m01, cu-m02 (2 / 8)
|
||
- **Milestones unblocked for parallel start:** cu-m03, cu-m04, cu-m05 (everything except M5.CU-05-06 which waits on M3.CU-03-03 for gateway login)
|
||
|
||
### Decisions locked (do not re-debate)
|
||
|
||
1. `@mosaicstack/cli` is dead; `@mosaicstack/mosaic` is the sole CLI package
|
||
2. Sub-package CLI pattern: each package exports `register<Name>Command(parent: Command)`, wired into `packages/mosaic/src/cli.ts` (copy the `registerQualityRails` pattern)
|
||
3. Gateway recovery uses **BetterAuth cookie** — `mosaic gateway login` + `mosaic gateway config rotate-token` via authenticated `POST /api/admin/tokens`
|
||
4. Telemetry: `mosaic telemetry` wraps `@mosaicstack/telemetry-client-js`; remote upload is dry-run only until the mosaicstack.dev server endpoint is live
|
||
5. Top-level `mosaic config` command is required (separate from `mosaic gateway config`) — wraps `packages/mosaic/src/config/config-service.ts`; added as CU-04-04
|
||
|
||
### Known gotchas for Session 2
|
||
|
||
- **pr-create.sh eval bug:** `~/.config/mosaic/tools/git/pr-create.sh` line 158 uses `eval "$CMD"`. Backticks and `$()` in PR bodies get shell-evaluated. **Workaround:** strip backticks from PR bodies OR use `tea pr create --repo mosaicstack/mosaic-stack --login mosaicstack --title ... --description ... --head <branch>` directly. Captured in openbrain.
|
||
- **ci-queue-wait.sh unknown state:** The wrapper reports `state=unknown` and returns immediately instead of waiting. Poll the PR pipeline manually with `~/.config/mosaic/tools/woodpecker/pipeline-list.sh` and grep for the PR branch.
|
||
- **pr-merge.sh branch delete:** `-d` flag is accepted but warns "branch deletion may need to be done separately". Delete via the Gitea API: `curl -X DELETE -H "Authorization: token $TOKEN" "https://git.mosaicstack.dev/api/v1/repos/mosaicstack/mosaic-stack/branches/<url-encoded-branch>"`.
|
||
- **Tea login not default:** `tea login list` shows `mosaicstack` with DEFAULT=false. Pass `--login mosaicstack` explicitly on every `tea` call.
|
||
- **`.mosaic/orchestrator/session.lock`:** auto-rewritten on every session launch. Shows up as dirty working tree on branch switch. Safe to `git checkout` the file before branching.
|
||
- **Dual install.ts files no longer exist:** M1 removed `packages/cli/src/commands/gateway/install.ts`. The canonical (and only) one is `packages/mosaic/src/commands/gateway/install.ts`. The "user exists, no token" bug (CU-03-06) is in this file around lines 388-394 (`bootstrapFirstUser`). The server-side gate is in `apps/gateway/src/admin/bootstrap.controller.ts` lines 28 and 35.
|
||
|
||
### Suggested starting task for Session 2
|
||
|
||
Pick based on what the user wants shipped first:
|
||
|
||
- **Highest user-impact:** M3 — fixes the install bug that made the user "off the reservation" in the first place. Start with CU-03-01 (implementation plan, opus-tier, 4K) → CU-03-02 (server endpoint, sonnet).
|
||
- **Quickest win:** M4.CU-04-01 — one-line `configureHelp({ sortSubcommands: true })`. 3K estimate. Good warm-up.
|
||
- **User priority stated in session 1:** M5.CU-05-01 — `mosaic forge`. Larger scope (18K), but user flagged Forge specifically as part of "E2E install to functional, with Mosaic Forge working".
|
||
|
||
Session 2 orchestrator should pick one, update TASKS.md status to `in-progress`, follow the standard cycle: plan → code → test → review → remediate → commit → push → PR → queue guard → merge. Mosaic hard gates apply.
|
||
|
||
### Files added / modified in Session 1
|
||
|
||
Session 1 touched only these files across PRs #398 and #399 plus this handoff PR:
|
||
|
||
- Deleted: `packages/cli/` (entire directory, 48 files)
|
||
- Archived: `docs/archive/missions/harness-20260321/MISSION-MANIFEST.md`, `docs/archive/missions/harness-20260321/PRD.md`, `docs/archive/missions/storage-abstraction/TASKS.md`
|
||
- Modified: `pnpm-workspace.yaml`, `tools/install.sh`, `AGENTS.md`, `CLAUDE.md`, `README.md`, `docs/guides/user-guide.md`, `packages/mosaic/framework/defaults/README.md`
|
||
- Created: `docs/MISSION-MANIFEST.md`, `docs/TASKS.md`, `docs/scratchpads/cli-unification-20260404.md` (this file)
|
||
|
||
No code changes to `apps/`, `packages/mosaic/`, or any other runtime package. Session 2 starts fresh on the runtime code.
|
||
|
||
## Open Risks
|
||
|
||
- **Telemetry server not live:** CU-06-03 (`mosaic telemetry upload`) may need a dry-run stub until the server endpoint exists on mosaicstack.dev. Not blocking for this mission, but ships with reduced validation until then.
|
||
- **`mosaic auth` depends on gateway login:** CU-05-06 is gated by CU-03-03 (`mosaic gateway login`). Sequencing matters — do not start CU-05-06 until M3 is done or significantly underway.
|
||
- **pr-create.sh wrapper bug:** Discovered during M1 — `~/.config/mosaic/tools/git/pr-create.sh` line 158 uses `eval "$CMD"`, which shell-evaluates any backticks / `$(…)` / `${…}` in PR bodies. Workaround: strip backticks from PR bodies (use bold / italic / plain text instead), or use `tea pr create` directly. Captured in openbrain as gotcha. Should be fixed upstream in Mosaic tools repo at some point, but out of scope for this mission.
|
||
- **Mosaic coord / orchestrator session lock drift:** `.mosaic/orchestrator/session.lock` gets re-written every session launch and shows up as a dirty working tree on branch switch. Not blocking — just noise to ignore.
|
||
|
||
## Session 2 Log (2026-04-05)
|
||
|
||
**Session 2 agent:** claude-opus-4-6[1m]
|
||
**Mode:** parallel orchestration across worktrees
|
||
|
||
### Wave 1 — M3 (gateway token recovery)
|
||
|
||
- CU-03-01 plan landed as PR #401 → `docs/plans/gateway-token-recovery.md`. Confirmed no server changes needed — AdminGuard already accepts BetterAuth cookies, `POST /api/admin/tokens` is the existing mint endpoint.
|
||
- CU-03-02..07 implemented as PR #411: `mosaic gateway login` (interactive BetterAuth sign-in, session persisted), `mosaic gateway config rotate-token`, `mosaic gateway config recover-token`, fix for `bootstrapFirstUser` "user exists, no token" dead-end, 22 new unit tests. New files: `commands/gateway/login.ts`, `commands/gateway/token-ops.ts`.
|
||
- CU-03-08 independent code review surfaced 2 BLOCKER findings (session.json world-readable, password echoed during prompt) + 3 important findings (trimmed password, cross-gateway token persistence, unsafe `--password` flag). Remediated in PR #414: `saveSession` writes mode 0o600, new `promptSecret()` uses TTY raw mode, persistence target now matches `--gateway` host, `--password` marked UNSAFE with warning.
|
||
|
||
### Wave 2 — M4 (help ergonomics + mosaic config)
|
||
|
||
- CU-04-01..03 landed as PR #402: `configureHelp({ sortSubcommands: true })` on root + gateway subgroup, plus an `addHelpText('after', …)` grouped-reference section (Commander 13 has no native command-group API).
|
||
- CU-04-04/05 landed as PR #408: top-level `mosaic config` with `show|get|set|edit|path`, extends `config/config-service.ts` with `readAll`, `getValue`, `setValue`, `getConfigPath`, `isInitialized` + `ConfigSection`/`ResolvedConfig` types. Additive only.
|
||
|
||
### Wave 3 — M5 (sub-package CLI surface, 8 commands + integration)
|
||
|
||
Parallel-dispatched in isolated worktrees. All merged:
|
||
|
||
- PR #403 `mosaic brain`, PR #404 `mosaic queue`, PR #405 `mosaic storage`, PR #406 `mosaic memory`, PR #407 `mosaic log`, PR #410 `mosaic macp`, PR #412 `mosaic forge`, PR #413 `mosaic auth`.
|
||
- Every package exports `register<Name>Command(parent: Command)` co-located with library code, following `@mosaicstack/quality-rails` pattern. Each wired into `packages/mosaic/src/cli.ts` with alphabetized `register…Command(program)` calls.
|
||
- PR #415 landed CU-05-10 integration smoke test (`packages/mosaic/src/cli-smoke.spec.ts`, 19 tests covering all 9 registrars) PLUS a pre-existing exports bug fix in `packages/macp/package.json` (`default` pointed at `./src/index.ts` instead of `./dist/index.js`, breaking ERR_MODULE_NOT_FOUND when compiled mosaic CLI tried to load macp at runtime). Caught by empirical `node packages/mosaic/dist/cli.js --help` test before merge.
|
||
|
||
### New gotchas captured in Session 2
|
||
|
||
- **`pr-create.sh` "Remote repository required" failure:** wrapper can't detect origin in multi-remote contexts. Fallback used throughout: direct Gitea API `curl -X POST …/api/v1/repos/mosaicstack/mosaic-stack/pulls` with body JSON.
|
||
- **`publish` workflow killed on post-merge pushes:** pipelines 735, 742, 747, 750, 758, 767 all show the Docker build step killed after `ci` workflow succeeded. Pre-existing infrastructure issue (observed on #714/#715 pre-mission). The `ci` workflow is the authoritative gate; `publish` killing is noise.
|
||
- **macp exports.default misaligned:** latent bug from original monorepo consolidation — every other package already pointed at `dist/`. Only exposed when compiled CLI started loading macp at runtime.
|
||
- **Commander 13 grouping:** no native command-group API; workaround is `addHelpText('after', groupedReferenceString)` + alphabetized flat list via `sortSubcommands: true`.
|
||
|
||
### Wave 4 — M6 + M7 (parallel)
|
||
|
||
- M6 `mosaic telemetry` landed as PR #417 (merge `a531029c`). Full scope CU-06-01..05: `@mosaicstack/telemetry-client-js` shim, `telemetry local {status,tail,jaeger}`, top-level `telemetry {status,opt-in,opt-out,test,upload}` with dry-run default, persistent consent state. New files: `packages/mosaic/src/commands/telemetry.ts`, `src/telemetry/client-shim.ts`, `src/telemetry/consent-store.ts`, plus `telemetry.spec.ts`.
|
||
- M7 unified first-run UX landed as PR #418 (merge `872c1245`). Full scope CU-07-01..04: `install.sh` `--yes`/`--no-auto-launch` flags + auto-handoff to wizard + gateway install, wizard/gateway-install coordination via transient state file, `mosaic gateway verify` post-install healthcheck, Docker-based `tools/e2e-install-test.sh`.
|
||
|
||
### Wave 5 — M8 (release)
|
||
|
||
- PR #419 (merge `b9d464de`) — CLI unification release v0.1.0. Single cohesive docs + release PR:
|
||
- README.md: unified command tree, new install UX, `mosaic gateway` and `mosaic config` sections, removed stale `@mosaicstack/cli` refs.
|
||
- docs/guides/user-guide.md: new "Sub-package Commands" + "Telemetry" sections covering all 11 top-level commands.
|
||
- `packages/mosaic/package.json`: bumped 0.0.21 → 0.1.0 (CI publishes on merge).
|
||
- Git tag: `mosaic-v0.1.0` (scoped to avoid collision with existing `v0.1.0` repo tag) — pushed to origin on merge sha.
|
||
- Gitea release: https://git.mosaicstack.dev/mosaicstack/mosaic-stack/releases/tag/mosaic-v0.1.0 — "@mosaicstack/mosaic v0.1.0 — CLI Unification".
|
||
|
||
### Wave 6 — M8 correction (version regression)
|
||
|
||
PR #419 bumped `@mosaicstack/mosaic` 0.0.21 → 0.1.0 and released as `mosaic-v0.1.0`. This was wrong on two counts:
|
||
|
||
1. **Versioning policy violation.** The project stays in `0.0.x` alpha until GA. Minor bump to `0.1.0` jumped out of alpha without authorization.
|
||
2. **macp exports fix never reached the registry.** PR #415 fixed `packages/macp/package.json` `exports.default` pointing at `./src/index.ts`, but did NOT bump macp's version. When the post-merge publish workflow ran on #419, it published `@mosaicstack/mosaic@0.1.0` but `@mosaicstack/macp@0.0.2` was "already published" so the fix was silently skipped. Result: users running `mosaic update` got mosaic 0.1.0 which depends on macp and resolves to the still-broken registry copy of macp@0.0.2, failing with `ERR_MODULE_NOT_FOUND` on `./src/index.ts` at CLI startup.
|
||
|
||
Correction PR:
|
||
|
||
- `@mosaicstack/mosaic` 0.1.0 → `0.0.22` (stay in alpha)
|
||
- `@mosaicstack/macp` 0.0.2 → `0.0.3` (force republish with the exports fix)
|
||
- Delete Gitea tag `mosaic-v0.1.0` + release
|
||
- Delete `@mosaicstack/mosaic@0.1.0` from the Gitea npm registry so `latest` reverts to the highest remaining version
|
||
- Create tag `mosaic-v0.0.22` + Gitea release
|
||
|
||
**Lesson captured:** every package whose _source_ changes must also have its _version_ bumped, because the publish workflow silently skips "already published" versions. `@mosaicstack/macp@0.0.2` had the bad exports in the registry from day one; the in-repo fix in #415 was invisible to installed-from-registry consumers until the version bumped.
|
||
|
||
### Mission outcome
|
||
|
||
All 8 milestones, all 8 success criteria met in-repo. Released as `mosaic-v0.0.22` (alpha) after correcting an incorrect 0.1.0 version bump + missed macp republish. Two sessions total (~10h combined) plus a follow-up correction PR.
|
||
|
||
## Verification Evidence
|
||
|
||
### CU-01-01 (PR #398)
|
||
|
||
- Branch: `chore/remove-cli-package-duplicate`
|
||
- Commit: `7206b9411d96`
|
||
- Merge commit on main: `c39433c3`
|
||
- CI pipeline: #702 (`pull_request` event, all 6 steps green: postgres, install, typecheck, lint, format, test)
|
||
- Quality gates (pre-push): typecheck 38/38, lint 21/21, format clean, test 38/38
|