139 lines
8.6 KiB
Markdown
139 lines
8.6 KiB
Markdown
# Fleet Backlog Conventions
|
|
|
|
The **backlog** is Mosaic's native backlog-of-record for fleet work. It is built
|
|
end-to-end on Mosaic's own storage layer (`@mosaicstack/db`, drizzle/Postgres)
|
|
and surfaced as `mosaic fleet backlog <sub> --json`.
|
|
|
|
> **Mosaic-native, no Hermes.** This backlog REPLACES the former Hermes adapter.
|
|
> There is **no** runtime dependency on Hermes, `hermes kanban`, or `~/.hermes`
|
|
> anywhere in this feature. Anything previously delegated to Hermes is recreated
|
|
> here on Mosaic's own Postgres storage layer.
|
|
|
|
## Storage tier — PGlite by default, Postgres by config
|
|
|
|
The backlog uses the existing Mosaic storage layer; there is **no** new database
|
|
engine (no sqlite, no raw client).
|
|
|
|
| Condition | Tier | Data location |
|
|
| ------------------------------ | -------------------- | -------------------------------- |
|
|
| `DATABASE_URL` set | Full server Postgres | the configured database |
|
|
| `PGLITE_DATA_DIR` set (no URL) | Embedded PGlite | that directory |
|
|
| neither (default) | Embedded PGlite | `~/.config/mosaic/fleet/backlog` |
|
|
|
|
PGlite is real Postgres semantics in-process — including the row locks the atomic
|
|
claim relies on — so the **same code** runs on a laptop (embedded, single-host
|
|
default) and on a full Postgres deployment. Switching tiers is config-only.
|
|
|
|
The schema (`backlog` table) is created automatically on first CLI use:
|
|
`runMigrations()` for Postgres, `runPgliteMigrations()` for embedded PGlite.
|
|
|
|
### Update safety
|
|
|
|
The embedded PGlite store lives under `~/.config/mosaic/fleet/backlog`, which is
|
|
listed in `PRESERVE_PATHS` in `packages/mosaic/framework/install.sh`. This means
|
|
`mosaic update` (which runs the framework sync with `rsync --delete`) will **not**
|
|
wipe the operator's backlog — same protection as the roster, per-agent env, and
|
|
heartbeat run dir.
|
|
|
|
## Card schema
|
|
|
|
A card is one row in the `backlog` table:
|
|
|
|
| Column | Type | Notes |
|
|
| ------------------- | ------------------- | ------------------------------------------------------------- |
|
|
| `id` | text (PK) | Stable, caller-supplied id (e.g. `A4`, `fleet-001`). |
|
|
| `title` | text | Required. |
|
|
| `body` | text (nullable) | Free-form description. |
|
|
| `phase` | text (nullable) | Board/phase grouping (see below). |
|
|
| `priority` | int (default 0) | **Higher = sooner.** Claim picks the max-priority ready card. |
|
|
| `status` | enum | `ready` \| `claimed` \| `blocked` \| `done`. |
|
|
| `depends_on` | jsonb `string[]` | DAG edges — ids of cards this one depends on. |
|
|
| `claim_owner` | text (nullable) | Owner token of the active claim. |
|
|
| `claim_ttl_seconds` | int (nullable) | TTL of the active claim. |
|
|
| `claimed_at` | timestamptz (null) | When the claim was taken. `claimed_at + ttl` = expiry. |
|
|
| `attempts` | int (default 0) | Incremented each time the card is claimed. |
|
|
| `idempotency_key` | text (unique, null) | Dedups `create`; NULLs are distinct in Postgres. |
|
|
| `acceptance` | jsonb (nullable) | Acceptance criteria (array of strings or object). |
|
|
| `created_at` | timestamptz | |
|
|
| `updated_at` | timestamptz | |
|
|
|
|
`depends_on` is modeled as a `jsonb` array column rather than a separate edge
|
|
table. Justification: it matches the repo's existing style (e.g. `tasks.tags`,
|
|
`agents.skills`, `routing_rules.conditions` are all jsonb arrays), keeps a card
|
|
self-contained, and the DAG is small (per-card dependency lists), so a join table
|
|
would add ceremony without benefit.
|
|
|
|
### Board / phase convention
|
|
|
|
`phase` is a free-form grouping string used as the board column / milestone label
|
|
(e.g. `M1`, `fleet`, `infra`). `list --phase <phase>` filters to one board lane.
|
|
`priority` orders cards **within** the ready pool regardless of phase.
|
|
|
|
## Status lifecycle
|
|
|
|
```
|
|
create
|
|
│
|
|
▼
|
|
┌──────► ready ───── claim ─────► claimed ───── complete ─────► done
|
|
│ │ │
|
|
│ block reclaim (TTL expiry or --id)
|
|
│ ▼ │
|
|
│ blocked └──────────────────────────┘ (back to ready)
|
|
└──────────┘ (reclaim / re-create can return a card to ready)
|
|
```
|
|
|
|
- **ready** — eligible to be claimed once every `depends_on` card is `done`.
|
|
- **claimed** — a worker holds it; `claim_owner` + `claimed_at` set.
|
|
- **blocked** — explicitly parked; never auto-claimed.
|
|
- **done** — completed; satisfies dependents.
|
|
|
|
## Atomic claim (`FOR UPDATE SKIP LOCKED`) + TTL
|
|
|
|
`claim` is atomic. Inside a single transaction it locks candidate `ready` rows
|
|
with `SELECT ... FOR UPDATE SKIP LOCKED` (via the drizzle `sql` operator), picks
|
|
the highest-priority deps-satisfied card, and flips it to `claimed`. Because a row
|
|
already locked by a concurrent claimer is **skipped**, two claimers can **never**
|
|
both win the same card — the loser falls through to the next candidate or gets
|
|
`null`. (Proven by the concurrency tests in `packages/db/src/backlog.spec.ts`.)
|
|
|
|
- **Deps gate:** a card is only claimable when every id in `depends_on` is `done`.
|
|
- **TTL:** `claim --ttl <sec>` (default **900s**) records `claim_ttl_seconds`.
|
|
- **reclaim:** releases claims whose `claimed_at + ttl` is in the past (expired)
|
|
back to `ready`, clearing the claim fields. `reclaim --id <id>` force-releases a
|
|
specific card regardless of expiry. This is how a crashed worker's card returns
|
|
to the pool.
|
|
|
|
## CLI — `mosaic fleet backlog <sub> --json`
|
|
|
|
All subcommands support `--json`.
|
|
|
|
| Subcommand | Purpose |
|
|
| --------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
|
|
| `create --id --title [--body --phase --priority --depends-on --acceptance --idempotency-key]` | Create a card; `idempotency_key` dedups (repeat returns the existing card). |
|
|
| `list [--status --phase --ready-only]` | List cards. `--ready-only` = status `ready` AND all deps `done`. |
|
|
| `claim --owner [--ttl <sec> --id <id>]` | Atomically claim the highest-priority ready card (or `--id`). Returns the card or `null`. |
|
|
| `reclaim [--id <id>]` | Release expired claims (or a specific card) back to `ready`. |
|
|
| `link --from --to` | Add a `depends_on` edge (`--from` depends on `--to`). |
|
|
| `stats` | Counts by status, oldest-ready age, expired-claim count. |
|
|
| `block --id` | Set a card to `blocked`. |
|
|
| `complete --id` | Set a card to `done` (releases any claim). |
|
|
|
|
### Example
|
|
|
|
```sh
|
|
# Seed two cards, the second depends on the first.
|
|
mosaic fleet backlog create --id A1 --title "schema" --priority 5
|
|
mosaic fleet backlog create --id A2 --title "service" --depends-on A1 --priority 9
|
|
|
|
# A2 is gated on A1, so claim returns A1 first.
|
|
mosaic fleet backlog claim --owner worker-1 --ttl 600 --json
|
|
|
|
# Finish A1; now A2 is ready.
|
|
mosaic fleet backlog complete --id A1
|
|
mosaic fleet backlog list --ready-only --json
|
|
|
|
# Recover stalled work.
|
|
mosaic fleet backlog reclaim --json
|
|
```
|