Files
stack/docs/fleet/backlog-conventions.md
jason.woltje f852250419
Some checks failed
ci/woodpecker/push/ci-image Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was canceled
feat(fleet): native Mosaic backlog on @mosaicstack/db (atomic claim + TTL) (#657)
2026-06-24 14:55:10 +00:00

8.6 KiB

Fleet Backlog Conventions

The backlog is Mosaic's native backlog-of-record for fleet work. It is built end-to-end on Mosaic's own storage layer (@mosaicstack/db, drizzle/Postgres) and surfaced as mosaic fleet backlog <sub> --json.

Mosaic-native, no Hermes. This backlog REPLACES the former Hermes adapter. There is no runtime dependency on Hermes, hermes kanban, or ~/.hermes anywhere in this feature. Anything previously delegated to Hermes is recreated here on Mosaic's own Postgres storage layer.

Storage tier — PGlite by default, Postgres by config

The backlog uses the existing Mosaic storage layer; there is no new database engine (no sqlite, no raw client).

Condition Tier Data location
DATABASE_URL set Full server Postgres the configured database
PGLITE_DATA_DIR set (no URL) Embedded PGlite that directory
neither (default) Embedded PGlite ~/.config/mosaic/fleet/backlog

PGlite is real Postgres semantics in-process — including the row locks the atomic claim relies on — so the same code runs on a laptop (embedded, single-host default) and on a full Postgres deployment. Switching tiers is config-only.

The schema (backlog table) is created automatically on first CLI use: runMigrations() for Postgres, runPgliteMigrations() for embedded PGlite.

Update safety

The embedded PGlite store lives under ~/.config/mosaic/fleet/backlog, which is listed in PRESERVE_PATHS in packages/mosaic/framework/install.sh. This means mosaic update (which runs the framework sync with rsync --delete) will not wipe the operator's backlog — same protection as the roster, per-agent env, and heartbeat run dir.

Card schema

A card is one row in the backlog table:

Column Type Notes
id text (PK) Stable, caller-supplied id (e.g. A4, fleet-001).
title text Required.
body text (nullable) Free-form description.
phase text (nullable) Board/phase grouping (see below).
priority int (default 0) Higher = sooner. Claim picks the max-priority ready card.
status enum ready | claimed | blocked | done.
depends_on jsonb string[] DAG edges — ids of cards this one depends on.
claim_owner text (nullable) Owner token of the active claim.
claim_ttl_seconds int (nullable) TTL of the active claim.
claimed_at timestamptz (null) When the claim was taken. claimed_at + ttl = expiry.
attempts int (default 0) Incremented each time the card is claimed.
idempotency_key text (unique, null) Dedups create; NULLs are distinct in Postgres.
acceptance jsonb (nullable) Acceptance criteria (array of strings or object).
created_at timestamptz
updated_at timestamptz

depends_on is modeled as a jsonb array column rather than a separate edge table. Justification: it matches the repo's existing style (e.g. tasks.tags, agents.skills, routing_rules.conditions are all jsonb arrays), keeps a card self-contained, and the DAG is small (per-card dependency lists), so a join table would add ceremony without benefit.

Board / phase convention

phase is a free-form grouping string used as the board column / milestone label (e.g. M1, fleet, infra). list --phase <phase> filters to one board lane. priority orders cards within the ready pool regardless of phase.

Status lifecycle

            create
              │
              ▼
   ┌──────► ready ───── claim ─────► claimed ───── complete ─────► done
   │          │                         │
   │       block                  reclaim (TTL expiry or --id)
   │          ▼                         │
   │       blocked                      └──────────────────────────┘ (back to ready)
   └──────────┘  (reclaim / re-create can return a card to ready)
  • ready — eligible to be claimed once every depends_on card is done.
  • claimed — a worker holds it; claim_owner + claimed_at set.
  • blocked — explicitly parked; never auto-claimed.
  • done — completed; satisfies dependents.

Atomic claim (FOR UPDATE SKIP LOCKED) + TTL

claim is atomic. Inside a single transaction it locks candidate ready rows with SELECT ... FOR UPDATE SKIP LOCKED (via the drizzle sql operator), picks the highest-priority deps-satisfied card, and flips it to claimed. Because a row already locked by a concurrent claimer is skipped, two claimers can never both win the same card — the loser falls through to the next candidate or gets null. (Proven by the concurrency tests in packages/db/src/backlog.spec.ts.)

  • Deps gate: a card is only claimable when every id in depends_on is done.
  • TTL: claim --ttl <sec> (default 900s) records claim_ttl_seconds.
  • reclaim: releases claims whose claimed_at + ttl is in the past (expired) back to ready, clearing the claim fields. reclaim --id <id> force-releases a specific card regardless of expiry. This is how a crashed worker's card returns to the pool.

CLI — mosaic fleet backlog <sub> --json

All subcommands support --json.

Subcommand Purpose
create --id --title [--body --phase --priority --depends-on --acceptance --idempotency-key] Create a card; idempotency_key dedups (repeat returns the existing card).
list [--status --phase --ready-only] List cards. --ready-only = status ready AND all deps done.
claim --owner [--ttl <sec> --id <id>] Atomically claim the highest-priority ready card (or --id). Returns the card or null.
reclaim [--id <id>] Release expired claims (or a specific card) back to ready.
link --from --to Add a depends_on edge (--from depends on --to).
stats Counts by status, oldest-ready age, expired-claim count.
block --id Set a card to blocked.
complete --id Set a card to done (releases any claim).

Example

# Seed two cards, the second depends on the first.
mosaic fleet backlog create --id A1 --title "schema" --priority 5
mosaic fleet backlog create --id A2 --title "service" --depends-on A1 --priority 9

# A2 is gated on A1, so claim returns A1 first.
mosaic fleet backlog claim --owner worker-1 --ttl 600 --json

# Finish A1; now A2 is ready.
mosaic fleet backlog complete --id A1
mosaic fleet backlog list --ready-only --json

# Recover stalled work.
mosaic fleet backlog reclaim --json