Add an optional triage class to inter-agent messages so a comms daemon (M8)
can route deliver-vs-log-and-drop from an exact field instead of re-deriving
intent from the message body. ~35% of Mos's queued fan-in is agent-send
traffic; this makes that slice self-declaring on the tmux transport today,
with zero dependency on the M7/Matrix cutover.
Producer:
-C CLASS / --class CLASS / --class=CLASS, c in
{terminal-log, actionable, human, reaction}.
When SET, the preamble carries a ` class=<c>` token INSIDE the bracket:
[src -> dst class=terminal-log] msg
When OMITTED, NO token is emitted — the preamble is byte-for-byte identical
to the classic format (regression bar). Consumers treat an absent class as
'actionable' (fail-safe: the agent still sees it). Invalid/empty class => exit 3.
Consumer grammar (daemon mirrors this exactly):
^\[(\S+) -> (\S+?)(?: class=(terminal-log|actionable|human|reaction))?\] (.*)$
Tests (agent-send.test.sh, 11 assertions, all green; shellcheck clean):
- REGRESSION: no --class is byte-identical to origin/main (proven via od -tx1
diff of the on-wire payload, not just an expected string).
- space / equals / -C short forms all parse identically.
- invalid class and valueless --class both exit 3 with nothing sent.
- the documented consumer regex round-trips every class + the classic line.
SENDER is now env-overridable (AGENT_SEND_SENDER) purely for test injection;
production callers never set it, so behavior is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kt2D8TsnDwhtzEAPijsNmR
Fresh `mosaic gateway install` (npm) left the gateway DB schema empty —
sign-in 500'd with `relation "users" does not exist`, and every entry
point (auth, bootstrap setup) failed because they all query the users
table first. Five stacked bugs on the local (PGlite) tier:
1. `packages/db/package.json` `files: ["dist"]` excluded the `drizzle/`
SQL migrations from the published tarball.
2. `runMigrations()` only supports postgres-js — unusable for embedded
PGlite.
3. `apps/gateway/src/database/database.module.ts` never invoked
migrations at startup.
4. `createPgliteDb` didn't load pgvector, so migration 0001's
`CREATE EXTENSION vector` failed.
5. Drizzle's PG migrator wraps every migration in one outer
transaction, which trips Postgres' `check_safe_enum_use` on
migration 0009 (`ALTER TYPE ADD VALUE 'pending'` → `SET DEFAULT
'pending'` in the same tx).
Changes:
- Ship `drizzle/` in the published tarball.
- `createPgliteDb` loads `@electric-sql/pglite/vector`.
- New `runPgliteMigrations(handle)` walks the Drizzle journal and
runs each statement-breakpoint chunk through PGlite's `client.exec()`
(autocommit per statement). Records into `drizzle.__drizzle_migrations`
for interop with the postgres-js path. Per-statement try/catch
surfaces which statement of which migration failed.
- `DatabaseModule` runs migrations in `OnModuleInit` before
`app.listen()`. Local tier: explicit `runPgliteMigrations` then
`storageAdapter.migrate()`. Postgres tier: just `storageAdapter.migrate()`,
which already calls `runMigrations(url)` internally — no double-call.
- Removed `packages/storage/src/test-utils/pglite-with-vector.ts`. The
"intentionally not exported" rationale is moot now that migration
0001 forces pgvector load anyway. The integration test uses
`createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
Tests: BetterAuth tables exist after migrate; idempotent (re-runs 0009);
partial-failure surfaces statement-level context and leaves no ledger row.
QA on a fresh PGlite install:
- `Applying PGlite schema migrations...` then `Initializing storage
adapter (pglite)...` in startup log.
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500).
- `POST /api/bootstrap/setup` reaches Zod validator (was 500).
Scope: this PR fixes the local (PGlite) tier. Postgres-tier first
install still has the outer-transaction problem and a journal ordering
bug (0009's `when` < 0008's). Documented inline as TODO and in the
scratchpad — needs a separate change with real-Postgres validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CRIT-1: Validate cert subjectUserId against grant.subjectUserId from DB;
use authoritative DB value in FederationContext
- CRIT-2: Add @Inject(GrantsService) decorator (tsx/esbuild requirement)
- HIGH-1: Validate UTF8String TLV tag, length, and bounds in OID parser
- HIGH-2: Collapse all 403 wire messages to a generic string to prevent
grant enumeration; keep internal logger detail
- HIGH-3: Assert federation wire envelope shape in all guard tests
- HIGH-4: Regression test for subjectUserId cert/DB mismatch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds FederationAuthGuard that validates inbound mTLS client certs on
federation API routes. Extracts custom OIDs (grantId, subjectUserId),
loads the grant+peer from DB in one query, asserts active status, and
validates cert serial as defense-in-depth. Attaches FederationContext
to requests on success and uses federation wire-format error envelopes
(not raw NestJS exceptions) for 401/403 responses.
New files:
- apps/gateway/src/federation/oid.util.ts — shared OID extraction (no dupe ASN.1 logic)
- apps/gateway/src/federation/server/federation-auth.guard.ts — guard impl
- apps/gateway/src/federation/server/federation-context.ts — FederationContext type + module augment
- apps/gateway/src/federation/server/index.ts — barrel export
- apps/gateway/src/federation/server/__tests__/federation-auth.guard.spec.ts — 11 unit tests
Modified:
- apps/gateway/src/federation/grants.service.ts — adds getGrantWithPeer() with join
- apps/gateway/src/federation/federation.module.ts — registers FederationAuthGuard as provider
Closes#462
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- HIGH-A: resolveEntry now uses promise-cache pattern so concurrent
callers serialize on a single in-flight build, eliminating duplicate
key material in heap and duplicate DB round-trips
- HIGH-B: flushPeer destroys the evicted undici Agent so stale TLS
connections close on cert rotation
- MED-C: add regression test for PEER_MISCONFIGURED when
STEP_CA_ROOT_CERT_PATH is unset
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRIT-1: regenerate pnpm-lock.yaml so apps/gateway resolves undici@7.24.6
(prior PR pushed package.json without lockfile update; CI failed with
ERR_PNPM_OUTDATED_LOCKFILE). Incidentally cleans 57 lines of stale
peer-dep entries.
CRIT-2: cache-hit test no longer swallows resolveEntry errors. Calls the
private method directly twice and asserts identity equality plus a
single DB select, removing the silent-failure path the prior assertion
allowed.
HIGH-1: mTLS Agent now pins Step-CA root via STEP_CA_ROOT_CERT_PATH.
Without the env var resolveEntry throws PEER_MISCONFIGURED, refusing to
dial peers against the public trust store. PEM is read once and cached
on the service instance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements FederationClientService — a NestJS injectable that dials peer
gateways over mTLS (undici Agent with cert+sealed-key from federation_peers),
invokes list/get/capabilities verbs, validates responses via Zod, and surfaces
all failure modes as typed FederationClientError with a coherent error code
taxonomy (PEER_NOT_FOUND, PEER_INACTIVE, PEER_MISCONFIGURED, NETWORK,
FORBIDDEN, HTTP_{status}, INVALID_RESPONSE).
Per-peer Agent instances are cached in a Map for the service lifetime;
flushPeer(peerId) invalidates the cache for M5/M6 cert rotation and
revocation events.
Wired into FederationModule providers + exports so QuerySourceService
(M3-09) can inject it.
13 unit tests covering all required scenarios via undici MockAgent +
real sealClientKey/unsealClientKey round-trip.
Closes#462
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>