Commit Graph

466 Commits

Author SHA1 Message Date
Jarvis
dd10f0046b fix(fleet): bounded-poll send --verify to eliminate false unverifiable on slow TUIs
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Replace the single fixed 300ms capture-pane delay in `agent send --verify` with a
bounded polling loop. After sending, the loop polls `capture-pane` every 400ms
(VERIFY_POLL_INTERVAL_MS) up to a configurable total timeout (default 6000ms,
VERIFY_DEFAULT_TIMEOUT_MS). classifySendResult is called on each poll: accepted/draft
return immediately; unverifiable keeps polling until timeout, then fails closed with
the existing "no pane change after send" message.

New `--verify-timeout <ms>` option on `agent send` (default 6000ms documented).
Injectable SleepFn added to FleetCommandDeps for test isolation — no real sleeps in
tests. Exports VERIFY_POLL_INTERVAL_MS and VERIFY_DEFAULT_TIMEOUT_MS as constants.
classifySendResult and all other pure functions remain unchanged.

Tests: multi-poll acceptance on 2nd/3rd poll => exit 0; pane unchanged until timeout
=> exit 1; draft detected on first poll => exit 1. All 386 tests pass.

docs/fleet/PRD.md Known-limitations updated: verify now polls up to bounded timeout
(default ~6s, --verify-timeout); definitive acceptance still deferred to Phase-3
heartbeat-ack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 23:07:08 -05:00
Jarvis
8466ca2d81 fix(fleet): verify via pane-change diff + non-resizing watch
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/pr/ci Pipeline was canceled
Blocker fix: send --verify now captures a BEFORE snapshot immediately
before the send and an AFTER snapshot after the delay, then uses
classifySendResult(before, after) to classify. A wedged pane showing
stale non-empty content is no longer falsely reported as 'accepted' —
BEFORE==AFTER maps to 'unverifiable' (exit 1, "no pane change after
send"). Blank AFTER still fails closed as 'unverifiable'. Only
AFTER != BEFORE without a draft suffix counts as 'accepted' (exit 0).

Should-fix: agent watch now uses a GROUPED VIEWER SESSION instead of a
bare 'tmux attach -r' against the agent session. A bare attach lets the
viewer terminal shrink the agent's window; a grouped session has
independent sizing so the agent's window is never affected.
Sequence: new-session -d -t '=<agent>' -s '<agent>-watch-<pid>' (runner),
attach -r to viewer session (interactiveRunner), kill-session on detach
(runner). New builder functions exported: buildAgentWatchCreateViewerCommand,
buildAgentWatchAttachCommand, buildAgentWatchKillViewerCommand,
buildViewerSessionName. buildAgentWatchCommand kept but deprecated.

New exports: classifySendResult(before, after) — the testable classifier.

Tests added:
- classifySendResult unit suite (6 cases): accepted/draft/unverifiable/
  stale-pane/both-blank/before-blank-after-response
- send --verify regression: stale (before==after non-empty) => exit 1
- send --verify regression: blank AFTER => exit 1
- send --verify regression: draft after pane change => exit 1
- send --verify regression: changed non-draft => exit 0
- send --verify: 3-call sequence assertion (before-capture, send, after-capture)
- watch dispatch: grouped viewer session created/attached/killed; no bare
  attach against agent session; viewer name matches <agent>-watch-<pid>

PRD Known-limitations updated: pane-change check rationale, Phase-3
heartbeat-ack requirement, grouped-session watch design.

All gates pass: pnpm typecheck, pnpm lint, pnpm --filter @mosaicstack/mosaic test
(382 tests, 74 fleet), prettier --check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:57:00 -05:00
Jarvis
aec560162b fix(fleet): verify fails-closed on unverifiable + interactive watch
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/pr/ci Pipeline was canceled
- isSendAccepted now returns 'accepted' | 'draft' | 'unverifiable' (was bool)
- Blank/empty capture => 'unverifiable' => process.exitCode=1 with distinct
  "could not verify delivery (blank/no response captured)" message; previously
  blank was treated as success, violating FR-5 fail-closed semantics
- Draft line ('^> ') => process.exitCode=1 with "left as unsubmitted draft"
  message; distinct wording from unverifiable case
- agent watch now dispatched through injectable InteractiveRunner (stdio:inherit)
  instead of the capturing CommandRunner; tmux attach requires TTY passthrough
- Default spawnInteractive implementation uses node:child_process spawn with
  stdio:'inherit'; injectable via FleetCommandDeps.interactiveRunner for tests
- Removed buildSystemdIsActiveCommand (dead code — exported but unused)
- Tests: blank=>exitCode=1, draft=>exitCode=1, real response=>exitCode=0,
  watch dispatched through interactiveRunner not capturing runner
- PRD: added "Known limitations" section (heuristic verify, blank fails closed,
  non-pi/claude draft detection is best-effort, watch requires TTY passthrough)
- Code comment on isSendAccepted notes pi/claude-specific draft heuristic

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:45:22 -05:00
Jarvis
ddeb200fdf style(fleet): prettier-format workstream docs
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/pr/ci Pipeline was canceled
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:32:14 -05:00
Jarvis
11c4dbe6f3 docs(fleet): session-2 log — heartbeat live + launch-path/env findings
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:30:34 -05:00
Jarvis
c154ced6e5 docs(fleet): mark Phase-2 CLI tasks done (fleet ps/watch/send --verify live-verified)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:30:34 -05:00
Jarvis
cf304eebc3 feat(fleet): phase-2 observability — fleet ps + watch + send --verify
FR-1 fleet ps: joins systemd show (ActiveState/SubState/UnitFileState),
tmux list-panes (pid/command/dead/activity), and file-based heartbeat
(~/.config/mosaic/fleet/run/<name>.hb) into one table per roster agent.
Flags DRIFT (roster runtime ≠ actual pane command) and BOOT-ENABLE
(active but UnitFileState=disabled). --json output includes tenant_id
and host on every record (FR-6 zero-foreclosure for multi-tenant/host).

FR-3 agent watch: read-only tmux attach (-r flag) so the operator can
observe any session without injecting keystrokes or resizing the window.
Registered as a new verb alongside tail/send/reset in registerFleetAgentCommands.

FR-5 agent send --verify: after keystroke injection, captures the last 5
pane lines and checks for draft heuristic (last non-empty line starts
with '> '). Exits non-zero and writes to stderr if the message appears
unsubmitted. Default send behavior is unchanged when --verify is omitted.

New pure exported helpers (all unit-testable without real tmux/systemd):
buildSystemdShowCommand, buildTmuxListPanesCommand, buildAgentWatchCommand,
buildAgentVerifyAcceptedCommand, parseHeartbeat, parseSystemdShow,
parseTmuxListPanes, detectDrift, getDefaultTenantAndHost, isSendAccepted,
heartbeatPath. Added 31 new spec cases (62 total) covering exact command
construction, JSON shape, heartbeat parsing, drift detection, and verify flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:30:34 -05:00
Jarvis
c740c59359 docs(fleet): record dual-engine review + worktree discipline decisions
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:30:34 -05:00
Jarvis
b2071dc898 docs(fleet): north star + Phase-2 observability PRD/tasks (W-FLEET)
Establish Fleet workstream doctrine under mvp-20260312: north star (incl.
fleet-as-means-of-production), Phase-2 observability PRD, workstream tasks,
and scratchpad. Collision-safe: scoped to docs/fleet/, touches none of the
MVP single-writer control-plane files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
2026-06-20 22:30:34 -05:00
5118be74cb feat(framework): P3 — extract Constitution (L0) + gut AGENTS dispatcher (#575)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 03:20:32 +00:00
bf24066a49 feat(framework): P1+P2 — public sanitization + blocking CI gate (#572)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 02:40:11 +00:00
92316ab41e feat(framework): P0 — MIT license + executable-leak sanitization (#570)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 01:43:49 +00:00
b354bc8fae docs(framework): add agency & persistence patterns to config + guides (#543)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 01:43:36 +00:00
e834bbb83c fix(fleet): install executable tmux helpers (#568)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 22:27:46 +00:00
7498fcb20d fix(fleet): preserve agent env overrides on install (#567)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 21:50:46 +00:00
42d081613f chore(release): bump mosaic cli to 0.0.32 (#566)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 21:15:25 +00:00
b5c1381e45 fix(fleet): harden operator sends for release (#565)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 20:41:11 +00:00
6dfd78f643 feat(fleet): add local canary CLI (#563)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 17:49:01 +00:00
45e2c2aad8 docs: plan durable tmux fleet install (#557)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 16:19:19 +00:00
57919c38d8 fix(framework/tools): wrapper hardening — TLS validation, cred-path fallback, no-CI fast-exit (#551)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 10:16:38 +00:00
87f561c1f8 fix(launch): include Pi native skill roots in 'all' mode; dedup 'discover' force-loads (#556)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-19 19:58:09 +00:00
8c45857859 feat(launch): force-load fleet-critical Pi skills + reconcile skill docs (#555)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-19 18:31:02 +00:00
605221d42f docs(framework/tools): lead TOOLS.md with high-salience fleet-tools cheatsheet (#554)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was canceled
2026-06-19 18:03:03 +00:00
ee584ab48c fix(framework/tools): prettier-format woodpecker README — restore main format gate (#553)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-18 22:39:35 +00:00
ab4e138003 feat(framework/tools): orchestration helpers — lane-brief.sh + ci-wait.sh (#547)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline was canceled
2026-06-18 22:08:40 +00:00
719c6ac3db fix(framework/tools): eval injection, broken JSON, tmpfile leak (#549)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was canceled
2026-06-18 21:35:32 +00:00
b8807e60df feat(agent-reflection): durable kernel — reflection.v1 capture + risk-floor + Phase-0 (#545)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-16 21:35:40 +00:00
c461380a4a feat(mosaic-as): agent registration + scoped/revocable tokens (US-007) (#541)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-16 01:10:44 +00:00
98a771c8f8 Fix Gitea wrapper login resolution (#538)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-12 02:34:18 +00:00
bd9527c033 docs(framework): canonize merge-authority policy (hard gate 13 + E2E gate note) (#537)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-11 23:56:20 +00:00
aa221bf92e release(mosaic): bump @mosaicstack/mosaic 0.0.30 -> 0.0.31 (#534)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/tag/publish Pipeline was successful
mosaic-v0.0.31
2026-06-11 19:55:43 +00:00
799df40f4e feat(appservice): room provisioning (M4c) (#535)
Some checks failed
ci/woodpecker/push/publish Pipeline was canceled
ci/woodpecker/push/ci Pipeline was canceled
2026-06-11 19:50:55 +00:00
b79e9f32c6 chore(framework): canonize Vault-as-SSOT + ESO-default secrets policy (#519)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-11 19:07:00 +00:00
89d69eb23b docs: add mission control and coordination resilience docs (#511)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
2026-06-11 19:06:35 +00:00
59b611ba8a refactor(framework): thin-core prompt diet — cut injected contract ~53% (#529)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-06-11 18:10:42 +00:00
dfa0be42f6 feat(framework/tools): inter-agent tmux comms — agent-send.sh + addressing standard (#533)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
2026-06-11 18:01:44 +00:00
bb96a3f23e ci: publish mosaic-as appservice image (#532)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-10 23:00:38 +00:00
48b2f28e45 feat(appservice): mosaic-as daemon host + container (M4a) (#531)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-10 22:16:28 +00:00
8f09c910a9 feat(appservice): Matrix Application Service core library (M4a) (#530)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-10 21:23:25 +00:00
dde95a59b3 fix(pi): reduce startup skill-token overhead (#527)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-05 18:36:42 +00:00
821e19dcbb fix(mosaic-tools): roll up Gitea and Woodpecker wrapper fixes (#524)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-05-26 20:56:09 +00:00
755df9079e Merge pull request 'fix(db): bootstrap migrations on local-tier gateway startup' (#510) from fix/db-bootstrap-migrations into main 2026-05-04 22:13:14 +00:00
ac5650d9f9 fix(db): bootstrap migrations on local-tier gateway startup
Fresh `mosaic gateway install` (npm) left the gateway DB schema empty —
sign-in 500'd with `relation "users" does not exist`, and every entry
point (auth, bootstrap setup) failed because they all query the users
table first. Five stacked bugs on the local (PGlite) tier:

1. `packages/db/package.json` `files: ["dist"]` excluded the `drizzle/`
   SQL migrations from the published tarball.
2. `runMigrations()` only supports postgres-js — unusable for embedded
   PGlite.
3. `apps/gateway/src/database/database.module.ts` never invoked
   migrations at startup.
4. `createPgliteDb` didn't load pgvector, so migration 0001's
   `CREATE EXTENSION vector` failed.
5. Drizzle's PG migrator wraps every migration in one outer
   transaction, which trips Postgres' `check_safe_enum_use` on
   migration 0009 (`ALTER TYPE ADD VALUE 'pending'` → `SET DEFAULT
   'pending'` in the same tx).

Changes:
- Ship `drizzle/` in the published tarball.
- `createPgliteDb` loads `@electric-sql/pglite/vector`.
- New `runPgliteMigrations(handle)` walks the Drizzle journal and
  runs each statement-breakpoint chunk through PGlite's `client.exec()`
  (autocommit per statement). Records into `drizzle.__drizzle_migrations`
  for interop with the postgres-js path. Per-statement try/catch
  surfaces which statement of which migration failed.
- `DatabaseModule` runs migrations in `OnModuleInit` before
  `app.listen()`. Local tier: explicit `runPgliteMigrations` then
  `storageAdapter.migrate()`. Postgres tier: just `storageAdapter.migrate()`,
  which already calls `runMigrations(url)` internally — no double-call.
- Removed `packages/storage/src/test-utils/pglite-with-vector.ts`. The
  "intentionally not exported" rationale is moot now that migration
  0001 forces pgvector load anyway. The integration test uses
  `createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.

Tests: BetterAuth tables exist after migrate; idempotent (re-runs 0009);
partial-failure surfaces statement-level context and leaves no ledger row.

QA on a fresh PGlite install:
- `Applying PGlite schema migrations...` then `Initializing storage
  adapter (pglite)...` in startup log.
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500).
- `POST /api/bootstrap/setup` reaches Zod validator (was 500).

Scope: this PR fixes the local (PGlite) tier. Postgres-tier first
install still has the outer-transaction problem and a journal ordering
bug (0009's `when` < 0008's). Documented inline as TODO and in the
scratchpad — needs a separate change with real-Postgres validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:06:50 -05:00
bd83f86740 Merge pull request 'feat(federation): mTLS AuthGuard with OID-based grant resolution (FED-M3-03)' (#509) from feat/federation-m3-auth-guard into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-25 13:27:20 +00:00
Jarvis
0af3e218a1 fix(federation/auth-guard): remediate CRIT-1/CRIT-2 + HIGH-1..4 review findings
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
- CRIT-1: Validate cert subjectUserId against grant.subjectUserId from DB;
  use authoritative DB value in FederationContext
- CRIT-2: Add @Inject(GrantsService) decorator (tsx/esbuild requirement)
- HIGH-1: Validate UTF8String TLV tag, length, and bounds in OID parser
- HIGH-2: Collapse all 403 wire messages to a generic string to prevent
  grant enumeration; keep internal logger detail
- HIGH-3: Assert federation wire envelope shape in all guard tests
- HIGH-4: Regression test for subjectUserId cert/DB mismatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 06:33:37 -05:00
Jarvis
b01c9b3bb0 feat(federation): mTLS AuthGuard with OID-based grant resolution (FED-M3-03)
Adds FederationAuthGuard that validates inbound mTLS client certs on
federation API routes. Extracts custom OIDs (grantId, subjectUserId),
loads the grant+peer from DB in one query, asserts active status, and
validates cert serial as defense-in-depth. Attaches FederationContext
to requests on success and uses federation wire-format error envelopes
(not raw NestJS exceptions) for 401/403 responses.

New files:
- apps/gateway/src/federation/oid.util.ts — shared OID extraction (no dupe ASN.1 logic)
- apps/gateway/src/federation/server/federation-auth.guard.ts — guard impl
- apps/gateway/src/federation/server/federation-context.ts — FederationContext type + module augment
- apps/gateway/src/federation/server/index.ts — barrel export
- apps/gateway/src/federation/server/__tests__/federation-auth.guard.spec.ts — 11 unit tests

Modified:
- apps/gateway/src/federation/grants.service.ts — adds getGrantWithPeer() with join
- apps/gateway/src/federation/federation.module.ts — registers FederationAuthGuard as provider

Closes #462

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 06:33:37 -05:00
b67f2c9f08 Merge pull request 'feat(federation): outbound mTLS FederationClient (FED-M3-08)' (#508) from feat/federation-m3-client into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-24 04:30:29 +00:00
Jarvis
37675ae3f2 fix(federation/client): serialize cache fills, destroy evicted Agent, cover env-var guard
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
- HIGH-A: resolveEntry now uses promise-cache pattern so concurrent
  callers serialize on a single in-flight build, eliminating duplicate
  key material in heap and duplicate DB round-trips
- HIGH-B: flushPeer destroys the evicted undici Agent so stale TLS
  connections close on cert rotation
- MED-C: add regression test for PEER_MISCONFIGURED when
  STEP_CA_ROOT_CERT_PATH is unset

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:56:57 -05:00
Jarvis
a4a6769a6d fix(federation/client): pin Step-CA root, fix lockfile, harden cache test
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
CRIT-1: regenerate pnpm-lock.yaml so apps/gateway resolves undici@7.24.6
(prior PR pushed package.json without lockfile update; CI failed with
ERR_PNPM_OUTDATED_LOCKFILE). Incidentally cleans 57 lines of stale
peer-dep entries.

CRIT-2: cache-hit test no longer swallows resolveEntry errors. Calls the
private method directly twice and asserts identity equality plus a
single DB select, removing the silent-failure path the prior assertion
allowed.

HIGH-1: mTLS Agent now pins Step-CA root via STEP_CA_ROOT_CERT_PATH.
Without the env var resolveEntry throws PEER_MISCONFIGURED, refusing to
dial peers against the public trust store. PEM is read once and cached
on the service instance.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 22:30:09 -05:00
Jarvis
21650fb194 feat(federation): outbound mTLS FederationClient (FED-M3-08)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
Implements FederationClientService — a NestJS injectable that dials peer
gateways over mTLS (undici Agent with cert+sealed-key from federation_peers),
invokes list/get/capabilities verbs, validates responses via Zod, and surfaces
all failure modes as typed FederationClientError with a coherent error code
taxonomy (PEER_NOT_FOUND, PEER_INACTIVE, PEER_MISCONFIGURED, NETWORK,
FORBIDDEN, HTTP_{status}, INVALID_RESPONSE).

Per-peer Agent instances are cached in a Map for the service lifetime;
flushPeer(peerId) invalidates the cache for M5/M6 cert rotation and
revocation events.

Wired into FederationModule providers + exports so QuerySourceService
(M3-09) can inject it.

13 unit tests covering all required scenarios via undici MockAgent +
real sealClientKey/unsealClientKey round-trip.

Closes #462

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:16:52 -05:00