FED-M5: cache + offline degradation + OTEL #464

Open
opened 2026-04-19 22:02:10 +00:00 by jason.woltje · 0 comments
Owner

Epic: Federation v1 — see docs/federation/PRD.md and docs/federation/MILESTONES.md.

Goal

Sessions feel fast and stay useful when the peer is slow or down.

Scope

  • In-memory response cache keyed by (grant_id, verb, resource, query_hash), TTL 30s default
  • Cache NOT used for search; only list + get
  • Cache flushed on cert rotation and grant revocation
  • Circuit breaker per peer: after N failures, fast-fail for cooldown window
  • _source tagging extended with _cached: true when served from cache
  • Agent-visible "federation offline for <peer>" signal emitted once per session per peer
  • OTEL spans: federation.request with attrs grant_id, peer, verb, resource, outcome, latency_ms, cached
  • W3C traceparent propagated across mTLS boundary (both directions)
  • mosaic federation status CLI subcommand

Deliverables

  • apps/gateway/src/federation/client/response-cache.service.ts
  • apps/gateway/src/federation/client/circuit-breaker.service.ts
  • apps/gateway/src/federation/observability/ (span helpers)
  • packages/mosaic/src/commands/federation/status.ts

Acceptance Tests

  • Two identical list calls within 30s: second served from cache, flagged _cached
  • search never cached: two identical searches both hit peer
  • After grant revocation, peer's cache flushed immediately
  • After N consecutive failures, circuit opens; subsequent requests fail-fast without network call
  • Circuit closes after cooldown and next success
  • With peer offline, session completes using local data, one "federation offline" signal surfaced
  • OTEL traces show spans on both gateways correlated by traceparent
  • mosaic federation status prints peer state, cert expiry, last success/failure, circuit state

Dependencies

Blocked by FED-M4. Can run in parallel with FED-M6.

Estimated budget

~20K tokens

Risk notes

Caching correctness under revocation must be provable — write tests that intentionally race revocation against cached hits.

**Epic:** Federation v1 — see `docs/federation/PRD.md` and `docs/federation/MILESTONES.md`. ## Goal Sessions feel fast and stay useful when the peer is slow or down. ## Scope - In-memory response cache keyed by `(grant_id, verb, resource, query_hash)`, TTL 30s default - Cache NOT used for `search`; only `list` + `get` - Cache flushed on cert rotation and grant revocation - Circuit breaker per peer: after N failures, fast-fail for cooldown window - `_source` tagging extended with `_cached: true` when served from cache - Agent-visible "federation offline for `<peer>`" signal emitted once per session per peer - OTEL spans: `federation.request` with attrs `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`, `cached` - W3C `traceparent` propagated across mTLS boundary (both directions) - `mosaic federation status` CLI subcommand ## Deliverables - `apps/gateway/src/federation/client/response-cache.service.ts` - `apps/gateway/src/federation/client/circuit-breaker.service.ts` - `apps/gateway/src/federation/observability/` (span helpers) - `packages/mosaic/src/commands/federation/status.ts` ## Acceptance Tests - [ ] Two identical `list` calls within 30s: second served from cache, flagged `_cached` - [ ] `search` never cached: two identical searches both hit peer - [ ] After grant revocation, peer's cache flushed immediately - [ ] After N consecutive failures, circuit opens; subsequent requests fail-fast without network call - [ ] Circuit closes after cooldown and next success - [ ] With peer offline, session completes using local data, one "federation offline" signal surfaced - [ ] OTEL traces show spans on both gateways correlated by `traceparent` - [ ] `mosaic federation status` prints peer state, cert expiry, last success/failure, circuit state ## Dependencies Blocked by **FED-M4**. Can run in parallel with **FED-M6**. ## Estimated budget ~20K tokens ## Risk notes Caching correctness under revocation must be provable — write tests that intentionally race revocation against cached hits.
jason.woltje added this to the Federation v1 milestone 2026-04-19 22:02:10 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaicstack/stack#464