Compare commits

..

4 Commits

Author SHA1 Message Date
Jarvis
4cbb5eff8e docs(federation): S21 tracking — DEPLOY-01/02 done, IMG-FIX in flight, M2-01 in remediation
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- TASKS.md: mark DEPLOY-01 (image verify) + DEPLOY-02 (stack template, PR #485) done
- TASKS.md: add DEPLOY-IMG-FIX row (gateway Dockerfile pnpm-deploy fix in flight)
- TASKS.md: DEPLOY-03/04 → blocked on IMG-FIX; M2-01 → needs-qa (PR #486 in remediation)
- scratchpads/mvp-20260312.md: Session 20 entry (M2 kickoff, workstream split, Portainer access)
- scratchpads/mvp-20260312.md: Session 21 entry (DEPLOY-02 merged, gateway image runtime bug
  discovered via stripped-container test, M2-01 review verdict + remediation in flight,
  process notes on branch races + worktree isolation rule)
2026-04-21 20:52:20 -05:00
4dbd429203 feat(deploy): portainer stack template for federation test instances [DEPLOY-02] (#485)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-22 01:34:44 +00:00
b985d7bfe2 docs(federation): M2 mission planning — TASKS decomposition + manifest update (#483)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-22 01:24:00 +00:00
45e8f02c91 feat(mosaic-portainer): PORTAINER_INSECURE flag for self-signed TLS (#484)
Some checks failed
ci/woodpecker/push/publish Pipeline failed
ci/woodpecker/push/ci Pipeline failed
2026-04-22 01:21:54 +00:00
20 changed files with 429 additions and 4111 deletions

View File

@@ -0,0 +1,70 @@
# deploy/portainer/
Portainer stack templates for Mosaic Stack deployments.
## Files
| File | Purpose |
| -------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `federated-test.stack.yml` | Docker Swarm stack for federation end-to-end test instances (`mos-test-1.woltje.com`, `mos-test-2.woltje.com`) |
---
## federated-test.stack.yml
A self-contained Swarm stack that boots a federated-tier Mosaic gateway with co-located Postgres 17 (pgvector) and Valkey 8. This is a **test template** — production deployments will use a separate template with stricter resource limits and Docker secrets.
### Deploy via Portainer UI
1. Log into Portainer.
2. Navigate to **Stacks → Add stack**.
3. Set a stack name matching `STACK_NAME` below (e.g. `mos-test-1`).
4. Choose **Web editor** and paste the contents of `federated-test.stack.yml`.
5. Scroll to **Environment variables** and add each variable listed below.
6. Click **Deploy the stack**.
### Required environment variables
| Variable | Example | Notes |
| -------------------- | --------------------------------------- | -------------------------------------------------------- |
| `STACK_NAME` | `mos-test-1` | Unique per stack — used in Traefik router/service names. |
| `HOST_FQDN` | `mos-test-1.woltje.com` | Fully-qualified hostname served by this stack. |
| `POSTGRES_PASSWORD` | _(generate randomly)_ | Database password. Do **not** reuse between stacks. |
| `BETTER_AUTH_SECRET` | _(generate: `openssl rand -base64 32`)_ | BetterAuth session signing key. |
| `BETTER_AUTH_URL` | `https://mos-test-1.woltje.com` | Public base URL of the gateway. |
Optional variables (uncomment in the YAML or set in Portainer):
| Variable | Notes |
| ----------------------------- | ---------------------------------------------------------- |
| `ANTHROPIC_API_KEY` | Enable Claude models. |
| `OPENAI_API_KEY` | Enable OpenAI models. |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Forward traces to a collector (e.g. `http://jaeger:4318`). |
### Required external resources
Before deploying, ensure the following exist on the Swarm:
1. **`traefik-public` overlay network** — shared network Traefik uses to route traffic to stacks.
```bash
docker network create --driver overlay --attachable traefik-public
```
2. **`letsencrypt` cert resolver** — configured in the Traefik Swarm stack. The stack template references `tls.certresolver=letsencrypt`; the name must match your Traefik config.
3. **DNS A record** — `${HOST_FQDN}` must resolve to the Swarm ingress IP (or a Cloudflare-proxied address pointing there).
### Deployed instances
| Stack name | HOST_FQDN | Purpose |
| ------------ | ----------------------- | ---------------------------------- |
| `mos-test-1` | `mos-test-1.woltje.com` | DEPLOY-03 — first federation peer |
| `mos-test-2` | `mos-test-2.woltje.com` | DEPLOY-04 — second federation peer |
### Image
The gateway image is pinned by digest to `fed-v0.1.0-m1` (verified in DEPLOY-01). Update the digest in the YAML when promoting a new build — never use `:latest` or a mutable tag in Swarm.
### Notes
- This template boots a **vanilla M1-baseline gateway** in federated tier. Federation grants (Step-CA, mTLS) are M2+ scope and not included here.
- Each stack gets its own Postgres volume (`postgres-data`) and Valkey volume (`valkey-data`) scoped to the stack name by Swarm.
- `depends_on` is honoured by Compose but ignored by Swarm — healthchecks on Postgres and Valkey ensure the gateway retries until they are ready.

View File

@@ -0,0 +1,147 @@
# deploy/portainer/federated-test.stack.yml
#
# Portainer / Docker Swarm stack template — federated-tier test instance
#
# PURPOSE
# Deploys a single federated-tier Mosaic gateway with co-located Postgres
# (pgvector) and Valkey for end-to-end federation testing. Intended for
# mos-test-1.woltje.com and mos-test-2.woltje.com (DEPLOY-03/04).
#
# REQUIRED ENV VARS (set per-stack in Portainer → Stacks → Environment variables)
# STACK_NAME Unique name for Traefik router/service labels.
# Examples: mos-test-1, mos-test-2
# HOST_FQDN Fully-qualified domain name served by this stack.
# Examples: mos-test-1.woltje.com, mos-test-2.woltje.com
# POSTGRES_PASSWORD Database password — set per stack; do NOT commit a default.
# BETTER_AUTH_SECRET Random 32-char string for BetterAuth session signing.
# Generate: openssl rand -base64 32
# BETTER_AUTH_URL Public gateway base URL, e.g. https://mos-test-1.woltje.com
#
# OPTIONAL ENV VARS (uncomment and set in Portainer to enable features)
# ANTHROPIC_API_KEY sk-ant-...
# OPENAI_API_KEY sk-...
# OTEL_EXPORTER_OTLP_ENDPOINT http://<collector>:4318
# OTEL_SERVICE_NAME (default: mosaic-gateway)
#
# REQUIRED EXTERNAL RESOURCES
# traefik-public Docker overlay network — must exist before deploying.
# Create: docker network create --driver overlay --attachable traefik-public
# letsencrypt Traefik cert resolver configured on the Swarm manager.
# DNS A record ${HOST_FQDN} → Swarm ingress IP (or Cloudflare proxy).
#
# IMAGE
# Pinned to digest fed-v0.1.0-m1 (DEPLOY-01 verified).
# Update digest here when promoting a new build.
#
# NOTE: This is a TEST template — production deployments use a separate
# parameterised template with stricter resource limits and secrets.
version: '3.9'
services:
gateway:
image: git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec
# Tag for human reference: fed-v0.1.0-m1
environment:
# ── Tier ───────────────────────────────────────────────────────────────
MOSAIC_TIER: federated
# ── Database ───────────────────────────────────────────────────────────
DATABASE_URL: postgres://gateway:${POSTGRES_PASSWORD}@postgres:5432/mosaic
# ── Queue ──────────────────────────────────────────────────────────────
VALKEY_URL: redis://valkey:6379
# ── Gateway ────────────────────────────────────────────────────────────
GATEWAY_PORT: '3000'
GATEWAY_CORS_ORIGIN: https://${HOST_FQDN}
# ── Auth ───────────────────────────────────────────────────────────────
BETTER_AUTH_SECRET: ${BETTER_AUTH_SECRET}
BETTER_AUTH_URL: https://${HOST_FQDN}
# ── Observability ──────────────────────────────────────────────────────
OTEL_SERVICE_NAME: ${STACK_NAME:-mosaic-gateway}
# OTEL_EXPORTER_OTLP_ENDPOINT: http://<collector>:4318
# ── AI Providers (uncomment to enable) ─────────────────────────────────
# ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
# OPENAI_API_KEY: ${OPENAI_API_KEY}
networks:
- federated-test
- traefik-public
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- 'traefik.enable=true'
- 'traefik.docker.network=traefik-public'
- 'traefik.http.routers.${STACK_NAME}.rule=Host(`${HOST_FQDN}`)'
- 'traefik.http.routers.${STACK_NAME}.entrypoints=websecure'
- 'traefik.http.routers.${STACK_NAME}.tls=true'
- 'traefik.http.routers.${STACK_NAME}.tls.certresolver=letsencrypt'
- 'traefik.http.services.${STACK_NAME}.loadbalancer.server.port=3000'
healthcheck:
test: ['CMD', 'wget', '-qO-', 'http://localhost:3000/health']
interval: 30s
timeout: 5s
retries: 3
start_period: 20s
depends_on:
- postgres
- valkey
postgres:
image: pgvector/pgvector:pg17
environment:
POSTGRES_USER: gateway
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: mosaic
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- federated-test
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U gateway']
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
valkey:
image: valkey/valkey:8-alpine
volumes:
- valkey-data:/data
networks:
- federated-test
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
healthcheck:
test: ['CMD', 'valkey-cli', 'ping']
interval: 10s
timeout: 3s
retries: 5
start_period: 5s
volumes:
postgres-data:
valkey-data:
networks:
federated-test:
driver: overlay
traefik-public:
external: true

View File

@@ -7,13 +7,22 @@
**ID:** federation-v1-20260419
**Statement:** Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management.
**Phase:** M1 complete — federated tier infrastructure ready for testing
**Current Milestone:** FED-M2 (next; deferred to mission planning)
**Phase:** M2 active — Step-CA + grant schema + admin CLI; parallel test-deploy workstream stood up
**Current Milestone:** FED-M2
**Progress:** 1 / 7 milestones
**Status:** active
**Last Updated:** 2026-04-19 (M1 complete; tag `fed-v0.1.0-m1`)
**Last Updated:** 2026-04-21 (M2 decomposed; mos-test-1/-2 designated as federation E2E test hosts)
**Parent Mission:** None — new mission
## Test Infrastructure
| Host | Role | Image | Tier |
| ----------------------- | ----------------------------------- | ------------------------------------- | --------- |
| `mos-test-1.woltje.com` | Federation Server A (querying side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
| `mos-test-2.woltje.com` | Federation Server B (serving side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
These are TEST hosts for federation E2E (M3+). Distinct from PRD AC-12 production targets (`woltje.com``uscllc.com`). Deployment workstream tracked in `docs/federation/TASKS.md` under FED-M2-DEPLOY-\*.
## Context
Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign:
@@ -54,7 +63,7 @@ Key design references:
| # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------ | --------------------------------------------- | ----------- | ------------------ | ----- | ---------- | ---------- |
| 1 | FED-M1 | Federated tier infrastructure | done | (12 PRs #470-#481) | #460 | 2026-04-19 | 2026-04-19 |
| 2 | FED-M2 | Step-CA + grant schema + admin CLI | not-started | — | #461 | | — |
| 2 | FED-M2 | Step-CA + grant schema + admin CLI | in-progress | (decomposition) | #461 | 2026-04-21 | — |
| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — |
| 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — |
| 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — |
@@ -83,6 +92,10 @@ Key design references:
## Next Step
FED-M1 complete (12 PRs #470-#481, tag `fed-v0.1.0-m1`). Federated tier infrastructure is testable end-to-end: see `docs/federation/SETUP.md` and `docs/guides/migrate-tier.md`.
FED-M2 active. Decomposition landed in `docs/federation/TASKS.md` (M2-01..M2-13 code workstream + DEPLOY-01..DEPLOY-05 parallel test-deploy workstream, ~88K total). Tracking issue #482.
Begin FED-M2 (Step-CA + grant schema + admin CLI) when planning is greenlit. Issue #461 tracks scope; orchestrator decomposes M2 into per-task rows in `docs/federation/TASKS.md` at the start of M2.
Parallel execution plan:
- **CODE workstream**: M2-01 (DB migration) starts immediately — sonnet subagent on `feat/federation-m2-schema`. Then M2-02 → M2-09 sequentially with M2-04/M2-05/M2-06/M2-07 having interleaved CA/storage/grant dependencies.
- **DEPLOY workstream**: DEPLOY-01 (image verify) → DEPLOY-02 (stack template) → DEPLOY-03/04 (mos-test-1/-2 deploy) → DEPLOY-05 (TEST-INFRA.md). Gated on Portainer wrapper PR (`PORTAINER_INSECURE` flag) merging first.
- **Re-converge** at M2-10 (E2E test) once both workstreams ready.

View File

@@ -36,9 +36,52 @@ Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No
---
## Pre-M2 — Test deployment infrastructure (FED-M2-DEPLOY)
Goal: Two federated-tier gateways stood up on Portainer at `mos-test-1.woltje.com` and `mos-test-2.woltje.com` running the M1 release (`gateway:fed-v0.1.0-m1`). This is the test bed for M2 enrollment work and the M3 federation E2E harness. No federation logic exercised yet — pure infrastructure validation.
> **Why now:** M2 enrollment requires a real second gateway to test peer-add flows; standing the test hosts up before M2 code lands gives both code and deployment streams a fast feedback loop.
> **Parallelizable:** This workstream runs in parallel with the M2 code workstream (M2-01 → M2-13). They re-converge at M2-10 (E2E test).
> **Tracking issue:** #482.
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | ------ | ------------------------------------- | ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-DEPLOY-01 | done | Verify `gateway:fed-v0.1.0-m1` image was published by `.woodpecker/publish.yml` on tag push; if not, investigate and remediate. Document image URI in deployment artifact. | #482 | sonnet | (verified inline, no PR) | — | 2K | Tag exists; digest `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec` captured for digest-pinned deploys. |
| FED-M2-DEPLOY-02 | done | Author Portainer git-stack compose file `deploy/portainer/federated-test.stack.yml` (gateway + PG-pgvector + Valkey, env-driven). Use immutable tag, not `latest`. | #482 | sonnet | feat/federation-deploy-stack-template | DEPLOY-01 | 5K | Shipped in PR #485. Digest-pinned. Env: STACK_NAME, HOST_FQDN, POSTGRES_PASSWORD, BETTER_AUTH_SECRET, BETTER_AUTH_URL. |
| FED-M2-DEPLOY-IMG-FIX | in-progress | Gateway image runtime broken (ERR_MODULE_NOT_FOUND for `dotenv`); Dockerfile copies `.pnpm/` store but not `apps/gateway/node_modules` symlinks. Switch to `pnpm deploy` for self-contained runtime. | #482 | sonnet | (subagent in flight) | DEPLOY-02 | 4K | Subagent `a78a9ab0ddae91fbc` in flight. Triggers Kaniko rebuild on merge; capture new digest; bump stack template in follow-up PR before redeploy. |
| FED-M2-DEPLOY-03 | blocked | Deploy stack to mos-test-1.woltje.com via `~/.config/mosaic/tools/portainer/`. Verify M1 acceptance: federated-tier boot succeeds; `mosaic gateway doctor --json` returns green; pgvector `vector(3)` round-trip works. | #482 | sonnet | feat/federation-deploy-test-1 | IMG-FIX | 3K | Stack created on Portainer endpoint 3 (Swarm `local`), but blocked on image fix. Container fails on boot until IMG-FIX merges + redeploy. |
| FED-M2-DEPLOY-04 | blocked | Deploy stack to mos-test-2.woltje.com via Portainer wrapper. Same M1 acceptance probes as DEPLOY-03. | #482 | sonnet | feat/federation-deploy-test-2 | IMG-FIX | 3K | Same status as DEPLOY-03. Stack created; blocked on image fix. |
| FED-M2-DEPLOY-05 | not-started | Document deployment in `docs/federation/TEST-INFRA.md`: hosts, image tags, secrets sourcing, redeploy procedure, teardown. Update MISSION-MANIFEST with deployment status. | #482 | haiku | feat/federation-deploy-docs | DEPLOY-03,04 | 3K | Operator-facing doc; mentions but does not duplicate `tools/portainer/README.md`. |
**Deploy workstream estimate:** ~16K tokens
---
## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2)
_Deferred to mission planning when M1 is complete. Issue #461 tracks scope._
Goal: An admin can create a federation grant; counterparty enrolls; cert is signed by Step-CA with SAN OIDs for `grantId` + `subjectUserId`. No runtime federation traffic flows yet (that's M3).
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------------------- | ---------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-01 | needs-qa | DB migration: `federation_grants`, `federation_peers`, `federation_audit_log` tables + enum types (`grant_status`, `peer_state`). Drizzle schema + migration generation; migration tests. | #461 | sonnet | feat/federation-m2-schema | — | 5K | PR #486 open. First review NEEDS CHANGES (missing DESC indexes + reserved cols). Remediation subagent `a673dd9355dc26f82` in flight in worktree `agent-a4404ac1`. |
| FED-M2-02 | not-started | Add Step-CA sidecar to `docker-compose.federated.yml`: official `smallstep/step-ca` image, persistent CA volume, JWK provisioner config baked into init script. | #461 | sonnet | feat/federation-m2-stepca | DEPLOY-02 | 4K | Profile-gated under `federated`. CA password from secret; dev compose uses dev-only password file. |
| FED-M2-03 | not-started | Scope JSON schema + validator: `resources` allowlist, `excluded_resources`, `include_teams`, `include_personal`, `max_rows_per_query`. Vitest unit tests for valid + invalid scopes. | #461 | sonnet | feat/federation-m2-scope-schema | — | 4K | Validator independent of CA — reusable from grant CRUD + (later) M3 scope enforcement. |
| FED-M2-04 | not-started | `apps/gateway/src/federation/ca.service.ts`: Step-CA client (CSR submission, OID-bearing cert retrieval). Mocked + integration tests against real Step-CA container. | #461 | sonnet | feat/federation-m2-ca-service | M2-02 | 6K | SAN OIDs: `grantId` (custom OID 1.3.6.1.4.1.99999.1) + `subjectUserId` (1.3.6.1.4.1.99999.2). Document OID assignments in PRD/SETUP. |
| FED-M2-05 | not-started | Sealed storage for `client_key_pem` reusing existing `provider_credentials` sealing key. Tests prove DB-at-rest is ciphertext, not PEM. Key rotation path documented (deferred impl). | #461 | sonnet | feat/federation-m2-key-sealing | M2-01 | 5K | Separate from M2-06 to keep crypto seam isolated; reviewer focus is sealing only. |
| FED-M2-06 | not-started | `grants.service.ts`: CRUD + status transitions (`pending``active``revoked`); integrates M2-03 (scope) + M2-05 (sealing). Unit tests cover all transitions including invalid ones. | #461 | sonnet | feat/federation-m2-grants-service | M2-03, M2-05 | 6K | Business logic only — CSR + cert work delegated to M2-04. Revocation handler is M6. |
| FED-M2-07 | not-started | `enrollment.controller.ts`: short-lived single-use token endpoint; CSR signing; updates grant `pending``active`; emits enrollment audit (table-only write, M4 tightens). | #461 | sonnet | feat/federation-m2-enrollment | M2-04, M2-06 | 6K | Tokens single-use with 410 on replay; tokens TTL'd at 15min; rate-limited at request layer (M4 introduces guard, M2 uses simple lock). |
| FED-M2-08 | not-started | Admin CLI: `mosaic federation grant create/list/show` + `peer add/list`. Integration with grants.service (no API duplication). Help output + machine-readable JSON option. | #461 | sonnet | feat/federation-m2-cli | M2-06, M2-07 | 7K | `peer add <enrollment-url>` is the client-side flow; resolves enrollment URL → CSR → store sealed key + cert. |
| FED-M2-09 | not-started | Integration tests covering MILESTONES.md M2 acceptance tests #1, #2, #3, #5, #7, #8 (single-gateway suite). Real Step-CA container; vitest profile gated by `FEDERATED_INTEGRATION=1`. | #461 | sonnet | feat/federation-m2-integration | M2-08 | 8K | Tests #4 (cert OID match) + #6 (two-gateway peer-add) handled separately by M2-10 (E2E). |
| FED-M2-10 | not-started | E2E test against deployed mos-test-1 + mos-test-2 (or local two-gateway docker-compose if Portainer not ready): MILESTONES test #6 `peer add` yields `active` peer record with valid cert + key. | #461 | sonnet | feat/federation-m2-e2e | M2-08, DEPLOY-04 | 6K | Falls back to local docker-compose-two-gateways if remote test hosts not yet available. Documents both paths. |
| FED-M2-11 | not-started | Independent security review (sonnet, not author of M2-04/05/06/07): focus on single-use token replay, sealing leak surfaces, OID match enforcement, scope schema bypass paths. | #461 | sonnet | feat/federation-m2-security-review | M2-10 | 8K | Apply M1 two-round pattern. Reviewer should explicitly attempt enrollment-token replay, OID-spoofing CSR, and key leak in error messages. |
| FED-M2-12 | not-started | Docs update: `docs/federation/SETUP.md` Step-CA section; new `docs/federation/ADMIN-CLI.md` with grant/peer commands; scope schema reference; OID registration note. Runbook still M7-deferred. | #461 | haiku | feat/federation-m2-docs | M2-11 | 4K | Adds CA bootstrap section to SETUP.md with `docker compose --profile federated up step-ca` example. |
| FED-M2-13 | not-started | PR aggregate close, CI green, merge to main, close #461. Release tag `fed-v0.2.0-m2`. Mark deploy stream complete. Update mission manifest M2 row. | #461 | sonnet | feat/federation-m2-close | M2-12 | 3K | Same close pattern as M1-12; queue-guard before merge; tea release-create with notes including deploy-stream PRs. |
**M2 code workstream estimate:** ~72K tokens (vs MILESTONES.md 30K — same over-budget pattern as M1, where per-task breakdown including tests/review/docs catches the real cost).
**Deploy + code combined:** ~88K tokens.
## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3)

View File

@@ -523,3 +523,92 @@ Independent security review surfaced three high-impact and four medium findings;
- #8: confirm `packages/config/dist` not git-tracked
**Next mission step:** FED-M2 (Step-CA + grant schema + admin CLI). Per TASKS.md scope rule, M2 will be decomposed when it enters active planning. Issue #461 tracks scope.
## Session 20 — 2026-04-21 — FED-M2 kickoff
### Decisions
- **Workstream split**: parallel CODE (M2-01..M2-13, ~72K) + DEPLOY (DEPLOY-01..DEPLOY-05, ~16K) tracks; re-converge at M2-10 E2E.
- **Test hosts**: `mos-test-1.woltje.com` (querying side / Server A), `mos-test-2.woltje.com` (serving side / Server B). Wildcard `*.woltje.com` A→174.137.97.162 already exists; Traefik wildcard cert covers both subdomains. No DNS or cert work needed pre-deploy.
- **Portainer access**: requires `PORTAINER_INSECURE=1` flag added to mosaic wrappers (self-signed cert at `https://10.1.1.43:9443`). PR pending on `feat/mosaic-portainer-tls-flag`.
- **Image policy**: deploy by digest (immutable) per Mosaic policy. `gateway:fed-v0.1.0-m1` digest = `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`.
### DEPLOY-01 — image manifest verified
- Tag `fed-v0.1.0-m1` exists at `git.mosaicstack.dev/mosaicstack/stack/gateway`
- Digest: `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`
- 9 layers, ~530MB total
- Use this digest in DEPLOY-02 stack template (do NOT reference `:fed-v0.1.0-m1` tag in stack — pin to digest)
### Registry auth note
- Gitea container registry uses Bearer token flow (`/v2/token?service=container_registry&scope=repository:<repo>:pull`)
- Username: `jarvis` (NOT `mosaicstack`); password: `gitea.mosaicstack.token` from credentials.json
- Direct `Authorization: Bearer <pat>` does NOT work — must exchange PAT for registry token first
### Active PRs
- #483 — docs: M2 mission planning (TASKS decomposition + manifest update) — CI running
- (pending) `feat/mosaic-portainer-tls-flag` — wrapper PORTAINER_INSECURE flag (sonnet subagent in progress)
- (pending) `feat/federation-m2-schema` — FED-M2-01 DB schema migration (sonnet subagent in progress)
### MISSION-MANIFEST layout fix
- Initial M2 commit had Test Infrastructure block inserted by lint-staged prettier between "Last Updated" and "Parent Mission" — split mission frontmatter
- Fixed in 3d001fdb: moved Parent Mission back to frontmatter, kept Test Infrastructure as standalone H2 between Mission and Context
## Session 21 — 2026-04-21/22 — DEPLOY-02 merged, gateway image bug discovered, M2-01 in remediation
### PRs merged
- **#483** — docs(federation): M2 mission planning (TASKS decomposition + manifest update)
- **#484** — feat(mosaic-portainer): PORTAINER_INSECURE flag for self-signed TLS (wrapper sync to `~/.config/mosaic/tools/portainer/` done manually due to broken `mosaic upgrade` `set -o pipefail` on dash)
- **#485** — feat(deploy): portainer stack template `deploy/portainer/federated-test.stack.yml` for federation test instances [DEPLOY-02]
### Stack deployed (mos-test-1, mos-test-2)
- Both stacks created on Portainer endpoint 3 (`local` Swarm @ 10.1.1.43, the only endpoint with traefik-public + woltje.com wildcard cert)
- Swarm ID `l7z67tfpd4bvj4979ufpkyi50`
- Image pinned to digest `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`
- Traefik labels target `${HOST_FQDN}` per env
### CRITICAL FINDING — gateway image runtime-broken
- `docker run` against `gateway:fed-v0.1.0-m1` fails immediately:
`Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'dotenv' imported from /app/dist/main.js`
- Root cause: `docker/gateway.Dockerfile` copies `/app/node_modules` from builder — but pnpm puts deps in the content-addressed `.pnpm/` store with symlinks at `apps/gateway/node_modules/*`. The runner stage misses the symlinks → Node can't resolve workspace deps.
- M1 release was never runtime-tested as a stripped container; CI passed because tests run in dev tree where pnpm symlinks are intact.
- **Fix in flight** (subagent `a78a9ab0ddae91fbc`): switch builder to `pnpm --filter @mosaic/gateway --prod deploy /deploy`, then runner copies `/deploy/node_modules` + `/deploy/dist` + `/deploy/package.json`.
### M2-01 schema review verdict — NEEDS CHANGES
- PR #486 (`feat/federation-m2-schema`) — independent reviewer (sonnet) found 2 real issues:
1. `federation_audit_log` time-range indexes missing `.desc()` on `created_at` (3 places)
2. Reserved columns missing per TASKS.md M2-01 spec: `query_hash`, `outcome`, `bytes_out` (M4 will write; spec said reserve now)
- Also notes (advisory): subject_user_id correctly `text` (matches BetterAuth users.id; spec defect, not code defect); peer→grant cascade test not present (would be trivial to add)
- **Remediation in flight** (subagent `a673dd9355dc26f82` in worktree `agent-a4404ac1`): apply DESC + reserved cols, regenerate migration in place (preferred) or stack 0009 (fallback), force-push, post PR comment.
### Process notes
- Branch race incident: schema subagent + wrapper subagent both ran in main checkout → schema files appeared on wrapper branch. Recovered by TaskStop, `git checkout --` to clean, respawned schema subagent with `isolation: "worktree"`. **Rule going forward:** any subagent doing code edits gets `isolation: "worktree"` unless work is single-file and the orchestrator confirms no other branch will touch overlapping files.
- `pr-create.sh` shell-quotes backticks badly → use `tea pr create --repo mosaicstack/stack` directly (matches CLI-skill behavior). Will leave a followup to harden pr-create.sh.
- Gitea registry auth: bearer-token exchange flow (`/v2/token?service=container_registry&scope=repository:<repo>:pull`) — direct `Authorization: Bearer <pat>` returns 401.
- Portainer Swarm stack create endpoint: `POST /api/stacks/create/swarm/string?endpointId=<id>` (NOT `/api/stacks?type=1` — deprecated and rejected with 400).
### In-flight at compaction boundary
- Subagent `a78a9ab0ddae91fbc` — Dockerfile pnpm-deploy fix → PR (not yet opened at handoff)
- Subagent `a673dd9355dc26f82` — M2-01 schema remediation (DESC + reserved cols) → force-push to PR #486
- Both will trigger CI; orchestrator must independently re-review fixes (especially the security-adjacent schema work) per "always verify subagent claims" rule.
### Next after subagents return
1. Independent re-review of schema remediation (different subagent, fresh context)
2. Merge #486 if green
3. Merge Dockerfile fix PR if green → triggers Kaniko CI rebuild → capture new digest
4. Update `deploy/portainer/federated-test.stack.yml` to new digest in a small PR
5. Redeploy mos-test-1 + mos-test-2 (Portainer stack update via API)
6. Verify HTTPS reachability + `/health` endpoint at both hosts
7. DEPLOY-03/04 acceptance probes (`mosaic gateway doctor --json`, pgvector `vector(3)` round-trip)
8. DEPLOY-05: author `docs/federation/TEST-INFRA.md`
9. M2-02 (Step-CA sidecar) kicks off after image health is green

View File

@@ -1,75 +0,0 @@
CREATE TYPE "public"."grant_status" AS ENUM('active', 'revoked', 'expired');--> statement-breakpoint
CREATE TYPE "public"."peer_state" AS ENUM('pending', 'active', 'suspended', 'revoked');--> statement-breakpoint
CREATE TABLE "admin_tokens" (
"id" text PRIMARY KEY NOT NULL,
"user_id" text NOT NULL,
"token_hash" text NOT NULL,
"label" text NOT NULL,
"scope" text DEFAULT 'admin' NOT NULL,
"expires_at" timestamp with time zone,
"last_used_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL
);
--> statement-breakpoint
CREATE TABLE "federation_audit_log" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"request_id" text NOT NULL,
"peer_id" uuid,
"subject_user_id" text,
"grant_id" uuid,
"verb" text NOT NULL,
"resource" text NOT NULL,
"status_code" integer NOT NULL,
"result_count" integer,
"denied_reason" text,
"latency_ms" integer,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"query_hash" text,
"outcome" text,
"bytes_out" integer
);
--> statement-breakpoint
CREATE TABLE "federation_grants" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"subject_user_id" text NOT NULL,
"peer_id" uuid NOT NULL,
"scope" jsonb NOT NULL,
"status" "grant_status" DEFAULT 'active' NOT NULL,
"expires_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"revoked_at" timestamp with time zone,
"revoked_reason" text
);
--> statement-breakpoint
CREATE TABLE "federation_peers" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"common_name" text NOT NULL,
"display_name" text NOT NULL,
"cert_pem" text NOT NULL,
"cert_serial" text NOT NULL,
"cert_not_after" timestamp with time zone NOT NULL,
"client_key_pem" text,
"state" "peer_state" DEFAULT 'pending' NOT NULL,
"endpoint_url" text,
"last_seen_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"revoked_at" timestamp with time zone,
CONSTRAINT "federation_peers_common_name_unique" UNIQUE("common_name"),
CONSTRAINT "federation_peers_cert_serial_unique" UNIQUE("cert_serial")
);
--> statement-breakpoint
ALTER TABLE "admin_tokens" ADD CONSTRAINT "admin_tokens_user_id_users_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_peer_id_federation_peers_id_fk" FOREIGN KEY ("peer_id") REFERENCES "public"."federation_peers"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_subject_user_id_users_id_fk" FOREIGN KEY ("subject_user_id") REFERENCES "public"."users"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_grant_id_federation_grants_id_fk" FOREIGN KEY ("grant_id") REFERENCES "public"."federation_grants"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_grants" ADD CONSTRAINT "federation_grants_subject_user_id_users_id_fk" FOREIGN KEY ("subject_user_id") REFERENCES "public"."users"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_grants" ADD CONSTRAINT "federation_grants_peer_id_federation_peers_id_fk" FOREIGN KEY ("peer_id") REFERENCES "public"."federation_peers"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
CREATE INDEX "admin_tokens_user_id_idx" ON "admin_tokens" USING btree ("user_id");--> statement-breakpoint
CREATE UNIQUE INDEX "admin_tokens_hash_idx" ON "admin_tokens" USING btree ("token_hash");--> statement-breakpoint
CREATE INDEX "federation_audit_log_peer_created_at_idx" ON "federation_audit_log" USING btree ("peer_id","created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_audit_log_subject_created_at_idx" ON "federation_audit_log" USING btree ("subject_user_id","created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_audit_log_created_at_idx" ON "federation_audit_log" USING btree ("created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_grants_subject_status_idx" ON "federation_grants" USING btree ("subject_user_id","status");--> statement-breakpoint
CREATE INDEX "federation_grants_peer_status_idx" ON "federation_grants" USING btree ("peer_id","status");--> statement-breakpoint
CREATE INDEX "federation_peers_cert_serial_idx" ON "federation_peers" USING btree ("cert_serial");--> statement-breakpoint
CREATE INDEX "federation_peers_state_idx" ON "federation_peers" USING btree ("state");

File diff suppressed because it is too large Load Diff

View File

@@ -57,13 +57,6 @@
"when": 1774227064500,
"tag": "0006_swift_shen",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1776822435828,
"tag": "0008_smart_lyja",
"breakpoints": true
}
]
}

View File

@@ -1,424 +0,0 @@
/**
* FED-M2-01 — Integration test: federation DB schema (peers / grants / audit_log).
*
* Prereq: docker compose -f docker-compose.federated.yml --profile federated up -d
* (or any postgres with the mosaic schema already applied)
* Run: FEDERATED_INTEGRATION=1 pnpm --filter @mosaicstack/db test src/federation.integration.test.ts
*
* Skipped when FEDERATED_INTEGRATION !== '1'.
*
* Strategy:
* - Applies the federation migration SQL directly (idempotent: CREATE TYPE/TABLE
* with IF NOT EXISTS guards applied via inline SQL before the migration DDL).
* - Assumes the base schema (users table etc.) already exists in the target DB.
* - All test rows use the `fed-m2-01-` prefix; cleanup in afterAll.
*
* Coverage:
* 1. Federation tables + enums apply cleanly against the existing schema.
* 2. Insert a sample user + peer + grant + audit row; verify round-trip.
* 3. FK cascade: deleting the user cascades to federation_grants.
* 4. FK set-null: deleting the peer sets federation_audit_log.peer_id to NULL.
* 5. Enum constraint: inserting an invalid status/state value throws a DB error.
* 6. Unique constraint: duplicate cert_serial throws a DB error.
*/
import postgres from 'postgres';
import { afterAll, beforeAll, describe, expect, it } from 'vitest';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const PG_URL = process.env['DATABASE_URL'] ?? 'postgresql://mosaic:mosaic@localhost:5433/mosaic';
/** Recognisable test-row prefix for safe cleanup without full-table truncation. */
const T = 'fed-m2-01';
// Deterministic IDs (UUID format required for uuid PK columns: 8-4-4-4-12 hex digits).
const PEER1_ID = `f2000001-0000-4000-8000-000000000001`;
const PEER2_ID = `f2000002-0000-4000-8000-000000000002`;
const USER1_ID = `${T}-user-1`;
let sql: ReturnType<typeof postgres> | undefined;
beforeAll(async () => {
if (!run) return;
sql = postgres(PG_URL, { max: 1, connect_timeout: 10, idle_timeout: 10 });
// Apply the federation enums and tables idempotently.
// This mirrors the migration file but uses IF NOT EXISTS guards so it can run
// against a DB that may not have had drizzle migrations tracked.
await sql`
DO $$ BEGIN
CREATE TYPE peer_state AS ENUM ('pending', 'active', 'suspended', 'revoked');
EXCEPTION WHEN duplicate_object THEN NULL;
END $$
`;
await sql`
DO $$ BEGIN
CREATE TYPE grant_status AS ENUM ('active', 'revoked', 'expired');
EXCEPTION WHEN duplicate_object THEN NULL;
END $$
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_peers (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
common_name text NOT NULL,
display_name text NOT NULL,
cert_pem text NOT NULL,
cert_serial text NOT NULL,
cert_not_after timestamp with time zone NOT NULL,
client_key_pem text,
state peer_state NOT NULL DEFAULT 'pending',
endpoint_url text,
last_seen_at timestamp with time zone,
created_at timestamp with time zone NOT NULL DEFAULT now(),
revoked_at timestamp with time zone,
CONSTRAINT federation_peers_common_name_unique UNIQUE (common_name),
CONSTRAINT federation_peers_cert_serial_unique UNIQUE (cert_serial)
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_peers_cert_serial_idx ON federation_peers (cert_serial)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_peers_state_idx ON federation_peers (state)
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_grants (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
subject_user_id text NOT NULL REFERENCES users(id) ON DELETE CASCADE,
peer_id uuid NOT NULL REFERENCES federation_peers(id) ON DELETE CASCADE,
scope jsonb NOT NULL,
status grant_status NOT NULL DEFAULT 'active',
expires_at timestamp with time zone,
created_at timestamp with time zone NOT NULL DEFAULT now(),
revoked_at timestamp with time zone,
revoked_reason text
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_grants_subject_status_idx ON federation_grants (subject_user_id, status)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_grants_peer_status_idx ON federation_grants (peer_id, status)
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_audit_log (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
request_id text NOT NULL,
peer_id uuid REFERENCES federation_peers(id) ON DELETE SET NULL,
subject_user_id text REFERENCES users(id) ON DELETE SET NULL,
grant_id uuid REFERENCES federation_grants(id) ON DELETE SET NULL,
verb text NOT NULL,
resource text NOT NULL,
status_code integer NOT NULL,
result_count integer,
denied_reason text,
latency_ms integer,
created_at timestamp with time zone NOT NULL DEFAULT now(),
query_hash text,
outcome text,
bytes_out integer
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_peer_created_at_idx
ON federation_audit_log (peer_id, created_at DESC NULLS LAST)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_subject_created_at_idx
ON federation_audit_log (subject_user_id, created_at DESC NULLS LAST)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_created_at_idx
ON federation_audit_log (created_at DESC NULLS LAST)
`;
});
afterAll(async () => {
if (!sql) return;
// Cleanup in FK-safe order (children before parents).
await sql`DELETE FROM federation_audit_log WHERE request_id LIKE ${T + '%'}`.catch(() => {});
await sql`
DELETE FROM federation_grants
WHERE subject_user_id LIKE ${T + '%'}
OR revoked_reason LIKE ${T + '%'}
`.catch(() => {});
await sql`DELETE FROM federation_peers WHERE common_name LIKE ${T + '%'}`.catch(() => {});
await sql`DELETE FROM users WHERE id LIKE ${T + '%'}`.catch(() => {});
await sql.end({ timeout: 3 }).catch(() => {});
});
describe.skipIf(!run)('federation schema — integration', () => {
// ── 1. Insert sample rows ──────────────────────────────────────────────────
it('inserts a user, peer, grant, and audit row without constraint violation', async () => {
const certPem = '-----BEGIN CERTIFICATE-----\nMIItest\n-----END CERTIFICATE-----';
// User — BetterAuth users.id is text (any string, not uuid).
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${USER1_ID}, ${'M2-01 Test User'}, ${USER1_ID + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
// Peer
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER1_ID},
${T + '-gateway-example-com'},
${'Test Peer'},
${certPem},
${T + '-serial-001'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
// Grant — scope is jsonb; pass as JSON string and cast server-side.
const scopeJson = JSON.stringify({
resources: ['tasks', 'notes'],
operations: ['list', 'get'],
});
const grants = await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${USER1_ID},
${PEER1_ID},
${scopeJson}::jsonb,
${'active'},
now()
)
RETURNING id
`;
expect(grants).toHaveLength(1);
const grantId = grants[0]!['id'] as string;
// Audit log row
await sql!`
INSERT INTO federation_audit_log
(request_id, peer_id, subject_user_id, grant_id, verb, resource, status_code, created_at)
VALUES (
${T + '-req-001'},
${PEER1_ID},
${USER1_ID},
${grantId},
${'list'},
${'tasks'},
${200},
now()
)
`;
// Verify the audit row is present and has correct data.
const auditRows = await sql!`
SELECT * FROM federation_audit_log WHERE request_id = ${T + '-req-001'}
`;
expect(auditRows).toHaveLength(1);
expect(auditRows[0]!['status_code']).toBe(200);
expect(auditRows[0]!['verb']).toBe('list');
expect(auditRows[0]!['resource']).toBe('tasks');
}, 30_000);
// ── 2. FK cascade: user delete cascades grants ─────────────────────────────
it('cascade-deletes federation_grants when the subject user is deleted', async () => {
const cascadeUserId = `${T}-cascade-user`;
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${cascadeUserId}, ${'Cascade User'}, ${cascadeUserId + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, revoked_reason, created_at)
VALUES (
${cascadeUserId},
${PEER1_ID},
${scopeJson}::jsonb,
${'active'},
${T + '-cascade-test'},
now()
)
`;
const before = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE subject_user_id = ${cascadeUserId}
`;
expect(before[0]!['cnt']).toBe(1);
// Delete user → grants should cascade-delete.
await sql!`DELETE FROM users WHERE id = ${cascadeUserId}`;
const after = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE subject_user_id = ${cascadeUserId}
`;
expect(after[0]!['cnt']).toBe(0);
}, 15_000);
// ── 3. FK set-null: peer delete sets audit_log.peer_id to NULL ────────────
it('sets federation_audit_log.peer_id to NULL when the peer is deleted', async () => {
// Insert a throwaway peer for this specific cascade test.
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER2_ID},
${T + '-gateway-throwaway-com'},
${'Throwaway Peer'},
${'cert-pem-placeholder'},
${T + '-serial-002'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
const reqId = `${T}-req-setnull`;
await sql!`
INSERT INTO federation_audit_log
(request_id, peer_id, subject_user_id, verb, resource, status_code, created_at)
VALUES (
${reqId},
${PEER2_ID},
${USER1_ID},
${'get'},
${'tasks'},
${200},
now()
)
`;
await sql!`DELETE FROM federation_peers WHERE id = ${PEER2_ID}`;
const rows = await sql!`
SELECT peer_id FROM federation_audit_log WHERE request_id = ${reqId}
`;
expect(rows).toHaveLength(1);
expect(rows[0]!['peer_id']).toBeNull();
}, 15_000);
// ── 4. Enum constraint: invalid grant_status rejected ─────────────────────
it('rejects an invalid grant_status value with a DB error', async () => {
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await expect(
sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${USER1_ID},
${PEER1_ID},
${scopeJson}::jsonb,
${'invalid_status'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 5. Enum constraint: invalid peer_state rejected ───────────────────────
it('rejects an invalid peer_state value with a DB error', async () => {
await expect(
sql!`
INSERT INTO federation_peers
(common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${'bad-state-peer'},
${'Bad State'},
${'pem'},
${'bad-serial-999'},
now() + interval '1 year',
${'invalid_state'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 6. Unique constraint: duplicate cert_serial rejected ──────────────────
it('rejects a duplicate cert_serial with a unique constraint violation', async () => {
await expect(
sql!`
INSERT INTO federation_peers
(common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${T + '-dup-cn'},
${'Dup Peer'},
${'pem'},
${T + '-serial-001'},
now() + interval '1 year',
${'pending'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 7. FK cascade: peer delete cascades to federation_grants ─────────────
it('cascade-deletes federation_grants when the owning peer is deleted', async () => {
const PEER3_ID = `f2000003-0000-4000-8000-000000000003`;
const cascadeGrantUserId = `${T}-cascade-grant-user`;
// Insert a dedicated user and peer for this test.
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${cascadeGrantUserId}, ${'Cascade Grant User'}, ${cascadeGrantUserId + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER3_ID},
${T + '-gateway-cascade-peer'},
${'Cascade Peer'},
${'cert-pem-cascade'},
${T + '-serial-003'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${cascadeGrantUserId},
${PEER3_ID},
${scopeJson}::jsonb,
${'active'},
now()
)
`;
const before = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE peer_id = ${PEER3_ID}
`;
expect(before[0]!['cnt']).toBe(1);
// Delete peer → grants should cascade-delete.
await sql!`DELETE FROM federation_peers WHERE id = ${PEER3_ID}`;
const after = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE peer_id = ${PEER3_ID}
`;
expect(after[0]!['cnt']).toBe(0);
// Cleanup
await sql!`DELETE FROM users WHERE id = ${cascadeGrantUserId}`.catch(() => {});
}, 15_000);
});

View File

@@ -1,20 +0,0 @@
/**
* Federation schema re-exports.
*
* The actual table and enum definitions live in schema.ts (alongside all other
* Drizzle tables) to avoid CJS/ESM cross-import issues when drizzle-kit loads
* schema files via esbuild-register. Application code that wants named imports
* for federation symbols should import from this file.
*
* M2-01: DB tables and enums only. No business logic.
* M2-03 will add JSON schema validation for the `scope` column.
* M4 will write rows to federation_audit_log.
*/
export {
peerStateEnum,
grantStatusEnum,
federationPeers,
federationGrants,
federationAuditLog,
} from './schema.js';

View File

@@ -2,7 +2,6 @@ export { createDb, type Db, type DbHandle } from './client.js';
export { createPgliteDb } from './client-pglite.js';
export { runMigrations } from './migrate.js';
export * from './schema.js';
export * from './federation.js';
export {
eq,
and,

View File

@@ -5,7 +5,6 @@
import {
pgTable,
pgEnum,
text,
timestamp,
boolean,
@@ -586,194 +585,3 @@ export const summarizationJobs = pgTable(
},
(t) => [index('summarization_jobs_status_idx').on(t.status)],
);
// ─── Federation ──────────────────────────────────────────────────────────────
// Enums declared before tables that reference them.
// All federation definitions live in this file (avoids CJS/ESM cross-import
// issues when drizzle-kit loads schema files via esbuild-register).
// Application code imports from `federation.ts` which re-exports from here.
/**
* Lifecycle state of a federation peer.
* - pending: registered but not yet approved / TLS handshake not confirmed
* - active: fully operational; mTLS verified
* - suspended: temporarily blocked; cert still valid
* - revoked: cert revoked; no traffic allowed
*/
export const peerStateEnum = pgEnum('peer_state', ['pending', 'active', 'suspended', 'revoked']);
/**
* Lifecycle state of a federation grant.
* - active: grant is in effect
* - revoked: manually revoked before expiry
* - expired: natural expiry (expires_at passed)
*/
export const grantStatusEnum = pgEnum('grant_status', ['active', 'revoked', 'expired']);
/**
* A registered peer gateway identified by its Step-CA certificate CN.
* Represents both inbound peers (other gateways querying us) and outbound
* peers (gateways we query — identified by client_key_pem being set).
*/
export const federationPeers = pgTable(
'federation_peers',
{
id: uuid('id').primaryKey().defaultRandom(),
/** Certificate CN, e.g. "gateway-uscllc-com". Unique — one row per peer identity. */
commonName: text('common_name').notNull().unique(),
/** Human-friendly label shown in admin UI. */
displayName: text('display_name').notNull(),
/** Pinned PEM certificate used for mTLS verification. */
certPem: text('cert_pem').notNull(),
/** Certificate serial number — used for CRL / revocation lookup. */
certSerial: text('cert_serial').notNull().unique(),
/** Certificate expiry — used by the renewal scheduler (FED-M6). */
certNotAfter: timestamp('cert_not_after', { withTimezone: true }).notNull(),
/**
* Sealed (encrypted) private key for outbound connections TO this peer.
* NULL for inbound-only peer rows (we serve them; we don't call them).
*/
clientKeyPem: text('client_key_pem'),
/** Current peer lifecycle state. */
state: peerStateEnum('state').notNull().default('pending'),
/** Base URL for outbound queries, e.g. "https://woltje.com:443". NULL for inbound-only peers. */
endpointUrl: text('endpoint_url'),
/** Timestamp of the most recent successful inbound or outbound request. */
lastSeenAt: timestamp('last_seen_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
/** Populated when the cert is revoked; NULL while the peer is active. */
revokedAt: timestamp('revoked_at', { withTimezone: true }),
},
(t) => [
// CRL / revocation lookups by serial.
index('federation_peers_cert_serial_idx').on(t.certSerial),
// Filter peers by state (e.g. find all active peers for outbound routing).
index('federation_peers_state_idx').on(t.state),
],
);
/**
* A grant lets a specific peer cert query a specific local user's data within
* a defined scope. Scopes are validated by JSON Schema in M2-03; this table
* stores them as raw jsonb.
*/
export const federationGrants = pgTable(
'federation_grants',
{
id: uuid('id').primaryKey().defaultRandom(),
/**
* The local user whose data this grant exposes.
* Cascade delete: if the user account is deleted, revoke all their grants.
*/
subjectUserId: text('subject_user_id')
.notNull()
.references(() => users.id, { onDelete: 'cascade' }),
/**
* The peer gateway holding the grant.
* Cascade delete: if the peer record is removed, the grant is moot.
*/
peerId: uuid('peer_id')
.notNull()
.references(() => federationPeers.id, { onDelete: 'cascade' }),
/**
* Scope object — validated by JSON Schema (M2-03).
* Example: { "resources": ["tasks", "notes"], "operations": ["list", "get"] }
*/
scope: jsonb('scope').notNull(),
/** Current grant lifecycle state. */
status: grantStatusEnum('status').notNull().default('active'),
/** Optional hard expiry. NULL means the grant does not expire automatically. */
expiresAt: timestamp('expires_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
/** Populated when the grant is explicitly revoked. */
revokedAt: timestamp('revoked_at', { withTimezone: true }),
/** Human-readable reason for revocation (audit trail). */
revokedReason: text('revoked_reason'),
},
(t) => [
// Hot path: look up active grants for a subject user (auth middleware).
index('federation_grants_subject_status_idx').on(t.subjectUserId, t.status),
// Hot path: look up active grants held by a peer (inbound request check).
index('federation_grants_peer_status_idx').on(t.peerId, t.status),
],
);
/**
* Append-only audit log of all federation requests.
* M4 writes rows here. M2 only creates the table.
*
* All FKs use SET NULL so audit rows survive peer/user/grant deletion.
*/
export const federationAuditLog = pgTable(
'federation_audit_log',
{
id: uuid('id').primaryKey().defaultRandom(),
/** UUIDv7 from the X-Request-ID header — correlates with OTEL traces. */
requestId: text('request_id').notNull(),
/** Peer that made the request. SET NULL if the peer is later deleted. */
peerId: uuid('peer_id').references(() => federationPeers.id, { onDelete: 'set null' }),
/** Subject user whose data was queried. SET NULL if the user is deleted. */
subjectUserId: text('subject_user_id').references(() => users.id, { onDelete: 'set null' }),
/** Grant under which the request was authorised. SET NULL if the grant is deleted. */
grantId: uuid('grant_id').references(() => federationGrants.id, { onDelete: 'set null' }),
/** Request verb: "list" | "get" | "search". */
verb: text('verb').notNull(),
/** Resource type: "tasks" | "notes" | "memory" | etc. */
resource: text('resource').notNull(),
/** HTTP status code returned to the peer. */
statusCode: integer('status_code').notNull(),
/** Number of items returned (NULL for non-list requests or errors). */
resultCount: integer('result_count'),
/** Why the request was denied (NULL when allowed). */
deniedReason: text('denied_reason'),
/** End-to-end latency in milliseconds. */
latencyMs: integer('latency_ms'),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
// Reserved for M4 — see PRD 7.3
/** SHA-256 of the normalised GraphQL/REST query string; written by M4 search. */
queryHash: text('query_hash'),
/** Request outcome: "allowed" | "denied" | "partial"; written by M4. */
outcome: text('outcome'),
/** Response payload size in bytes; written by M4. */
bytesOut: integer('bytes_out'),
},
(t) => [
// Per-peer request history in reverse chronological order.
index('federation_audit_log_peer_created_at_idx').on(t.peerId, t.createdAt.desc()),
// Per-user access log in reverse chronological order.
index('federation_audit_log_subject_created_at_idx').on(t.subjectUserId, t.createdAt.desc()),
// Global time-range scans (dashboards, rate-limit windows).
index('federation_audit_log_created_at_idx').on(t.createdAt.desc()),
],
);

View File

@@ -13,6 +13,14 @@ export PORTAINER_URL="https://portainer.example.com:9443"
export PORTAINER_API_KEY="your-api-key-here"
```
If your Portainer instance uses a self-signed TLS certificate (e.g. internal LAN), set:
```bash
export PORTAINER_INSECURE=1
```
This passes `-k` to all curl calls, bypassing certificate verification. Do not set this against public/production instances.
You can add these to your shell profile (`~/.bashrc`, `~/.zshrc`) or use a `.env` file.
### Creating an API Key

View File

@@ -46,8 +46,14 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Fetch endpoints
response=$(curl -s -w "\n%{http_code}" \
response=$(curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints")

View File

@@ -52,8 +52,14 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Fetch stacks
response=$(curl -s -w "\n%{http_code}" \
response=$(curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/stacks")

View File

@@ -64,12 +64,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}
@@ -165,7 +171,7 @@ fi
# Note: Docker API returns raw log stream, not JSON
if [[ "$FOLLOW" == "true" ]]; then
# Stream logs
curl -s -N \
curl -s "${CURL_OPTS[@]}" -N \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints/${ENDPOINT_ID}/docker/containers/${CONTAINER_ID}/logs?${params}" | \
# Docker log format has 8-byte header per line, strip it
@@ -175,7 +181,7 @@ if [[ "$FOLLOW" == "true" ]]; then
done
else
# Get logs (non-streaming)
curl -s \
curl -s "${CURL_OPTS[@]}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints/${ENDPOINT_ID}/docker/containers/${CONTAINER_ID}/logs?${params}" | \
# Docker log format has 8-byte header per line, attempt to strip it

View File

@@ -63,13 +63,19 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
local data="${3:-}"
local args=(-s -w "\n%{http_code}" -X "$method" -H "X-API-Key: ${PORTAINER_API_KEY}")
local args=(-s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" -H "X-API-Key: ${PORTAINER_API_KEY}")
if [[ -n "$data" ]]; then
args+=(-H "Content-Type: application/json" -d "$data")

View File

@@ -54,12 +54,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}

View File

@@ -57,12 +57,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}

View File

@@ -54,12 +54,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}