Compare commits

...

49 Commits

Author SHA1 Message Date
Jarvis
08bea8fba0 test(federation): fix BETTER_AUTH_SECRET requirement in M2 integration tests
Some checks failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/push/ci Pipeline failed
Ensure BETTER_AUTH_SECRET is set before seal() is called in test #5, add
it to the DB-only run command in the header comment, and surface cleanup
errors to stderr instead of silently swallowing them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 00:06:56 -05:00
Jarvis
17f1423318 test(federation): M2 integration tests (grant CRUD, enrollment, replay, at-rest encryption)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 23:52:32 -05:00
74fe60d8d6 feat(federation): admin controller + CLI federation commands (FED-M2-08) (#498)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 04:39:46 +00:00
0bfaa56e9e feat(federation): enrollment controller + single-use token flow (FED-M2-07) (#497)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 04:23:19 +00:00
01dd6b9fa1 feat(federation): grants service CRUD + status transitions (FED-M2-06) (#496)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 03:57:12 +00:00
1038ae76e1 feat(federation): Step-CA client service for grant certs (FED-M2-04) (#494)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 03:34:37 +00:00
bf082d95a0 feat(federation): seal federation peer client keys at rest (FED-M2-05) (#495)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 03:10:20 +00:00
bb24292cf7 fix(federation): healthcheck + restart policy for federated-test stacks (#492)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 02:56:40 +00:00
f2cda52e1a fix(deploy): bump gateway image digest to sha-9f1a081 [DEPLOY-IMG-FIX] (#491)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-22 02:35:19 +00:00
7d7cf012f0 feat(federation): scope schema validator [FED-M2-03] (#489)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline failed
2026-04-22 02:31:13 +00:00
c56dda74aa feat(federation): Step-CA sidecar in federated compose [FED-M2-02] (#490)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-22 02:21:49 +00:00
9f1a08185e docs(federation): S21 tracking — DEPLOY-01/02 done, IMG-FIX in flight, M2-01 in remediation (#487)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 02:02:36 +00:00
d2e408656b fix(docker): pnpm deploy for self-contained gateway runtime image (#488)
Some checks failed
ci/woodpecker/push/publish Pipeline failed
ci/woodpecker/push/ci Pipeline failed
2026-04-22 02:02:29 +00:00
54c278b871 feat(db): federation schema — grants/peers/audit_log [FED-M2-01] (#486)
Some checks failed
ci/woodpecker/push/publish Pipeline failed
ci/woodpecker/push/ci Pipeline failed
2026-04-22 02:02:21 +00:00
4dbd429203 feat(deploy): portainer stack template for federation test instances [DEPLOY-02] (#485)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-22 01:34:44 +00:00
b985d7bfe2 docs(federation): M2 mission planning — TASKS decomposition + manifest update (#483)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-22 01:24:00 +00:00
45e8f02c91 feat(mosaic-portainer): PORTAINER_INSECURE flag for self-signed TLS (#484)
Some checks failed
ci/woodpecker/push/publish Pipeline failed
ci/woodpecker/push/ci Pipeline failed
2026-04-22 01:21:54 +00:00
54c422ab06 Merge pull request 'docs(federation): close FED-M1 milestone' (#481) from feat/federation-m1-close into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/tag/publish Pipeline was successful
2026-04-20 02:20:43 +00:00
Jarvis
b9fb8aab57 docs(federation): close FED-M1 milestone
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- TASKS.md: mark FED-M1-12 done with PR/issue/tag references
- MISSION-MANIFEST.md: phase=M1 complete, progress 1/7, M1 row done with PR range #470-#481, session log appended
- scratchpad: Session 19 entry covering M1-09 → M1-12 with PR ledger and M1 retrospective learnings

Refs #460
2026-04-19 21:12:52 -05:00
78841f228a docs(federation): operator setup + migration guides (FED-M1-11) (#480)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 02:07:15 +00:00
dc4afee848 fix(storage): redact credentials in driver errors + advisory lock (FED-M1-10) (#479)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline failed
2026-04-20 02:02:57 +00:00
1e2b8ac8de test(federation): standalone regression canary — no breakage from M1 (FED-M1-09) (#478)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 01:46:35 +00:00
15d849c166 test(storage): integration test for migrate-tier (FED-M1-08) + camelCase column fix (#477)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-20 01:40:02 +00:00
78251d4af8 test(federation): integration tests for federated tier gateway boot (FED-M1-07) (#476)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 01:13:10 +00:00
1a4b1ebbf1 feat(gateway,storage): mosaic gateway doctor with tier health JSON (FED-M1-06) (#475)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 01:00:39 +00:00
ccad30dd27 feat(storage): mosaic storage migrate-tier with dry-run + idempotency (FED-M1-05) (#474)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 00:35:08 +00:00
4c2b177eab feat(gateway): tier-detector with fail-fast PG/Valkey/pgvector probes (FED-M1-04) (#473)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 00:07:07 +00:00
58169f9979 feat(storage): pgvector adapter support gated on tier=federated (FED-M1-03) (#472)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-19 23:42:18 +00:00
51402bdb6d feat(infra): docker-compose.federated.yml overlay (FED-M1-02) (#471)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-19 23:21:31 +00:00
9c89c32684 feat(config): add federated tier + rename team→standalone (FED-M1-01) (#470)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-19 23:11:11 +00:00
8aabb8c5b2 docs(mission): author MVP rollup manifest, archive install-ux-v2 (#469)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-19 22:51:11 +00:00
66512550df docs(federation): PRD, milestones, mission manifest, and M1 task breakdown (#468)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-19 22:27:09 +00:00
46dd799548 docs(federation): PRD, milestones, mission manifest, and M1 task breakdown (#467)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-19 22:09:20 +00:00
5f03c05523 chore(release): @mosaicstack/mosaic 0.0.30 (#459)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-12 02:18:17 +00:00
c3f810bbd1 fix(mosaic): seed TOOLS.md from defaults on install (#458)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-12 02:02:21 +00:00
b2cbf898d7 docs(scratchpad): finalize yolo runtime hotfix evidence (#456)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Follow-up to mosaicstack/stack#455.

Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-11 17:14:00 +00:00
b2cec8c6ba fix(mosaic): stop yolo runtime from leaking runtime name as first user message (#455)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Fixes mosaicstack/stack#454

Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-11 16:57:43 +00:00
81c1775a03 chore(release): @mosaicstack/mosaic 0.0.29 (#453)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/tag/publish Pipeline failed
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-08 00:42:54 +00:00
f64ec12f39 fix(installer): preserve credentials dir and seed STANDARDS.md (#452)
Some checks failed
ci/woodpecker/push/publish Pipeline failed
ci/woodpecker/push/ci Pipeline failed
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-08 00:40:49 +00:00
026382325c feat(framework): superpowers enforcement, typecheck hook, file-ownership rules (#451)
All checks were successful
ci/woodpecker/manual/ci Pipeline was successful
ci/woodpecker/manual/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-07 00:44:22 +00:00
1bfd8570d6 chore(release): @mosaicstack/mosaic 0.0.28 (#450) 2026-04-06 00:46:31 +00:00
312acd8bad chore: sweep mosaicstack/mosaic-stack → mosaicstack/stack + add short install URL (#448) 2026-04-06 00:39:56 +00:00
d08b969918 fix(mosaic): mask password input in TUI login prompt (#449) 2026-04-06 00:33:54 +00:00
051de0d8a9 docs: update README for mosaicstack/stack repo rename (#447) 2026-04-06 00:22:20 +00:00
bd76df1a50 feat(mosaic): drill-down main menu + provider-first flow + quick start (#446) 2026-04-06 00:15:23 +00:00
62b2ce2da1 docs: orchestrator close-out IUV-M02 (#445)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 23:50:55 +00:00
172bacb30f feat(mosaic): IUV-M02 — CORS/FQDN UX polish + skill installer rework (#444)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-05 23:44:07 +00:00
43667d7349 docs: orchestrator close-out IUV-M01 — mark tasks done, append session 2 (#443)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 22:40:08 +00:00
783884376c docs: mark IUV-M01 complete — mosaic-v0.0.26 released (#436) (#442)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-05 22:31:37 +00:00
149 changed files with 24872 additions and 358 deletions

3
.gitignore vendored
View File

@@ -9,3 +9,6 @@ coverage
*.tsbuildinfo
.pnpm-store
docs/reports/
# Step-CA dev password — real file is gitignored; commit only the .example
infra/step-ca/dev-password

View File

@@ -103,12 +103,12 @@ steps:
- mkdir -p /kaniko/.docker
- echo "{\"auths\":{\"git.mosaicstack.dev\":{\"username\":\"$REGISTRY_USER\",\"password\":\"$REGISTRY_PASS\"}}}" > /kaniko/.docker/config.json
- |
DESTINATIONS="--destination git.mosaicstack.dev/mosaicstack/mosaic-stack/gateway:sha-${CI_COMMIT_SHA:0:7}"
DESTINATIONS="--destination git.mosaicstack.dev/mosaicstack/stack/gateway:sha-${CI_COMMIT_SHA:0:7}"
if [ "$CI_COMMIT_BRANCH" = "main" ]; then
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/mosaic-stack/gateway:latest"
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/stack/gateway:latest"
fi
if [ -n "$CI_COMMIT_TAG" ]; then
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/mosaic-stack/gateway:$CI_COMMIT_TAG"
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/stack/gateway:$CI_COMMIT_TAG"
fi
/kaniko/executor --context . --dockerfile docker/gateway.Dockerfile $DESTINATIONS
depends_on:
@@ -128,12 +128,12 @@ steps:
- mkdir -p /kaniko/.docker
- echo "{\"auths\":{\"git.mosaicstack.dev\":{\"username\":\"$REGISTRY_USER\",\"password\":\"$REGISTRY_PASS\"}}}" > /kaniko/.docker/config.json
- |
DESTINATIONS="--destination git.mosaicstack.dev/mosaicstack/mosaic-stack/web:sha-${CI_COMMIT_SHA:0:7}"
DESTINATIONS="--destination git.mosaicstack.dev/mosaicstack/stack/web:sha-${CI_COMMIT_SHA:0:7}"
if [ "$CI_COMMIT_BRANCH" = "main" ]; then
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/mosaic-stack/web:latest"
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/stack/web:latest"
fi
if [ -n "$CI_COMMIT_TAG" ]; then
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/mosaic-stack/web:$CI_COMMIT_TAG"
DESTINATIONS="$DESTINATIONS --destination git.mosaicstack.dev/mosaicstack/stack/web:$CI_COMMIT_TAG"
fi
/kaniko/executor --context . --dockerfile docker/web.Dockerfile $DESTINATIONS
depends_on:

View File

@@ -58,14 +58,14 @@ pnpm typecheck && pnpm lint && pnpm format:check # Quality gates
The `agent` column specifies the required model for each task. **This is set at task creation by the orchestrator and must not be changed by workers.**
| Value | When to use | Budget |
| -------- | ----------------------------------------------------------- | -------------------------- |
| `codex` | All coding tasks (default for implementation) | OpenAI credits — preferred |
| `glm-5` | Cost-sensitive coding where Codex is unavailable | Z.ai credits |
| `haiku` | Review gates, verify tasks, status checks, docs-only | Cheapest Claude tier |
| `sonnet` | Complex planning, multi-file reasoning, architecture review | Claude quota |
| `opus` | Major cross-cutting architecture decisions ONLY | Most expensive — minimize |
| `—` | No preference / auto-select cheapest capable | Pipeline decides |
| Value | When to use | Budget |
| --------- | ----------------------------------------------------------- | -------------------------- |
| `codex` | All coding tasks (default for implementation) | OpenAI credits — preferred |
| `glm-5.1` | Cost-sensitive coding where Codex is unavailable | Z.ai credits |
| `haiku` | Review gates, verify tasks, status checks, docs-only | Cheapest Claude tier |
| `sonnet` | Complex planning, multi-file reasoning, architecture review | Claude quota |
| `opus` | Major cross-cutting architecture decisions ONLY | Most expensive — minimize |
| `—` | No preference / auto-select cheapest capable | Pipeline decides |
Pipeline crons read this column and spawn accordingly. Workers never modify `docs/TASKS.md` — only the orchestrator writes it.

View File

@@ -7,7 +7,13 @@ Mosaic gives you a unified launcher for Claude Code, Codex, OpenCode, and Pi —
## Quick Install
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/mosaic-stack/raw/branch/main/tools/install.sh)
curl -fsSL https://mosaicstack.dev/install.sh | bash
```
Or use the direct URL:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/stack/raw/branch/main/tools/install.sh)
```
The installer auto-launches the setup wizard, which walks you through gateway install and verification. Flags for non-interactive use:
@@ -74,6 +80,8 @@ If you already have a gateway account but no token, use `mosaic gateway config r
### Configuration
Mosaic supports three storage tiers: `local` (PGlite, single-host), `standalone` (PostgreSQL, single-host), and `federated` (PostgreSQL + pgvector + Valkey, multi-host). See [Federated Tier Setup](docs/federation/SETUP.md) for multi-user and production deployments, or [Migrating to Federated](docs/guides/migrate-tier.md) to upgrade from existing tiers.
```bash
mosaic config show # Print full config as JSON
mosaic config get <key> # Read a specific key
@@ -179,8 +187,8 @@ Consent state is persisted in config. Remote upload is a no-op until you run `mo
### Setup
```bash
git clone git@git.mosaicstack.dev:mosaicstack/mosaic-stack.git
cd mosaic-stack
git clone git@git.mosaicstack.dev:mosaicstack/stack.git
cd stack
# Start infrastructure (Postgres, Valkey, Jaeger)
docker compose up -d
@@ -229,7 +237,7 @@ npm packages are published to the Gitea package registry on main merges.
## Architecture
```
mosaic-stack/
stack/
├── apps/
│ ├── gateway/ NestJS API + WebSocket hub (Fastify, Socket.IO, OTEL)
│ └── web/ Next.js dashboard (React 19, Tailwind)
@@ -302,7 +310,13 @@ Each stage has a dispatch mode (`exec` for research/review, `yolo` for coding),
Run the installer again — it handles upgrades automatically:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/mosaic-stack/raw/branch/main/tools/install.sh)
curl -fsSL https://mosaicstack.dev/install.sh | bash
```
Or use the direct URL:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/stack/raw/branch/main/tools/install.sh)
```
Or use the CLI:

View File

@@ -3,7 +3,7 @@
"version": "0.0.6",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "apps/gateway"
},
"type": "module",
@@ -56,6 +56,7 @@
"@opentelemetry/sdk-metrics": "^2.6.0",
"@opentelemetry/sdk-node": "^0.213.0",
"@opentelemetry/semantic-conventions": "^1.40.0",
"@peculiar/x509": "^2.0.0",
"@sinclair/typebox": "^0.34.48",
"better-auth": "^1.5.5",
"bullmq": "^5.71.0",
@@ -63,8 +64,11 @@
"class-validator": "^0.15.1",
"dotenv": "^17.3.1",
"fastify": "^5.0.0",
"ioredis": "^5.10.0",
"jose": "^6.2.2",
"node-cron": "^4.2.1",
"openai": "^6.32.0",
"postgres": "^3.4.8",
"reflect-metadata": "^0.2.0",
"rxjs": "^7.8.0",
"socket.io": "^4.8.0",

View File

@@ -0,0 +1,64 @@
/**
* Test B — Gateway boot refuses (fail-fast) when PG is unreachable.
*
* Prereq: docker compose -f docker-compose.federated.yml --profile federated up -d
* (Valkey must be running; only PG is intentionally misconfigured.)
* Run: FEDERATED_INTEGRATION=1 pnpm --filter @mosaicstack/gateway test src/__tests__/integration/federated-boot.pg-unreachable.integration.test.ts
*
* Skipped when FEDERATED_INTEGRATION !== '1'.
*/
import net from 'node:net';
import { beforeAll, describe, expect, it } from 'vitest';
import { TierDetectionError, detectAndAssertTier } from '@mosaicstack/storage';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const VALKEY_URL = 'redis://localhost:6380';
/**
* Reserves a guaranteed-closed port at runtime by binding to an ephemeral OS
* port (port 0) and immediately releasing it. The OS will not reassign the
* port during the TIME_WAIT window, so it remains closed for the duration of
* this test.
*/
async function reserveClosedPort(): Promise<number> {
return new Promise((resolve, reject) => {
const server = net.createServer();
server.listen(0, '127.0.0.1', () => {
const addr = server.address();
if (typeof addr !== 'object' || !addr) return reject(new Error('no addr'));
const port = addr.port;
server.close(() => resolve(port));
});
server.on('error', reject);
});
}
describe.skipIf(!run)('federated boot — PG unreachable', () => {
let badPgUrl: string;
beforeAll(async () => {
const closedPort = await reserveClosedPort();
badPgUrl = `postgresql://mosaic:mosaic@localhost:${closedPort}/mosaic`;
});
it('detectAndAssertTier throws TierDetectionError with service: postgres when PG is down', async () => {
const brokenConfig = {
tier: 'federated' as const,
storage: {
type: 'postgres' as const,
url: badPgUrl,
enableVector: true,
},
queue: {
type: 'bullmq',
url: VALKEY_URL,
},
};
await expect(detectAndAssertTier(brokenConfig)).rejects.toSatisfy(
(err: unknown) => err instanceof TierDetectionError && err.service === 'postgres',
);
}, 10_000);
});

View File

@@ -0,0 +1,50 @@
/**
* Test A — Gateway boot succeeds when federated services are up.
*
* Prereq: docker compose -f docker-compose.federated.yml --profile federated up -d
* Run: FEDERATED_INTEGRATION=1 pnpm --filter @mosaicstack/gateway test src/__tests__/integration/federated-boot.success.integration.test.ts
*
* Skipped when FEDERATED_INTEGRATION !== '1'.
*/
import postgres from 'postgres';
import { afterAll, describe, expect, it } from 'vitest';
import { detectAndAssertTier } from '@mosaicstack/storage';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const PG_URL = 'postgresql://mosaic:mosaic@localhost:5433/mosaic';
const VALKEY_URL = 'redis://localhost:6380';
const federatedConfig = {
tier: 'federated' as const,
storage: {
type: 'postgres' as const,
url: PG_URL,
enableVector: true,
},
queue: {
type: 'bullmq',
url: VALKEY_URL,
},
};
describe.skipIf(!run)('federated boot — success path', () => {
let sql: ReturnType<typeof postgres> | undefined;
afterAll(async () => {
if (sql) {
await sql.end({ timeout: 2 }).catch(() => {});
}
});
it('detectAndAssertTier resolves without throwing when federated services are up', async () => {
await expect(detectAndAssertTier(federatedConfig)).resolves.toBeUndefined();
}, 10_000);
it('pgvector extension is registered (pg_extension row exists)', async () => {
sql = postgres(PG_URL, { max: 1, connect_timeout: 5, idle_timeout: 5 });
const rows = await sql`SELECT * FROM pg_extension WHERE extname = 'vector'`;
expect(rows).toHaveLength(1);
}, 10_000);
});

View File

@@ -0,0 +1,43 @@
/**
* Test C — pgvector extension is functional end-to-end.
*
* Creates a temp table with a vector(3) column, inserts a row, and queries it
* back — confirming the extension is not just registered but operational.
*
* Prereq: docker compose -f docker-compose.federated.yml --profile federated up -d
* Run: FEDERATED_INTEGRATION=1 pnpm --filter @mosaicstack/gateway test src/__tests__/integration/federated-pgvector.integration.test.ts
*
* Skipped when FEDERATED_INTEGRATION !== '1'.
*/
import postgres from 'postgres';
import { afterAll, describe, expect, it } from 'vitest';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const PG_URL = 'postgresql://mosaic:mosaic@localhost:5433/mosaic';
let sql: ReturnType<typeof postgres> | undefined;
afterAll(async () => {
if (sql) {
await sql.end({ timeout: 2 }).catch(() => {});
}
});
describe.skipIf(!run)('federated pgvector — functional end-to-end', () => {
it('vector ops round-trip: INSERT [1,2,3] and SELECT returns [1,2,3]', async () => {
sql = postgres(PG_URL, { max: 1, connect_timeout: 5, idle_timeout: 5 });
await sql`CREATE TEMP TABLE t (id int, embedding vector(3))`;
await sql`INSERT INTO t VALUES (1, '[1,2,3]')`;
const rows = await sql`SELECT embedding FROM t`;
expect(rows).toHaveLength(1);
// The postgres driver returns vector columns as strings like '[1,2,3]'.
// Normalise by parsing the string representation.
const raw = rows[0]?.['embedding'] as string;
const parsed = JSON.parse(raw) as number[];
expect(parsed).toEqual([1, 2, 3]);
}, 10_000);
});

View File

@@ -0,0 +1,483 @@
/**
* Federation M2 integration tests (FED-M2-09).
*
* Covers MILESTONES.md acceptance tests #1, #2, #3, #5, #7, #8.
*
* Prerequisites:
* docker compose -f docker-compose.federated.yml --profile federated up -d
*
* Run DB-only tests (no Step-CA):
* FEDERATED_INTEGRATION=1 BETTER_AUTH_SECRET=test-secret pnpm --filter @mosaicstack/gateway test \
* src/__tests__/integration/federation-m2.integration.test.ts
*
* Run all tests including Step-CA-dependent ones:
* FEDERATED_INTEGRATION=1 STEP_CA_AVAILABLE=1 \
* STEP_CA_URL=https://localhost:9000 \
* STEP_CA_PROVISIONER_KEY_JSON="$(docker exec $(docker ps -qf name=step-ca) cat /home/step/secrets/mosaic-fed.json)" \
* STEP_CA_ROOT_CERT_PATH=/tmp/step-ca-root.crt \
* pnpm --filter @mosaicstack/gateway test \
* src/__tests__/integration/federation-m2.integration.test.ts
*
* Obtaining Step-CA credentials:
* # Extract provisioner key from running container:
* # docker exec $(docker ps -qf name=step-ca) cat /home/step/secrets/mosaic-fed.json
* # Copy root cert from container:
* # docker cp $(docker ps -qf name=step-ca):/home/step/certs/root_ca.crt /tmp/step-ca-root.crt
* # Then: export STEP_CA_ROOT_CERT_PATH=/tmp/step-ca-root.crt
*/
import * as crypto from 'node:crypto';
import { afterAll, beforeAll, describe, expect, it } from 'vitest';
import { Test } from '@nestjs/testing';
import { GoneException } from '@nestjs/common';
import { Pkcs10CertificateRequestGenerator, X509Certificate as PeculiarX509 } from '@peculiar/x509';
import {
createDb,
type Db,
type DbHandle,
federationPeers,
federationGrants,
federationEnrollmentTokens,
inArray,
eq,
} from '@mosaicstack/db';
import * as schema from '@mosaicstack/db';
import { seal } from '@mosaicstack/auth';
import { DB } from '../../database/database.module.js';
import { GrantsService } from '../../federation/grants.service.js';
import { EnrollmentService } from '../../federation/enrollment.service.js';
import { CaService } from '../../federation/ca.service.js';
import { FederationScopeError } from '../../federation/scope-schema.js';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const stepCaRun = run && process.env['STEP_CA_AVAILABLE'] === '1';
const PG_URL = 'postgresql://mosaic:mosaic@localhost:5433/mosaic';
// ---------------------------------------------------------------------------
// Helpers for test data isolation
// ---------------------------------------------------------------------------
/** Unique run prefix to identify rows created by this test run. */
const RUN_ID = crypto.randomUUID();
/** Insert a minimal user row to satisfy the FK on federation_grants.subject_user_id. */
async function insertTestUser(db: Db, id: string): Promise<void> {
await db
.insert(schema.users)
.values({
id,
name: `test-user-${id}`,
email: `test-${id}@federation-test.invalid`,
emailVerified: false,
})
.onConflictDoNothing();
}
/** Insert a minimal peer row to satisfy the FK on federation_grants.peer_id. */
async function insertTestPeer(db: Db, id: string, suffix: string = ''): Promise<void> {
await db
.insert(federationPeers)
.values({
id,
commonName: `test-peer-${RUN_ID}-${suffix}`,
displayName: `Test Peer ${suffix}`,
certPem: '-----BEGIN CERTIFICATE-----\nMOCK\n-----END CERTIFICATE-----\n',
certSerial: `test-serial-${id}`,
certNotAfter: new Date(Date.now() + 365 * 24 * 60 * 60 * 1000),
state: 'pending',
})
.onConflictDoNothing();
}
// ---------------------------------------------------------------------------
// DB-only test module (CaService mocked so env vars not required)
// ---------------------------------------------------------------------------
function buildDbModule(db: Db) {
return Test.createTestingModule({
providers: [
{ provide: DB, useValue: db },
GrantsService,
{
provide: CaService,
useValue: {
issueCert: async () => {
throw new Error('CaService.issueCert should not be called in DB-only tests');
},
},
},
EnrollmentService,
],
}).compile();
}
// ---------------------------------------------------------------------------
// Test suite — DB-only (no Step-CA)
// ---------------------------------------------------------------------------
describe.skipIf(!run)('federation M2 — DB-only tests', () => {
let handle: DbHandle;
let db: Db;
let grantsService: GrantsService;
/** IDs created during this run — cleaned up in afterAll. */
const createdGrantIds: string[] = [];
const createdPeerIds: string[] = [];
const createdUserIds: string[] = [];
beforeAll(async () => {
process.env['BETTER_AUTH_SECRET'] ??= 'test-integration-sealing-key-not-for-prod';
handle = createDb(PG_URL);
db = handle.db;
const moduleRef = await buildDbModule(db);
grantsService = moduleRef.get(GrantsService);
});
afterAll(async () => {
// Clean up in FK-safe order: tokens → grants → peers → users
if (db && createdGrantIds.length > 0) {
await db
.delete(federationEnrollmentTokens)
.where(inArray(federationEnrollmentTokens.grantId, createdGrantIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
await db
.delete(federationGrants)
.where(inArray(federationGrants.id, createdGrantIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (db && createdPeerIds.length > 0) {
await db
.delete(federationPeers)
.where(inArray(federationPeers.id, createdPeerIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (db && createdUserIds.length > 0) {
await db
.delete(schema.users)
.where(inArray(schema.users.id, createdUserIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (handle)
await handle.close().catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
});
// -------------------------------------------------------------------------
// #1 — grant create writes a pending row
// -------------------------------------------------------------------------
it('#1 — createGrant writes a pending row to DB', async () => {
const userId = crypto.randomUUID();
const peerId = crypto.randomUUID();
const validScope = {
resources: ['tasks'],
excluded_resources: [],
max_rows_per_query: 100,
};
await insertTestUser(db, userId);
await insertTestPeer(db, peerId, 'test1');
createdUserIds.push(userId);
createdPeerIds.push(peerId);
const grant = await grantsService.createGrant({
subjectUserId: userId,
scope: validScope,
peerId,
});
createdGrantIds.push(grant.id);
// Verify the row exists in DB with correct shape
const [row] = await db
.select()
.from(federationGrants)
.where(eq(federationGrants.id, grant.id))
.limit(1);
expect(row).toBeDefined();
expect(row?.status).toBe('pending');
expect(row?.peerId).toBe(peerId);
expect(row?.subjectUserId).toBe(userId);
const storedScope = row?.scope as Record<string, unknown>;
expect(storedScope['resources']).toEqual(['tasks']);
expect(storedScope['max_rows_per_query']).toBe(100);
}, 15_000);
// -------------------------------------------------------------------------
// #7 — scope with unknown resource type rejected
// -------------------------------------------------------------------------
it('#7 — createGrant rejects scope with unknown resource type', async () => {
const userId = crypto.randomUUID();
const peerId = crypto.randomUUID();
const invalidScope = {
resources: ['totally_unknown_resource'],
excluded_resources: [],
max_rows_per_query: 100,
};
await insertTestUser(db, userId);
await insertTestPeer(db, peerId, 'test7');
createdUserIds.push(userId);
createdPeerIds.push(peerId);
await expect(
grantsService.createGrant({
subjectUserId: userId,
scope: invalidScope,
peerId,
}),
).rejects.toThrow(FederationScopeError);
}, 15_000);
// -------------------------------------------------------------------------
// #8 — listGrants returns accurate status for grants in various states
// -------------------------------------------------------------------------
it('#8 — listGrants returns accurate status for grants in various states', async () => {
const userId = crypto.randomUUID();
const peerId = crypto.randomUUID();
const validScope = {
resources: ['notes'],
excluded_resources: [],
max_rows_per_query: 50,
};
await insertTestUser(db, userId);
await insertTestPeer(db, peerId, 'test8');
createdUserIds.push(userId);
createdPeerIds.push(peerId);
// Create two pending grants via GrantsService
const grantA = await grantsService.createGrant({
subjectUserId: userId,
scope: validScope,
peerId,
});
const grantB = await grantsService.createGrant({
subjectUserId: userId,
scope: { resources: ['tasks'], excluded_resources: [], max_rows_per_query: 50 },
peerId,
});
createdGrantIds.push(grantA.id, grantB.id);
// Insert a third grant directly in 'revoked' state to test status variety
const [grantC] = await db
.insert(federationGrants)
.values({
id: crypto.randomUUID(),
subjectUserId: userId,
peerId,
scope: validScope,
status: 'revoked',
revokedAt: new Date(),
})
.returning();
createdGrantIds.push(grantC!.id);
// List all grants for this peer
const allForPeer = await grantsService.listGrants({ peerId });
const ourGrantIds = new Set([grantA.id, grantB.id, grantC!.id]);
const ourGrants = allForPeer.filter((g) => ourGrantIds.has(g.id));
expect(ourGrants).toHaveLength(3);
const pendingGrants = ourGrants.filter((g) => g.status === 'pending');
const revokedGrants = ourGrants.filter((g) => g.status === 'revoked');
expect(pendingGrants).toHaveLength(2);
expect(revokedGrants).toHaveLength(1);
// Status-filtered query
const pendingOnly = await grantsService.listGrants({ peerId, status: 'pending' });
const ourPending = pendingOnly.filter((g) => ourGrantIds.has(g.id));
expect(ourPending.every((g) => g.status === 'pending')).toBe(true);
// Verify peer list from DB also shows the peer rows with correct state
const peers = await db.select().from(federationPeers).where(eq(federationPeers.id, peerId));
expect(peers).toHaveLength(1);
expect(peers[0]?.state).toBe('pending');
}, 15_000);
// -------------------------------------------------------------------------
// #5 — client_key_pem encrypted at rest
// -------------------------------------------------------------------------
it('#5 — clientKeyPem stored in DB is a sealed ciphertext (not a valid PEM)', async () => {
const peerId = crypto.randomUUID();
const rawPem = '-----BEGIN PRIVATE KEY-----\nMOCK\n-----END PRIVATE KEY-----\n';
const sealed = seal(rawPem);
await db.insert(federationPeers).values({
id: peerId,
commonName: `test-peer-${RUN_ID}-sealed`,
displayName: 'Sealed Key Test Peer',
certPem: '-----BEGIN CERTIFICATE-----\nMOCK\n-----END CERTIFICATE-----\n',
certSerial: `test-serial-sealed-${peerId}`,
certNotAfter: new Date(Date.now() + 365 * 24 * 60 * 60 * 1000),
state: 'pending',
clientKeyPem: sealed,
});
createdPeerIds.push(peerId);
const [row] = await db
.select()
.from(federationPeers)
.where(eq(federationPeers.id, peerId))
.limit(1);
expect(row).toBeDefined();
// The stored value must NOT be a valid PEM — it's a sealed ciphertext blob
expect(row?.clientKeyPem).toBeDefined();
expect(row?.clientKeyPem?.startsWith('-----BEGIN')).toBe(false);
// The sealed value should be non-trivial (at least 20 chars)
expect((row?.clientKeyPem ?? '').length).toBeGreaterThan(20);
}, 15_000);
});
// ---------------------------------------------------------------------------
// Test suite — Step-CA gated
// ---------------------------------------------------------------------------
describe.skipIf(!stepCaRun)('federation M2 — Step-CA tests', () => {
let handle: DbHandle;
let db: Db;
let grantsService: GrantsService;
let enrollmentService: EnrollmentService;
const createdGrantIds: string[] = [];
const createdPeerIds: string[] = [];
const createdUserIds: string[] = [];
beforeAll(async () => {
handle = createDb(PG_URL);
db = handle.db;
// Use real CaService — env vars (STEP_CA_URL, STEP_CA_PROVISIONER_KEY_JSON,
// STEP_CA_ROOT_CERT_PATH) must be set when STEP_CA_AVAILABLE=1
const moduleRef = await Test.createTestingModule({
providers: [{ provide: DB, useValue: db }, CaService, GrantsService, EnrollmentService],
}).compile();
grantsService = moduleRef.get(GrantsService);
enrollmentService = moduleRef.get(EnrollmentService);
});
afterAll(async () => {
if (db && createdGrantIds.length > 0) {
await db
.delete(federationEnrollmentTokens)
.where(inArray(federationEnrollmentTokens.grantId, createdGrantIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
await db
.delete(federationGrants)
.where(inArray(federationGrants.id, createdGrantIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (db && createdPeerIds.length > 0) {
await db
.delete(federationPeers)
.where(inArray(federationPeers.id, createdPeerIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (db && createdUserIds.length > 0) {
await db
.delete(schema.users)
.where(inArray(schema.users.id, createdUserIds))
.catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
}
if (handle)
await handle.close().catch((e: unknown) => console.error('[federation-m2-test cleanup]', e));
});
/** Generate a P-256 key pair and PKCS#10 CSR, returning the CSR as PEM. */
async function generateCsrPem(cn: string): Promise<string> {
const alg = { name: 'ECDSA', namedCurve: 'P-256', hash: 'SHA-256' };
const keyPair = await crypto.subtle.generateKey(alg, true, ['sign', 'verify']);
const csr = await Pkcs10CertificateRequestGenerator.create({
name: `CN=${cn}`,
keys: keyPair,
signingAlgorithm: alg,
});
return csr.toString('pem');
}
// -------------------------------------------------------------------------
// #2 — enrollment signs CSR and returns cert
// -------------------------------------------------------------------------
it('#2 — redeem returns a certPem containing a valid PEM certificate', async () => {
const userId = crypto.randomUUID();
const peerId = crypto.randomUUID();
const validScope = {
resources: ['tasks'],
excluded_resources: [],
max_rows_per_query: 100,
};
await insertTestUser(db, userId);
await insertTestPeer(db, peerId, 'ca-test2');
createdUserIds.push(userId);
createdPeerIds.push(peerId);
const grant = await grantsService.createGrant({
subjectUserId: userId,
scope: validScope,
peerId,
});
createdGrantIds.push(grant.id);
const { token } = await enrollmentService.createToken({
grantId: grant.id,
peerId,
ttlSeconds: 900,
});
const csrPem = await generateCsrPem(`gateway-test-${RUN_ID.slice(0, 8)}`);
const result = await enrollmentService.redeem(token, csrPem);
expect(result.certPem).toContain('-----BEGIN CERTIFICATE-----');
expect(result.certChainPem).toContain('-----BEGIN CERTIFICATE-----');
// Verify the issued cert parses cleanly
const cert = new PeculiarX509(result.certPem);
expect(cert.serialNumber).toBeTruthy();
}, 30_000);
// -------------------------------------------------------------------------
// #3 — token single-use; second attempt returns GoneException
// -------------------------------------------------------------------------
it('#3 — second redeem of the same token throws GoneException', async () => {
const userId = crypto.randomUUID();
const peerId = crypto.randomUUID();
const validScope = {
resources: ['notes'],
excluded_resources: [],
max_rows_per_query: 50,
};
await insertTestUser(db, userId);
await insertTestPeer(db, peerId, 'ca-test3');
createdUserIds.push(userId);
createdPeerIds.push(peerId);
const grant = await grantsService.createGrant({
subjectUserId: userId,
scope: validScope,
peerId,
});
createdGrantIds.push(grant.id);
const { token } = await enrollmentService.createToken({
grantId: grant.id,
peerId,
ttlSeconds: 900,
});
const csrPem = await generateCsrPem(`gateway-test-replay-${RUN_ID.slice(0, 8)}`);
// First redeem must succeed
const result = await enrollmentService.redeem(token, csrPem);
expect(result.certPem).toContain('-----BEGIN CERTIFICATE-----');
// Second redeem with the same token must be rejected
await expect(enrollmentService.redeem(token, csrPem)).rejects.toThrow(GoneException);
}, 30_000);
});

View File

@@ -1,62 +1,10 @@
import { Inject, Injectable, Logger } from '@nestjs/common';
import { createCipheriv, createDecipheriv, createHash, randomBytes } from 'node:crypto';
import { seal, unseal } from '@mosaicstack/auth';
import type { Db } from '@mosaicstack/db';
import { providerCredentials, eq, and } from '@mosaicstack/db';
import { DB } from '../database/database.module.js';
import type { ProviderCredentialSummaryDto } from './provider-credentials.dto.js';
const ALGORITHM = 'aes-256-gcm';
const IV_LENGTH = 12; // 96-bit IV for GCM
const TAG_LENGTH = 16; // 128-bit auth tag
/**
* Derive a 32-byte AES-256 key from BETTER_AUTH_SECRET using SHA-256.
* The secret is assumed to be set in the environment.
*/
function deriveEncryptionKey(): Buffer {
const secret = process.env['BETTER_AUTH_SECRET'];
if (!secret) {
throw new Error('BETTER_AUTH_SECRET is not set — cannot derive encryption key');
}
return createHash('sha256').update(secret).digest();
}
/**
* Encrypt a plain-text value using AES-256-GCM.
* Output format: base64(iv + authTag + ciphertext)
*/
function encrypt(plaintext: string): string {
const key = deriveEncryptionKey();
const iv = randomBytes(IV_LENGTH);
const cipher = createCipheriv(ALGORITHM, key, iv);
const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
const authTag = cipher.getAuthTag();
// Combine iv (12) + authTag (16) + ciphertext and base64-encode
const combined = Buffer.concat([iv, authTag, encrypted]);
return combined.toString('base64');
}
/**
* Decrypt a value encrypted by `encrypt()`.
* Throws on authentication failure (tampered data).
*/
function decrypt(encoded: string): string {
const key = deriveEncryptionKey();
const combined = Buffer.from(encoded, 'base64');
const iv = combined.subarray(0, IV_LENGTH);
const authTag = combined.subarray(IV_LENGTH, IV_LENGTH + TAG_LENGTH);
const ciphertext = combined.subarray(IV_LENGTH + TAG_LENGTH);
const decipher = createDecipheriv(ALGORITHM, key, iv);
decipher.setAuthTag(authTag);
const decrypted = Buffer.concat([decipher.update(ciphertext), decipher.final()]);
return decrypted.toString('utf8');
}
@Injectable()
export class ProviderCredentialsService {
private readonly logger = new Logger(ProviderCredentialsService.name);
@@ -74,7 +22,7 @@ export class ProviderCredentialsService {
value: string,
metadata?: Record<string, unknown>,
): Promise<void> {
const encryptedValue = encrypt(value);
const encryptedValue = seal(value);
await this.db
.insert(providerCredentials)
@@ -122,7 +70,7 @@ export class ProviderCredentialsService {
}
try {
return decrypt(row.encryptedValue);
return unseal(row.encryptedValue);
} catch (err) {
this.logger.error(
`Failed to decrypt credential for user=${userId} provider=${provider}`,

View File

@@ -24,6 +24,7 @@ import { GCModule } from './gc/gc.module.js';
import { ReloadModule } from './reload/reload.module.js';
import { WorkspaceModule } from './workspace/workspace.module.js';
import { QueueModule } from './queue/queue.module.js';
import { FederationModule } from './federation/federation.module.js';
import { ThrottlerGuard, ThrottlerModule } from '@nestjs/throttler';
@Module({
@@ -52,6 +53,7 @@ import { ThrottlerGuard, ThrottlerModule } from '@nestjs/throttler';
QueueModule,
ReloadModule,
WorkspaceModule,
FederationModule,
],
controllers: [HealthController],
providers: [

View File

@@ -0,0 +1,373 @@
/**
* Unit tests for EnrollmentService — federation enrollment token flow (FED-M2-07).
*
* Coverage:
* createToken:
* - inserts token row with correct grantId, peerId, and future expiresAt
* - returns { token, expiresAt } with a 64-char hex token
* - clamps ttlSeconds to 900
*
* redeem — error paths:
* - NotFoundException when token row not found
* - GoneException when token already used (usedAt set)
* - GoneException when token expired (expiresAt < now)
* - GoneException when grant status is not pending
*
* redeem — success path:
* - atomically claims token BEFORE cert issuance (claim → issueCert → tx)
* - calls CaService.issueCert with correct args
* - activates grant + updates peer + writes audit log inside a transaction
* - returns { certPem, certChainPem }
*
* redeem — replay protection:
* - GoneException when claim UPDATE returns empty array (concurrent request won)
*/
import 'reflect-metadata';
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { GoneException, NotFoundException } from '@nestjs/common';
import type { Db } from '@mosaicstack/db';
import { EnrollmentService } from '../enrollment.service.js';
// ---------------------------------------------------------------------------
// Test constants
// ---------------------------------------------------------------------------
const GRANT_ID = 'g1111111-1111-1111-1111-111111111111';
const PEER_ID = 'p2222222-2222-2222-2222-222222222222';
const USER_ID = 'u3333333-3333-3333-3333-333333333333';
const TOKEN = 'a'.repeat(64); // 64-char hex
const MOCK_CERT_PEM = '-----BEGIN CERTIFICATE-----\nMOCK\n-----END CERTIFICATE-----\n';
const MOCK_CHAIN_PEM = MOCK_CERT_PEM + MOCK_CERT_PEM;
const MOCK_SERIAL = 'ABCD1234';
// ---------------------------------------------------------------------------
// Factory helpers
// ---------------------------------------------------------------------------
function makeTokenRow(overrides: Partial<Record<string, unknown>> = {}) {
return {
token: TOKEN,
grantId: GRANT_ID,
peerId: PEER_ID,
expiresAt: new Date(Date.now() + 60_000), // 1 min from now
usedAt: null,
createdAt: new Date(),
...overrides,
};
}
function makeGrant(overrides: Partial<Record<string, unknown>> = {}) {
return {
id: GRANT_ID,
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: { resources: ['tasks'], excluded_resources: [], max_rows_per_query: 100 },
status: 'pending',
expiresAt: null,
createdAt: new Date(),
revokedAt: null,
revokedReason: null,
...overrides,
};
}
// ---------------------------------------------------------------------------
// Mock DB builder
// ---------------------------------------------------------------------------
function makeDb({
tokenRows = [makeTokenRow()],
// claimedRows is returned by the .returning() on the token-claim UPDATE.
// Empty array = concurrent request won the race (GoneException).
claimedRows = [{ token: TOKEN }],
}: {
tokenRows?: unknown[];
claimedRows?: unknown[];
} = {}) {
// insert().values() — for createToken (outer db, not tx)
const insertValues = vi.fn().mockResolvedValue(undefined);
const insertMock = vi.fn().mockReturnValue({ values: insertValues });
// select().from().where().limit() — for fetching the token row
const limitSelect = vi.fn().mockResolvedValue(tokenRows);
const whereSelect = vi.fn().mockReturnValue({ limit: limitSelect });
const fromSelect = vi.fn().mockReturnValue({ where: whereSelect });
const selectMock = vi.fn().mockReturnValue({ from: fromSelect });
// update().set().where().returning() — for the atomic token claim (outer db)
const returningMock = vi.fn().mockResolvedValue(claimedRows);
const whereClaimUpdate = vi.fn().mockReturnValue({ returning: returningMock });
const setClaimMock = vi.fn().mockReturnValue({ where: whereClaimUpdate });
const claimUpdateMock = vi.fn().mockReturnValue({ set: setClaimMock });
// transaction(cb) — cb receives txMock; txMock has update + insert
const txInsertValues = vi.fn().mockResolvedValue(undefined);
const txInsertMock = vi.fn().mockReturnValue({ values: txInsertValues });
const txWhereUpdate = vi.fn().mockResolvedValue(undefined);
const txSetMock = vi.fn().mockReturnValue({ where: txWhereUpdate });
const txUpdateMock = vi.fn().mockReturnValue({ set: txSetMock });
const txMock = { update: txUpdateMock, insert: txInsertMock };
const transactionMock = vi
.fn()
.mockImplementation(async (cb: (tx: typeof txMock) => Promise<void>) => cb(txMock));
return {
insert: insertMock,
select: selectMock,
update: claimUpdateMock,
transaction: transactionMock,
_mocks: {
insertValues,
insertMock,
limitSelect,
whereSelect,
fromSelect,
selectMock,
returningMock,
whereClaimUpdate,
setClaimMock,
claimUpdateMock,
txInsertValues,
txInsertMock,
txWhereUpdate,
txSetMock,
txUpdateMock,
txMock,
transactionMock,
},
};
}
// ---------------------------------------------------------------------------
// Mock CaService
// ---------------------------------------------------------------------------
function makeCaService() {
return {
issueCert: vi.fn().mockResolvedValue({
certPem: MOCK_CERT_PEM,
certChainPem: MOCK_CHAIN_PEM,
serialNumber: MOCK_SERIAL,
}),
};
}
// ---------------------------------------------------------------------------
// Mock GrantsService
// ---------------------------------------------------------------------------
function makeGrantsService(grantOverrides: Partial<Record<string, unknown>> = {}) {
return {
getGrant: vi.fn().mockResolvedValue(makeGrant(grantOverrides)),
activateGrant: vi.fn().mockResolvedValue(makeGrant({ status: 'active' })),
};
}
// ---------------------------------------------------------------------------
// Helper: build service under test
// ---------------------------------------------------------------------------
function buildService({
db = makeDb(),
caService = makeCaService(),
grantsService = makeGrantsService(),
}: {
db?: ReturnType<typeof makeDb>;
caService?: ReturnType<typeof makeCaService>;
grantsService?: ReturnType<typeof makeGrantsService>;
} = {}) {
return new EnrollmentService(db as unknown as Db, caService as never, grantsService as never);
}
// ---------------------------------------------------------------------------
// Tests: createToken
// ---------------------------------------------------------------------------
describe('EnrollmentService.createToken', () => {
it('inserts a token row and returns { token, expiresAt }', async () => {
const db = makeDb();
const service = buildService({ db });
const result = await service.createToken({
grantId: GRANT_ID,
peerId: PEER_ID,
ttlSeconds: 900,
});
expect(result.token).toHaveLength(64); // 32 bytes hex
expect(result.expiresAt).toBeDefined();
expect(new Date(result.expiresAt).getTime()).toBeGreaterThan(Date.now());
expect(db._mocks.insertValues).toHaveBeenCalledWith(
expect.objectContaining({ grantId: GRANT_ID, peerId: PEER_ID }),
);
});
it('clamps ttlSeconds to 900', async () => {
const db = makeDb();
const service = buildService({ db });
const before = Date.now();
const result = await service.createToken({
grantId: GRANT_ID,
peerId: PEER_ID,
ttlSeconds: 9999,
});
const after = Date.now();
const expiresMs = new Date(result.expiresAt).getTime();
// Should be at most 900s from now
expect(expiresMs - before).toBeLessThanOrEqual(900_000 + 100);
expect(expiresMs - after).toBeGreaterThanOrEqual(0);
});
});
// ---------------------------------------------------------------------------
// Tests: redeem — error paths
// ---------------------------------------------------------------------------
describe('EnrollmentService.redeem — error paths', () => {
it('throws NotFoundException when token row not found', async () => {
const db = makeDb({ tokenRows: [] });
const service = buildService({ db });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(NotFoundException);
});
it('throws GoneException when usedAt is set (already redeemed)', async () => {
const db = makeDb({ tokenRows: [makeTokenRow({ usedAt: new Date(Date.now() - 1000) })] });
const service = buildService({ db });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(GoneException);
});
it('throws GoneException when token has expired', async () => {
const db = makeDb({ tokenRows: [makeTokenRow({ expiresAt: new Date(Date.now() - 1000) })] });
const service = buildService({ db });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(GoneException);
});
it('throws GoneException when grant status is not pending', async () => {
const db = makeDb();
const grantsService = makeGrantsService({ status: 'active' });
const service = buildService({ db, grantsService });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(GoneException);
});
it('throws GoneException when token claim UPDATE returns empty array (concurrent replay)', async () => {
const db = makeDb({ claimedRows: [] });
const caService = makeCaService();
const grantsService = makeGrantsService();
const service = buildService({ db, caService, grantsService });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(GoneException);
});
it('does NOT call issueCert when token claim fails (no double minting)', async () => {
const db = makeDb({ claimedRows: [] });
const caService = makeCaService();
const service = buildService({ db, caService });
await expect(service.redeem(TOKEN, '---CSR---')).rejects.toBeInstanceOf(GoneException);
expect(caService.issueCert).not.toHaveBeenCalled();
});
});
// ---------------------------------------------------------------------------
// Tests: redeem — success path
// ---------------------------------------------------------------------------
describe('EnrollmentService.redeem — success path', () => {
let db: ReturnType<typeof makeDb>;
let caService: ReturnType<typeof makeCaService>;
let grantsService: ReturnType<typeof makeGrantsService>;
let service: EnrollmentService;
beforeEach(() => {
db = makeDb();
caService = makeCaService();
grantsService = makeGrantsService();
service = buildService({ db, caService, grantsService });
});
it('claims token BEFORE calling issueCert (prevents double minting)', async () => {
const callOrder: string[] = [];
db._mocks.returningMock.mockImplementation(async () => {
callOrder.push('claim');
return [{ token: TOKEN }];
});
caService.issueCert.mockImplementation(async () => {
callOrder.push('issueCert');
return { certPem: MOCK_CERT_PEM, certChainPem: MOCK_CHAIN_PEM, serialNumber: MOCK_SERIAL };
});
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(callOrder).toEqual(['claim', 'issueCert']);
});
it('calls CaService.issueCert with grantId, subjectUserId, csrPem, ttlSeconds=300', async () => {
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(caService.issueCert).toHaveBeenCalledWith(
expect.objectContaining({
grantId: GRANT_ID,
subjectUserId: USER_ID,
csrPem: MOCK_CERT_PEM,
ttlSeconds: 300,
}),
);
});
it('runs activate grant + peer update + audit inside a transaction', async () => {
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(db._mocks.transactionMock).toHaveBeenCalledOnce();
// tx.update called twice: activate grant + update peer
expect(db._mocks.txUpdateMock).toHaveBeenCalledTimes(2);
// tx.insert called once: audit log
expect(db._mocks.txInsertMock).toHaveBeenCalledOnce();
});
it('activates grant (sets status=active) inside the transaction', async () => {
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(db._mocks.txSetMock).toHaveBeenCalledWith(expect.objectContaining({ status: 'active' }));
});
it('updates the federationPeers row with certPem, certSerial, state=active inside the transaction', async () => {
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(db._mocks.txSetMock).toHaveBeenCalledWith(
expect.objectContaining({
certPem: MOCK_CERT_PEM,
certSerial: MOCK_SERIAL,
state: 'active',
}),
);
});
it('inserts an audit log row inside the transaction', async () => {
await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(db._mocks.txInsertValues).toHaveBeenCalledWith(
expect.objectContaining({
peerId: PEER_ID,
grantId: GRANT_ID,
verb: 'enrollment',
}),
);
});
it('returns { certPem, certChainPem } from CaService', async () => {
const result = await service.redeem(TOKEN, MOCK_CERT_PEM);
expect(result).toEqual({
certPem: MOCK_CERT_PEM,
certChainPem: MOCK_CHAIN_PEM,
});
});
});

View File

@@ -0,0 +1,212 @@
/**
* Unit tests for FederationController (FED-M2-08).
*
* Coverage:
* - listGrants: delegates to GrantsService with query params
* - createGrant: delegates to GrantsService, validates body
* - generateToken: returns enrollmentUrl containing the token
* - listPeers: returns DB rows
*/
import 'reflect-metadata';
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { NotFoundException } from '@nestjs/common';
import type { Db } from '@mosaicstack/db';
import { FederationController } from '../federation.controller.js';
import type { GrantsService } from '../grants.service.js';
import type { EnrollmentService } from '../enrollment.service.js';
// ---------------------------------------------------------------------------
// Constants
// ---------------------------------------------------------------------------
const GRANT_ID = 'g1111111-1111-1111-1111-111111111111';
const PEER_ID = 'p2222222-2222-2222-2222-222222222222';
const USER_ID = 'u3333333-3333-3333-3333-333333333333';
const MOCK_GRANT = {
id: GRANT_ID,
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: { resources: ['tasks'], operations: ['list'] },
status: 'pending' as const,
expiresAt: null,
createdAt: new Date('2026-01-01T00:00:00Z'),
revokedAt: null,
revokedReason: null,
};
const MOCK_PEER = {
id: PEER_ID,
commonName: 'test-peer',
displayName: 'Test Peer',
certPem: '',
certSerial: 'pending',
certNotAfter: new Date(0),
clientKeyPem: null,
state: 'pending' as const,
endpointUrl: null,
createdAt: new Date('2026-01-01T00:00:00Z'),
updatedAt: new Date('2026-01-01T00:00:00Z'),
};
// ---------------------------------------------------------------------------
// DB mock builder
// ---------------------------------------------------------------------------
function makeDbMock(rows: unknown[] = []) {
const orderBy = vi.fn().mockResolvedValue(rows);
const where = vi.fn().mockReturnValue({ orderBy });
const from = vi.fn().mockReturnValue({ where, orderBy });
const select = vi.fn().mockReturnValue({ from });
return {
select,
from,
where,
orderBy,
insert: vi.fn(),
update: vi.fn(),
delete: vi.fn(),
} as unknown as Db;
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('FederationController', () => {
let db: Db;
let grantsService: GrantsService;
let enrollmentService: EnrollmentService;
let controller: FederationController;
beforeEach(() => {
db = makeDbMock([MOCK_PEER]);
grantsService = {
createGrant: vi.fn().mockResolvedValue(MOCK_GRANT),
getGrant: vi.fn().mockResolvedValue(MOCK_GRANT),
listGrants: vi.fn().mockResolvedValue([MOCK_GRANT]),
revokeGrant: vi.fn().mockResolvedValue({ ...MOCK_GRANT, status: 'revoked' }),
activateGrant: vi.fn(),
expireGrant: vi.fn(),
} as unknown as GrantsService;
enrollmentService = {
createToken: vi.fn().mockResolvedValue({
token: 'abc123def456abc123def456abc123def456abc123def456abc123def456ab12',
expiresAt: '2026-01-01T00:15:00.000Z',
}),
redeem: vi.fn(),
} as unknown as EnrollmentService;
controller = new FederationController(db, grantsService, enrollmentService);
});
// ─── Grant management ──────────────────────────────────────────────────
describe('listGrants', () => {
it('delegates to GrantsService with provided query params', async () => {
const query = { peerId: PEER_ID, status: 'pending' as const };
const result = await controller.listGrants(query);
expect(grantsService.listGrants).toHaveBeenCalledWith(query);
expect(result).toEqual([MOCK_GRANT]);
});
it('delegates to GrantsService with empty filters', async () => {
const result = await controller.listGrants({});
expect(grantsService.listGrants).toHaveBeenCalledWith({});
expect(result).toEqual([MOCK_GRANT]);
});
});
describe('createGrant', () => {
it('delegates to GrantsService and returns created grant', async () => {
const body = {
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: { resources: ['tasks'], operations: ['list'] },
};
const result = await controller.createGrant(body);
expect(grantsService.createGrant).toHaveBeenCalledWith(body);
expect(result).toEqual(MOCK_GRANT);
});
});
describe('getGrant', () => {
it('delegates to GrantsService with provided ID', async () => {
const result = await controller.getGrant(GRANT_ID);
expect(grantsService.getGrant).toHaveBeenCalledWith(GRANT_ID);
expect(result).toEqual(MOCK_GRANT);
});
});
describe('revokeGrant', () => {
it('delegates to GrantsService with id and reason', async () => {
const result = await controller.revokeGrant(GRANT_ID, { reason: 'test reason' });
expect(grantsService.revokeGrant).toHaveBeenCalledWith(GRANT_ID, 'test reason');
expect(result).toMatchObject({ status: 'revoked' });
});
it('delegates without reason when omitted', async () => {
await controller.revokeGrant(GRANT_ID, {});
expect(grantsService.revokeGrant).toHaveBeenCalledWith(GRANT_ID, undefined);
});
});
describe('generateToken', () => {
it('returns enrollmentUrl containing the token', async () => {
const token = 'abc123def456abc123def456abc123def456abc123def456abc123def456ab12';
vi.mocked(enrollmentService.createToken).mockResolvedValueOnce({
token,
expiresAt: '2026-01-01T00:15:00.000Z',
});
const result = await controller.generateToken(GRANT_ID, { ttlSeconds: 900 });
expect(result.token).toBe(token);
expect(result.enrollmentUrl).toContain(token);
expect(result.enrollmentUrl).toContain('/api/federation/enrollment/');
});
it('creates token via EnrollmentService with correct grantId and peerId', async () => {
await controller.generateToken(GRANT_ID, { ttlSeconds: 300 });
expect(enrollmentService.createToken).toHaveBeenCalledWith({
grantId: GRANT_ID,
peerId: PEER_ID,
ttlSeconds: 300,
});
});
it('throws NotFoundException when grant does not exist', async () => {
vi.mocked(grantsService.getGrant).mockRejectedValueOnce(
new NotFoundException(`Grant ${GRANT_ID} not found`),
);
await expect(controller.generateToken(GRANT_ID, { ttlSeconds: 900 })).rejects.toThrow(
NotFoundException,
);
});
});
// ─── Peer management ───────────────────────────────────────────────────
describe('listPeers', () => {
it('returns DB rows ordered by commonName', async () => {
const result = await controller.listPeers();
expect(db.select).toHaveBeenCalled();
// The DB mock resolves with [MOCK_PEER]
expect(result).toEqual([MOCK_PEER]);
});
});
});

View File

@@ -0,0 +1,351 @@
/**
* Unit tests for GrantsService — federation grants CRUD + status transitions (FED-M2-06).
*
* Coverage:
* - createGrant: validates scope via parseFederationScope
* - createGrant: inserts with status 'pending'
* - getGrant: returns grant when found
* - getGrant: throws NotFoundException when not found
* - listGrants: no filters returns all grants
* - listGrants: filters by peerId
* - listGrants: filters by subjectUserId
* - listGrants: filters by status
* - listGrants: multiple filters combined
* - activateGrant: pending → active works
* - activateGrant: non-pending throws ConflictException
* - revokeGrant: active → revoked works, sets revokedAt
* - revokeGrant: non-active throws ConflictException
* - expireGrant: active → expired works
* - expireGrant: non-active throws ConflictException
*/
import 'reflect-metadata';
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { ConflictException, NotFoundException } from '@nestjs/common';
import type { Db } from '@mosaicstack/db';
import { GrantsService } from '../grants.service.js';
import { FederationScopeError } from '../scope-schema.js';
// ---------------------------------------------------------------------------
// Minimal valid federation scope for testing
// ---------------------------------------------------------------------------
const VALID_SCOPE = {
resources: ['tasks'] as const,
excluded_resources: [],
max_rows_per_query: 100,
};
const PEER_ID = 'a1111111-1111-1111-1111-111111111111';
const USER_ID = 'u2222222-2222-2222-2222-222222222222';
const GRANT_ID = 'g3333333-3333-3333-3333-333333333333';
// ---------------------------------------------------------------------------
// Build a mock DB that mimics chained Drizzle query builder calls
// ---------------------------------------------------------------------------
function makeMockGrant(overrides: Partial<Record<string, unknown>> = {}) {
return {
id: GRANT_ID,
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: VALID_SCOPE,
status: 'pending',
expiresAt: null,
createdAt: new Date('2026-01-01T00:00:00Z'),
revokedAt: null,
revokedReason: null,
...overrides,
};
}
function makeDb(
overrides: {
insertReturning?: unknown[];
selectRows?: unknown[];
updateReturning?: unknown[];
} = {},
) {
const insertReturning = overrides.insertReturning ?? [makeMockGrant()];
const selectRows = overrides.selectRows ?? [makeMockGrant()];
const updateReturning = overrides.updateReturning ?? [makeMockGrant({ status: 'active' })];
// Drizzle returns a chainable builder; we need to mock the full chain.
const returningInsert = vi.fn().mockResolvedValue(insertReturning);
const valuesInsert = vi.fn().mockReturnValue({ returning: returningInsert });
const insertMock = vi.fn().mockReturnValue({ values: valuesInsert });
// select().from().where().limit()
const limitSelect = vi.fn().mockResolvedValue(selectRows);
const whereSelect = vi.fn().mockReturnValue({ limit: limitSelect });
// from returns something that is both thenable (for full-table select) and has .where()
const fromSelect = vi.fn().mockReturnValue({
where: whereSelect,
limit: limitSelect,
// Make it thenable for listGrants with no filters (await db.select().from(federationGrants))
then: (resolve: (v: unknown) => unknown) => resolve(selectRows),
});
const selectMock = vi.fn().mockReturnValue({ from: fromSelect });
const returningUpdate = vi.fn().mockResolvedValue(updateReturning);
const whereUpdate = vi.fn().mockReturnValue({ returning: returningUpdate });
const setMock = vi.fn().mockReturnValue({ where: whereUpdate });
const updateMock = vi.fn().mockReturnValue({ set: setMock });
return {
insert: insertMock,
select: selectMock,
update: updateMock,
// Expose internals for assertions
_mocks: {
insertReturning,
valuesInsert,
insertMock,
limitSelect,
whereSelect,
fromSelect,
selectMock,
returningUpdate,
whereUpdate,
setMock,
updateMock,
},
};
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('GrantsService', () => {
let db: ReturnType<typeof makeDb>;
let service: GrantsService;
beforeEach(() => {
db = makeDb();
service = new GrantsService(db as unknown as Db);
});
// ─── createGrant ──────────────────────────────────────────────────────────
describe('createGrant', () => {
it('calls parseFederationScope — rejects an invalid scope', async () => {
const invalidScope = { resources: [], max_rows_per_query: 0 };
await expect(
service.createGrant({ peerId: PEER_ID, subjectUserId: USER_ID, scope: invalidScope }),
).rejects.toBeInstanceOf(FederationScopeError);
});
it('inserts a grant with status pending and returns it', async () => {
const result = await service.createGrant({
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: VALID_SCOPE,
});
expect(db._mocks.valuesInsert).toHaveBeenCalledWith(
expect.objectContaining({ status: 'pending', peerId: PEER_ID, subjectUserId: USER_ID }),
);
expect(result.status).toBe('pending');
});
it('passes expiresAt as a Date when provided', async () => {
await service.createGrant({
peerId: PEER_ID,
subjectUserId: USER_ID,
scope: VALID_SCOPE,
expiresAt: '2027-01-01T00:00:00Z',
});
expect(db._mocks.valuesInsert).toHaveBeenCalledWith(
expect.objectContaining({ expiresAt: expect.any(Date) }),
);
});
it('sets expiresAt to null when not provided', async () => {
await service.createGrant({ peerId: PEER_ID, subjectUserId: USER_ID, scope: VALID_SCOPE });
expect(db._mocks.valuesInsert).toHaveBeenCalledWith(
expect.objectContaining({ expiresAt: null }),
);
});
});
// ─── getGrant ─────────────────────────────────────────────────────────────
describe('getGrant', () => {
it('returns the grant when found', async () => {
const result = await service.getGrant(GRANT_ID);
expect(result.id).toBe(GRANT_ID);
});
it('throws NotFoundException when no rows returned', async () => {
db = makeDb({ selectRows: [] });
service = new GrantsService(db as unknown as Db);
await expect(service.getGrant(GRANT_ID)).rejects.toBeInstanceOf(NotFoundException);
});
});
// ─── listGrants ───────────────────────────────────────────────────────────
describe('listGrants', () => {
it('queries without where clause when no filters provided', async () => {
const result = await service.listGrants({});
expect(Array.isArray(result)).toBe(true);
});
it('applies peerId filter', async () => {
await service.listGrants({ peerId: PEER_ID });
expect(db._mocks.whereSelect).toHaveBeenCalled();
});
it('applies subjectUserId filter', async () => {
await service.listGrants({ subjectUserId: USER_ID });
expect(db._mocks.whereSelect).toHaveBeenCalled();
});
it('applies status filter', async () => {
await service.listGrants({ status: 'active' });
expect(db._mocks.whereSelect).toHaveBeenCalled();
});
it('applies multiple filters combined', async () => {
await service.listGrants({ peerId: PEER_ID, status: 'pending' });
expect(db._mocks.whereSelect).toHaveBeenCalled();
});
});
// ─── activateGrant ────────────────────────────────────────────────────────
describe('activateGrant', () => {
it('transitions pending → active and returns updated grant', async () => {
db = makeDb({
selectRows: [makeMockGrant({ status: 'pending' })],
updateReturning: [makeMockGrant({ status: 'active' })],
});
service = new GrantsService(db as unknown as Db);
const result = await service.activateGrant(GRANT_ID);
expect(db._mocks.setMock).toHaveBeenCalledWith({ status: 'active' });
expect(result.status).toBe('active');
});
it('throws ConflictException when grant is already active', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'active' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.activateGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is revoked', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'revoked' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.activateGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is expired', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'expired' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.activateGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
});
// ─── revokeGrant ──────────────────────────────────────────────────────────
describe('revokeGrant', () => {
it('transitions active → revoked and sets revokedAt', async () => {
const revokedAt = new Date();
db = makeDb({
selectRows: [makeMockGrant({ status: 'active' })],
updateReturning: [makeMockGrant({ status: 'revoked', revokedAt })],
});
service = new GrantsService(db as unknown as Db);
const result = await service.revokeGrant(GRANT_ID, 'test reason');
expect(db._mocks.setMock).toHaveBeenCalledWith(
expect.objectContaining({
status: 'revoked',
revokedAt: expect.any(Date),
revokedReason: 'test reason',
}),
);
expect(result.status).toBe('revoked');
});
it('sets revokedReason to null when not provided', async () => {
db = makeDb({
selectRows: [makeMockGrant({ status: 'active' })],
updateReturning: [makeMockGrant({ status: 'revoked', revokedAt: new Date() })],
});
service = new GrantsService(db as unknown as Db);
await service.revokeGrant(GRANT_ID);
expect(db._mocks.setMock).toHaveBeenCalledWith(
expect.objectContaining({ revokedReason: null }),
);
});
it('throws ConflictException when grant is pending', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'pending' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.revokeGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is already revoked', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'revoked' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.revokeGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is expired', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'expired' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.revokeGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
});
// ─── expireGrant ──────────────────────────────────────────────────────────
describe('expireGrant', () => {
it('transitions active → expired and returns updated grant', async () => {
db = makeDb({
selectRows: [makeMockGrant({ status: 'active' })],
updateReturning: [makeMockGrant({ status: 'expired' })],
});
service = new GrantsService(db as unknown as Db);
const result = await service.expireGrant(GRANT_ID);
expect(db._mocks.setMock).toHaveBeenCalledWith({ status: 'expired' });
expect(result.status).toBe('expired');
});
it('throws ConflictException when grant is pending', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'pending' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.expireGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is already expired', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'expired' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.expireGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
it('throws ConflictException when grant is revoked', async () => {
db = makeDb({ selectRows: [makeMockGrant({ status: 'revoked' })] });
service = new GrantsService(db as unknown as Db);
await expect(service.expireGrant(GRANT_ID)).rejects.toBeInstanceOf(ConflictException);
});
});
});

View File

@@ -0,0 +1,63 @@
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { sealClientKey, unsealClientKey } from '../peer-key.util.js';
const TEST_SECRET = 'test-secret-for-peer-key-unit-tests-only';
const TEST_PEM = `-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC7o4qne60TB3wo
pCOW8QqstpxEBpnFo37JxLYEJbpE3gUlJajsHv9UWRQ7m5B7n+MBXwTCQqMEY8Wl
kHv9tGgz1YGwzBjNKxPJXE6pPTXQ1Oa0VB9l3qHdqF5HtZoJzE0c6dO8HJ5YUVL
-----END PRIVATE KEY-----`;
let savedSecret: string | undefined;
beforeEach(() => {
savedSecret = process.env['BETTER_AUTH_SECRET'];
process.env['BETTER_AUTH_SECRET'] = TEST_SECRET;
});
afterEach(() => {
if (savedSecret === undefined) {
delete process.env['BETTER_AUTH_SECRET'];
} else {
process.env['BETTER_AUTH_SECRET'] = savedSecret;
}
});
describe('peer-key seal/unseal', () => {
it('round-trip: unsealClientKey(sealClientKey(pem)) returns original pem', () => {
const sealed = sealClientKey(TEST_PEM);
const roundTripped = unsealClientKey(sealed);
expect(roundTripped).toBe(TEST_PEM);
});
it('non-determinism: sealClientKey produces different ciphertext each call', () => {
const sealed1 = sealClientKey(TEST_PEM);
const sealed2 = sealClientKey(TEST_PEM);
expect(sealed1).not.toBe(sealed2);
});
it('at-rest: sealed output does not contain plaintext PEM content', () => {
const sealed = sealClientKey(TEST_PEM);
expect(sealed).not.toContain('PRIVATE KEY');
expect(sealed).not.toContain(
'MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC7o4qne60TB3wo',
);
});
it('tamper: flipping a byte in the sealed payload causes unseal to throw', () => {
const sealed = sealClientKey(TEST_PEM);
const buf = Buffer.from(sealed, 'base64');
// Flip a byte in the middle of the buffer (past IV and authTag)
const midpoint = Math.floor(buf.length / 2);
buf[midpoint] = buf[midpoint]! ^ 0xff;
const tampered = buf.toString('base64');
expect(() => unsealClientKey(tampered)).toThrow();
});
it('missing secret: unsealClientKey throws when BETTER_AUTH_SECRET is unset', () => {
const sealed = sealClientKey(TEST_PEM);
delete process.env['BETTER_AUTH_SECRET'];
expect(() => unsealClientKey(sealed)).toThrow('BETTER_AUTH_SECRET is not set');
});
});

View File

@@ -0,0 +1,57 @@
/**
* DTOs for the Step-CA client service (FED-M2-04).
*
* IssueCertRequestDto — input to CaService.issueCert()
* IssuedCertDto — output from CaService.issueCert()
*/
import { IsInt, IsNotEmpty, IsOptional, IsString, IsUUID, Max, Min } from 'class-validator';
export class IssueCertRequestDto {
/**
* PEM-encoded PKCS#10 Certificate Signing Request.
* The CSR must already include the desired SANs.
*/
@IsString()
@IsNotEmpty()
csrPem!: string;
/**
* UUID of the federation_grants row this certificate is being issued for.
* Embedded as the `mosaic_grant_id` custom OID extension.
*/
@IsUUID()
grantId!: string;
/**
* UUID of the local user on whose behalf the cert is being issued.
* Embedded as the `mosaic_subject_user_id` custom OID extension.
*/
@IsUUID()
subjectUserId!: string;
/**
* Requested certificate validity in seconds.
* Hard cap: 900 s (15 minutes). Default: 300 s (5 minutes).
* The service will always clamp to 900 s regardless of this value.
*/
@IsOptional()
@IsInt()
@Min(60)
@Max(15 * 60)
ttlSeconds: number = 300;
}
export class IssuedCertDto {
/** PEM-encoded leaf certificate returned by step-ca. */
certPem!: string;
/**
* PEM-encoded full certificate chain (leaf + intermediates + root).
* Falls back to `certPem` when step-ca returns no `certChain` field.
*/
certChainPem!: string;
/** Decimal serial number string of the issued certificate. */
serialNumber!: string;
}

View File

@@ -0,0 +1,577 @@
/**
* Unit tests for CaService — Step-CA client (FED-M2-04).
*
* Coverage:
* - Happy path: returns IssuedCertDto with certPem, certChainPem, serialNumber
* - certChainPem fallback: falls back to certPem when certChain absent
* - certChainPem from ca field: uses crt+ca when certChain absent but ca present
* - HTTP 401: throws CaServiceError with cause + remediation
* - HTTP non-401 error: throws CaServiceError
* - Malformed CSR: throws before HTTP call (INVALID_CSR)
* - Non-JSON response: throws CaServiceError
* - HTTPS connection error: throws CaServiceError
* - JWT custom claims: mosaic_grant_id and mosaic_subject_user_id present in OTT payload
* verified with jose.jwtVerify (real signature check)
* - CaServiceError: has cause + remediation properties
* - Missing crt in response: throws CaServiceError
* - Real CSR validation: valid P-256 CSR passes; malformed CSR fails with INVALID_CSR
* - provisionerPassword never appears in CaServiceError messages
* - HTTPS-only enforcement: http:// URL throws in constructor
*/
import 'reflect-metadata';
import { describe, it, expect, vi, beforeEach, type Mock } from 'vitest';
import { jwtVerify, exportJWK, generateKeyPair } from 'jose';
import { Pkcs10CertificateRequestGenerator } from '@peculiar/x509';
// ---------------------------------------------------------------------------
// Mock node:https BEFORE importing CaService so the mock is in place when
// the module is loaded. Vitest/ESM require vi.mock at the top level.
// ---------------------------------------------------------------------------
vi.mock('node:https', () => {
const mockRequest = vi.fn();
const mockAgent = vi.fn().mockImplementation(() => ({}));
return {
default: { request: mockRequest, Agent: mockAgent },
request: mockRequest,
Agent: mockAgent,
};
});
vi.mock('node:fs', () => {
const mockReadFileSync = vi
.fn()
.mockReturnValue('-----BEGIN CERTIFICATE-----\nFAKEROOT\n-----END CERTIFICATE-----\n');
return {
default: { readFileSync: mockReadFileSync },
readFileSync: mockReadFileSync,
};
});
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
// Real self-signed EC P-256 certificate generated with openssl for testing.
// openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:P-256 -nodes -keyout /dev/null \
// -out /dev/stdout -subj "/CN=test" -days 1
const FAKE_CERT_PEM = `-----BEGIN CERTIFICATE-----
MIIBdDCCARmgAwIBAgIUM+iUJSayN+PwXkyVN6qwSY7sr6gwCgYIKoZIzj0EAwIw
DzENMAsGA1UEAwwEdGVzdDAeFw0yNjA0MjIwMzE5MTlaFw0yNjA0MjMwMzE5MTla
MA8xDTALBgNVBAMMBHRlc3QwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAR21kHL
n1GmFQ4TEBw3EA53pD+2McIBf5WcoHE+x0eMz5DpRKJe0ksHwOVN5Yev5d57kb+4
MvG1LhbHCB/uQo8So1MwUTAdBgNVHQ4EFgQUPq0pdIGiQ7pLBRXICS8GTliCrLsw
HwYDVR0jBBgwFoAUPq0pdIGiQ7pLBRXICS8GTliCrLswDwYDVR0TAQH/BAUwAwEB
/zAKBggqhkjOPQQDAgNJADBGAiEAypJqyC6S77aQ3eEXokM6sgAsD7Oa3tJbCbVm
zG3uJb0CIQC1w+GE+Ad0OTR5Quja46R1RjOo8ydpzZ7Fh4rouAiwEw==
-----END CERTIFICATE-----
`;
// Use a second copy of the same cert for the CA field in tests.
const FAKE_CA_PEM = FAKE_CERT_PEM;
const GRANT_ID = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';
const SUBJECT_USER_ID = 'b1ffcd00-0d1c-5f09-cc7e-7cc0ce491b22';
// ---------------------------------------------------------------------------
// Generate a real EC P-256 key pair and CSR for integration-style tests
// ---------------------------------------------------------------------------
// We generate this once at module level so it's available to all tests.
// The key pair and CSR PEM are populated asynchronously in the test that needs them.
let realCsrPem: string;
async function generateRealCsr(): Promise<string> {
const { privateKey, publicKey } = await generateKeyPair('ES256');
// Export public key JWK for potential verification (not used here but confirms key is exportable)
await exportJWK(publicKey);
// Use @peculiar/x509 to build a proper CSR
const csr = await Pkcs10CertificateRequestGenerator.create({
name: 'CN=test.federation.local',
signingAlgorithm: { name: 'ECDSA', hash: 'SHA-256' },
keys: { privateKey, publicKey },
});
return csr.toString('pem');
}
// ---------------------------------------------------------------------------
// Setup env before importing service
// We use an EC P-256 key pair here so the JWK-based signing works.
// The key pair is generated once and stored in module-level vars.
// ---------------------------------------------------------------------------
// Real EC P-256 test JWK (test-only, never used in production).
// Generated with node webcrypto for use in unit tests.
const TEST_EC_PRIVATE_JWK = {
key_ops: ['sign'],
ext: true,
kty: 'EC',
x: 'Xq2RjZctcPcUMU14qfjs3MtZTmFk8z1lFGQyypgXZOU',
y: 't8w9Cbt4RVmR47Wnb_i5cLwefEnMcvwse049zu9Rl_E',
crv: 'P-256',
d: 'TM6N79w1HE-PiML5Td4mbXfJaLHEaZrVyVrrwlJv7q8',
kid: 'test-ec-kid',
};
const TEST_EC_PUBLIC_JWK = {
key_ops: ['verify'],
ext: true,
kty: 'EC',
x: 'Xq2RjZctcPcUMU14qfjs3MtZTmFk8z1lFGQyypgXZOU',
y: 't8w9Cbt4RVmR47Wnb_i5cLwefEnMcvwse049zu9Rl_E',
crv: 'P-256',
kid: 'test-ec-kid',
};
process.env['STEP_CA_URL'] = 'https://step-ca:9000';
process.env['STEP_CA_PROVISIONER_KEY_JSON'] = JSON.stringify(TEST_EC_PRIVATE_JWK);
process.env['STEP_CA_ROOT_CERT_PATH'] = '/fake/root.pem';
// Import AFTER env is set and mocks are registered
import * as httpsModule from 'node:https';
import { CaService, CaServiceError } from './ca.service.js';
import type { IssueCertRequestDto } from './ca.dto.js';
// ---------------------------------------------------------------------------
// Helper to build a mock https.request that simulates step-ca
// ---------------------------------------------------------------------------
function makeHttpsMock(statusCode: number, body: unknown, errorMsg?: string): void {
const mockReq = {
write: vi.fn(),
end: vi.fn(),
on: vi.fn(),
setTimeout: vi.fn(),
};
(httpsModule.request as unknown as Mock).mockImplementation(
(
_options: unknown,
callback: (res: {
statusCode: number;
on: (event: string, cb: (chunk?: Buffer) => void) => void;
}) => void,
) => {
const mockRes = {
statusCode,
on: (event: string, cb: (chunk?: Buffer) => void) => {
if (event === 'data') {
if (body !== undefined) {
cb(Buffer.from(typeof body === 'string' ? body : JSON.stringify(body)));
}
}
if (event === 'end') {
cb();
}
},
};
if (errorMsg) {
// Simulate a connection error via the req.on('error') handler
mockReq.on.mockImplementation((event: string, cb: (err: Error) => void) => {
if (event === 'error') {
setImmediate(() => cb(new Error(errorMsg)));
}
});
} else {
// Normal flow: call the response callback
setImmediate(() => callback(mockRes));
}
return mockReq;
},
);
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('CaService', () => {
let service: CaService;
beforeEach(() => {
vi.clearAllMocks();
service = new CaService();
});
function makeReq(overrides: Partial<IssueCertRequestDto> = {}): IssueCertRequestDto {
// Use a real CSR if available; fall back to a minimal placeholder
const defaultCsr = realCsrPem ?? makeFakeCsr();
return {
csrPem: defaultCsr,
grantId: GRANT_ID,
subjectUserId: SUBJECT_USER_ID,
ttlSeconds: 300,
...overrides,
};
}
function makeFakeCsr(): string {
// A structurally valid-looking CSR header/footer (body will fail crypto verify)
return `-----BEGIN CERTIFICATE REQUEST-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0000000000000000AAAA\n-----END CERTIFICATE REQUEST-----\n`;
}
// -------------------------------------------------------------------------
// Real CSR generation — runs once and populates realCsrPem
// -------------------------------------------------------------------------
it('generates a real P-256 CSR that passes validateCsr', async () => {
realCsrPem = await generateRealCsr();
expect(realCsrPem).toMatch(/BEGIN CERTIFICATE REQUEST/);
// Now test that the service's validateCsr accepts it.
// We call it indirectly via issueCert with a successful mock.
makeHttpsMock(200, { crt: FAKE_CERT_PEM, certChain: [FAKE_CERT_PEM, FAKE_CA_PEM] });
const result = await service.issueCert(makeReq({ csrPem: realCsrPem }));
expect(result.certPem).toBe(FAKE_CERT_PEM);
});
it('throws INVALID_CSR for a malformed PEM-shaped CSR', async () => {
const malformedCsr =
'-----BEGIN CERTIFICATE REQUEST-----\nTm90QVJlYWxDU1I=\n-----END CERTIFICATE REQUEST-----\n';
await expect(service.issueCert(makeReq({ csrPem: malformedCsr }))).rejects.toSatisfy(
(err: unknown) => {
if (!(err instanceof CaServiceError)) return false;
expect(err.code).toBe('INVALID_CSR');
return true;
},
);
});
// -------------------------------------------------------------------------
// Happy path
// -------------------------------------------------------------------------
it('returns IssuedCertDto on success (certChain present)', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(200, {
crt: FAKE_CERT_PEM,
certChain: [FAKE_CERT_PEM, FAKE_CA_PEM],
});
const result = await service.issueCert(makeReq());
expect(result.certPem).toBe(FAKE_CERT_PEM);
expect(result.certChainPem).toContain(FAKE_CERT_PEM);
expect(result.certChainPem).toContain(FAKE_CA_PEM);
expect(typeof result.serialNumber).toBe('string');
});
// -------------------------------------------------------------------------
// certChainPem fallback — certChain absent, ca field present
// -------------------------------------------------------------------------
it('builds certChainPem from crt+ca when certChain is absent', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(200, {
crt: FAKE_CERT_PEM,
ca: FAKE_CA_PEM,
});
const result = await service.issueCert(makeReq());
expect(result.certPem).toBe(FAKE_CERT_PEM);
expect(result.certChainPem).toContain(FAKE_CERT_PEM);
expect(result.certChainPem).toContain(FAKE_CA_PEM);
});
// -------------------------------------------------------------------------
// certChainPem fallback — no certChain, no ca field
// -------------------------------------------------------------------------
it('falls back to certPem alone when certChain and ca are absent', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(200, { crt: FAKE_CERT_PEM });
const result = await service.issueCert(makeReq());
expect(result.certPem).toBe(FAKE_CERT_PEM);
expect(result.certChainPem).toBe(FAKE_CERT_PEM);
});
// -------------------------------------------------------------------------
// HTTP 401
// -------------------------------------------------------------------------
it('throws CaServiceError on HTTP 401', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(401, { message: 'Unauthorized' });
await expect(service.issueCert(makeReq())).rejects.toSatisfy((err: unknown) => {
if (!(err instanceof CaServiceError)) return false;
expect(err.message).toMatch(/401/);
expect(err.remediation).toBeTruthy();
return true;
});
});
// -------------------------------------------------------------------------
// HTTP non-401 error (e.g. 422)
// -------------------------------------------------------------------------
it('throws CaServiceError on HTTP 422', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(422, { message: 'Unprocessable Entity' });
await expect(service.issueCert(makeReq())).rejects.toBeInstanceOf(CaServiceError);
});
// -------------------------------------------------------------------------
// Malformed CSR — throws before HTTP call
// -------------------------------------------------------------------------
it('throws CaServiceError for malformed CSR without making HTTP call', async () => {
const requestSpy = vi.spyOn(httpsModule, 'request');
await expect(service.issueCert(makeReq({ csrPem: 'not-a-valid-csr' }))).rejects.toBeInstanceOf(
CaServiceError,
);
expect(requestSpy).not.toHaveBeenCalled();
});
// -------------------------------------------------------------------------
// Non-JSON response
// -------------------------------------------------------------------------
it('throws CaServiceError when step-ca returns non-JSON', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(200, 'this is not json');
await expect(service.issueCert(makeReq())).rejects.toSatisfy((err: unknown) => {
if (!(err instanceof CaServiceError)) return false;
expect(err.message).toMatch(/non-JSON/);
return true;
});
});
// -------------------------------------------------------------------------
// HTTPS connection error
// -------------------------------------------------------------------------
it('throws CaServiceError on HTTPS connection error', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(0, undefined, 'connect ECONNREFUSED 127.0.0.1:9000');
await expect(service.issueCert(makeReq())).rejects.toSatisfy((err: unknown) => {
if (!(err instanceof CaServiceError)) return false;
expect(err.message).toMatch(/HTTPS connection/);
expect(err.cause).toBeInstanceOf(Error);
return true;
});
});
// -------------------------------------------------------------------------
// JWT custom claims: mosaic_grant_id and mosaic_subject_user_id
// Verified with jose.jwtVerify for real signature verification (M6)
// -------------------------------------------------------------------------
it('OTT contains mosaic_grant_id, mosaic_subject_user_id, and jti; signature verifies with jose', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
let capturedBody: Record<string, unknown> | undefined;
const mockReq = {
write: vi.fn((data: string) => {
capturedBody = JSON.parse(data) as Record<string, unknown>;
}),
end: vi.fn(),
on: vi.fn(),
setTimeout: vi.fn(),
};
(httpsModule.request as unknown as Mock).mockImplementation(
(
_options: unknown,
callback: (res: {
statusCode: number;
on: (event: string, cb: (chunk?: Buffer) => void) => void;
}) => void,
) => {
const mockRes = {
statusCode: 200,
on: (event: string, cb: (chunk?: Buffer) => void) => {
if (event === 'data') {
cb(Buffer.from(JSON.stringify({ crt: FAKE_CERT_PEM })));
}
if (event === 'end') {
cb();
}
},
};
setImmediate(() => callback(mockRes));
return mockReq;
},
);
await service.issueCert(makeReq({ csrPem: realCsrPem }));
expect(capturedBody).toBeDefined();
const ott = capturedBody!['ott'] as string;
expect(typeof ott).toBe('string');
// Verify JWT structure
const parts = ott.split('.');
expect(parts).toHaveLength(3);
// Decode payload without signature check first
const payloadJson = Buffer.from(parts[1]!, 'base64url').toString('utf8');
const payload = JSON.parse(payloadJson) as Record<string, unknown>;
expect(payload['mosaic_grant_id']).toBe(GRANT_ID);
expect(payload['mosaic_subject_user_id']).toBe(SUBJECT_USER_ID);
expect(typeof payload['jti']).toBe('string'); // M2: jti present
expect(payload['jti']).toMatch(/^[0-9a-f-]{36}$/); // UUID format
// M3: top-level sha should NOT be present; step.sha should be present
expect(payload['sha']).toBeUndefined();
const step = payload['step'] as Record<string, unknown> | undefined;
expect(step?.['sha']).toBeDefined();
// M6: Verify signature with jose.jwtVerify using the public key
const { importJWK: importJose } = await import('jose');
const publicKey = await importJose(TEST_EC_PUBLIC_JWK, 'ES256');
const verified = await jwtVerify(ott, publicKey);
expect(verified.payload['mosaic_grant_id']).toBe(GRANT_ID);
});
// -------------------------------------------------------------------------
// CaServiceError has cause + remediation
// -------------------------------------------------------------------------
it('CaServiceError carries cause and remediation', () => {
const cause = new Error('original error');
const err = new CaServiceError('something went wrong', 'fix it like this', cause);
expect(err).toBeInstanceOf(Error);
expect(err).toBeInstanceOf(CaServiceError);
expect(err.message).toBe('something went wrong');
expect(err.remediation).toBe('fix it like this');
expect(err.cause).toBe(cause);
expect(err.name).toBe('CaServiceError');
});
// -------------------------------------------------------------------------
// Missing crt in response
// -------------------------------------------------------------------------
it('throws CaServiceError when response is missing the crt field', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(200, { ca: FAKE_CA_PEM });
await expect(service.issueCert(makeReq())).rejects.toSatisfy((err: unknown) => {
if (!(err instanceof CaServiceError)) return false;
expect(err.message).toMatch(/missing the "crt" field/);
return true;
});
});
// -------------------------------------------------------------------------
// M6: provisionerPassword must never appear in CaServiceError messages
// -------------------------------------------------------------------------
it('provisionerPassword does not appear in any CaServiceError message', async () => {
// Temporarily set a recognizable password to test against
const originalPassword = process.env['STEP_CA_PROVISIONER_PASSWORD'];
process.env['STEP_CA_PROVISIONER_PASSWORD'] = 'super-secret-password-12345';
// Generate a bad CSR to trigger an error path
const caughtErrors: CaServiceError[] = [];
try {
await service.issueCert(makeReq({ csrPem: 'not-a-csr' }));
} catch (err) {
if (err instanceof CaServiceError) {
caughtErrors.push(err);
}
}
// Also try HTTP 401 path
if (!realCsrPem) realCsrPem = await generateRealCsr();
makeHttpsMock(401, { message: 'Unauthorized' });
try {
await service.issueCert(makeReq({ csrPem: realCsrPem }));
} catch (err) {
if (err instanceof CaServiceError) {
caughtErrors.push(err);
}
}
for (const err of caughtErrors) {
expect(err.message).not.toContain('super-secret-password-12345');
if (err.remediation) {
expect(err.remediation).not.toContain('super-secret-password-12345');
}
}
process.env['STEP_CA_PROVISIONER_PASSWORD'] = originalPassword;
});
// -------------------------------------------------------------------------
// M7: HTTPS-only enforcement in constructor
// -------------------------------------------------------------------------
it('throws in constructor if STEP_CA_URL uses http://', () => {
const originalUrl = process.env['STEP_CA_URL'];
process.env['STEP_CA_URL'] = 'http://step-ca:9000';
expect(() => new CaService()).toThrow(CaServiceError);
process.env['STEP_CA_URL'] = originalUrl;
});
// -------------------------------------------------------------------------
// TTL clamp: ttlSeconds is clamped to 900 s (15 min) maximum
// -------------------------------------------------------------------------
it('clamps ttlSeconds to 900 s regardless of input', async () => {
if (!realCsrPem) realCsrPem = await generateRealCsr();
let capturedBody: Record<string, unknown> | undefined;
const mockReq = {
write: vi.fn((data: string) => {
capturedBody = JSON.parse(data) as Record<string, unknown>;
}),
end: vi.fn(),
on: vi.fn(),
setTimeout: vi.fn(),
};
(httpsModule.request as unknown as Mock).mockImplementation(
(
_options: unknown,
callback: (res: {
statusCode: number;
on: (event: string, cb: (chunk?: Buffer) => void) => void;
}) => void,
) => {
const mockRes = {
statusCode: 200,
on: (event: string, cb: (chunk?: Buffer) => void) => {
if (event === 'data') {
cb(Buffer.from(JSON.stringify({ crt: FAKE_CERT_PEM })));
}
if (event === 'end') {
cb();
}
},
};
setImmediate(() => callback(mockRes));
return mockReq;
},
);
// Request 86400 s — should be clamped to 900
await service.issueCert(makeReq({ ttlSeconds: 86400 }));
expect(capturedBody).toBeDefined();
const validity = capturedBody!['validity'] as Record<string, unknown>;
expect(validity['duration']).toBe('900s');
});
});

View File

@@ -0,0 +1,635 @@
/**
* CaService — Step-CA client for federation grant certificate issuance.
*
* Responsibilities:
* 1. Build a JWK-provisioner One-Time Token (OTT) signed with the provisioner
* private key (ES256/ES384/RS256 per JWK kty/crv) carrying Mosaic-specific
* claims (`mosaic_grant_id`, `mosaic_subject_user_id`, `step.sha`) per the
* step-ca JWK provisioner protocol.
* 2. POST the CSR + OTT to the step-ca `/1.0/sign` endpoint over HTTPS,
* pinning the trust to the CA root cert supplied via env.
* 3. Return an IssuedCertDto containing the leaf cert, full chain, and
* serial number.
*
* Environment variables (all required at runtime — validated in constructor):
* STEP_CA_URL https://step-ca:9000
* STEP_CA_PROVISIONER_KEY_JSON JWK provisioner private key (JSON)
* STEP_CA_ROOT_CERT_PATH Absolute path to the CA root PEM
*
* Optional (only used for JWK PBES2 decrypt at startup if key is encrypted):
* STEP_CA_PROVISIONER_PASSWORD JWK provisioner password (raw string)
*
* Custom OID registry (PRD §6, docs/federation/SETUP.md):
* 1.3.6.1.4.1.99999.1 — mosaic_grant_id
* 1.3.6.1.4.1.99999.2 — mosaic_subject_user_id
*
* Fail-loud contract:
* Every error path throws CaServiceError with a human-readable `remediation`
* field. Silent OID-stripping is NEVER allowed — if the sign response does
* not include the cert, we throw rather than return a cert that may be
* missing the custom extensions.
*/
import { Injectable, Logger } from '@nestjs/common';
import * as crypto from 'node:crypto';
import * as fs from 'node:fs';
import * as https from 'node:https';
import { SignJWT, importJWK } from 'jose';
import { Pkcs10CertificateRequest } from '@peculiar/x509';
import type { IssueCertRequestDto } from './ca.dto.js';
import { IssuedCertDto } from './ca.dto.js';
// ---------------------------------------------------------------------------
// Custom error class
// ---------------------------------------------------------------------------
export class CaServiceError extends Error {
readonly cause: unknown;
readonly remediation: string;
readonly code?: string;
constructor(message: string, remediation: string, cause?: unknown, code?: string) {
super(message);
this.name = 'CaServiceError';
this.cause = cause;
this.remediation = remediation;
this.code = code;
}
}
// ---------------------------------------------------------------------------
// Internal types
// ---------------------------------------------------------------------------
interface StepSignResponse {
crt: string;
ca?: string;
certChain?: string[];
}
interface JwkKey {
kty: string;
kid?: string;
use?: string;
alg?: string;
k?: string; // symmetric
n?: string; // RSA
e?: string;
d?: string;
x?: string; // EC
y?: string;
crv?: string;
[key: string]: unknown;
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/** UUID regex for validation */
const UUID_RE = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
/**
* Derive the JWT algorithm string from a JWK's kty/crv fields.
* EC P-256 → ES256, EC P-384 → ES384, RSA → RS256.
*/
function algFromJwk(jwk: JwkKey): string {
if (jwk.alg) return jwk.alg;
if (jwk.kty === 'EC') {
if (jwk.crv === 'P-384') return 'ES384';
return 'ES256'; // default for P-256 and Ed25519-style EC keys
}
if (jwk.kty === 'RSA') return 'RS256';
throw new CaServiceError(
`Unsupported JWK kty: ${jwk.kty}`,
'STEP_CA_PROVISIONER_KEY_JSON must be an EC (P-256/P-384) or RSA JWK private key.',
);
}
/**
* Compute SHA-256 fingerprint of the DER-encoded CSR body.
* step-ca uses this as the `step.sha` claim to bind the OTT to a specific CSR.
*/
function csrFingerprint(csrPem: string): string {
// Strip PEM headers and decode base64 body
const b64 = csrPem
.replace(/-----BEGIN CERTIFICATE REQUEST-----/, '')
.replace(/-----END CERTIFICATE REQUEST-----/, '')
.replace(/\s+/g, '');
let derBuf: Buffer;
try {
derBuf = Buffer.from(b64, 'base64');
} catch (err) {
throw new CaServiceError(
'Failed to base64-decode the CSR PEM body',
'Verify that csrPem is a valid PKCS#10 PEM-encoded certificate request.',
err,
);
}
if (derBuf.length === 0) {
throw new CaServiceError(
'CSR PEM decoded to empty buffer — malformed input',
'Provide a valid non-empty PKCS#10 PEM-encoded certificate request.',
);
}
return crypto.createHash('sha256').update(derBuf).digest('hex');
}
/**
* Send a JSON POST to the step-ca sign endpoint.
* Returns the parsed response body or throws CaServiceError.
*/
function httpsPost(url: string, body: unknown, agent: https.Agent): Promise<StepSignResponse> {
return new Promise((resolve, reject) => {
const bodyStr = JSON.stringify(body);
const parsed = new URL(url);
const options: https.RequestOptions = {
hostname: parsed.hostname,
port: parsed.port ? parseInt(parsed.port, 10) : 443,
path: parsed.pathname,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(bodyStr),
},
agent,
timeout: 5000,
};
const req = https.request(options, (res) => {
const chunks: Buffer[] = [];
res.on('data', (chunk: Buffer) => chunks.push(chunk));
res.on('end', () => {
const raw = Buffer.concat(chunks).toString('utf8');
if (res.statusCode === 401) {
reject(
new CaServiceError(
`step-ca returned HTTP 401 — invalid or expired OTT`,
'Check STEP_CA_PROVISIONER_KEY_JSON. Ensure the mosaic-fed provisioner is configured in the CA.',
),
);
return;
}
if (res.statusCode && res.statusCode >= 400) {
reject(
new CaServiceError(
`step-ca returned HTTP ${res.statusCode}: ${raw.slice(0, 256)}`,
`Review the step-ca logs. Status ${res.statusCode} may indicate a CSR policy violation or misconfigured provisioner.`,
),
);
return;
}
let parsed: unknown;
try {
parsed = JSON.parse(raw) as unknown;
} catch (err) {
reject(
new CaServiceError(
'step-ca returned a non-JSON response',
'Verify STEP_CA_URL points to a running step-ca instance and that TLS is properly configured.',
err,
),
);
return;
}
resolve(parsed as StepSignResponse);
});
});
req.setTimeout(5000, () => {
req.destroy(new Error('Request timed out after 5000ms'));
});
req.on('error', (err: Error) => {
reject(
new CaServiceError(
`HTTPS connection to step-ca failed: ${err.message}`,
'Ensure STEP_CA_URL is reachable and STEP_CA_ROOT_CERT_PATH points to the correct CA root certificate.',
err,
),
);
});
req.write(bodyStr);
req.end();
});
}
/**
* Extract a decimal serial number from a PEM certificate.
* Throws CaServiceError on failure — never silently returns 'unknown'.
*/
function extractSerial(certPem: string): string {
let cert: crypto.X509Certificate;
try {
cert = new crypto.X509Certificate(certPem);
} catch (err) {
throw new CaServiceError(
'Failed to parse the issued certificate PEM',
'The certificate returned by step-ca could not be parsed. Check that step-ca is returning a valid PEM certificate.',
err,
'CERT_PARSE',
);
}
return cert.serialNumber;
}
// ---------------------------------------------------------------------------
// Service
// ---------------------------------------------------------------------------
@Injectable()
export class CaService {
private readonly logger = new Logger(CaService.name);
private readonly caUrl: string;
private readonly rootCertPath: string;
private readonly httpsAgent: https.Agent;
private readonly jwk: JwkKey;
private cachedPrivateKey: crypto.KeyObject | null = null;
private readonly jwtAlg: string;
private readonly kid: string;
constructor() {
const caUrl = process.env['STEP_CA_URL'];
const provisionerKeyJson = process.env['STEP_CA_PROVISIONER_KEY_JSON'];
const rootCertPath = process.env['STEP_CA_ROOT_CERT_PATH'];
if (!caUrl) {
throw new CaServiceError(
'STEP_CA_URL is not set',
'Set STEP_CA_URL to the base URL of the step-ca instance, e.g. https://step-ca:9000',
);
}
// Enforce HTTPS-only URL
let parsedUrl: URL;
try {
parsedUrl = new URL(caUrl);
} catch (err) {
throw new CaServiceError(
`STEP_CA_URL is not a valid URL: ${caUrl}`,
'Set STEP_CA_URL to a valid HTTPS URL, e.g. https://step-ca:9000',
err,
);
}
if (parsedUrl.protocol !== 'https:') {
throw new CaServiceError(
`STEP_CA_URL must use HTTPS — got: ${parsedUrl.protocol}`,
'Set STEP_CA_URL to an https:// URL. Unencrypted connections to the CA are not permitted.',
);
}
if (!provisionerKeyJson) {
throw new CaServiceError(
'STEP_CA_PROVISIONER_KEY_JSON is not set',
'Set STEP_CA_PROVISIONER_KEY_JSON to the JSON-encoded JWK for the mosaic-fed provisioner.',
);
}
if (!rootCertPath) {
throw new CaServiceError(
'STEP_CA_ROOT_CERT_PATH is not set',
'Set STEP_CA_ROOT_CERT_PATH to the absolute path of the step-ca root CA certificate PEM file.',
);
}
// Parse JWK once — do NOT store the raw JSON string as a class field
let jwk: JwkKey;
try {
jwk = JSON.parse(provisionerKeyJson) as JwkKey;
} catch (err) {
throw new CaServiceError(
'STEP_CA_PROVISIONER_KEY_JSON is not valid JSON',
'Set STEP_CA_PROVISIONER_KEY_JSON to the JSON-serialised JWK object for the mosaic-fed provisioner.',
err,
);
}
// Derive algorithm from JWK metadata
const jwtAlg = algFromJwk(jwk);
const kid = jwk.kid ?? 'mosaic-fed';
// Import the JWK into a native KeyObject — fail loudly if it cannot be loaded.
// We do this synchronously here by calling the async importJWK via a blocking workaround.
// Actually importJWK is async, so we store it for use during token building.
// We keep the raw jwk object for later async import inside buildOtt.
// NOTE: We do NOT store provisionerKeyJson string as a class field.
this.jwk = jwk;
this.jwtAlg = jwtAlg;
this.kid = kid;
this.caUrl = caUrl;
this.rootCertPath = rootCertPath;
// Read the root cert and pin it for all HTTPS connections.
let rootCert: string;
try {
rootCert = fs.readFileSync(this.rootCertPath, 'utf8');
} catch (err) {
throw new CaServiceError(
`Cannot read STEP_CA_ROOT_CERT_PATH: ${rootCertPath}`,
'Ensure the file exists and is readable by the gateway process.',
err,
);
}
this.httpsAgent = new https.Agent({
ca: rootCert,
rejectUnauthorized: true,
});
this.logger.log(`CaService initialised — CA URL: ${this.caUrl}`);
}
/**
* Lazily import the private key from JWK on first use.
* The key is cached in cachedPrivateKey after first import.
*/
private async getPrivateKey(): Promise<crypto.KeyObject> {
if (this.cachedPrivateKey !== null) return this.cachedPrivateKey;
try {
const key = await importJWK(this.jwk, this.jwtAlg);
// importJWK returns KeyLike (crypto.KeyObject | Uint8Array) — in Node.js it's KeyObject
this.cachedPrivateKey = key as unknown as crypto.KeyObject;
return this.cachedPrivateKey;
} catch (err) {
throw new CaServiceError(
'Failed to import STEP_CA_PROVISIONER_KEY_JSON as a cryptographic key',
'Ensure STEP_CA_PROVISIONER_KEY_JSON contains a valid JWK private key (EC P-256/P-384 or RSA).',
err,
);
}
}
/**
* Build the JWK-provisioner OTT signed with the provisioner private key.
* Algorithm is derived from the JWK kty/crv fields.
*/
private async buildOtt(params: {
csrPem: string;
grantId: string;
subjectUserId: string;
ttlSeconds: number;
csrCn: string;
}): Promise<string> {
const { csrPem, grantId, subjectUserId, ttlSeconds, csrCn } = params;
// Validate UUID shape for grant id and subject user id
if (!UUID_RE.test(grantId)) {
throw new CaServiceError(
`grantId is not a valid UUID: ${grantId}`,
'Provide a valid UUID (RFC 4122) for grantId.',
undefined,
'INVALID_GRANT_ID',
);
}
if (!UUID_RE.test(subjectUserId)) {
throw new CaServiceError(
`subjectUserId is not a valid UUID: ${subjectUserId}`,
'Provide a valid UUID (RFC 4122) for subjectUserId.',
undefined,
'INVALID_GRANT_ID',
);
}
const sha = csrFingerprint(csrPem);
const now = Math.floor(Date.now() / 1000);
const privateKey = await this.getPrivateKey();
const ott = await new SignJWT({
iss: this.kid,
sub: csrCn, // M1: set sub to identity from CSR CN
aud: [`${this.caUrl}/1.0/sign`],
iat: now,
nbf: now - 30, // 30 s clock-skew tolerance
exp: now + Math.min(ttlSeconds, 3600), // OTT validity ≤ 1 h
jti: crypto.randomUUID(), // M2: unique token ID
// step.sha is the canonical field name used in the template — M3: keep only step.sha
step: { sha },
// Mosaic custom claims consumed by federation.tpl
mosaic_grant_id: grantId,
mosaic_subject_user_id: subjectUserId,
})
.setProtectedHeader({ alg: this.jwtAlg, typ: 'JWT', kid: this.kid })
.sign(privateKey);
return ott;
}
/**
* Validate a PEM-encoded CSR using @peculiar/x509.
* Verifies the self-signature, key type/size, and signature algorithm.
* Optionally verifies that the CSR's SANs match the expected set.
*
* Throws CaServiceError with code 'INVALID_CSR' on failure.
*/
private async validateCsr(pem: string, expectedSans?: string[]): Promise<string> {
let csr: Pkcs10CertificateRequest;
try {
csr = new Pkcs10CertificateRequest(pem);
} catch (err) {
throw new CaServiceError(
'Failed to parse CSR PEM as a valid PKCS#10 certificate request',
'Provide a valid PEM-encoded PKCS#10 CSR.',
err,
'INVALID_CSR',
);
}
// Verify self-signature
let valid: boolean;
try {
valid = await csr.verify();
} catch (err) {
throw new CaServiceError(
'CSR signature verification threw an error',
'The CSR self-signature could not be verified. Ensure the CSR is properly formed.',
err,
'INVALID_CSR',
);
}
if (!valid) {
throw new CaServiceError(
'CSR self-signature is invalid',
'The CSR must be self-signed with the corresponding private key.',
undefined,
'INVALID_CSR',
);
}
// Validate signature algorithm — reject MD5 and SHA-1
// signatureAlgorithm is HashedAlgorithm which extends Algorithm.
// Cast through unknown to access .name and .hash.name without DOM lib globals.
const sigAlgAny = csr.signatureAlgorithm as unknown as {
name?: string;
hash?: { name?: string };
};
const sigAlgName = (sigAlgAny.name ?? '').toLowerCase();
const hashName = (sigAlgAny.hash?.name ?? '').toLowerCase();
if (
sigAlgName.includes('md5') ||
sigAlgName.includes('sha1') ||
hashName === 'sha-1' ||
hashName === 'sha1'
) {
throw new CaServiceError(
`CSR uses a forbidden signature algorithm: ${sigAlgAny.name ?? 'unknown'}`,
'Use SHA-256 or stronger. MD5 and SHA-1 are not permitted.',
undefined,
'INVALID_CSR',
);
}
// Validate public key algorithm and strength via the algorithm descriptor on the key.
// csr.publicKey.algorithm is type Algorithm (WebCrypto) — use name-based checks.
// We cast to an extended interface to access curve/modulus info without DOM globals.
const pubKeyAlgo = csr.publicKey.algorithm as {
name: string;
namedCurve?: string;
modulusLength?: number;
};
const keyAlgoName = pubKeyAlgo.name;
if (keyAlgoName === 'RSASSA-PKCS1-v1_5' || keyAlgoName === 'RSA-PSS') {
const modulusLength = pubKeyAlgo.modulusLength ?? 0;
if (modulusLength < 2048) {
throw new CaServiceError(
`CSR RSA key is too short: ${modulusLength} bits (minimum 2048)`,
'Use an RSA key of at least 2048 bits.',
undefined,
'INVALID_CSR',
);
}
} else if (keyAlgoName === 'ECDSA') {
const namedCurve = pubKeyAlgo.namedCurve ?? '';
const allowedCurves = new Set(['P-256', 'P-384']);
if (!allowedCurves.has(namedCurve)) {
throw new CaServiceError(
`CSR EC key uses disallowed curve: ${namedCurve}`,
'Use EC P-256 or P-384. Other curves are not permitted.',
undefined,
'INVALID_CSR',
);
}
} else if (keyAlgoName === 'Ed25519') {
// Ed25519 is explicitly allowed
} else {
throw new CaServiceError(
`CSR uses unsupported key algorithm: ${keyAlgoName}`,
'Use EC (P-256/P-384), Ed25519, or RSA (≥2048 bit) keys.',
undefined,
'INVALID_CSR',
);
}
// Extract SANs if expectedSans provided
if (expectedSans && expectedSans.length > 0) {
// Get SANs from CSR extensions
const sanExtension = csr.extensions?.find(
(ext) => ext.type === '2.5.29.17', // Subject Alternative Name OID
);
const csrSans: string[] = [];
if (sanExtension) {
// Parse the raw SAN extension — store as stringified for comparison
// @peculiar/x509 exposes SANs through the parsed extension
const sanExt = sanExtension as { names?: Array<{ type: string; value: string }> };
if (sanExt.names) {
for (const name of sanExt.names) {
csrSans.push(name.value);
}
}
}
const csrSanSet = new Set(csrSans);
const expectedSanSet = new Set(expectedSans);
const missing = expectedSans.filter((s) => !csrSanSet.has(s));
const extra = csrSans.filter((s) => !expectedSanSet.has(s));
if (missing.length > 0 || extra.length > 0) {
throw new CaServiceError(
`CSR SANs do not match expected set. Missing: [${missing.join(', ')}], Extra: [${extra.join(', ')}]`,
'The CSR must include exactly the SANs specified in the issuance request.',
undefined,
'INVALID_CSR',
);
}
}
// Return the CN from the CSR subject for use as JWT sub
const cn = csr.subjectName.getField('CN')?.[0] ?? '';
return cn;
}
/**
* Submit a CSR to step-ca and return the issued certificate.
*
* Throws `CaServiceError` on any failure (network, auth, malformed input).
* Never silently swallows errors — fail-loud is a hard contract per M2-02 review.
*/
async issueCert(req: IssueCertRequestDto): Promise<IssuedCertDto> {
// Clamp TTL to 15-minute maximum (H2)
const ttl = Math.min(req.ttlSeconds ?? 300, 900);
this.logger.debug(
`issueCert — grantId=${req.grantId} subjectUserId=${req.subjectUserId} ttl=${ttl}s`,
);
// Validate CSR — real cryptographic validation (H3)
const csrCn = await this.validateCsr(req.csrPem);
const ott = await this.buildOtt({
csrPem: req.csrPem,
grantId: req.grantId,
subjectUserId: req.subjectUserId,
ttlSeconds: ttl,
csrCn,
});
const signUrl = `${this.caUrl}/1.0/sign`;
const requestBody = {
csr: req.csrPem,
ott,
validity: {
duration: `${ttl}s`,
},
};
this.logger.debug(`Posting CSR to ${signUrl}`);
const response = await httpsPost(signUrl, requestBody, this.httpsAgent);
if (!response.crt) {
throw new CaServiceError(
'step-ca sign response missing the "crt" field',
'This is unexpected — the step-ca instance may be misconfigured or running an incompatible version.',
);
}
// Build certChainPem: prefer certChain array, fall back to ca field, fall back to crt alone.
let certChainPem: string;
if (response.certChain && response.certChain.length > 0) {
certChainPem = response.certChain.join('\n');
} else if (response.ca) {
certChainPem = response.crt + '\n' + response.ca;
} else {
certChainPem = response.crt;
}
const serialNumber = extractSerial(response.crt);
this.logger.log(`Certificate issued — serial=${serialNumber} grantId=${req.grantId}`);
const result = new IssuedCertDto();
result.certPem = response.crt;
result.certChainPem = certChainPem;
result.serialNumber = serialNumber;
return result;
}
}

View File

@@ -0,0 +1,54 @@
/**
* EnrollmentController — federation enrollment HTTP layer (FED-M2-07).
*
* Routes:
* POST /api/federation/enrollment/tokens — admin creates a single-use token
* POST /api/federation/enrollment/:token — unauthenticated; token IS the auth
*/
import {
Body,
Controller,
HttpCode,
HttpStatus,
Inject,
Param,
Post,
UseGuards,
} from '@nestjs/common';
import { AdminGuard } from '../admin/admin.guard.js';
import { EnrollmentService } from './enrollment.service.js';
import { CreateEnrollmentTokenDto, RedeemEnrollmentTokenDto } from './enrollment.dto.js';
@Controller('api/federation/enrollment')
export class EnrollmentController {
constructor(@Inject(EnrollmentService) private readonly enrollmentService: EnrollmentService) {}
/**
* Admin-only: generate a single-use enrollment token for a pending grant.
* The token should be distributed out-of-band to the remote peer operator.
*
* POST /api/federation/enrollment/tokens
*/
@Post('tokens')
@UseGuards(AdminGuard)
@HttpCode(HttpStatus.CREATED)
async createToken(@Body() dto: CreateEnrollmentTokenDto) {
return this.enrollmentService.createToken(dto);
}
/**
* Unauthenticated: remote peer redeems a token by submitting its CSR.
* The token itself is the credential — no session or bearer token required.
*
* POST /api/federation/enrollment/:token
*
* Returns the signed leaf cert and full chain PEM on success.
* Returns 410 Gone if the token was already used or has expired.
*/
@Post(':token')
@HttpCode(HttpStatus.OK)
async redeem(@Param('token') token: string, @Body() dto: RedeemEnrollmentTokenDto) {
return this.enrollmentService.redeem(token, dto.csrPem);
}
}

View File

@@ -0,0 +1,35 @@
/**
* DTOs for the federation enrollment flow (FED-M2-07).
*
* CreateEnrollmentTokenDto — admin generates a single-use enrollment token
* RedeemEnrollmentTokenDto — remote peer submits CSR to redeem the token
*/
import { IsInt, IsNotEmpty, IsOptional, IsString, IsUUID, Max, Min } from 'class-validator';
export class CreateEnrollmentTokenDto {
/** UUID of the federation grant this token will activate on redemption. */
@IsUUID()
grantId!: string;
/** UUID of the peer record that will receive the issued cert on redemption. */
@IsUUID()
peerId!: string;
/**
* Token lifetime in seconds. Default 900 (15 min). Min 60. Max 900.
* After this time the token is rejected even if unused.
*/
@IsOptional()
@IsInt()
@Min(60)
@Max(900)
ttlSeconds: number = 900;
}
export class RedeemEnrollmentTokenDto {
/** PEM-encoded PKCS#10 Certificate Signing Request from the remote peer. */
@IsString()
@IsNotEmpty()
csrPem!: string;
}

View File

@@ -0,0 +1,230 @@
/**
* EnrollmentService — single-use enrollment token lifecycle (FED-M2-07).
*
* Responsibilities:
* 1. Generate time-limited single-use enrollment tokens (admin action).
* 2. Redeem a token: validate → atomically claim token → issue cert via
* CaService → transactionally activate grant + update peer + write audit.
*
* Replay protection: the token is claimed (UPDATE WHERE used_at IS NULL) BEFORE
* cert issuance. This prevents double cert minting on concurrent requests.
* If cert issuance fails after claim, the token is consumed and the grant
* stays pending — admin must create a new grant.
*/
import {
BadRequestException,
GoneException,
Inject,
Injectable,
Logger,
NotFoundException,
} from '@nestjs/common';
import * as crypto from 'node:crypto';
// X509Certificate is available as a named export in Node.js ≥ 15.6
const { X509Certificate } = crypto;
import {
type Db,
and,
eq,
isNull,
sql,
federationEnrollmentTokens,
federationGrants,
federationPeers,
federationAuditLog,
} from '@mosaicstack/db';
import { DB } from '../database/database.module.js';
import { CaService } from './ca.service.js';
import { GrantsService } from './grants.service.js';
import { FederationScopeError } from './scope-schema.js';
import type { CreateEnrollmentTokenDto } from './enrollment.dto.js';
export interface EnrollmentTokenResult {
token: string;
expiresAt: string;
}
export interface RedeemResult {
certPem: string;
certChainPem: string;
}
@Injectable()
export class EnrollmentService {
private readonly logger = new Logger(EnrollmentService.name);
constructor(
@Inject(DB) private readonly db: Db,
private readonly caService: CaService,
private readonly grantsService: GrantsService,
) {}
/**
* Generate a single-use enrollment token for an admin to distribute
* out-of-band to the remote peer operator.
*/
async createToken(dto: CreateEnrollmentTokenDto): Promise<EnrollmentTokenResult> {
const ttl = Math.min(dto.ttlSeconds, 900);
const token = crypto.randomBytes(32).toString('hex');
const expiresAt = new Date(Date.now() + ttl * 1000);
await this.db.insert(federationEnrollmentTokens).values({
token,
grantId: dto.grantId,
peerId: dto.peerId,
expiresAt,
});
this.logger.log(
`Enrollment token created — grantId=${dto.grantId} peerId=${dto.peerId} expiresAt=${expiresAt.toISOString()}`,
);
return { token, expiresAt: expiresAt.toISOString() };
}
/**
* Redeem an enrollment token.
*
* Full flow:
* 1. Fetch token row — NotFoundException if not found
* 2. usedAt set → GoneException (already used)
* 3. expiresAt < now → GoneException (expired)
* 4. Load grant — verify status is 'pending'
* 5. Atomically claim token (UPDATE WHERE used_at IS NULL RETURNING token)
* — if no rows returned, concurrent request won → GoneException
* 6. Issue cert via CaService (network call, outside transaction)
* — if this fails, token is consumed; grant stays pending; admin must recreate
* 7. Transaction: activate grant + update peer record + write audit log
* 8. Return { certPem, certChainPem }
*/
async redeem(token: string, csrPem: string): Promise<RedeemResult> {
// 1. Fetch token row
const [row] = await this.db
.select()
.from(federationEnrollmentTokens)
.where(eq(federationEnrollmentTokens.token, token))
.limit(1);
if (!row) {
throw new NotFoundException('Enrollment token not found');
}
// 2. Already used?
if (row.usedAt !== null) {
throw new GoneException('Enrollment token has already been used');
}
// 3. Expired?
if (row.expiresAt < new Date()) {
throw new GoneException('Enrollment token has expired');
}
// 4. Load grant and verify it is still pending
let grant;
try {
grant = await this.grantsService.getGrant(row.grantId);
} catch (err) {
if (err instanceof FederationScopeError) {
throw new BadRequestException(err.message);
}
throw err;
}
if (grant.status !== 'pending') {
throw new GoneException(
`Grant ${row.grantId} is no longer pending (status: ${grant.status})`,
);
}
// 5. Atomically claim the token BEFORE cert issuance to prevent double-minting.
// WHERE used_at IS NULL ensures only one concurrent request wins.
// Using .returning() works on both node-postgres and PGlite without rowCount inspection.
const claimed = await this.db
.update(federationEnrollmentTokens)
.set({ usedAt: sql`NOW()` })
.where(
and(eq(federationEnrollmentTokens.token, token), isNull(federationEnrollmentTokens.usedAt)),
)
.returning({ token: federationEnrollmentTokens.token });
if (claimed.length === 0) {
throw new GoneException('Enrollment token has already been used (concurrent request)');
}
// 6. Issue certificate via CaService (network call — outside any transaction).
// If this throws, the token is already consumed. The grant stays pending.
// Admin must revoke the grant and create a new one.
let issued;
try {
issued = await this.caService.issueCert({
csrPem,
grantId: row.grantId,
subjectUserId: grant.subjectUserId,
ttlSeconds: 300,
});
} catch (err) {
this.logger.error(
`issueCert failed after token ${token} was claimed — grant ${row.grantId} is stranded pending`,
err instanceof Error ? err.stack : String(err),
);
if (err instanceof FederationScopeError) {
throw new BadRequestException((err as Error).message);
}
throw err;
}
// 7. Atomically activate grant, update peer record, and write audit log.
const certNotAfter = this.extractCertNotAfter(issued.certPem);
await this.db.transaction(async (tx) => {
await tx
.update(federationGrants)
.set({ status: 'active' })
.where(eq(federationGrants.id, row.grantId));
await tx
.update(federationPeers)
.set({
certPem: issued.certPem,
certSerial: issued.serialNumber,
certNotAfter,
state: 'active',
})
.where(eq(federationPeers.id, row.peerId));
await tx.insert(federationAuditLog).values({
requestId: crypto.randomUUID(),
peerId: row.peerId,
grantId: row.grantId,
verb: 'enrollment',
resource: 'federation_grant',
statusCode: 200,
outcome: 'allowed',
});
});
this.logger.log(
`Enrollment complete — peerId=${row.peerId} grantId=${row.grantId} serial=${issued.serialNumber}`,
);
// 8. Return cert material
return {
certPem: issued.certPem,
certChainPem: issued.certChainPem,
};
}
/**
* Extract the notAfter date from a PEM certificate.
* Falls back to 90 days from now if parsing fails.
*/
private extractCertNotAfter(certPem: string): Date {
try {
const cert = new X509Certificate(certPem);
return new Date(cert.validTo);
} catch {
// Fallback: 90 days from now
return new Date(Date.now() + 90 * 24 * 60 * 60 * 1000);
}
}
}

View File

@@ -0,0 +1,39 @@
/**
* DTOs for the federation admin controller (FED-M2-08).
*/
import { IsInt, IsNotEmpty, IsOptional, IsString, IsUrl, Max, Min } from 'class-validator';
export class CreatePeerKeypairDto {
@IsString()
@IsNotEmpty()
commonName!: string;
@IsString()
@IsNotEmpty()
displayName!: string;
@IsOptional()
@IsUrl()
endpointUrl?: string;
}
export class StorePeerCertDto {
@IsString()
@IsNotEmpty()
certPem!: string;
}
export class GenerateEnrollmentTokenDto {
@IsOptional()
@IsInt()
@Min(60)
@Max(900)
ttlSeconds: number = 900;
}
export class RevokeGrantBodyDto {
@IsOptional()
@IsString()
reason?: string;
}

View File

@@ -0,0 +1,266 @@
/**
* FederationController — admin REST API for federation management (FED-M2-08).
*
* Routes (all under /api/admin/federation, all require AdminGuard):
*
* Grant management:
* POST /api/admin/federation/grants
* GET /api/admin/federation/grants
* GET /api/admin/federation/grants/:id
* PATCH /api/admin/federation/grants/:id/revoke
* POST /api/admin/federation/grants/:id/tokens
*
* Peer management:
* GET /api/admin/federation/peers
* POST /api/admin/federation/peers/keypair
* PATCH /api/admin/federation/peers/:id/cert
*
* NOTE: The enrollment REDEMPTION endpoint (POST /api/federation/enrollment/:token)
* is handled by EnrollmentController — not duplicated here.
*/
import {
Body,
Controller,
Get,
HttpCode,
HttpStatus,
Inject,
NotFoundException,
Param,
Patch,
Post,
Query,
UseGuards,
} from '@nestjs/common';
import { webcrypto } from 'node:crypto';
import { X509Certificate } from 'node:crypto';
import { Pkcs10CertificateRequestGenerator } from '@peculiar/x509';
import { type Db, eq, federationPeers } from '@mosaicstack/db';
import { DB } from '../database/database.module.js';
import { AdminGuard } from '../admin/admin.guard.js';
import { GrantsService } from './grants.service.js';
import { EnrollmentService } from './enrollment.service.js';
import { sealClientKey } from './peer-key.util.js';
import { CreateGrantDto, ListGrantsDto } from './grants.dto.js';
import {
CreatePeerKeypairDto,
GenerateEnrollmentTokenDto,
RevokeGrantBodyDto,
StorePeerCertDto,
} from './federation-admin.dto.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/**
* Convert an ArrayBuffer to a Base64 string (for PEM encoding).
*/
function arrayBufferToBase64(buf: ArrayBuffer): string {
const bytes = new Uint8Array(buf);
let binary = '';
for (const b of bytes) {
binary += String.fromCharCode(b);
}
return Buffer.from(binary, 'binary').toString('base64');
}
/**
* Wrap a Base64 string in PEM armour.
*/
function toPem(label: string, b64: string): string {
const lines = b64.match(/.{1,64}/g) ?? [];
return `-----BEGIN ${label}-----\n${lines.join('\n')}\n-----END ${label}-----\n`;
}
// ---------------------------------------------------------------------------
// Controller
// ---------------------------------------------------------------------------
@Controller('api/admin/federation')
@UseGuards(AdminGuard)
export class FederationController {
constructor(
@Inject(DB) private readonly db: Db,
@Inject(GrantsService) private readonly grantsService: GrantsService,
@Inject(EnrollmentService) private readonly enrollmentService: EnrollmentService,
) {}
// ─── Grant management ────────────────────────────────────────────────────
/**
* POST /api/admin/federation/grants
* Create a new grant in pending state.
*/
@Post('grants')
@HttpCode(HttpStatus.CREATED)
async createGrant(@Body() body: CreateGrantDto) {
return this.grantsService.createGrant(body);
}
/**
* GET /api/admin/federation/grants
* List grants with optional filters.
*/
@Get('grants')
async listGrants(@Query() query: ListGrantsDto) {
return this.grantsService.listGrants(query);
}
/**
* GET /api/admin/federation/grants/:id
* Get a single grant by ID.
*/
@Get('grants/:id')
async getGrant(@Param('id') id: string) {
return this.grantsService.getGrant(id);
}
/**
* PATCH /api/admin/federation/grants/:id/revoke
* Revoke an active grant.
*/
@Patch('grants/:id/revoke')
async revokeGrant(@Param('id') id: string, @Body() body: RevokeGrantBodyDto) {
return this.grantsService.revokeGrant(id, body.reason);
}
/**
* POST /api/admin/federation/grants/:id/tokens
* Generate a single-use enrollment token for a pending grant.
* Returns the token plus an enrollmentUrl the operator shares out-of-band.
*/
@Post('grants/:id/tokens')
@HttpCode(HttpStatus.CREATED)
async generateToken(@Param('id') id: string, @Body() body: GenerateEnrollmentTokenDto) {
const grant = await this.grantsService.getGrant(id);
const result = await this.enrollmentService.createToken({
grantId: id,
peerId: grant.peerId,
ttlSeconds: body.ttlSeconds ?? 900,
});
const baseUrl = process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242';
const enrollmentUrl = `${baseUrl}/api/federation/enrollment/${result.token}`;
return {
token: result.token,
expiresAt: result.expiresAt,
enrollmentUrl,
};
}
// ─── Peer management ─────────────────────────────────────────────────────
/**
* GET /api/admin/federation/peers
* List all federation peer rows.
*/
@Get('peers')
async listPeers() {
return this.db.select().from(federationPeers).orderBy(federationPeers.commonName);
}
/**
* POST /api/admin/federation/peers/keypair
* Generate a new peer entry with EC P-256 key pair and a PKCS#10 CSR.
*
* Flow:
* 1. Generate EC P-256 key pair via webcrypto
* 2. Generate a self-signed CSR via @peculiar/x509
* 3. Export private key as PEM
* 4. sealClientKey(privatePem) → sealed blob
* 5. Insert pending peer row
* 6. Return { peerId, csrPem }
*/
@Post('peers/keypair')
@HttpCode(HttpStatus.CREATED)
async createPeerKeypair(@Body() body: CreatePeerKeypairDto) {
// 1. Generate EC P-256 key pair via Web Crypto
const keyPair = await webcrypto.subtle.generateKey(
{ name: 'ECDSA', namedCurve: 'P-256' },
true, // extractable
['sign', 'verify'],
);
// 2. Generate PKCS#10 CSR
const csr = await Pkcs10CertificateRequestGenerator.create({
name: `CN=${body.commonName}`,
keys: keyPair,
signingAlgorithm: { name: 'ECDSA', hash: 'SHA-256' },
});
const csrPem = csr.toString('pem');
// 3. Export private key as PKCS#8 PEM
const pkcs8Der = await webcrypto.subtle.exportKey('pkcs8', keyPair.privateKey);
const privatePem = toPem('PRIVATE KEY', arrayBufferToBase64(pkcs8Der));
// 4. Seal the private key
const sealed = sealClientKey(privatePem);
// 5. Insert pending peer row
const [peer] = await this.db
.insert(federationPeers)
.values({
commonName: body.commonName,
displayName: body.displayName,
certPem: '',
certSerial: 'pending',
certNotAfter: new Date(0),
clientKeyPem: sealed,
state: 'pending',
endpointUrl: body.endpointUrl,
})
.returning();
return {
peerId: peer!.id,
csrPem,
};
}
/**
* PATCH /api/admin/federation/peers/:id/cert
* Store a signed certificate after enrollment completes.
*
* Flow:
* 1. Parse the cert to extract serial and notAfter
* 2. Update the peer row with cert data + state='active'
* 3. Return the updated peer row
*/
@Patch('peers/:id/cert')
async storePeerCert(@Param('id') id: string, @Body() body: StorePeerCertDto) {
// Ensure peer exists
const [existing] = await this.db
.select({ id: federationPeers.id })
.from(federationPeers)
.where(eq(federationPeers.id, id))
.limit(1);
if (!existing) {
throw new NotFoundException(`Peer ${id} not found`);
}
// 1. Parse cert
const x509 = new X509Certificate(body.certPem);
const certSerial = x509.serialNumber;
const certNotAfter = new Date(x509.validTo);
// 2. Update peer
const [updated] = await this.db
.update(federationPeers)
.set({
certPem: body.certPem,
certSerial,
certNotAfter,
state: 'active',
})
.where(eq(federationPeers.id, id))
.returning();
return updated;
}
}

View File

@@ -0,0 +1,14 @@
import { Module } from '@nestjs/common';
import { AdminGuard } from '../admin/admin.guard.js';
import { CaService } from './ca.service.js';
import { EnrollmentController } from './enrollment.controller.js';
import { EnrollmentService } from './enrollment.service.js';
import { FederationController } from './federation.controller.js';
import { GrantsService } from './grants.service.js';
@Module({
controllers: [EnrollmentController, FederationController],
providers: [AdminGuard, CaService, EnrollmentService, GrantsService],
exports: [CaService, EnrollmentService, GrantsService],
})
export class FederationModule {}

View File

@@ -0,0 +1,36 @@
import { IsDateString, IsIn, IsObject, IsOptional, IsString, IsUUID } from 'class-validator';
export class CreateGrantDto {
@IsUUID()
peerId!: string;
@IsUUID()
subjectUserId!: string;
@IsObject()
scope!: Record<string, unknown>;
@IsOptional()
@IsDateString()
expiresAt?: string;
}
export class ListGrantsDto {
@IsOptional()
@IsUUID()
peerId?: string;
@IsOptional()
@IsUUID()
subjectUserId?: string;
@IsOptional()
@IsIn(['pending', 'active', 'revoked', 'expired'])
status?: 'pending' | 'active' | 'revoked' | 'expired';
}
export class RevokeGrantDto {
@IsOptional()
@IsString()
reason?: string;
}

View File

@@ -0,0 +1,161 @@
/**
* Federation grants service — CRUD + status transitions (FED-M2-06).
*
* Business logic only. CSR/cert work is handled by M2-07.
*
* Status lifecycle:
* pending → active (activateGrant, called by M2-07 enrollment controller after cert signed)
* active → revoked (revokeGrant)
* active → expired (expireGrant, called by M6 scheduler)
*/
import { ConflictException, Inject, Injectable, NotFoundException } from '@nestjs/common';
import { type Db, and, eq, federationGrants } from '@mosaicstack/db';
import { DB } from '../database/database.module.js';
import { parseFederationScope } from './scope-schema.js';
import type { CreateGrantDto, ListGrantsDto } from './grants.dto.js';
export type Grant = typeof federationGrants.$inferSelect;
@Injectable()
export class GrantsService {
constructor(@Inject(DB) private readonly db: Db) {}
/**
* Create a new grant in `pending` state.
* Validates the scope against the federation scope JSON schema before inserting.
*/
async createGrant(dto: CreateGrantDto): Promise<Grant> {
// Throws FederationScopeError (a plain Error subclass) on invalid scope.
parseFederationScope(dto.scope);
const [grant] = await this.db
.insert(federationGrants)
.values({
peerId: dto.peerId,
subjectUserId: dto.subjectUserId,
scope: dto.scope,
status: 'pending',
expiresAt: dto.expiresAt != null ? new Date(dto.expiresAt) : null,
})
.returning();
return grant!;
}
/**
* Fetch a single grant by ID. Throws NotFoundException if not found.
*/
async getGrant(id: string): Promise<Grant> {
const [grant] = await this.db
.select()
.from(federationGrants)
.where(eq(federationGrants.id, id))
.limit(1);
if (!grant) {
throw new NotFoundException(`Grant ${id} not found`);
}
return grant;
}
/**
* List grants with optional filters for peerId, subjectUserId, and status.
*/
async listGrants(filters: ListGrantsDto): Promise<Grant[]> {
const conditions = [];
if (filters.peerId != null) {
conditions.push(eq(federationGrants.peerId, filters.peerId));
}
if (filters.subjectUserId != null) {
conditions.push(eq(federationGrants.subjectUserId, filters.subjectUserId));
}
if (filters.status != null) {
conditions.push(eq(federationGrants.status, filters.status));
}
if (conditions.length === 0) {
return this.db.select().from(federationGrants);
}
return this.db
.select()
.from(federationGrants)
.where(and(...conditions));
}
/**
* Transition a grant from `pending` → `active`.
* Called by M2-07 enrollment controller after cert is signed.
* Throws ConflictException if the grant is not in `pending` state.
*/
async activateGrant(id: string): Promise<Grant> {
const grant = await this.getGrant(id);
if (grant.status !== 'pending') {
throw new ConflictException(
`Grant ${id} cannot be activated: expected status 'pending', got '${grant.status}'`,
);
}
const [updated] = await this.db
.update(federationGrants)
.set({ status: 'active' })
.where(eq(federationGrants.id, id))
.returning();
return updated!;
}
/**
* Transition a grant from `active` → `revoked`.
* Sets revokedAt and optionally revokedReason.
* Throws ConflictException if the grant is not in `active` state.
*/
async revokeGrant(id: string, reason?: string): Promise<Grant> {
const grant = await this.getGrant(id);
if (grant.status !== 'active') {
throw new ConflictException(
`Grant ${id} cannot be revoked: expected status 'active', got '${grant.status}'`,
);
}
const [updated] = await this.db
.update(federationGrants)
.set({
status: 'revoked',
revokedAt: new Date(),
revokedReason: reason ?? null,
})
.where(eq(federationGrants.id, id))
.returning();
return updated!;
}
/**
* Transition a grant from `active` → `expired`.
* Intended for use by the M6 scheduler.
* Throws ConflictException if the grant is not in `active` state.
*/
async expireGrant(id: string): Promise<Grant> {
const grant = await this.getGrant(id);
if (grant.status !== 'active') {
throw new ConflictException(
`Grant ${id} cannot be expired: expected status 'active', got '${grant.status}'`,
);
}
const [updated] = await this.db
.update(federationGrants)
.set({ status: 'expired' })
.where(eq(federationGrants.id, id))
.returning();
return updated!;
}
}

View File

@@ -0,0 +1,9 @@
import { seal, unseal } from '@mosaicstack/auth';
export function sealClientKey(privateKeyPem: string): string {
return seal(privateKeyPem);
}
export function unsealClientKey(sealedKey: string): string {
return unseal(sealedKey);
}

View File

@@ -0,0 +1,187 @@
/**
* Unit tests for FederationScopeSchema and parseFederationScope.
*
* Coverage:
* - Valid: minimal scope
* - Valid: full PRD §8.1 example
* - Valid: resources + excluded_resources (no overlap)
* - Invalid: empty resources
* - Invalid: unknown resource value
* - Invalid: resources / excluded_resources intersection
* - Invalid: filter key not in resources
* - Invalid: max_rows_per_query = 0
* - Invalid: max_rows_per_query = 10001
* - Invalid: not an object / null
* - Defaults: include_personal defaults to true; excluded_resources defaults to []
* - Sentinel: console.warn fires for sensitive resources
*/
import { describe, it, expect, vi, afterEach } from 'vitest';
import {
parseFederationScope,
FederationScopeError,
FederationScopeSchema,
} from './scope-schema.js';
afterEach(() => {
vi.restoreAllMocks();
});
describe('parseFederationScope — valid inputs', () => {
it('accepts a minimal scope (resources + max_rows_per_query only)', () => {
const scope = parseFederationScope({
resources: ['tasks'],
max_rows_per_query: 100,
});
expect(scope.resources).toEqual(['tasks']);
expect(scope.max_rows_per_query).toBe(100);
expect(scope.excluded_resources).toEqual([]);
expect(scope.filters).toBeUndefined();
});
it('accepts the full PRD §8.1 example', () => {
const scope = parseFederationScope({
resources: ['tasks', 'notes', 'memory'],
filters: {
tasks: { include_teams: ['team_uuid_1', 'team_uuid_2'], include_personal: true },
notes: { include_personal: true, include_teams: [] },
memory: { include_personal: true },
},
excluded_resources: ['credentials', 'api_keys'],
max_rows_per_query: 500,
});
expect(scope.resources).toEqual(['tasks', 'notes', 'memory']);
expect(scope.excluded_resources).toEqual(['credentials', 'api_keys']);
expect(scope.filters?.tasks?.include_teams).toEqual(['team_uuid_1', 'team_uuid_2']);
expect(scope.max_rows_per_query).toBe(500);
});
it('accepts a scope with excluded_resources and no filter overlap', () => {
const scope = parseFederationScope({
resources: ['tasks', 'notes'],
excluded_resources: ['memory'],
max_rows_per_query: 250,
});
expect(scope.resources).toEqual(['tasks', 'notes']);
expect(scope.excluded_resources).toEqual(['memory']);
});
});
describe('parseFederationScope — defaults', () => {
it('defaults excluded_resources to []', () => {
const scope = parseFederationScope({ resources: ['tasks'], max_rows_per_query: 1 });
expect(scope.excluded_resources).toEqual([]);
});
it('defaults include_personal to true when filter is provided without it', () => {
const scope = parseFederationScope({
resources: ['tasks'],
filters: { tasks: { include_teams: ['t1'] } },
max_rows_per_query: 10,
});
expect(scope.filters?.tasks?.include_personal).toBe(true);
});
});
describe('parseFederationScope — invalid inputs', () => {
it('throws FederationScopeError for empty resources array', () => {
expect(() => parseFederationScope({ resources: [], max_rows_per_query: 100 })).toThrow(
FederationScopeError,
);
});
it('throws for unknown resource value in resources', () => {
expect(() =>
parseFederationScope({ resources: ['unknown_resource'], max_rows_per_query: 100 }),
).toThrow(FederationScopeError);
});
it('throws when resources and excluded_resources intersect', () => {
expect(() =>
parseFederationScope({
resources: ['tasks', 'memory'],
excluded_resources: ['memory'],
max_rows_per_query: 100,
}),
).toThrow(FederationScopeError);
});
it('throws when filters references a resource not in resources', () => {
expect(() =>
parseFederationScope({
resources: ['tasks'],
filters: { notes: { include_personal: true } },
max_rows_per_query: 100,
}),
).toThrow(FederationScopeError);
});
it('throws for max_rows_per_query = 0', () => {
expect(() => parseFederationScope({ resources: ['tasks'], max_rows_per_query: 0 })).toThrow(
FederationScopeError,
);
});
it('throws for max_rows_per_query = 10001', () => {
expect(() => parseFederationScope({ resources: ['tasks'], max_rows_per_query: 10001 })).toThrow(
FederationScopeError,
);
});
it('throws for null input', () => {
expect(() => parseFederationScope(null)).toThrow(FederationScopeError);
});
it('throws for non-object input (string)', () => {
expect(() => parseFederationScope('not-an-object')).toThrow(FederationScopeError);
});
});
describe('parseFederationScope — sentinel warning', () => {
it('emits console.warn when resources includes "credentials"', () => {
const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
parseFederationScope({
resources: ['tasks', 'credentials'],
max_rows_per_query: 100,
});
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining(
'[FederationScope] WARNING: scope grants sensitive resource "credentials"',
),
);
});
it('emits console.warn when resources includes "api_keys"', () => {
const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
parseFederationScope({
resources: ['tasks', 'api_keys'],
max_rows_per_query: 100,
});
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining(
'[FederationScope] WARNING: scope grants sensitive resource "api_keys"',
),
);
});
it('does NOT emit console.warn for non-sensitive resources', () => {
const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
parseFederationScope({ resources: ['tasks', 'notes', 'memory'], max_rows_per_query: 100 });
expect(warnSpy).not.toHaveBeenCalled();
});
});
describe('FederationScopeSchema — boundary values', () => {
it('accepts max_rows_per_query = 1 (lower bound)', () => {
const result = FederationScopeSchema.safeParse({ resources: ['tasks'], max_rows_per_query: 1 });
expect(result.success).toBe(true);
});
it('accepts max_rows_per_query = 10000 (upper bound)', () => {
const result = FederationScopeSchema.safeParse({
resources: ['tasks'],
max_rows_per_query: 10000,
});
expect(result.success).toBe(true);
});
});

View File

@@ -0,0 +1,147 @@
/**
* Federation grant scope schema and validator.
*
* Source of truth: docs/federation/PRD.md §8.1
*
* This module is intentionally pure — no DB, no NestJS, no CA wiring.
* It is reusable from grant CRUD (M2-06) and scope enforcement (M3+).
*/
import { z } from 'zod';
// ---------------------------------------------------------------------------
// Allowlist of federation resources (canonical — M3+ will extend this list)
// ---------------------------------------------------------------------------
export const FEDERATION_RESOURCE_VALUES = [
'tasks',
'notes',
'memory',
'credentials',
'api_keys',
] as const;
export type FederationResource = (typeof FEDERATION_RESOURCE_VALUES)[number];
/**
* Sensitive resources require explicit admin approval (PRD §8.4).
* The parser warns when these appear in `resources`; M2-06 grant CRUD
* will add a hard gate on top of this warning.
*/
const SENSITIVE_RESOURCES: ReadonlySet<FederationResource> = new Set(['credentials', 'api_keys']);
// ---------------------------------------------------------------------------
// Sub-schemas
// ---------------------------------------------------------------------------
const ResourceArraySchema = z
.array(z.enum(FEDERATION_RESOURCE_VALUES))
.nonempty({ message: 'resources must contain at least one value' })
.refine((arr) => new Set(arr).size === arr.length, {
message: 'resources must not contain duplicate values',
});
const ResourceFilterSchema = z.object({
include_teams: z.array(z.string()).optional(),
include_personal: z.boolean().default(true),
});
// ---------------------------------------------------------------------------
// Top-level schema
// ---------------------------------------------------------------------------
export const FederationScopeSchema = z
.object({
resources: ResourceArraySchema,
excluded_resources: z
.array(z.enum(FEDERATION_RESOURCE_VALUES))
.default([])
.refine((arr) => new Set(arr).size === arr.length, {
message: 'excluded_resources must not contain duplicate values',
}),
filters: z.record(z.string(), ResourceFilterSchema).optional(),
max_rows_per_query: z
.number()
.int({ message: 'max_rows_per_query must be an integer' })
.min(1, { message: 'max_rows_per_query must be at least 1' })
.max(10000, { message: 'max_rows_per_query must be at most 10000' }),
})
.superRefine((data, ctx) => {
const resourceSet = new Set(data.resources);
// Intersection guard: a resource cannot be both granted and excluded
for (const r of data.excluded_resources) {
if (resourceSet.has(r)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: `Resource "${r}" appears in both resources and excluded_resources`,
path: ['excluded_resources'],
});
}
}
// Filter keys must be a subset of resources
if (data.filters) {
for (const key of Object.keys(data.filters)) {
if (!resourceSet.has(key as FederationResource)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: `filters key "${key}" references a resource not present in resources`,
path: ['filters', key],
});
}
}
}
});
export type FederationScope = z.infer<typeof FederationScopeSchema>;
// ---------------------------------------------------------------------------
// Error class
// ---------------------------------------------------------------------------
export class FederationScopeError extends Error {
constructor(message: string) {
super(message);
this.name = 'FederationScopeError';
}
}
// ---------------------------------------------------------------------------
// Typed parser
// ---------------------------------------------------------------------------
/**
* Parse and validate an unknown value as a FederationScope.
*
* Throws `FederationScopeError` with aggregated Zod issues on failure.
*
* Emits `console.warn` when sensitive resources (`credentials`, `api_keys`)
* are present in `resources` — per PRD §8.4, these require explicit admin
* approval. M2-06 grant CRUD will add a hard gate on top of this warning.
*/
export function parseFederationScope(input: unknown): FederationScope {
const result = FederationScopeSchema.safeParse(input);
if (!result.success) {
const issues = result.error.issues
.map((e) => ` - [${e.path.join('.') || 'root'}] ${e.message}`)
.join('\n');
throw new FederationScopeError(`Invalid federation scope:\n${issues}`);
}
const scope = result.data;
// Sentinel warning for sensitive resources (PRD §8.4)
for (const resource of scope.resources) {
if (SENSITIVE_RESOURCES.has(resource)) {
console.warn(
`[FederationScope] WARNING: scope grants sensitive resource "${resource}". Per PRD §8.4 this requires explicit admin approval and is logged.`,
);
}
}
return scope;
}

View File

@@ -20,10 +20,12 @@ import { Logger, ValidationPipe } from '@nestjs/common';
import { FastifyAdapter, type NestFastifyApplication } from '@nestjs/platform-fastify';
import helmet from '@fastify/helmet';
import { listSsoStartupWarnings } from '@mosaicstack/auth';
import { loadConfig } from '@mosaicstack/config';
import { AppModule } from './app.module.js';
import { mountAuthHandler } from './auth/auth.controller.js';
import { mountMcpHandler } from './mcp/mcp.controller.js';
import { McpService } from './mcp/mcp.service.js';
import { detectAndAssertTier, TierDetectionError } from '@mosaicstack/storage';
async function bootstrap(): Promise<void> {
const logger = new Logger('Bootstrap');
@@ -32,6 +34,20 @@ async function bootstrap(): Promise<void> {
throw new Error('BETTER_AUTH_SECRET is required');
}
// Pre-flight: assert all external services required by the configured tier
// are reachable. Runs before NestFactory.create() so failures are visible
// immediately with actionable remediation hints.
const mosaicConfig = loadConfig();
try {
await detectAndAssertTier(mosaicConfig);
} catch (err) {
if (err instanceof TierDetectionError) {
logger.error(`Tier detection failed: ${err.message}`);
logger.error(`Remediation: ${err.remediation}`);
}
throw err;
}
for (const warning of listSsoStartupWarnings()) {
logger.warn(warning);
}

View File

@@ -0,0 +1,70 @@
# deploy/portainer/
Portainer stack templates for Mosaic Stack deployments.
## Files
| File | Purpose |
| -------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `federated-test.stack.yml` | Docker Swarm stack for federation end-to-end test instances (`mos-test-1.woltje.com`, `mos-test-2.woltje.com`) |
---
## federated-test.stack.yml
A self-contained Swarm stack that boots a federated-tier Mosaic gateway with co-located Postgres 17 (pgvector) and Valkey 8. This is a **test template** — production deployments will use a separate template with stricter resource limits and Docker secrets.
### Deploy via Portainer UI
1. Log into Portainer.
2. Navigate to **Stacks → Add stack**.
3. Set a stack name matching `STACK_NAME` below (e.g. `mos-test-1`).
4. Choose **Web editor** and paste the contents of `federated-test.stack.yml`.
5. Scroll to **Environment variables** and add each variable listed below.
6. Click **Deploy the stack**.
### Required environment variables
| Variable | Example | Notes |
| -------------------- | --------------------------------------- | -------------------------------------------------------- |
| `STACK_NAME` | `mos-test-1` | Unique per stack — used in Traefik router/service names. |
| `HOST_FQDN` | `mos-test-1.woltje.com` | Fully-qualified hostname served by this stack. |
| `POSTGRES_PASSWORD` | _(generate randomly)_ | Database password. Do **not** reuse between stacks. |
| `BETTER_AUTH_SECRET` | _(generate: `openssl rand -base64 32`)_ | BetterAuth session signing key. |
| `BETTER_AUTH_URL` | `https://mos-test-1.woltje.com` | Public base URL of the gateway. |
Optional variables (uncomment in the YAML or set in Portainer):
| Variable | Notes |
| ----------------------------- | ---------------------------------------------------------- |
| `ANTHROPIC_API_KEY` | Enable Claude models. |
| `OPENAI_API_KEY` | Enable OpenAI models. |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Forward traces to a collector (e.g. `http://jaeger:4318`). |
### Required external resources
Before deploying, ensure the following exist on the Swarm:
1. **`traefik-public` overlay network** — shared network Traefik uses to route traffic to stacks.
```bash
docker network create --driver overlay --attachable traefik-public
```
2. **`letsencrypt` cert resolver** — configured in the Traefik Swarm stack. The stack template references `tls.certresolver=letsencrypt`; the name must match your Traefik config.
3. **DNS A record** — `${HOST_FQDN}` must resolve to the Swarm ingress IP (or a Cloudflare-proxied address pointing there).
### Deployed instances
| Stack name | HOST_FQDN | Purpose |
| ------------ | ----------------------- | ---------------------------------- |
| `mos-test-1` | `mos-test-1.woltje.com` | DEPLOY-03 — first federation peer |
| `mos-test-2` | `mos-test-2.woltje.com` | DEPLOY-04 — second federation peer |
### Image
The gateway image is pinned by digest to `fed-v0.1.0-m1` (verified in DEPLOY-01). Update the digest in the YAML when promoting a new build — never use `:latest` or a mutable tag in Swarm.
### Notes
- This template boots a **vanilla M1-baseline gateway** in federated tier. Federation grants (Step-CA, mTLS) are M2+ scope and not included here.
- Each stack gets its own Postgres volume (`postgres-data`) and Valkey volume (`valkey-data`) scoped to the stack name by Swarm.
- `depends_on` is honoured by Compose but ignored by Swarm — healthchecks on Postgres and Valkey ensure the gateway retries until they are ready.

View File

@@ -0,0 +1,160 @@
# deploy/portainer/federated-test.stack.yml
#
# Portainer / Docker Swarm stack template — federated-tier test instance
#
# PURPOSE
# Deploys a single federated-tier Mosaic gateway with co-located Postgres
# (pgvector) and Valkey for end-to-end federation testing. Intended for
# mos-test-1.woltje.com and mos-test-2.woltje.com (DEPLOY-03/04).
#
# REQUIRED ENV VARS (set per-stack in Portainer → Stacks → Environment variables)
# STACK_NAME Unique name for Traefik router/service labels.
# Examples: mos-test-1, mos-test-2
# HOST_FQDN Fully-qualified domain name served by this stack.
# Examples: mos-test-1.woltje.com, mos-test-2.woltje.com
# POSTGRES_PASSWORD Database password — set per stack; do NOT commit a default.
# BETTER_AUTH_SECRET Random 32-char string for BetterAuth session signing.
# Generate: openssl rand -base64 32
# BETTER_AUTH_URL Public gateway base URL, e.g. https://mos-test-1.woltje.com
#
# OPTIONAL ENV VARS (uncomment and set in Portainer to enable features)
# ANTHROPIC_API_KEY sk-ant-...
# OPENAI_API_KEY sk-...
# OTEL_EXPORTER_OTLP_ENDPOINT http://<collector>:4318
# OTEL_SERVICE_NAME (default: mosaic-gateway)
#
# REQUIRED EXTERNAL RESOURCES
# traefik-public Docker overlay network — must exist before deploying.
# Create: docker network create --driver overlay --attachable traefik-public
# letsencrypt Traefik cert resolver configured on the Swarm manager.
# DNS A record ${HOST_FQDN} → Swarm ingress IP (or Cloudflare proxy).
#
# IMAGE
# Pinned to sha-9f1a081 (main HEAD post-#488 Dockerfile fix). The previous
# pin (fed-v0.1.0-m1, sha256:9b72e2...) had a broken pnpm copy and could
# not resolve @mosaicstack/storage at runtime. The new digest was smoke-
# tested locally — gateway boots, imports resolve, tier-detector runs.
# Update digest here when promoting a new build.
#
# HEALTHCHECK NOTE (2026-04-21)
# Switched from busybox wget to node http.get on 127.0.0.1 (not localhost) to
# avoid IPv6 resolution issues on Alpine. Retries increased to 5 and
# start_period to 60s to cover the NestJS/GC cold-start window (~40-50s).
# restart_policy set to `any` so SIGTERM/clean-exit also triggers restart.
#
# NOTE: This is a TEST template — production deployments use a separate
# parameterised template with stricter resource limits and secrets.
version: '3.9'
services:
gateway:
image: git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
# Tag for human reference: sha-9f1a081 (post-#488 Dockerfile fix; smoke-tested locally)
environment:
# ── Tier ───────────────────────────────────────────────────────────────
MOSAIC_TIER: federated
# ── Database ───────────────────────────────────────────────────────────
DATABASE_URL: postgres://gateway:${POSTGRES_PASSWORD}@postgres:5432/mosaic
# ── Queue ──────────────────────────────────────────────────────────────
VALKEY_URL: redis://valkey:6379
# ── Gateway ────────────────────────────────────────────────────────────
GATEWAY_PORT: '3000'
GATEWAY_CORS_ORIGIN: https://${HOST_FQDN}
# ── Auth ───────────────────────────────────────────────────────────────
BETTER_AUTH_SECRET: ${BETTER_AUTH_SECRET}
BETTER_AUTH_URL: https://${HOST_FQDN}
# ── Observability ──────────────────────────────────────────────────────
OTEL_SERVICE_NAME: ${STACK_NAME:-mosaic-gateway}
# OTEL_EXPORTER_OTLP_ENDPOINT: http://<collector>:4318
# ── AI Providers (uncomment to enable) ─────────────────────────────────
# ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
# OPENAI_API_KEY: ${OPENAI_API_KEY}
networks:
- federated-test
- traefik-public
deploy:
replicas: 1
restart_policy:
condition: any
delay: 5s
max_attempts: 3
labels:
- 'traefik.enable=true'
- 'traefik.docker.network=traefik-public'
- 'traefik.http.routers.${STACK_NAME}.rule=Host(`${HOST_FQDN}`)'
- 'traefik.http.routers.${STACK_NAME}.entrypoints=websecure'
- 'traefik.http.routers.${STACK_NAME}.tls=true'
- 'traefik.http.routers.${STACK_NAME}.tls.certresolver=letsencrypt'
- 'traefik.http.services.${STACK_NAME}.loadbalancer.server.port=3000'
healthcheck:
test:
- 'CMD'
- 'node'
- '-e'
- "require('http').get('http://127.0.0.1:3000/health',r=>process.exit(r.statusCode===200?0:1)).on('error',()=>process.exit(1))"
interval: 30s
timeout: 5s
retries: 5
start_period: 60s
depends_on:
- postgres
- valkey
postgres:
image: pgvector/pgvector:pg17
environment:
POSTGRES_USER: gateway
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: mosaic
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- federated-test
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U gateway']
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
valkey:
image: valkey/valkey:8-alpine
volumes:
- valkey-data:/data
networks:
- federated-test
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
healthcheck:
test: ['CMD', 'valkey-cli', 'ping']
interval: 10s
timeout: 3s
retries: 5
start_period: 5s
volumes:
postgres-data:
valkey-data:
networks:
federated-test:
driver: overlay
traefik-public:
external: true

View File

@@ -0,0 +1,120 @@
# docker-compose.federated.yml — Federated tier overlay
#
# USAGE:
# docker compose -f docker-compose.federated.yml --profile federated up -d
#
# This file is a standalone overlay for the Mosaic federated tier.
# It is NOT an extension of docker-compose.yml — it defines its own services
# and named volumes so it can run independently of the base dev stack.
#
# IMPORTANT — HOST PORT CONFLICTS:
# The federated services bind the same host ports as the base dev stack
# (5433 for Postgres, 6380 for Valkey). You must stop the base dev stack
# before starting the federated stack on the same machine:
# docker compose down
# docker compose -f docker-compose.federated.yml --profile federated up -d
#
# pgvector extension:
# The vector extension is created automatically at first boot via
# ./infra/pg-init/01-extensions.sql (CREATE EXTENSION IF NOT EXISTS vector).
#
# Tier configuration:
# Used by `mosaic` instances configured with `tier: federated`.
# DEFAULT_FEDERATED_CONFIG points at:
# postgresql://mosaic:mosaic@localhost:5433/mosaic
services:
postgres-federated:
image: pgvector/pgvector:pg17
profiles: [federated]
restart: unless-stopped
ports:
- '${PG_FEDERATED_HOST_PORT:-5433}:5432'
environment:
POSTGRES_USER: mosaic
POSTGRES_PASSWORD: mosaic
POSTGRES_DB: mosaic
volumes:
- pg_federated_data:/var/lib/postgresql/data
- ./infra/pg-init:/docker-entrypoint-initdb.d:ro
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U mosaic']
interval: 5s
timeout: 3s
retries: 5
valkey-federated:
image: valkey/valkey:8-alpine
profiles: [federated]
restart: unless-stopped
ports:
- '${VALKEY_FEDERATED_HOST_PORT:-6380}:6379'
volumes:
- valkey_federated_data:/data
healthcheck:
test: ['CMD', 'valkey-cli', 'ping']
interval: 5s
timeout: 3s
retries: 5
# ---------------------------------------------------------------------------
# Step-CA — Mosaic Federation internal certificate authority
#
# Image: pinned to 0.27.4 (latest stable as of late 2025).
# `latest` is forbidden per Mosaic image policy (immutable tag required for
# reproducible deployments and digest-first promotion in CI).
#
# Profile: `federated` — this service must not start in non-federated dev.
#
# Password:
# Dev: bind-mount ./infra/step-ca/dev-password (gitignored; copy from
# ./infra/step-ca/dev-password.example and customise locally).
# Prod: replace the bind-mount with a Docker secret:
# secrets:
# ca_password:
# external: true
# and reference it as `/run/secrets/ca_password` (same path the
# init script already uses).
#
# Provisioner: "mosaic-fed" (consumed by apps/gateway/src/federation/ca.service.ts)
# ---------------------------------------------------------------------------
step-ca:
image: smallstep/step-ca:0.27.4
profiles: [federated]
restart: unless-stopped
ports:
- '${STEP_CA_HOST_PORT:-9000}:9000'
volumes:
- step_ca_data:/home/step
# init script — executed as the container entrypoint
- ./infra/step-ca/init.sh:/usr/local/bin/mosaic-step-ca-init.sh:ro
# X.509 template skeleton (wired in M2-04)
- ./infra/step-ca/templates:/etc/step-ca-templates:ro
# Dev password file — GITIGNORED; copy from dev-password.example
# In production, replace this with a Docker secret (see comment above).
- ./infra/step-ca/dev-password:/run/secrets/ca_password:ro
entrypoint: ['/bin/sh', '/usr/local/bin/mosaic-step-ca-init.sh']
healthcheck:
# The healthcheck requires the root cert to exist, which is only true
# after init.sh has completed on first boot. start_period gives init
# time to finish before Docker starts counting retries.
test:
[
'CMD',
'step',
'ca',
'health',
'--ca-url',
'https://localhost:9000',
'--root',
'/home/step/certs/root_ca.crt',
]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
volumes:
pg_federated_data:
valkey_federated_data:
step_ca_data:

View File

@@ -5,18 +5,27 @@ RUN corepack enable
FROM base AS builder
WORKDIR /app
# Copy workspace manifests first for layer-cached install
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json ./
COPY apps/gateway/package.json ./apps/gateway/
COPY packages/ ./packages/
COPY plugins/ ./plugins/
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm --filter @mosaic/gateway build
# Build gateway and all of its workspace dependencies via turbo dependency graph
RUN pnpm turbo run build --filter @mosaicstack/gateway...
# Produce a self-contained deploy artifact: flat node_modules, no pnpm symlinks
# --legacy is required for pnpm v10 when inject-workspace-packages is not set
RUN pnpm --filter @mosaicstack/gateway --prod deploy --legacy /deploy
FROM base AS runner
WORKDIR /app
ENV NODE_ENV=production
# Use the pnpm deploy output — resolves all deps into a flat, self-contained node_modules
COPY --from=builder /deploy/node_modules ./node_modules
COPY --from=builder /deploy/package.json ./package.json
# dist is declared in package.json "files" so pnpm deploy copies it into /deploy;
# copy from builder explicitly as belt-and-suspenders
COPY --from=builder /app/apps/gateway/dist ./dist
COPY --from=builder /app/apps/gateway/package.json ./package.json
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 4000
CMD ["node", "dist/main.js"]

View File

@@ -1,73 +1,116 @@
# Mission Manifest — Install UX v2
# Mission Manifest — MVP
> Persistent document tracking full mission scope, status, and session history.
> Updated by the orchestrator at each phase transition and milestone completion.
> Top-level rollup tracking Mosaic Stack MVP execution.
> Workstreams have their own manifests; this document is the source of truth for MVP scope, status, and history.
> Owner: Orchestrator (sole writer).
## Mission
**ID:** install-ux-v2-20260405
**Statement:** The install-ux-hardening mission shipped the plumbing (uninstall, masked password, hooks consent, unified flow, headless path), but the first real end-to-end run surfaced a critical regression and a collection of UX failings that make the wizard feel neither quick nor intelligent. This mission closes the bootstrap regression as a hotfix, then rethinks the first-run experience around a provider-first, intent-driven flow with a drill-down main menu and a genuinely fast quick-start.
**Phase:** Planning
**Current Milestone:** IUV-M01
**Progress:** 0 / 3 milestones
**Status:** active
**Last Updated:** 2026-04-05
**Parent Mission:** [install-ux-hardening-20260405](./archive/missions/install-ux-hardening-20260405/MISSION-MANIFEST.md) (complete — `mosaic-v0.0.25`)
**ID:** mvp-20260312
**Statement:** Ship a self-hosted, multi-user AI agent platform that consolidates the user's disparate jarvis-brain usage across home and USC workstations into a single coherent system reachable via three first-class surfaces — webUI, TUI, and CLI — with federation as the data-layer mechanism that makes cross-host agent sessions work in real time without copying user data across the boundary.
**Phase:** Execution (workstream W1 in planning-complete state)
**Current Workstream:** W1 — Federation v1
**Progress:** 0 / 1 declared workstreams complete (more workstreams will be declared as scope is refined)
**Status:** active (continuous since 2026-03-13)
**Last Updated:** 2026-04-19 (manifest authored at the rollup level; install-ux-v2 archived; W1 federation planning landed via PR #468)
**Source PRD:** [docs/PRD.md](./PRD.md) — Mosaic Stack v0.1.0
**Scratchpad:** [docs/scratchpads/mvp-20260312.md](./scratchpads/mvp-20260312.md) (active since 2026-03-13; 14 prior sessions of phase-based execution)
## Context
Real-run testing of `@mosaicstack/mosaic@0.0.25` uncovered:
Jarvis (v0.2.0) was a single-host Python/Next.js assistant. The user runs sessions across 34 workstations split between home and USC. Today every session reaches back to a single jarvis-brain checkout, which is brittle (offline-hostile, no consolidation, no shared state beyond a single repo). A prior OpenBrain attempt punished offline use, introduced cache/latency/opacity pain, and tightly coupled every session to a remote service.
1. **Critical:** admin bootstrap fails with HTTP 400 `property email should not exist``bootstrap.controller.ts` uses `import type { BootstrapSetupDto }`, erasing the class at runtime. Nest's `@Body()` falls back to plain `Object` metatype, and ValidationPipe with `forbidNonWhitelisted` rejects every property. One-character fix (drop the `type` keyword), but it blocks the happy path of the release that just shipped.
2. The wizard reports `✔ Wizard complete` and `✔ Done` _after_ the bootstrap 400 — failure only propagates in headless mode (`wizard.ts:147`).
3. The gateway port prompt does not prefill `14242` in the input buffer.
4. `"What is Mosaic?"` intro copy does not mention Pi SDK (the actual agent runtime behind Claude/Codex/OpenCode).
5. CORS origin prompt is confusing — the user should be able to supply an FQDN/hostname and have the system derive the CORS value.
6. Skill / additional feature install section is unusable in practice.
7. Quick-start asks far too many questions to be meaningfully "quick".
8. No drill-down main menu — everything is a linear interrogation.
9. Provider setup happens late and without intelligence. An OpenClaw-style provider-first flow would let the user describe what they want in natural language, have the agent expound on it, and have the agent choose its own name based on that intent.
The MVP solution: keep each user's home gateway as the source of truth, connect gateways gateway-to-gateway over mTLS with scoped read-only data exposure, and expose the unified experience through three coherent surfaces:
- **webUI** — the primary visual control plane (Next.js + React 19, `apps/web`)
- **TUI** — the terminal-native interface for agent work (`packages/mosaic` wizard + Pi TUI)
- **CLI** — `mosaic` command for scripted/headless workflows
Federation is required NOW because it unblocks cross-host consolidation; it is necessary but not sufficient for MVP. Additional workstreams will be declared as their scope solidifies.
## Prior Execution (March 13 → April 5)
This manifest was authored on 2026-04-19 to rollup work that began 2026-03-13. Before this date, MVP work was tracked via phase-based Gitea milestones and the scratchpad — there was no rollup manifest at the `docs/MISSION-MANIFEST.md` path (the slot was occupied by sub-mission manifests for `install-ux-hardening` and then `install-ux-v2`).
Prior execution outline (full detail in [scratchpads/mvp-20260312.md](./scratchpads/mvp-20260312.md)):
- **Phases 0 → 7** (Gitea milestones `ms-157``ms-164`, issues #1#59): foundation, core API, agent layer, web dashboard, memory, remote control, CLI/tools, polish/beta. Substantially shipped by Session 13.
- **Phase 8** (Gitea milestone `ms-165`, issues #160#172): platform architecture extension — teams, workspaces, `/provider` OAuth, preferences, etc. Wave-based execution plan defined at Session 14.
- **Sub-missions** during the gap: `install-ux-hardening` (complete, `mosaic-v0.0.25`), `install-ux-v2` (complete on 2026-04-19, `0.0.27``0.0.29`). Both archived under `docs/archive/missions/`.
Going forward, MVP execution is tracked through the **Workstreams** table below. Phase-based issue numbering is preserved on Gitea but is no longer the primary control plane.
## Cross-Cutting MVP Requirements
These apply to every workstream and every milestone. A workstream cannot ship if it breaks any of them.
| # | Requirement |
| ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| MVP-X1 | Three-surface parity: every user-facing capability is reachable via webUI **and** TUI **and** CLI (read paths at minimum; mutating paths where applicable to the surface). |
| MVP-X2 | Multi-tenant isolation is enforced at every boundary; no cross-user leakage under any circumstance. |
| MVP-X3 | Auth via BetterAuth (existing); SSO adapters per PRD; admin bootstrap remains a one-shot. |
| MVP-X4 | Three quality gates green before push: `pnpm typecheck`, `pnpm lint`, `pnpm format:check`. |
| MVP-X5 | Federated tier (PG + pgvector + Valkey) is the canonical MVP deployment topology; local/standalone tiers continue to work for non-federated installs but are not the MVP target. |
| MVP-X6 | OTEL tracing on every request path; `traceparent` propagated across the federation boundary in both directions. |
| MVP-X7 | Trunk merge strategy: branch from `main`, squash-merge via PR, never push to `main` directly. |
## Success Criteria
- [ ] AC-1: Admin bootstrap completes successfully end-to-end on a fresh install (DTO value import, no forbidNonWhitelisted regression); covered by an integration or e2e test that exercises the real DTO binding.
- [ ] AC-2: Wizard fails loudly (non-zero exit, clear error) when the bootstrap stage returns `completed: false`, in both interactive and headless modes. No more silent `✔ Wizard complete` after a 400.
- [ ] AC-3: Gateway port prompt prefills `14242` in the input field (user can press Enter to accept).
- [ ] AC-4: `"What is Mosaic?"` intro copy mentions Pi SDK as the underlying agent runtime.
- [ ] AC-5: Release `mosaic-v0.0.26` tagged and published to the Gitea npm registry, unblocking the 0.0.25 happy path.
- [ ] AC-6: CORS origin prompt replaced with FQDN/hostname input; CORS string is derived from that.
- [ ] AC-7: Skill / additional feature install section is reworked until it is actually usable end-to-end (worker defines the concrete failure modes during diagnosis).
- [ ] AC-8: First-run flow has a drill-down main menu with at least `Plugins` (Recommended / Custom), `Providers`, and the other top-level configuration groups. Linear interrogation is gone.
- [ ] AC-9: `Quick Start` path completes with a minimal, curated set of questions (target: under 90 seconds for a returning user; define the exact baseline during design).
- [ ] AC-10: Provider setup happens first, driven by a natural-language intake prompt. The agent expounds on the user's intent and chooses its own name based on that intent (OpenClaw-style). Naming is confirmable / overridable.
- [ ] AC-11: All milestones ship as merged PRs with green CI and closed issues.
The MVP is complete when ALL declared workstreams are complete AND every cross-cutting requirement is verifiable on a live two-host deployment (woltje.com ↔ uscllc.com).
## Milestones
- [ ] AC-MVP-1: All declared workstreams reach `complete` status with merged PRs and green CI
- [ ] AC-MVP-2: A user session on the home gateway can transparently query work-gateway data subject to scope, with no data persisted across the boundary
- [ ] AC-MVP-3: The same user-facing capability is reachable from webUI, TUI, and CLI (per MVP-X1)
- [ ] AC-MVP-4: Two-gateway production deployment (woltje.com ↔ uscllc.com) operational ≥7 days without incident
- [ ] AC-MVP-5: All cross-cutting requirements (MVP-X1 → MVP-X7) verified with evidence
- [ ] AC-MVP-6: PRD `docs/PRD.md` "In Scope (v0.1.0 Beta)" list mapped to evidence (each item: shipped / explicitly deferred with rationale)
| # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------- | ------------------------------------------------------------ | ----------- | ---------------------- | ----- | ---------- | --------- |
| 1 | IUV-M01 | Hotfix: bootstrap DTO + wizard failure + port prefill + copy | in-progress | fix/bootstrap-hotfix | #436 | 2026-04-05 | — |
| 2 | IUV-M02 | UX polish: CORS/FQDN, skill installer rework | not-started | feat/install-ux-polish | #437 | — | — |
| 3 | IUV-M03 | Provider-first intelligent flow + drill-down main menu | not-started | feat/install-ux-intent | #438 | — | — |
## Workstreams
## Subagent Delegation Plan
| # | ID | Name | Status | Manifest | Notes |
| --- | --- | ------------------------------------------- | ----------------- | ----------------------------------------------------------------------- | --------------------------------------------------- |
| W1 | FED | Federation v1 | planning-complete | [docs/federation/MISSION-MANIFEST.md](./federation/MISSION-MANIFEST.md) | 7 milestones, ~175K tokens, issues #460#466 filed |
| W2+ | TBD | (additional workstreams declared as scoped) | — | — | Scope creep is expected and explicitly accommodated |
| Milestone | Recommended Tier | Rationale |
| --------- | ---------------- | --------------------------------------------------------------------- |
| IUV-M01 | sonnet | Tight bug cluster with known fix sites + small release cycle |
| IUV-M02 | sonnet | UX rework, moderate surface, diagnostic-heavy for the skill installer |
| IUV-M03 | opus | Architectural redesign of first-run flow, state machine + LLM intake |
### Likely Additional Workstreams (Not Yet Declared)
These are anticipated based on the PRD `In Scope` list but are NOT counted toward MVP completion until they have their own manifest, milestones, and tracking issues. Listed here so the orchestrator knows what's likely coming.
- Web dashboard parity with PRD scope (chat, tasks, projects, missions, agent status surfaces)
- Pi TUI integration for terminal-native agent work
- CLI completeness for headless / scripted workflows that mirror webUI capability
- Remote control plugins (Discord priority, then Telegram)
- Multi-user / SSO finishing (BetterAuth + Authentik/WorkOS/Keycloak adapters per PRD)
- LLM provider expansion (Anthropic, Codex, Z.ai, Ollama, LM Studio, llama.cpp) + routing matrix
- MCP server/client capability + skill import interface
- Brain (`@mosaicstack/brain`) as the structured data layer on PG + vector
When any of these solidify into a real workstream, add a row to the Workstreams table, create a workstream-level manifest under `docs/{workstream}/MISSION-MANIFEST.md`, and file tracking issues.
## Risks
- **Hotfix regression surface** — the `import type``import` fix on the DTO class is one character but needs an integration test that binds the real DTO, not just a controller unit test, to prevent the same class-erasure regression from sneaking back in.
- **LLM-driven intake latency / offline** — M03's provider-first intent flow assumes an available LLM call to expound on user input and choose a name. Offline installs need a deterministic fallback.
- **Menu vs. linear back-compat** — M03 changes the top-level flow shape; existing `tools/install.sh --yes` + env-var headless path must continue to work.
- **Scope creep in M03** — "redesign the wizard" can absorb arbitrary work. Keep it bounded with explicit non-goals.
- **Scope creep is the named risk.** Workstreams will be added; the rule is that each must have its own manifest + milestones + acceptance criteria before it consumes execution capacity.
- **Federation urgency vs. surface parity** — federation is being built first because it unblocks the user, but webUI/TUI/CLI parity (MVP-X1) cannot slip indefinitely. Track surface coverage explicitly when each workstream lands.
- **Three-surface fan-out** — the same capability exposed three ways multiplies test surface and design effort. Default to a shared API/contract layer, then thin surface adapters; resist surface-specific business logic.
- **Federated-tier dependency** — MVP requires PG + pgvector + Valkey; users on local/standalone tier cannot federate. This is intentional but must be communicated clearly in the wizard.
## Out of Scope
## Out of Scope (MVP)
- Migrating the wizard to a GUI / web UI (still terminal-first)
- Replacing the Gitea registry or the Woodpecker publish pipeline
- Multi-tenant / multi-user onboarding (still single-admin bootstrap)
- Reworking `mosaic uninstall` (M01 of the parent mission — stable)
- SaaS / multi-tenant revenue model — personal/family/team tool only
- Mobile native apps — responsive web only
- Public npm registry publishing — Gitea registry only
- Voice / video agent interaction
- Full OpenClaw feature parity — inspiration only
- Calendar / GLPI / Woodpecker tooling integrations (deferred to post-MVP)
## Session History
For sessions 114 (phase-based execution, 2026-03-13 → 2026-03-15), see [scratchpads/mvp-20260312.md](./scratchpads/mvp-20260312.md). Sessions below are tracked at the rollup level.
| Session | Date | Runtime | Outcome |
| ------- | ---------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| S15 | 2026-04-19 | claude | MVP rollup manifest authored. Install-ux-v2 archived (IUV-M03 retroactively closed — shipped via PR #446 + releases 0.0.27 → 0.0.29). Federation v1 planning landed via PR #468. W1 manifest reachable at `docs/federation/MISSION-MANIFEST.md`. Next: kickoff FED-M1. |
## Next Step
Begin W1 / FED-M1 — federated tier infrastructure. Task breakdown lives at [docs/federation/TASKS.md](./federation/TASKS.md).

View File

@@ -1,39 +1,40 @@
# Tasks — Install UX v2
# Tasks — MVP (Top-Level Rollup)
> Single-writer: orchestrator only. Workers read but never modify.
>
> **Mission:** install-ux-v2-20260405
> **Schema:** `| id | status | description | issue | agent | branch | depends_on | estimate | notes |`
> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed` | `needs-qa`
> **Agent values:** `codex` | `sonnet` | `haiku` | `opus` | `—` (auto)
> **Mission:** mvp-20260312
> **Manifest:** [docs/MISSION-MANIFEST.md](./MISSION-MANIFEST.md)
>
> This file is a **rollup**. Per-workstream task breakdowns live in workstream task files
> (e.g. `docs/federation/TASKS.md`). Workers operating inside a workstream should treat
> the workstream file as their primary task source; this file exists for orchestrator-level
> visibility into MVP-wide state.
>
> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed`
## Milestone 1 — Hotfix: bootstrap DTO + wizard failure + port prefill + copy (IUV-M01)
## Workstream Rollup
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | ------ | -------------------- | ---------- | -------- | ---------------------------------------------------------------------------------------------- |
| IUV-01-01 | not-started | Fix `apps/gateway/src/admin/bootstrap.controller.ts:16` — switch `import type { BootstrapSetupDto }` to a value import so Nest's `@Body()` binds the real class | #436 | sonnet | fix/bootstrap-hotfix | — | 3K | one-character fix; repro is real-run `mosaic wizard` against `0.0.25` |
| IUV-01-02 | not-started | Add integration / e2e test that POSTs `/api/bootstrap/setup` with `{name,email,password}` against a real Nest app instance and asserts 201 — NOT a mocked controller unit test | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-01 | 10K | must fail before the fix and pass after; guards against the class-erasure regression recurring |
| IUV-01-03 | not-started | `packages/mosaic/src/wizard.ts:147` — propagate `!bootstrapResult.completed` as a wizard failure in **interactive** mode too (not only headless); non-zero exit + no `✔ Wizard complete` line | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-02 | 5K | |
| IUV-01-04 | not-started | Gateway port prompt prefills `14242` in the input buffer — investigate why `promptPort`'s `defaultValue` isn't reaching the user-visible input | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-03 | 5K | likely WizardPrompter adapter or @clack/prompts `initialValue` vs `defaultValue` mismatch |
| IUV-01-05 | not-started | `"What is Mosaic?"` intro copy updated to mention Pi SDK as the underlying agent runtime (alongside Claude Code / Codex / OpenCode) | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-04 | 2K | |
| IUV-01-06 | not-started | Tests + code review + PR merge + tag `mosaic-v0.0.26` + Gitea release + npm registry republish | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-05 | 10K | bump `packages/mosaic/package.json` to 0.0.25 → 0.0.26 |
| id | status | workstream | progress | tasks file | notes |
| --- | ----------------- | ------------------- | ---------------- | ------------------------------------------------- | --------------------------------------------------------------- |
| W1 | planning-complete | Federation v1 (FED) | 0 / 7 milestones | [docs/federation/TASKS.md](./federation/TASKS.md) | M1 task breakdown populated; M2M7 deferred to mission planning |
## Milestone 2 — UX polish: CORS/FQDN, skill installer rework (IUV-M02)
## Cross-Cutting Tracking
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------- | ---------- | -------- | --------------------------- |
| IUV-02-01 | not-started | Replace CORS origin prompt with FQDN / hostname input; derive the CORS value internally; default to `localhost` with clear help text | #437 | sonnet | feat/install-ux-polish | IUV-01-06 | 10K | |
| IUV-02-02 | not-started | Diagnose and document the concrete failure modes of the current skill / additional feature install section end-to-end | #437 | sonnet | feat/install-ux-polish | IUV-02-01 | 8K | needs real-run reproduction |
| IUV-02-03 | not-started | Rework the skill installer so it is usable end-to-end (selection, install, verify, failure reporting) | #437 | sonnet | feat/install-ux-polish | IUV-02-02 | 20K | |
| IUV-02-04 | not-started | Tests + code review + PR merge | #437 | sonnet | feat/install-ux-polish | IUV-02-03 | 10K | |
These are MVP-level checks that don't belong to any single workstream. Updated by the orchestrator at each session.
## Milestone 3 — Provider-first intelligent flow + drill-down main menu (IUV-M03)
| id | status | description | notes |
| ------- | ----------- | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| MVP-T01 | done | Author MVP-level manifest at `docs/MISSION-MANIFEST.md` | This session (2026-04-19); PR pending |
| MVP-T02 | done | Archive install-ux-v2 mission state to `docs/archive/missions/install-ux-v2-20260405/` | IUV-M03 retroactively closed (shipped via PR #446 + releases 0.0.27→0.0.29) |
| MVP-T03 | done | Land federation v1 planning artifacts on `main` | PR #468 merged 2026-04-19 (commit `66512550`) |
| MVP-T04 | not-started | Sync `.mosaic/orchestrator/mission.json` MVP slot with this manifest (milestone enumeration, etc.) | Coord state file; consider whether to repopulate via `mosaic coord` or accept hand-edit |
| MVP-T05 | in-progress | Kick off W1 / FED-M1 — federated tier infrastructure | Session 16 (2026-04-19): FED-M1-01 in-progress on `feat/federation-m1-tier-config` |
| MVP-T06 | not-started | Declare additional workstreams (web dashboard, TUI/CLI parity, remote control, etc.) as scope solidifies | Track each new workstream by adding a row to the Workstream Rollup |
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ----- | ---------------------- | ---------- | -------- | ------------------------------------------------------------- |
| IUV-03-01 | not-started | Design doc: new first-run state machine — main menu (Plugins / Providers / …), Quick Start vs Custom paths, provider-first flow, intent intake + naming loop | #438 | opus | feat/install-ux-intent | IUV-02-04 | 15K | scratchpad + explicit non-goals |
| IUV-03-02 | not-started | Implement drill-down main menu (Plugins: Recommended / Custom, Providers, …) as the top-level entry point of `mosaic wizard` | #438 | opus | feat/install-ux-intent | IUV-03-01 | 25K | |
| IUV-03-03 | not-started | Quick Start path: curated minimum question set — define the exact baseline, delete everything else from the fast path | #438 | opus | feat/install-ux-intent | IUV-03-02 | 15K | |
| IUV-03-04 | not-started | Provider-first natural-language intake: user describes intent → agent expounds → agent proposes a name (confirmable / overridable) — OpenClaw-style | #438 | opus | feat/install-ux-intent | IUV-03-03 | 25K | offline fallback required (deterministic default name + path) |
| IUV-03-05 | not-started | Preserve backward-compat: headless path (`MOSAIC_ASSUME_YES=1` + env vars) still works end-to-end; `tools/install.sh --yes` unchanged | #438 | opus | feat/install-ux-intent | IUV-03-04 | 10K | |
| IUV-03-06 | not-started | Tests + code review + PR merge + `mosaic-v0.0.27` release | #438 | opus | feat/install-ux-intent | IUV-03-05 | 15K | |
## Pointer to Active Workstream
Active workstream is **W1 — Federation v1**. Workers should:
1. Read [docs/federation/MISSION-MANIFEST.md](./federation/MISSION-MANIFEST.md) for workstream scope
2. Read [docs/federation/TASKS.md](./federation/TASKS.md) for the next pending task
3. Follow per-task agent + tier guidance from the workstream manifest

View File

@@ -0,0 +1,74 @@
# Mission Manifest — Install UX v2
> Persistent document tracking full mission scope, status, and session history.
> Updated by the orchestrator at each phase transition and milestone completion.
## Mission
**ID:** install-ux-v2-20260405
**Statement:** The install-ux-hardening mission shipped the plumbing (uninstall, masked password, hooks consent, unified flow, headless path), but the first real end-to-end run surfaced a critical regression and a collection of UX failings that make the wizard feel neither quick nor intelligent. This mission closes the bootstrap regression as a hotfix, then rethinks the first-run experience around a provider-first, intent-driven flow with a drill-down main menu and a genuinely fast quick-start.
**Phase:** Closed
**Current Milestone:**
**Progress:** 3 / 3 milestones
**Status:** complete
**Last Updated:** 2026-04-19 (archived during MVP manifest authoring; IUV-M03 substantively shipped via PR #446 — drill-down menu + provider-first flow + quick start; releases 0.0.27 → 0.0.29)
**Archived to:** `docs/archive/missions/install-ux-v2-20260405/`
**Parent Mission:** [install-ux-hardening-20260405](./archive/missions/install-ux-hardening-20260405/MISSION-MANIFEST.md) (complete — `mosaic-v0.0.25`)
## Context
Real-run testing of `@mosaicstack/mosaic@0.0.25` uncovered:
1. **Critical:** admin bootstrap fails with HTTP 400 `property email should not exist``bootstrap.controller.ts` uses `import type { BootstrapSetupDto }`, erasing the class at runtime. Nest's `@Body()` falls back to plain `Object` metatype, and ValidationPipe with `forbidNonWhitelisted` rejects every property. One-character fix (drop the `type` keyword), but it blocks the happy path of the release that just shipped.
2. The wizard reports `✔ Wizard complete` and `✔ Done` _after_ the bootstrap 400 — failure only propagates in headless mode (`wizard.ts:147`).
3. The gateway port prompt does not prefill `14242` in the input buffer.
4. `"What is Mosaic?"` intro copy does not mention Pi SDK (the actual agent runtime behind Claude/Codex/OpenCode).
5. CORS origin prompt is confusing — the user should be able to supply an FQDN/hostname and have the system derive the CORS value.
6. Skill / additional feature install section is unusable in practice.
7. Quick-start asks far too many questions to be meaningfully "quick".
8. No drill-down main menu — everything is a linear interrogation.
9. Provider setup happens late and without intelligence. An OpenClaw-style provider-first flow would let the user describe what they want in natural language, have the agent expound on it, and have the agent choose its own name based on that intent.
## Success Criteria
- [x] AC-1: Admin bootstrap completes successfully end-to-end on a fresh install (DTO value import, no forbidNonWhitelisted regression); covered by an integration or e2e test that exercises the real DTO binding. _(PR #440)_
- [x] AC-2: Wizard fails loudly (non-zero exit, clear error) when the bootstrap stage returns `completed: false`, in both interactive and headless modes. No more silent `✔ Wizard complete` after a 400. _(PR #440)_
- [x] AC-3: Gateway port prompt prefills `14242` in the input field (user can press Enter to accept). _(PR #440)_
- [x] AC-4: `"What is Mosaic?"` intro copy mentions Pi SDK as the underlying agent runtime. _(PR #440)_
- [x] AC-5: Release `mosaic-v0.0.26` tagged and published to the Gitea npm registry, unblocking the 0.0.25 happy path. _(tag: mosaic-v0.0.26, registry: 0.0.26 live)_
- [ ] AC-6: CORS origin prompt replaced with FQDN/hostname input; CORS string is derived from that.
- [ ] AC-7: Skill / additional feature install section is reworked until it is actually usable end-to-end (worker defines the concrete failure modes during diagnosis).
- [ ] AC-8: First-run flow has a drill-down main menu with at least `Plugins` (Recommended / Custom), `Providers`, and the other top-level configuration groups. Linear interrogation is gone.
- [ ] AC-9: `Quick Start` path completes with a minimal, curated set of questions (target: under 90 seconds for a returning user; define the exact baseline during design).
- [ ] AC-10: Provider setup happens first, driven by a natural-language intake prompt. The agent expounds on the user's intent and chooses its own name based on that intent (OpenClaw-style). Naming is confirmable / overridable.
- [ ] AC-11: All milestones ship as merged PRs with green CI and closed issues.
## Milestones
| # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------- | ------------------------------------------------------------ | -------- | ---------------------- | ----- | ---------- | ---------- |
| 1 | IUV-M01 | Hotfix: bootstrap DTO + wizard failure + port prefill + copy | complete | fix/bootstrap-hotfix | #436 | 2026-04-05 | 2026-04-05 |
| 2 | IUV-M02 | UX polish: CORS/FQDN, skill installer rework | complete | feat/install-ux-polish | #437 | 2026-04-05 | 2026-04-05 |
| 3 | IUV-M03 | Provider-first intelligent flow + drill-down main menu | complete | feat/install-ux-intent | #438 | 2026-04-05 | 2026-04-19 |
## Subagent Delegation Plan
| Milestone | Recommended Tier | Rationale |
| --------- | ---------------- | --------------------------------------------------------------------- |
| IUV-M01 | sonnet | Tight bug cluster with known fix sites + small release cycle |
| IUV-M02 | sonnet | UX rework, moderate surface, diagnostic-heavy for the skill installer |
| IUV-M03 | opus | Architectural redesign of first-run flow, state machine + LLM intake |
## Risks
- **Hotfix regression surface** — the `import type``import` fix on the DTO class is one character but needs an integration test that binds the real DTO, not just a controller unit test, to prevent the same class-erasure regression from sneaking back in.
- **LLM-driven intake latency / offline** — M03's provider-first intent flow assumes an available LLM call to expound on user input and choose a name. Offline installs need a deterministic fallback.
- **Menu vs. linear back-compat** — M03 changes the top-level flow shape; existing `tools/install.sh --yes` + env-var headless path must continue to work.
- **Scope creep in M03** — "redesign the wizard" can absorb arbitrary work. Keep it bounded with explicit non-goals.
## Out of Scope
- Migrating the wizard to a GUI / web UI (still terminal-first)
- Replacing the Gitea registry or the Woodpecker publish pipeline
- Multi-tenant / multi-user onboarding (still single-admin bootstrap)
- Reworking `mosaic uninstall` (M01 of the parent mission — stable)

View File

@@ -0,0 +1,39 @@
# Tasks — Install UX v2
> Single-writer: orchestrator only. Workers read but never modify.
>
> **Mission:** install-ux-v2-20260405
> **Schema:** `| id | status | description | issue | agent | branch | depends_on | estimate | notes |`
> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed` | `needs-qa`
> **Agent values:** `codex` | `sonnet` | `haiku` | `opus` | `—` (auto)
## Milestone 1 — Hotfix: bootstrap DTO + wizard failure + port prefill + copy (IUV-M01)
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | ------ | -------------------- | ---------- | -------- | --------------------------------------------------------------------------------------- |
| IUV-01-01 | done | Fix `apps/gateway/src/admin/bootstrap.controller.ts:16` — switch `import type { BootstrapSetupDto }` to a value import so Nest's `@Body()` binds the real class | #436 | sonnet | fix/bootstrap-hotfix | — | 3K | PR #440 merged `0ae932ab` |
| IUV-01-02 | done | Add integration / e2e test that POSTs `/api/bootstrap/setup` with `{name,email,password}` against a real Nest app instance and asserts 201 — NOT a mocked controller unit test | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-01 | 10K | `apps/gateway/src/admin/bootstrap.e2e.spec.ts` — 4 tests; unplugin-swc added for vitest |
| IUV-01-03 | done | `packages/mosaic/src/wizard.ts:147` — propagate `!bootstrapResult.completed` as a wizard failure in **interactive** mode too (not only headless); non-zero exit + no `✔ Wizard complete` line | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-02 | 5K | removed `&& headlessRun` guard |
| IUV-01-04 | done | Gateway port prompt prefills `14242` in the input buffer — investigate why `promptPort`'s `defaultValue` isn't reaching the user-visible input | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-03 | 5K | added `initialValue` through prompter interface → clack |
| IUV-01-05 | done | `"What is Mosaic?"` intro copy updated to mention Pi SDK as the underlying agent runtime (alongside Claude Code / Codex / OpenCode) | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-04 | 2K | `packages/mosaic/src/stages/welcome.ts` |
| IUV-01-06 | done | Tests + code review + PR merge + tag `mosaic-v0.0.26` + Gitea release + npm registry republish | #436 | sonnet | fix/bootstrap-hotfix | IUV-01-05 | 10K | PRs #440/#441/#442 merged; tag `mosaic-v0.0.26`; registry latest=0.0.26 ✓ |
## Milestone 2 — UX polish: CORS/FQDN, skill installer rework (IUV-M02)
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------- | ---------- | -------- | ---------------------------------------------------------------------- |
| IUV-02-01 | done | Replace CORS origin prompt with FQDN / hostname input; derive the CORS value internally; default to `localhost` with clear help text | #437 | sonnet | feat/install-ux-polish | — | 10K | `deriveCorsOrigin()` pure fn; MOSAIC_HOSTNAME headless var; PR #444 |
| IUV-02-02 | done | Diagnose and document the concrete failure modes of the current skill / additional feature install section end-to-end | #437 | sonnet | feat/install-ux-polish | IUV-02-01 | 8K | selection→install gap, silent catch{}, no whitelist concept |
| IUV-02-03 | done | Rework the skill installer so it is usable end-to-end (selection, install, verify, failure reporting) | #437 | sonnet | feat/install-ux-polish | IUV-02-02 | 20K | MOSAIC_INSTALL_SKILLS env var whitelist; SyncSkillsResult typed return |
| IUV-02-04 | done | Tests + code review + PR merge | #437 | sonnet | feat/install-ux-polish | IUV-02-03 | 10K | 18 new tests (13 CORS + 5 skills); PR #444 merged `172bacb3` |
## Milestone 3 — Provider-first intelligent flow + drill-down main menu (IUV-M03)
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ----- | ---------------------- | ---------- | -------- | ------------------------------------------------------------- |
| IUV-03-01 | not-started | Design doc: new first-run state machine — main menu (Plugins / Providers / …), Quick Start vs Custom paths, provider-first flow, intent intake + naming loop | #438 | opus | feat/install-ux-intent | — | 15K | scratchpad + explicit non-goals |
| IUV-03-02 | not-started | Implement drill-down main menu (Plugins: Recommended / Custom, Providers, …) as the top-level entry point of `mosaic wizard` | #438 | opus | feat/install-ux-intent | IUV-03-01 | 25K | |
| IUV-03-03 | not-started | Quick Start path: curated minimum question set — define the exact baseline, delete everything else from the fast path | #438 | opus | feat/install-ux-intent | IUV-03-02 | 15K | |
| IUV-03-04 | not-started | Provider-first natural-language intake: user describes intent → agent expounds → agent proposes a name (confirmable / overridable) — OpenClaw-style | #438 | opus | feat/install-ux-intent | IUV-03-03 | 25K | offline fallback required (deterministic default name + path) |
| IUV-03-05 | not-started | Preserve backward-compat: headless path (`MOSAIC_ASSUME_YES=1` + env vars) still works end-to-end; `tools/install.sh --yes` unchanged | #438 | opus | feat/install-ux-intent | IUV-03-04 | 10K | |
| IUV-03-06 | not-started | Tests + code review + PR merge + `mosaic-v0.0.27` release | #438 | opus | feat/install-ux-intent | IUV-03-05 | 15K | |

View File

@@ -0,0 +1,227 @@
# IUV-M03 Design: Provider-first intelligent flow + drill-down main menu
**Issue:** #438
**Branch:** `feat/install-ux-intent`
**Date:** 2026-04-05
## 1. New first-run state machine
The linear 12-stage interrogation is replaced with a menu-driven architecture.
### Flow overview
```
Welcome banner
|
v
Detect existing install (auto)
|
v
Main Menu (loop)
|-- Quick Start -> provider key + admin creds -> finalize
|-- Providers -> LLM API key config
|-- Agent Identity -> intent intake + naming (deterministic)
|-- Skills -> recommended / custom selection
|-- Gateway -> port, storage tier, hostname, CORS
|-- Advanced -> SOUL.md, USER.md, TOOLS.md, runtimes, hooks
|-- Finish & Apply -> finalize + gateway bootstrap
v
Done
```
### Menu navigation
- Main menu is a `select` prompt. Each option drills into a sub-flow.
- Completing a section returns to the main menu.
- Menu items show completion state: `[done]` hint after configuration.
- `Finish & Apply` is always last and requires at minimum a provider key (or explicit skip).
- The menu tracks configured sections in `WizardState.completedSections`.
### Headless bypass
When `MOSAIC_ASSUME_YES=1` or `!process.stdin.isTTY`, the entire menu is skipped.
The wizard runs: defaults + env var overrides -> finalize -> gateway config -> bootstrap.
This preserves full backward compatibility with `tools/install.sh --yes`.
## 2. Quick Start path
Target: 3-5 questions max. Under 90 seconds for a returning user.
### Questions asked
1. **Provider API key** (Anthropic/OpenAI) - `text` prompt with paste support
2. **Admin email** - `text` prompt
3. **Admin password** - masked + confirmed
### Questions skipped (with defaults)
| Setting | Default | Rationale |
| ---------------------------- | ------------------------------- | ---------------------- |
| Agent name | "Mosaic" | Generic but branded |
| Port | 14242 | Standard default |
| Storage tier | local | No external deps |
| Hostname | localhost | Dev-first |
| CORS origin | http://localhost:3000 | Standard web UI port |
| Skills | recommended set | Curated by maintainers |
| Runtimes | auto-detected | No user input needed |
| Communication style | direct | Most popular choice |
| SOUL.md / USER.md / TOOLS.md | template defaults | Can customize later |
| Hooks | auto-install if Claude detected | Safe default |
### Flow
```
Quick Start selected
-> "Paste your LLM API key (Anthropic recommended):"
-> [auto-detect provider from key prefix: sk-ant-* = Anthropic, sk-* = OpenAI]
-> Apply all defaults
-> Run finalize (sync framework, write configs, link assets, sync skills)
-> Run gateway config (headless-style with defaults + provided key)
-> "Admin email:"
-> "Admin password:" (masked + confirm)
-> Run gateway bootstrap
-> Done
```
## 3. Provider-first flow
Provider configuration (currently buried in gateway-config stage as "ANTHROPIC_API_KEY")
moves to a dedicated top-level menu item and is the first question in Quick Start.
### Provider detection
The API key prefix determines the provider:
- `sk-ant-api03-*` -> Anthropic (Claude)
- `sk-*` -> OpenAI
- Empty/skipped -> no provider (gateway starts without LLM access)
### Storage
The provider key is stored in the gateway `.env` as `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`.
For Quick Start, this replaces the old interactive prompt in `collectAndWriteConfig`.
### Menu section: "Providers"
In the drill-down menu, "Providers" lets users:
1. Enter/change their API key
2. See which provider was detected
3. Optionally configure a second provider
For v0.0.27, we support Anthropic and OpenAI keys only. The key is stored
in `WizardState` and written during finalize.
## 4. Intent intake + naming (deterministic fallback - Option B)
### Rationale
At install time, the LLM provider may not be configured yet (chicken-and-egg).
We use **Option B: deterministic advisor** for the install wizard.
### Flow (Agent Identity menu section)
```
1. "What will this agent primarily help you with?"
-> Select from presets:
- General purpose assistant
- Software development
- DevOps & infrastructure
- Research & analysis
- Content & writing
- Custom (free text description)
2. System proposes a thematic name based on selection:
- General purpose -> "Mosaic"
- Software development -> "Forge"
- DevOps & infrastructure -> "Sentinel"
- Research & analysis -> "Atlas"
- Content & writing -> "Muse"
- Custom -> "Mosaic" (default)
3. "Your agent will be named 'Forge'. Press Enter to accept or type a new name:"
-> User confirms or overrides
```
### Storage
- Agent name -> `WizardState.soul.agentName` -> written to SOUL.md
- Intent category -> `WizardState.agentIntent` (new field) -> written to `~/.config/mosaic/agent.json`
### Post-install LLM-powered intake (future)
A future `mosaic configure identity` command can use the configured LLM to:
- Accept free-text intent description
- Generate an expounded persona
- Propose a contextual name
This is explicitly out of scope for the install wizard.
## 5. Headless backward-compat
### Supported env vars (unchanged)
| Variable | Used by |
| -------------------------- | ---------------------------------------------- |
| `MOSAIC_ASSUME_YES=1` | Skip all prompts, use defaults + env overrides |
| `MOSAIC_ADMIN_NAME` | Gateway bootstrap |
| `MOSAIC_ADMIN_EMAIL` | Gateway bootstrap |
| `MOSAIC_ADMIN_PASSWORD` | Gateway bootstrap |
| `MOSAIC_GATEWAY_PORT` | Gateway config |
| `MOSAIC_HOSTNAME` | Gateway config (CORS derivation) |
| `MOSAIC_CORS_ORIGIN` | Gateway config (full override) |
| `MOSAIC_STORAGE_TIER` | Gateway config (local/team) |
| `MOSAIC_DATABASE_URL` | Gateway config (team tier) |
| `MOSAIC_VALKEY_URL` | Gateway config (team tier) |
| `MOSAIC_ANTHROPIC_API_KEY` | Provider config |
### New env vars
| Variable | Purpose |
| --------------------- | ----------------------------------------- |
| `MOSAIC_AGENT_NAME` | Override agent name in headless mode |
| `MOSAIC_AGENT_INTENT` | Override intent category in headless mode |
### `tools/install.sh --yes`
The install script sets `MOSAIC_ASSUME_YES=1` and passes through env vars.
No changes needed to the script itself. The new wizard detects headless mode
at the top of `runWizard` and runs a linear path identical to the old flow.
## 6. Explicit non-goals
- **No GUI** — this is a terminal wizard only
- **No multi-user install** — single-user, single-machine
- **No registry changes** — npm publish flow is unchanged
- **No LLM calls during install** — deterministic fallback only
- **No new dependencies** — uses existing @clack/prompts and picocolors
- **No changes to gateway API** — only the wizard orchestration changes
- **No changes to tools/install.sh** — headless compat maintained via env vars
## 7. Implementation plan
### Files to modify
1. `packages/mosaic/src/types.ts` — add `MenuSection`, `AgentIntent`, `completedSections`, `agentIntent`, `providerKey`, `providerType` to WizardState
2. `packages/mosaic/src/wizard.ts` — replace linear flow with menu loop
3. `packages/mosaic/src/stages/mode-select.ts` — becomes the main menu
4. `packages/mosaic/src/stages/provider-setup.ts` — new: provider key collection
5. `packages/mosaic/src/stages/agent-intent.ts` — new: intent intake + naming
6. `packages/mosaic/src/stages/menu-gateway.ts` — new: gateway sub-menu wrapper
7. `packages/mosaic/src/stages/quick-start.ts` — new: quick start linear path
8. `packages/mosaic/src/constants.ts` — add intent presets and name mappings
9. `packages/mosaic/package.json` — version bump 0.0.26 -> 0.0.27
### Files to add (tests)
1. `packages/mosaic/src/stages/wizard-menu.spec.ts` — menu navigation tests
2. `packages/mosaic/src/stages/quick-start.spec.ts` — quick start path tests
3. `packages/mosaic/src/stages/agent-intent.spec.ts` — intent + naming tests
4. `packages/mosaic/src/stages/provider-setup.spec.ts` — provider detection tests
### Migration strategy
The existing stage functions remain intact. The menu system wraps them —
each menu item calls the appropriate stage function(s). The linear headless
path calls them in the same order as before.

View File

@@ -107,3 +107,67 @@ Sequencing: strict. M01 ships first as a hotfix release (mosaic-v0.0.26). M02 is
1. Create Gitea issues for M01, M02, M03
2. Open the mission-scaffold docs PR (same pattern as parent mission's PR #430)
3. After merge, delegate IUV-M01 to a sonnet subagent in an isolated worktree with the concrete fix-site pointers above
## Session 2 — 2026-04-05 (IUV-M01 delivery + close-out)
### Outcome
IUV-M01 shipped. `mosaic-v0.0.26` released and registry latest confirmed `0.0.26`.
### PRs merged
| PR | Title | Merge |
| ---- | ------------------------------------------------------------------------ | -------- |
| #440 | fix: bootstrap hotfix — DTO erasure, wizard failure, port prefill, copy | 0ae932ab |
| #441 | fix: add vitest.config.ts to eslint allowDefaultProject (#440 build fix) | c08aa6fa |
| #442 | docs: mark IUV-M01 complete — mosaic-v0.0.26 released | 78388437 |
### Bugs fixed (all 4 in worker's PR #440)
1. **DTO class erasure**`apps/gateway/src/admin/bootstrap.controller.ts:16` — dropped `type` from `import { BootstrapSetupDto }`. Guarded by new e2e test `bootstrap.e2e.spec.ts` (4 cases) that binds through a real Nest app with `ValidationPipe { whitelist, forbidNonWhitelisted }`. Test suite needed `unplugin-swc` in `apps/gateway/vitest.config.ts` to emit `decoratorMetadata` (tsx/esbuild can't).
2. **Wizard silent failure**`packages/mosaic/src/wizard.ts` — removed the `&& headlessRun` guard so `!bootstrapResult.completed` now aborts in both modes.
3. **Port prefill** — root cause was clack's `defaultValue` vs `initialValue` semantics (`defaultValue` only fills on empty submit, `initialValue` prefills the buffer). Added an `initialValue` field to `WizardPrompter.text()` interface, threaded through clack and headless prompters, switched `gateway-config.ts` port/url prompts to use it.
4. **Pi SDK copy**`packages/mosaic/src/stages/welcome.ts` — intro copy now lists Pi SDK.
### Mid-delivery hiccup — tsconfig/eslint cross-contamination
Worker's initial approach added `vitest.config.ts` to `apps/gateway/tsconfig.json`'s `include` to appease the eslint parser. That broke `pnpm --filter @mosaicstack/gateway build` with TS6059 (`vitest.config.ts` outside `rootDir: "src"`). The publish pipeline on the `#440` merge commit failed.
**Correct fix** (worker's PR #441): leave `tsconfig.json` clean (`include: ["src/**/*"]`) and instead add the file to `allowDefaultProject` in the root `eslint.config.mjs`. This keeps the tsc program strict while letting eslint resolve a parser project for the standalone config file.
**Pattern to remember**: when adding root-level `.ts` config files (vitest, build scripts) to a package with `rootDir: "src"`, the eslint parser project conflict is solved with `allowDefaultProject`, NEVER by widening tsconfig include. I had independently arrived at the same fix on a branch before the worker shipped #441 — deleted the duplicate.
### Residual follow-ups carried forward
1. Headless prompter fallback order: worker set `initialValue > defaultValue` in the headless path. Correct semantic, but any future headless test that explicitly depends on `defaultValue` precedence will need review.
2. Vitest + SWC decorator metadata pattern is now the blessed approach for NestJS e2e tests in this monorepo. Any other package that adds NestJS e2e tests should mirror `apps/gateway/vitest.config.ts`.
### Next action
- Close out orchestrator doc sync (this commit): mark M01 subtasks done in `TASKS.md`, update manifest phase to Execution, commit scratchpad session 2, PR to main.
- After merge, delegate IUV-M02 (sonnet, isolated worktree). Dependencies: IUV-02-01 (CORS→FQDN) starts unblocked since M01 is released; first real task for the M02 worker is diagnosing the skill installer failure modes (IUV-02-02) against the fresh 0.0.26 install.
## Session 3 — 2026-04-05 (IUV-M02 delivery + close-out)
### Outcome
IUV-M02 shipped. PR #444 merged (`172bacb3`), issue #437 closed. 18 new tests (13 CORS derivation, 5 skill sync).
### Changes
**CORS → FQDN (IUV-02-01):**
- `packages/mosaic/src/stages/gateway-config.ts` — replaced raw "CORS origin" text prompt with "Web UI hostname" (default: `localhost`). Added HTTPS follow-up for remote hosts. Pure `deriveCorsOrigin(hostname, port, useHttps?)` function exported for testability.
- Headless: `MOSAIC_HOSTNAME` env var as friendly alternative; `MOSAIC_CORS_ORIGIN` still works as full override.
- `packages/mosaic/src/types.ts` — added `hostname?: string` to `GatewayState`.
**Skill installer rework (IUV-02-02 + IUV-02-03):**
- Root cause confirmed: `syncSkills()` in `finalize.ts` ignored `state.selectedSkills` entirely. The multiselect UI was a no-op.
- `packages/mosaic/src/stages/finalize.ts``syncSkills()` rewritten to accept `selectedSkills[]`, returns typed `SyncSkillsResult`, passes `MOSAIC_INSTALL_SKILLS` (colon-separated) as env var to the bash script.
- `packages/mosaic/framework/tools/_scripts/mosaic-sync-skills` — added bash associative array whitelist filter keyed on `MOSAIC_INSTALL_SKILLS`. When set, only whitelisted skills are linked. Empty/unset = all skills (legacy behavior preserved for `mosaic sync` outside wizard).
- Failure surfaces: silent `catch {}` replaced with typed error reporting through `p.warn()`.
### Next action
- Delegate IUV-M03 (opus, isolated worktree) — the architectural milestone: provider-first intelligent flow, drill-down main menu, Quick Start fast path, agent self-naming. This is the biggest piece of the mission.

View File

@@ -0,0 +1,368 @@
# Mosaic Stack — Federation Implementation Milestones
**Companion to:** `PRD.md`
**Approach:** Each milestone is a verifiable slice. A milestone is "done" only when its acceptance tests pass in CI against a real (not mocked) dependency stack.
---
## Milestone Dependency Graph
```
M1 (federated tier infra)
└── M2 (Step-CA + grant schema + CLI)
└── M3 (mTLS handshake + list/get + scope enforcement)
├── M4 (search + audit + rate limit)
│ └── M5 (cache + offline degradation + OTEL)
├── M6 (revocation + auto-renewal) ◄── can start after M3
└── M7 (multi-user hardening + e2e suite) ◄── depends on M4+M5+M6
```
M5 and M6 can run in parallel once M4 is merged.
---
## Test Strategy (applies to all milestones)
Three layers, all required before a milestone ships:
| Layer | Scope | Runtime |
| ------------------ | --------------------------------------------- | ------------------------------------------------------------------------ |
| **Unit** | Per-module logic, pure functions, adapters | Vitest, no I/O |
| **Integration** | Single gateway against real PG/Valkey/Step-CA | Vitest + Docker Compose test profile |
| **Federation E2E** | Two gateways on a Docker network, real mTLS | Playwright/custom harness (`tools/federation-harness/`) introduced in M3 |
Every milestone adds tests to these layers. A milestone cannot be claimed complete if the federation E2E harness fails (applies from M3 onward).
**Quality gates per milestone** (same as stack-wide):
- `pnpm typecheck` green
- `pnpm lint` green
- `pnpm test` green (unit + integration)
- `pnpm test:federation` green (M3+)
- Independent code review passed
- Docs updated (`docs/federation/`)
- Merged PR on `main`, CI terminal green, linked issue closed
---
## M1 — Federated Tier Infrastructure
**Goal:** A gateway can run in `federated` tier with containerized Postgres + Valkey + pgvector, with no federation logic active yet.
**Scope:**
- Add `"tier": "federated"` to `mosaic.config.json` schema and validators
- Docker Compose `federated` profile (`docker-compose.federated.yml`) adds: Postgres+pgvector (5433), Valkey (6380), dedicated volumes
- Tier detector in gateway bootstrap: reads config, asserts required services reachable, refuses to start otherwise
- `pgvector` extension installed + verified on startup
- Migration logic: safe upgrade path from `local`/`standalone``federated` (data export/import script, one-way)
- `mosaic doctor` reports tier + service health
- Gateway continues to serve as a normal standalone instance (no federation yet)
**Deliverables:**
- `mosaic.config.json` schema v2 (tier enum includes `federated`)
- `apps/gateway/src/bootstrap/tier-detector.ts`
- `docker-compose.federated.yml`
- `scripts/migrate-to-federated.ts`
- Updated `mosaic doctor` output
- Updated `packages/storage/src/adapters/postgres.ts` with pgvector support
**Acceptance tests:**
| # | Test | Layer |
| - | ---------------------------------------------------------------------------------------- | ----------- |
| 1 | Gateway boots in `federated` tier with all services present | Integration |
| 2 | Gateway refuses to boot in `federated` tier when Postgres unreachable (fail-fast, clear) | Integration |
| 3 | `pgvector` extension available in target DB (`SELECT * FROM pg_extension WHERE extname='vector'`) | Integration |
| 4 | Migration script moves a populated `local` (PGlite) instance to `federated` (Postgres) with no data loss | Integration |
| 5 | `mosaic doctor` reports correct tier and all services green | Unit |
| 6 | Existing standalone behavior regression: agent session works end-to-end, no federation references | E2E (single-gateway) |
**Estimated budget:** ~20K tokens (infra + config + migration script)
**Risk notes:** Pgvector install on existing PG installs is occasionally finicky; test the migration path on a realistic DB snapshot.
---
## M2 — Step-CA + Grant Schema + Admin CLI
**Goal:** An admin can create a federation grant and its counterparty can enroll. No runtime traffic flows yet.
**Scope:**
- Embed Step-CA as a Docker Compose sidecar with a persistent CA volume
- Gateway exposes a short-lived enrollment endpoint (single-use token from the grant)
- DB schema: `federation_grants`, `federation_peers`, `federation_audit_log` (table only, not yet written to)
- Sealed storage for `client_key_pem` using the existing credential sealing key
- Admin CLI:
- `mosaic federation grant create --user <id> --peer <host> --scope <file>`
- `mosaic federation grant list`
- `mosaic federation grant show <id>`
- `mosaic federation peer add <enrollment-url>`
- `mosaic federation peer list`
- Step-CA signs the cert with SAN OIDs for `grantId` + `subjectUserId`
- Grant status transitions: `pending``active` on successful enrollment
**Deliverables:**
- `packages/db` migration: three federation tables + enum types
- `apps/gateway/src/federation/ca.service.ts` (Step-CA client)
- `apps/gateway/src/federation/grants.service.ts`
- `apps/gateway/src/federation/enrollment.controller.ts`
- `packages/mosaic/src/commands/federation/` (grant + peer subcommands)
- `docker-compose.federated.yml` adds Step-CA service
- Scope JSON schema + validator
**Acceptance tests:**
| # | Test | Layer |
| - | ---------------------------------------------------------------------------------------- | ----------- |
| 1 | `grant create` writes a `pending` row with a scoped bundle | Integration |
| 2 | Enrollment endpoint signs a CSR and returns a cert with expected SAN OIDs | Integration |
| 3 | Enrollment token is single-use; second attempt returns 410 | Integration |
| 4 | Cert `subjectUserId` OID matches the grant's `subject_user_id` | Unit |
| 5 | `client_key_pem` is at-rest encrypted; raw DB read shows ciphertext, not PEM | Integration |
| 6 | `peer add <url>` on Server A yields an `active` peer record with a valid cert + key | E2E (two gateways, no traffic) |
| 7 | Scope JSON with unknown resource type rejected at `grant create` | Unit |
| 8 | `grant list` and `peer list` render active / pending / revoked accurately | Unit |
**Estimated budget:** ~30K tokens (schema + CA integration + CLI + sealing)
**Risk notes:** Step-CA's API surface is well-documented but the sealing integration with existing provider-credential encryption is a cross-module concern — walk that seam deliberately.
---
## M3 — mTLS Handshake + `list` + `get` with Scope Enforcement
**Goal:** Two federated gateways exchange real data over mTLS with scope intersecting native RBAC.
**Scope:**
- `FederationClient` (outbound): picks cert from `federation_peers`, does mTLS call
- `FederationServer` (inbound): NestJS guard validates client cert, extracts `grantId` + `subjectUserId`, loads grant
- Scope enforcement pipeline:
1. Resource allowlist / excluded-list check
2. Native RBAC evaluation as the `subjectUserId`
3. Scope filter intersection (`include_teams`, `include_personal`)
4. `max_rows_per_query` cap
- Verbs: `list`, `get`, `capabilities`
- Gateway query layer accepts `source: "local" | "federated:<host>" | "all"`; fan-out for `"all"`
- **Federation E2E harness** (`tools/federation-harness/`): docker-compose.two-gateways.yml, seed script, assertion helpers — this is its own deliverable
**Deliverables:**
- `apps/gateway/src/federation/client/federation-client.service.ts`
- `apps/gateway/src/federation/server/federation-auth.guard.ts`
- `apps/gateway/src/federation/server/scope.service.ts`
- `apps/gateway/src/federation/server/verbs/{list,get,capabilities}.controller.ts`
- `apps/gateway/src/federation/client/query-source.service.ts` (fan-out/merge)
- `tools/federation-harness/` (compose + seed + test helpers)
- `packages/types` — federation request/response DTOs in `federation.dto.ts`
**Acceptance tests:**
| # | Test | Layer |
| -- | -------------------------------------------------------------------------------------------------------- | ----- |
| 1 | A→B `list tasks` returns subjectUser's tasks intersected with scope | E2E |
| 2 | A→B `list tasks` with `include_teams: [T1]` excludes T2 tasks the user owns | E2E |
| 3 | A→B `get credential <id>` returns 403 when `credentials` is in `excluded_resources` | E2E |
| 4 | Client presenting cert for grant X cannot query subjectUser of grant Y (cross-user isolation) | E2E |
| 5 | Cert signed by untrusted CA rejected at TLS layer (no NestJS handler reached) | E2E |
| 6 | Malformed SAN OIDs → 401; cert valid but grant revoked in DB → 403 | Integration |
| 7 | `max_rows_per_query` caps response; request for more paginated | Integration |
| 8 | `source: "all"` fan-out merges local + federated results, each tagged with `_source` | Integration |
| 9 | Federation responses never persist: verify DB row count unchanged after `list` round-trip | E2E |
| 10 | Scope cannot grant more than native RBAC: user without access to team T still gets [] even if scope allows T | E2E |
**Estimated budget:** ~40K tokens (largest milestone — core federation logic + harness)
**Risk notes:** This is the critical trust boundary. Code review should focus on scope enforcement bypass and cert-SAN-spoofing paths. Every 403/401 path needs a test.
---
## M4 — `search` Verb + Audit Log + Rate Limit
**Goal:** Keyword search over allowed resources with full audit and per-grant rate limiting.
**Scope:**
- `search` verb across `resources` allowlist (intersection of scope + native RBAC)
- Keyword search (reuse existing `packages/memory/src/adapters/keyword.ts`); pgvector search stays out of v1 search verb
- Every federated request (all verbs) writes to `federation_audit_log`: `grant_id`, `verb`, `resource`, `query_hash`, `outcome`, `bytes_out`, `latency_ms`
- No request body captured; `query_hash` is SHA-256 of normalized query params
- Token-bucket rate limit per grant (default 60/min, override per grant)
- 429 response with `Retry-After` header and structured body
- 90-day hot retention for audit log; cold-tier rollover deferred to M7
**Deliverables:**
- `apps/gateway/src/federation/server/verbs/search.controller.ts`
- `apps/gateway/src/federation/server/audit.service.ts` (async write, no blocking)
- `apps/gateway/src/federation/server/rate-limit.guard.ts`
- Tests in harness
**Acceptance tests:**
| # | Test | Layer |
| - | ------------------------------------------------------------------------------------------------- | ----------- |
| 1 | `search` returns ranked hits only from allowed resources | E2E |
| 2 | `search` excluding `credentials` does not return a match even when keyword matches a credential name | E2E |
| 3 | Every successful request appears in `federation_audit_log` within 1s | Integration |
| 4 | Denied request (403) is also audited with `outcome='denied'` | Integration |
| 5 | Audit row stores query hash but NOT query body | Unit |
| 6 | 61st request in 60s window returns 429 with `Retry-After` | E2E |
| 7 | Per-grant override (e.g., 600/min) takes effect without restart | Integration |
| 8 | Audit writes are async: request latency unchanged when audit write slow (simulated) | Integration |
**Estimated budget:** ~20K tokens
**Risk notes:** Ensure audit writes can't block or error-out the request path; use a bounded queue and drop-with-counter pattern rather than in-line writes.
---
## M5 — Cache + Offline Degradation + Observability
**Goal:** Sessions feel fast and stay useful when the peer is slow or down.
**Scope:**
- In-memory response cache keyed by `(grant_id, verb, resource, query_hash)`, TTL 30s default
- Cache NOT used for `search`; only `list` and `get`
- Cache flushed on cert rotation and grant revocation
- Circuit breaker per peer: after N failures, fast-fail for cooldown window
- `_source` tagging extended with `_cached: true` when served from cache
- Agent-visible "federation offline for `<peer>`" signal emitted once per session per peer
- OTEL spans: `federation.request` with attrs `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`, `cached`
- W3C `traceparent` propagated across the mTLS boundary (both directions)
- `mosaic federation status` CLI subcommand
**Deliverables:**
- `apps/gateway/src/federation/client/response-cache.service.ts`
- `apps/gateway/src/federation/client/circuit-breaker.service.ts`
- `apps/gateway/src/federation/observability/` (span helpers)
- `packages/mosaic/src/commands/federation/status.ts`
**Acceptance tests:**
| # | Test | Layer |
| - | --------------------------------------------------------------------------------------------- | ----- |
| 1 | Two identical `list` calls within 30s: second served from cache, flagged `_cached` | Integration |
| 2 | `search` is never cached: two identical searches both hit the peer | Integration |
| 3 | After grant revocation, peer's cache is flushed immediately | Integration |
| 4 | After N consecutive failures, circuit opens; subsequent requests fail-fast without network call | E2E |
| 5 | Circuit closes after cooldown and next success | E2E |
| 6 | With peer offline, session completes using local data, one "federation offline" signal surfaced | E2E |
| 7 | OTEL traces show spans on both gateways correlated by `traceparent` | E2E |
| 8 | `mosaic federation status` prints peer state, cert expiry, last success/failure, circuit state | Unit |
**Estimated budget:** ~20K tokens
**Risk notes:** Caching correctness under revocation must be provable — write tests that intentionally race revocation against cached hits.
---
## M6 — Revocation, Auto-Renewal, CRL
**Goal:** Grant lifecycle works end-to-end: admin revoke, revoke-on-delete, automatic cert renewal, CRL distribution.
**Scope:**
- `mosaic federation grant revoke <id>` → status `revoked`, CRL updated, audit entry
- DB hook: deleting a user cascades `revoke-on-delete` on all grants where that user is subject
- Step-CA CRL endpoint exposed; serving gateway enforces CRL check on every handshake (cached CRL, refresh interval 60s)
- Client-side cert renewal job: at T-7 days, submit renewal CSR; rotate cert atomically; flush cache
- On renewal failure, peer marked `degraded` and admin-visible alert emitted
- Server A detects revocation on next request (TLS handshake fails with specific error) → peer marked `revoked`, user notified
**Deliverables:**
- `apps/gateway/src/federation/server/crl.service.ts` + endpoint
- `apps/gateway/src/federation/server/revocation.service.ts`
- DB cascade trigger or ORM hook for user deletion → grant revocation
- `apps/gateway/src/federation/client/renewal.job.ts` (scheduled)
- `packages/mosaic/src/commands/federation/grant.ts` gains `revoke` subcommand
**Acceptance tests:**
| # | Test | Layer |
| - | ----------------------------------------------------------------------------------------- | ----- |
| 1 | Admin `grant revoke` → A's next request fails with TLS-level error | E2E |
| 2 | Deleting subject user on B auto-revokes all grants where that user was the subject | Integration |
| 3 | CRL endpoint serves correct list; revoked cert present | Integration |
| 4 | Server rejects cert listed in CRL even if cert itself is still time-valid | E2E |
| 5 | Cert at T-7 days triggers renewal job; new cert issued and installed without dropped requests | E2E |
| 6 | Renewal failure marks peer `degraded` and surfaces alert | Integration |
| 7 | A marks peer `revoked` after a revocation-caused handshake failure (not on transient network errors) | E2E |
**Estimated budget:** ~20K tokens
**Risk notes:** The atomic cert swap during renewal is the sharpest edge here — any in-flight request mid-swap must either complete on old or retry on new, never fail mid-call.
---
## M7 — Multi-User RBAC Hardening + Team-Scoped Grants + Acceptance Suite
**Goal:** The full multi-tenant scenario from §4 user stories works end-to-end, with no cross-user leakage under any circumstance.
**Scope:**
- Three-user scenario on Server B (E1, E2, E3) each with their own Server A
- Team-scoped grants exercised: each employee's team-data visible on their own A, but E1's personal data never visible on E2's A
- User-facing UI surfaces on both gateways for: peer list, grant list, audit log viewer, scope editor
- Negative-path test matrix (every denial path from PRD §8)
- All PRD §15 acceptance criteria mapped to automated tests in the harness
- Security review: cert-spoofing, scope-bypass, audit-bypass paths explicitly tested
- Cold-storage rollover for audit log >90 days
- Docs: operator runbook, onboarding guide, troubleshooting guide
**Deliverables:**
- Full federation acceptance suite in `tools/federation-harness/acceptance/`
- `apps/web` surfaces for peer/grant/audit management
- `docs/federation/RUNBOOK.md`, `docs/federation/ONBOARDING.md`, `docs/federation/TROUBLESHOOTING.md`
- Audit cold-tier job (daily cron, moves rows >90d to separate table or object storage)
**Acceptance tests:**
Every PRD §15 criterion must be automated and green. Additionally:
| # | Test | Layer |
| --- | ----------------------------------------------------------------------------------------------------- | ---------------- |
| 1 | 3-employee scenario: each A sees only its user's data from B | E2E |
| 2 | Grant with team scope returns team data; same grant denied access to another employee's personal data | E2E |
| 3 | Concurrent sessions from E1's and E2's Server A to B interleave without any leakage | E2E |
| 4 | Audit log across 3-user test shows per-grant trails with no mis-attributed rows | E2E |
| 5 | Scope editor UI round-trip: edit → save → next request uses new scope | E2E |
| 6 | Attempt to use a revoked grant's cert against a different grant's endpoint: rejected | E2E |
| 7 | 90-day-old audit rows moved to cold tier; queryable via explicit historical query | Integration |
| 8 | Runbook steps validated: an operator following the runbook can onboard, rotate, and revoke | Manual checklist |
**Estimated budget:** ~25K tokens
**Risk notes:** This is the security-critical milestone. Budget review time here is non-negotiable — plan for two independent code reviews (internal + security-focused) before merge.
---
## Total Budget & Timeline Sketch
| Milestone | Tokens (est.) | Can parallelize? |
| --------- | ------------- | ---------------------- |
| M1 | 20K | No (foundation) |
| M2 | 30K | No (needs M1) |
| M3 | 40K | No (needs M2) |
| M4 | 20K | No (needs M3) |
| M5 | 20K | Yes (with M6 after M4) |
| M6 | 20K | Yes (with M5 after M3) |
| M7 | 25K | No (needs all) |
| **Total** | **~175K** | |
Parallelization of M5 and M6 after M4 saves one milestone's worth of serial time.
---
## Exit Criteria (federation feature complete)
All of the following must be green on `main`:
- Every PRD §15 acceptance criterion automated and passing
- Every milestone's acceptance table green
- Security review sign-off on M7
- Runbook walk-through completed by operator (not author)
- `mosaic doctor` recognizes federated tier and reports peer health accurately
- Two-gateway production deployment (woltje.com ↔ uscllc.com) operational for ≥7 days without incident
---
## Next Step After This Doc Is Approved
1. File tracking issues on `git.mosaicstack.dev/mosaicstack/stack` — one per milestone, labeled `epic:federation`
2. Populate `docs/TASKS.md` with M1's task breakdown (per-task agent assignment, budget, dependencies)
3. Begin M1 implementation

View File

@@ -0,0 +1,101 @@
# Mission Manifest — Federation v1
> Persistent document tracking full mission scope, status, and session history.
> Updated by the orchestrator at each phase transition and milestone completion.
## Mission
**ID:** federation-v1-20260419
**Statement:** Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management.
**Phase:** M2 active — Step-CA + grant schema + admin CLI; parallel test-deploy workstream stood up
**Current Milestone:** FED-M2
**Progress:** 1 / 7 milestones
**Status:** active
**Last Updated:** 2026-04-21 (M2 decomposed; mos-test-1/-2 designated as federation E2E test hosts)
**Parent Mission:** None — new mission
## Test Infrastructure
| Host | Role | Image | Tier |
| ----------------------- | ----------------------------------- | ------------------------------------- | --------- |
| `mos-test-1.woltje.com` | Federation Server A (querying side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
| `mos-test-2.woltje.com` | Federation Server B (serving side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
These are TEST hosts for federation E2E (M3+). Distinct from PRD AC-12 production targets (`woltje.com``uscllc.com`). Deployment workstream tracked in `docs/federation/TASKS.md` under FED-M2-DEPLOY-\*.
## Context
Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign:
1. Makes federation **gateway-to-gateway**, not agent-to-service
2. Keeps each user's home instance as source of truth for their data
3. Exposes scoped, read-only data on demand without persisting across the boundary
4. Uses X.509 mTLS via Step-CA so rotation/revocation/CRL/OCSP are standard
5. Supports multi-tenant serving sides (employees on uscllc.com each federating back to their own home gateway) with no cross-user leakage
6. Requires federation-tier instances on both sides (PG + pgvector + Valkey) — local/standalone tiers cannot federate
7. Works over public HTTPS (no VPN required); Tailscale is an optional overlay
Key design references:
- `docs/federation/PRD.md` — 16-section product requirements
- `docs/federation/MILESTONES.md` — 7-milestone decomposition with per-milestone acceptance tests
- `docs/federation/TASKS.md` — per-task breakdown (M1 populated; M2-M7 deferred to mission planning)
- `docs/research/mempalace-evaluation/` (in jarvis-brain) — why we didn't adopt MemPalace
## Success Criteria
- [ ] AC-1: Two Mosaic Stack gateways on different hosts can establish a federation grant via CLI-driven onboarding
- [ ] AC-2: Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters
- [ ] AC-3: User on B with no grant cannot be queried by A, even if A has a valid grant for another user (cross-user isolation)
- [ ] AC-4: Revoking a grant on B causes A's next request to fail with a clear error within one request cycle
- [ ] AC-5: Cert rotation happens automatically at T-7 days; in-progress session survives rotation without user action
- [ ] AC-6: Rate-limit enforcement returns 429 with `Retry-After`; client backs off
- [ ] AC-7: With B unreachable, a session on A completes using local data and surfaces "federation offline for `<peer>`" once per session
- [ ] AC-8: Every federated request appears in B's `federation_audit_log` within 1 second
- [ ] AC-9: Scope excluding `credentials` means credentials are never returned — even via `search` with matching keywords
- [ ] AC-10: `mosaic federation status` shows cert expiry, grant status, last success/failure per peer
- [ ] AC-11: Full 3-employee multi-tenant scenario passes with no cross-user leakage
- [ ] AC-12: Two-gateway production deployment (woltje.com ↔ uscllc.com) operational ≥7 days without incident
- [ ] AC-13: All 7 milestones ship as merged PRs with green CI and closed issues
## Milestones
| # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------ | --------------------------------------------- | ----------- | ------------------ | ----- | ---------- | ---------- |
| 1 | FED-M1 | Federated tier infrastructure | done | (12 PRs #470-#481) | #460 | 2026-04-19 | 2026-04-19 |
| 2 | FED-M2 | Step-CA + grant schema + admin CLI | in-progress | (decomposition) | #461 | 2026-04-21 | — |
| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — |
| 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — |
| 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — |
| 6 | FED-M6 | Revocation + auto-renewal + CRL | not-started | — | #465 | — | — |
| 7 | FED-M7 | Multi-user RBAC hardening + acceptance suite | not-started | — | #466 | — | — |
## Budget
| Milestone | Est. tokens | Parallelizable? |
| --------- | ----------- | ---------------------- |
| FED-M1 | 20K | No (foundation) |
| FED-M2 | 30K | No (needs M1) |
| FED-M3 | 40K | No (needs M2) |
| FED-M4 | 20K | No (needs M3) |
| FED-M5 | 20K | Yes (with M6 after M4) |
| FED-M6 | 20K | Yes (with M5 after M3) |
| FED-M7 | 25K | No (needs all) |
| **Total** | **~175K** | |
## Session History
| Session | Date | Runtime | Outcome |
| ------- | ---------- | ------- | --------------------------------------------------------------------- |
| S1 | 2026-04-19 | claude | PRD authored, MILESTONES decomposed, 7 issues filed |
| S2-S4 | 2026-04-19 | claude | FED-M1 complete: 12 tasks (PRs #470-#481) merged; tag `fed-v0.1.0-m1` |
## Next Step
FED-M2 active. Decomposition landed in `docs/federation/TASKS.md` (M2-01..M2-13 code workstream + DEPLOY-01..DEPLOY-05 parallel test-deploy workstream, ~88K total). Tracking issue #482.
Parallel execution plan:
- **CODE workstream**: M2-01 (DB migration) starts immediately — sonnet subagent on `feat/federation-m2-schema`. Then M2-02 → M2-09 sequentially with M2-04/M2-05/M2-06/M2-07 having interleaved CA/storage/grant dependencies.
- **DEPLOY workstream**: DEPLOY-01 (image verify) → DEPLOY-02 (stack template) → DEPLOY-03/04 (mos-test-1/-2 deploy) → DEPLOY-05 (TEST-INFRA.md). Gated on Portainer wrapper PR (`PORTAINER_INSECURE` flag) merging first.
- **Re-converge** at M2-10 (E2E test) once both workstreams ready.

330
docs/federation/PRD.md Normal file
View File

@@ -0,0 +1,330 @@
# Mosaic Stack — Federation PRD
**Status:** Draft v1 (locked for implementation)
**Owner:** Jason
**Date:** 2026-04-19
**Scope:** Enables cross-instance data federation between Mosaic Stack gateways with asymmetric trust, multi-tenant scoping, and no cross-boundary data persistence.
---
## 1. Problem Statement
Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session, and has tried OpenBrain to solve cross-session state — with poor results (cache invalidation, latency, opacity, hard dependency on a remote service).
The goal is a federation model where each user's **home instance** remains the source of truth for their personal data, and **work/shared instances** expose scoped data to that user's home instance on demand — without persisting anything across the boundary.
## 2. Goals
1. A user logged into their **home gateway** (Server A) can query their **work gateway** (Server B) in real time during a session.
2. Data returned from Server B is used in-session only; never written to Server A storage.
3. Server B has multiple users, each with their own Server A. No user's data leaks to another user.
4. Federation works over public HTTPS (no VPN required). Tailscale is a supported optional overlay.
5. Sync latency target: seconds, or at the next data need of the agent.
6. Graceful degradation: if the remote instance is unreachable, the local session continues with local data and a clear "federation offline" signal.
7. Teams exist on both sides. A federation grant can share **team-owned** data without exposing other team members' personal data.
8. Auth and revocation use standard PKI (X.509) so that certificate tooling (Step-CA, rotation, OCSP, CRL) is available out of the box.
## 3. Non-Goals (v1)
- Mesh federation (N-to-N). v1 is strictly A↔B pairs.
- Cross-instance writes. All federation is **read-only** on the remote side.
- Shared agent sessions across instances. Sessions live on one instance; federation is data-plane only.
- Cross-instance SSO. Each instance owns its own BetterAuth identity store; federation is service-to-service, not user-to-user.
- Realtime push from B→A. v1 is pull-only (A pulls from B during a session).
- Global search index. Federation is query-by-query, not index replication.
## 4. User Stories
- **US-1 (Solo user at home):** As the sole user on Server A, I want my agent session on workstation-1 to see the same data it saw on workstation-2, without running OpenBrain.
- **US-2 (Cross-location):** As a user with a home server and a work server, I want a session on my home laptop to transparently pull my USC-owned tasks/notes when I ask for them.
- **US-3 (Work admin):** As the admin of mosaic.uscllc.com, I want to grant each employee's home gateway scoped read access to only their own data plus explicitly-shared team data.
- **US-4 (Privacy boundary):** As employee A on mosaic.uscllc.com, my data must never appear in a session on employee B's home gateway — even if both are federated with uscllc.com.
- **US-5 (Revocation):** As a work admin, when I delete an employee, their home gateway loses access within one request cycle.
- **US-6 (Offline):** As a user in a hotel with flaky wifi, my local session keeps working; federation calls fail fast and are reported as "offline," not hung.
## 5. Architecture Overview
```
┌─────────────────────────────────────┐ mTLS / X.509 ┌─────────────────────────────────────┐
│ Server A — mosaic.woltje.com │ ───────────────────────► │ Server B — mosaic.uscllc.com │
│ (home, master for Jason) │ ◄── JSON over HTTPS │ (work, multi-tenant) │
│ │ │ │
│ ┌──────────────┐ ┌──────────────┐ │ │ ┌──────────────┐ ┌──────────────┐ │
│ │ Gateway │ │ Postgres │ │ │ │ Gateway │ │ Postgres │ │
│ │ (NestJS) │──│ (local SSOT)│ │ │ │ (NestJS) │──│ (tenant SSOT)│ │
│ └──────┬───────┘ └──────────────┘ │ │ └──────┬───────┘ └──────────────┘ │
│ │ │ │ │ │
│ │ FederationClient │ │ │ FederationServer │
│ │ (outbound, scoped query) │ │ │ (inbound, RBAC-gated) │
│ └───────────────────────────┼──────────────────────────┼────────┘ │
│ │ │ │
│ Step-CA (issues A's client cert) │ │ Step-CA (issues B's server cert, │
│ │ │ trusts A's CA root on grant)│
└─────────────────────────────────────┘ └──────────────────────────────────────┘
```
- Federation is a **transport-layer** concern between two gateways, implemented as a new internal module on each gateway.
- Both sides run the same code. Direction (client vs. server role) is per-request.
- Nothing in the agent runtime changes — agents query the gateway; the gateway decides local vs. remote.
## 6. Transport & Authentication
**Transport:** HTTPS with mutual TLS (mTLS).
**Identity:** X.509 client certificates issued by Step-CA. Each federation grant materializes as a client cert on the requesting side and a trust-anchor entry (CA root or explicit cert) on the serving side.
**Why mTLS over HMAC bearer tokens:**
- Standard rotation/revocation semantics (renew, CRL, OCSP).
- The cert subject carries identity claims (user, grant_id) that don't need a separate DB lookup to verify authenticity.
- Client certs never transit request bodies, so they can't be logged by accident.
- Transport is pinned at the TLS layer, not re-validated per-handler.
**Cert contents (SAN + subject):**
- `CN=grant-<uuid>`
- `O=<requesting-server-hostname>` (e.g., `mosaic.woltje.com`)
- Custom OIDs embedded in SAN otherName:
- `mosaic.federation.grantId` (UUID)
- `mosaic.federation.subjectUserId` (user on the **serving** side that this grant acts-as)
- Default lifetime: **30 days**, with auto-renewal at T-7 days if the grant is still active.
**Step-CA topology (v1):** Each server runs its own Step-CA instance. During onboarding, the serving side imports the requesting side's CA root. A central/shared Step-CA is out of scope for v1.
**Handshake:**
1. Client (A) opens HTTPS to B with its grant cert.
2. B validates cert chain against trusted CA roots for that grant.
3. B extracts `grantId` and `subjectUserId` from the cert.
4. B loads the grant record, checks it is `active`, not revoked, and not expired.
5. B enforces scope and rate-limit for this grant.
6. Request proceeds; response returned.
## 7. Data Model
All tables live on **each instance's own Postgres**. Federation grants are bilateral — each side has a record of the grant.
### 7.1 `federation_grants` (on serving side, Server B)
| Field | Type | Notes |
| --------------------------- | ----------- | ------------------------------------------------- |
| `id` | uuid PK | |
| `subject_user_id` | uuid FK | Which local user this grant acts-as |
| `requesting_server` | text | Hostname of requesting gateway (e.g., woltje.com) |
| `requesting_ca_fingerprint` | text | SHA-256 of trusted CA root |
| `active_cert_fingerprint` | text | SHA-256 of currently valid client cert |
| `scope` | jsonb | See §8 |
| `rate_limit_rpm` | int | Default 60 |
| `status` | enum | `pending`, `active`, `suspended`, `revoked` |
| `created_at` | timestamptz | |
| `activated_at` | timestamptz | |
| `revoked_at` | timestamptz | |
| `last_used_at` | timestamptz | |
| `notes` | text | Admin-visible description |
### 7.2 `federation_peers` (on requesting side, Server A)
| Field | Type | Notes |
| --------------------- | ----------- | ------------------------------------------------ |
| `id` | uuid PK | |
| `peer_hostname` | text | e.g., `mosaic.uscllc.com` |
| `peer_ca_fingerprint` | text | SHA-256 of peer's CA root |
| `grant_id` | uuid | The grant ID assigned by the peer |
| `local_user_id` | uuid FK | Who on Server A this federation belongs to |
| `client_cert_pem` | text (enc) | Current client cert (PEM); rotated automatically |
| `client_key_pem` | text (enc) | Private key (encrypted at rest) |
| `cert_expires_at` | timestamptz | |
| `status` | enum | `pending`, `active`, `degraded`, `revoked` |
| `last_success_at` | timestamptz | |
| `last_failure_at` | timestamptz | |
| `notes` | text | |
### 7.3 `federation_audit_log` (on serving side, Server B)
| Field | Type | Notes |
| ------------- | ----------- | ------------------------------------------------ |
| `id` | uuid PK | |
| `grant_id` | uuid FK | |
| `occurred_at` | timestamptz | indexed |
| `verb` | text | `query`, `handshake`, `rejected`, `rate_limited` |
| `resource` | text | e.g., `tasks`, `notes`, `credentials` |
| `query_hash` | text | SHA-256 of normalized query (no payload stored) |
| `outcome` | text | `ok`, `denied`, `error` |
| `bytes_out` | int | |
| `latency_ms` | int | |
**Audit policy:** Every federation request is logged on the serving side. Read-only requests only — no body capture. Retention: 90 days hot, then roll to cold storage.
## 8. RBAC & Scope
Every federation grant has a scope object that answers three questions for every inbound request:
1. **Who is acting?**`subject_user_id` from the cert.
2. **What resources?** — an allowlist of resource types (`tasks`, `notes`, `credentials`, `memory`, `teams/:id/tasks`, …).
3. **Filter expression** — predicates applied on top of the subject's normal RBAC (see below).
### 8.1 Scope schema
```json
{
"resources": ["tasks", "notes", "memory"],
"filters": {
"tasks": { "include_teams": ["team_uuid_1", "team_uuid_2"], "include_personal": true },
"notes": { "include_personal": true, "include_teams": [] },
"memory": { "include_personal": true }
},
"excluded_resources": ["credentials", "api_keys"],
"max_rows_per_query": 500
}
```
### 8.2 Access rule (enforced on serving side)
For every inbound federated query on resource R:
1. Resolve effective identity → `subject_user_id`.
2. Check R is in `scope.resources` and NOT in `scope.excluded_resources`. Otherwise 403.
3. Evaluate the user's **normal RBAC** (what would they see if they logged into Server B directly)?
4. Intersect with the scope filter (e.g., only team X, only personal).
5. Apply `max_rows_per_query`.
6. Return; log to audit.
### 8.3 Team boundary guarantees
- Scope filters are additive, never subtractive of the native RBAC. A grant cannot grant access the user would not have had themselves.
- `include_teams` means "only these teams," not "these teams in addition to all teams."
- `include_personal: false` hides the user's personal data entirely from federation, even if they own it — useful for work-only accounts.
### 8.4 No cross-user leakage
When Server B has multiple users (employees) all federating back to their own Server A:
- Each employee has their own grant with their own `subject_user_id`.
- The cert is bound to a specific grant; there is no mechanism by which one grant's cert can be used to impersonate another.
- Audit log is per-grant.
## 9. Query Model
Federation exposes a **narrow read API**, not arbitrary SQL.
### 9.1 Supported verbs (v1)
| Verb | Purpose | Returns |
| -------------- | ------------------------------------------ | ------------------------------- |
| `list` | Paginated list of a resource type | Array of resources |
| `get` | Fetch a single resource by id | One resource or 404 |
| `search` | Keyword search within allowed resources | Ranked list of hits |
| `capabilities` | What this grant is allowed to do right now | Scope object + rate-limit state |
### 9.2 Not in v1
- Write verbs.
- Aggregations / analytics.
- Streaming / subscriptions (future: see §13).
### 9.3 Agent-facing integration
Agents never call federation directly. Instead:
- The gateway query layer accepts `source: "local" | "federated:<peer_hostname>" | "all"`.
- `"all"` fans out in parallel, merges results, tags each with `_source`.
- Federation results are in-memory only; the gateway does not persist them.
## 10. Caching
- **In-memory response cache** with short TTL (default 30s) for `list` and `get`. `search` is not cached.
- Cache is keyed by `(grant_id, verb, resource, query_hash)`.
- Cache is flushed on cert rotation and on grant revocation.
- No disk cache. No cross-session cache.
## 11. Bootstrap & Onboarding
### 11.1 Instance capability tiers
| Tier | Storage | Queue | Memory | Can federate? |
| ------------ | -------- | ------- | -------- | --------------------- |
| `local` | PGlite | in-proc | keyword | No |
| `standalone` | Postgres | Valkey | keyword | No (can be client) |
| `federated` | Postgres | Valkey | pgvector | Yes (server + client) |
Federation requires `federated` tier on **both** sides.
### 11.2 Onboarding flow (admin-driven)
1. Admin on Server B runs `mosaic federation grant create --user <user-id> --peer <peer-hostname> --scope-file scope.json`.
2. Server B generates a `grant_id`, prints a one-time enrollment URL containing the grant ID + B's CA root fingerprint.
3. Admin on Server A (or the user themselves, if allowed) runs `mosaic federation peer add <enrollment-url>`.
4. Server A's Step-CA generates a CSR for the new grant. A submits the CSR to B over a short-lived enrollment endpoint (single-use token in the enrollment URL).
5. B's Step-CA signs the cert (with grant ID embedded in SAN OIDs), returns it.
6. A stores the signed cert + private key (encrypted) in `federation_peers`.
7. Grant status flips from `pending` to `active` on both sides.
8. Cert auto-renews at T-7 days using the standard Step-CA renewal flow as long as the grant remains active.
### 11.3 Revocation
- **Admin-initiated:** `mosaic federation grant revoke <grant-id>` on B flips status to `revoked`, adds the cert to B's CRL, and writes an audit entry.
- **Revoke-on-delete:** Deleting a user on B automatically revokes all grants where that user is the subject.
- Server A learns of revocation on the next request (TLS handshake fails) and flips the peer to `revoked`.
### 11.4 Rate limit
Default `60 req/min` per grant. Configurable per grant. Enforced at the serving side. A rate-limited request returns `429` with `Retry-After`.
## 12. Operational Concerns
- **Observability:** Each federation request emits an OTEL span with `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`. Traces correlate across both servers via W3C traceparent.
- **Health check:** `mosaic federation status` on each side shows active grants, last-success times, cert expirations, and any CRL mismatches.
- **Backpressure:** If the serving side is overloaded, it returns `503` with a structured body; the client marks the peer `degraded` and falls back to local-only until the next successful handshake.
- **Secrets:** `client_key_pem` in `federation_peers` is encrypted with the gateway's key (sealed with the instance's master key — same mechanism as `provider_credentials`).
- **Credentials never cross:** The `credentials` resource type is in the default excluded list. It must be explicitly added to scope (admin action, logged) and even then is per-grant and per-user.
## 13. Future (post-v1)
- B→A push (e.g., "notify A when a task assigned to subject changes") via Socket.IO over mTLS.
- Mesh (N-to-N) federation.
- Write verbs with conflict resolution.
- Shared Step-CA (a "root of roots") so that onboarding doesn't require exchanging CA roots.
- Federated memory search over vector indexes with homomorphic filtering.
## 14. Locked Decisions (was "Open Questions")
| # | Question | Decision |
| --- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | What happens to a grant when its subject user is deleted? | **Revoke-on-delete.** All grants where the user is subject are auto-revoked and CRL'd. |
| 2 | Do we audit read-only requests? | **Yes.** All federated reads are audited on the serving side. Bodies are not captured; query hash + metadata only. |
| 3 | Default rate limit? | **60 requests per minute per grant,** override-able per grant. |
| 4 | How do we verify the requesting-server's identity beyond the grant token? | **X.509 client cert tied to the user,** issued by Step-CA (per-server) or locally generated. Cert subject carries `grantId` + `subjectUserId`. |
### M1 decisions
- **Postgres deployment:** **Containerized** alongside the gateway in M1 (Docker Compose profile). Moving to a dedicated host is a M5+ operational concern, not a v1 feature.
- **Instance signing key:** **Separate** from the Step-CA key. Step-CA signs federation certs; the instance master key seals at-rest secrets (client keys, provider credentials). Different blast-radius, different rotation cadences.
## 15. Acceptance Criteria
- [ ] Two Mosaic Stack gateways on different hosts can establish a federation grant via the CLI-driven onboarding flow.
- [ ] Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters.
- [ ] A user on B with no grant cannot be queried by A, even if A has a valid grant for another user.
- [ ] Revoking a grant on B causes A's next request to fail with a clear error within one request cycle.
- [ ] Cert rotation happens automatically at T-7 days; an in-progress session survives rotation without user action.
- [ ] Rate-limit enforcement returns 429 with `Retry-After`; client backs off.
- [ ] With B unreachable, a session on A completes using local data and surfaces a "federation offline for `<peer>`" signal once.
- [ ] Every federated request appears in B's `federation_audit_log` within 1 second.
- [ ] A scope excluding `credentials` means credentials are not returnable even via `search` with matching keywords.
- [ ] `mosaic federation status` shows cert expiry, grant status, and last success/failure per peer.
## 16. Implementation Milestones (reference)
Milestones live in `docs/federation/MILESTONES.md` (to be authored next). High-level:
- **M1:** Server A runs `federated` tier standalone (Postgres + Valkey + pgvector, containerized). No peer yet.
- **M2:** Step-CA embedded; `federation_grants` / `federation_peers` schema + admin CLI.
- **M3:** Handshake + `list`/`get` verbs with scope enforcement.
- **M4:** `search` verb, audit log, rate limits.
- **M5:** Cache layer, offline-degradation UX, observability surfaces.
- **M6:** Revocation flows (admin + revoke-on-delete), cert auto-renewal.
- **M7:** Multi-user RBAC hardening on B, team-scoped grants end-to-end, acceptance suite green.
---
**Next step after PRD sign-off:** author `docs/federation/MILESTONES.md` with per-milestone acceptance tests and estimated token budget, then file tracking issues on `git.mosaicstack.dev/mosaicstack/stack`.

190
docs/federation/SETUP.md Normal file
View File

@@ -0,0 +1,190 @@
# Federated Tier Setup Guide
## What is the federated tier?
The federated tier is designed for multi-user and multi-host deployments. It consists of PostgreSQL 17 with pgvector extension (for embeddings and RAG), Valkey for distributed task queueing and caching, and a shared configuration across multiple Mosaic gateway instances. Use this tier when running Mosaic in production or when scaling beyond a single-host deployment.
## Prerequisites
- Docker and Docker Compose installed
- Ports 5433 (PostgreSQL) and 6380 (Valkey) available on your host (or adjust environment variables)
- At least 2 GB free disk space for data volumes
## Start the federated stack
Run the federated overlay:
```bash
docker compose -f docker-compose.federated.yml --profile federated up -d
```
This starts PostgreSQL 17 with pgvector and Valkey 8. The pgvector extension is created automatically on first boot.
Verify the services are running:
```bash
docker compose -f docker-compose.federated.yml ps
```
Expected output shows `postgres-federated` and `valkey-federated` both healthy.
## Configure mosaic for federated tier
Create or update your `mosaic.config.json`:
```json
{
"tier": "federated",
"database": "postgresql://mosaic:mosaic@localhost:5433/mosaic",
"queue": "redis://localhost:6380"
}
```
If you're using environment variables instead:
```bash
export DATABASE_URL="postgresql://mosaic:mosaic@localhost:5433/mosaic"
export REDIS_URL="redis://localhost:6380"
```
## Verify health
Run the health check:
```bash
mosaic gateway doctor
```
Expected output (green):
```
Tier: federated Config: mosaic.config.json
✓ postgres localhost:5433 (42ms)
✓ valkey localhost:6380 (8ms)
✓ pgvector (embedded) (15ms)
```
For JSON output (useful in CI/automation):
```bash
mosaic gateway doctor --json
```
## Troubleshooting
### Port conflicts
**Symptom:** `bind: address already in use`
**Fix:** Stop the base dev stack first:
```bash
docker compose down
docker compose -f docker-compose.federated.yml --profile federated up -d
```
Or change the host port with an environment variable:
```bash
PG_FEDERATED_HOST_PORT=5434 VALKEY_FEDERATED_HOST_PORT=6381 \
docker compose -f docker-compose.federated.yml --profile federated up -d
```
### pgvector extension error
**Symptom:** `ERROR: could not open extension control file`
**Fix:** pgvector is created at first boot. Check logs:
```bash
docker compose -f docker-compose.federated.yml logs postgres-federated | grep -i vector
```
If missing, exec into the container and create it manually:
```bash
docker exec <postgres-federated-id> psql -U mosaic -d mosaic -c "CREATE EXTENSION vector;"
```
### Valkey connection refused
**Symptom:** `Error: connect ECONNREFUSED 127.0.0.1:6380`
**Fix:** Check service health:
```bash
docker compose -f docker-compose.federated.yml logs valkey-federated
```
If Valkey is running, verify your firewall allows 6380. On macOS, Docker Desktop may require binding to `host.docker.internal` instead of `localhost`.
## Key rotation (deferred)
Federation peer private keys (`federation_peers.client_key_pem`) are sealed at rest using AES-256-GCM with a key derived from `BETTER_AUTH_SECRET` via SHA-256. If `BETTER_AUTH_SECRET` is rotated, all sealed `client_key_pem` values in the database become unreadable and must be re-sealed with the new key before rotation completes.
The full key rotation procedure (decrypt all rows with old key, re-encrypt with new key, atomically swap the secret) is out of scope for M2. Operators must not rotate `BETTER_AUTH_SECRET` without a migration plan for all sealed federation peer keys.
## OID Assignments — Mosaic Internal OID Arc
Mosaic uses the private enterprise arc `1.3.6.1.4.1.99999` for custom X.509
certificate extensions in federation grant certificates.
**IMPORTANT:** This is a development/internal OID arc. Before deploying to a
production environment accessible by external parties, register a proper IANA
Private Enterprise Number (PEN) at <https://pen.iana.org/pen/PenApplication.page>
and update these assignments accordingly.
### Assigned OIDs
| OID | Symbolic name | Description |
| --------------------- | --------------------------------- | --------------------------------------------------------- |
| `1.3.6.1.4.1.99999.1` | `mosaic.federation.grantId` | UUID of the `federation_grants` row authorising this cert |
| `1.3.6.1.4.1.99999.2` | `mosaic.federation.subjectUserId` | UUID of the local user on whose behalf the cert is issued |
### Encoding
Each extension value is DER-encoded as an ASN.1 **UTF8String**:
```
Tag 0x0C (UTF8String)
Length 0x24 (36 decimal — fixed length of a UUID string)
Value <36 ASCII bytes of the UUID>
```
The step-ca X.509 template at `infra/step-ca/templates/federation.tpl`
produces this encoding via the Go template expression:
```
{{ printf "\x0c\x24%s" .Token.mosaic_grant_id | b64enc }}
```
The resulting base64 value is passed as the `value` field of the extension
object in the template JSON.
### CA Environment Variables
The `CaService` (`apps/gateway/src/federation/ca.service.ts`) requires the
following environment variables at gateway startup:
| Variable | Required | Description |
| ------------------------------ | -------- | -------------------------------------------------------------------- |
| `STEP_CA_URL` | Yes | Base URL of the step-ca instance, e.g. `https://step-ca:9000` |
| `STEP_CA_PROVISIONER_PASSWORD` | Yes | JWK provisioner password for the `mosaic-fed` provisioner |
| `STEP_CA_PROVISIONER_KEY_JSON` | Yes | JSON-encoded JWK (public + private) for the `mosaic-fed` provisioner |
| `STEP_CA_ROOT_CERT_PATH` | Yes | Absolute path to the step-ca root CA certificate PEM file |
Set these variables in your environment or secret manager before starting
the gateway. In the federated Docker Compose stack they are expected to be
injected via Docker secrets and environment variable overrides.
### Fail-loud contract
The CA service (and the X.509 template) are designed to fail loudly if the
custom OIDs cannot be embedded:
- The template produces a malformed extension value (zero-length UTF8String
body) when the JWT claims `mosaic_grant_id` or `mosaic_subject_user_id` are
absent. step-ca rejects the CSR rather than issuing a cert without the OIDs.
- `CaService.issueCert()` throws a `CaServiceError` on every error path with
a human-readable `remediation` string. It never silently returns a cert that
may be missing the required extensions.

119
docs/federation/TASKS.md Normal file
View File

@@ -0,0 +1,119 @@
# Tasks — Federation v1
> Single-writer: orchestrator only. Workers read but never modify.
>
> **Mission:** federation-v1-20260419
> **Schema:** `| id | status | description | issue | agent | branch | depends_on | estimate | notes |`
> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed` | `needs-qa`
> **Agent values:** `codex` | `glm-5.1` | `haiku` | `sonnet` | `opus` | `—` (auto)
>
> **Scope of this file:** M1 is fully decomposed below. M2M7 are placeholders pending each milestone's entry into active planning — the orchestrator expands them one milestone at a time to avoid speculative decomposition of work whose shape will depend on what M1 surfaces.
---
## Milestone 1 — Federated tier infrastructure (FED-M1)
Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No federation logic yet. Existing standalone behavior does not regress.
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------------------- | ---------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| FED-M1-01 | done | Extend `mosaic.config.json` schema: add `"federated"` to `tier` enum in validator + TS types. Keep `local` and `standalone` working. Update schema docs/README where referenced. | #460 | sonnet | feat/federation-m1-tier-config | — | 4K | Shipped in PR #470. Renamed `team``standalone`; added `team` deprecation alias; added `DEFAULT_FEDERATED_CONFIG`. |
| FED-M1-02 | done | Author `docker-compose.federated.yml` as an overlay profile: Postgres 17 + pgvector extension (port 5433), Valkey (6380), named volumes, healthchecks. Compose-up should boot cleanly on a clean machine. | #460 | sonnet | feat/federation-m1-compose | FED-M1-01 | 5K | Shipped in PR #471. Overlay defines `postgres-federated`/`valkey-federated`, profile-gated, with pg-init for pgvector extension. |
| FED-M1-03 | done | Add pgvector support to `packages/storage/src/adapters/postgres.ts`: create extension on init (idempotent), expose vector column type in schema helpers. No adapter changes for non-federated tiers. | #460 | sonnet | feat/federation-m1-pgvector | FED-M1-02 | 8K | Shipped in PR #472. `enableVector` flag on postgres StorageConfig; idempotent CREATE EXTENSION before migrations. |
| FED-M1-04 | done | Implement `apps/gateway/src/bootstrap/tier-detector.ts`: reads config, asserts PG/Valkey/pgvector reachable for `federated`, fail-fast with actionable error message on failure. Unit tests for each failure mode. | #460 | sonnet | feat/federation-m1-detector | FED-M1-03 | 8K | Shipped in PR #473. 12 tests; 5s timeouts on probes; pgvector library/permission discrimination; rejects non-bullmq for federated. |
| FED-M1-05 | done | Write `scripts/migrate-to-federated.ts`: one-way migration from `local` (PGlite) / `standalone` (PG without pgvector) → `federated`. Dumps, transforms, loads; dry-run + confirm UX. Idempotent on re-run. | #460 | sonnet | feat/federation-m1-migrate | FED-M1-04 | 10K | Shipped in PR #474. `mosaic storage migrate-tier`; DrizzleMigrationSource (corrects P0 found in review); 32 tests; idempotent. |
| FED-M1-06 | done | Update `mosaic doctor`: report current tier, required services, actual health per service, pgvector presence, overall green/yellow/red. Machine-readable JSON output flag for CI use. | #460 | sonnet | feat/federation-m1-doctor | FED-M1-04 | 6K | Shipped in PR #475 as `mosaic gateway doctor`. Probes lifted to @mosaicstack/storage; structural TierConfig breaks dep cycle. |
| FED-M1-07 | done | Integration test: gateway boots in `federated` tier with docker-compose `federated` profile; refuses to boot when PG unreachable (asserts fail-fast); pgvector extension query succeeds. | #460 | sonnet | feat/federation-m1-integration | FED-M1-04 | 8K | Shipped in PR #476. 3 test files, 4 tests, gated by FEDERATED_INTEGRATION=1; reserved-port helper avoids host collisions. |
| FED-M1-08 | done | Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. | #460 | sonnet | feat/federation-m1-migrate-test | FED-M1-05 | 6K | Shipped in PR #477. Caught P0 in M1-05 (camelCase→snake_case) missed by mocked unit tests; fix in same PR. |
| FED-M1-09 | done | Standalone regression: full agent-session E2E on existing `standalone` tier with a gateway built from this branch. Must pass without referencing any federation module. | #460 | sonnet | feat/federation-m1-regression | FED-M1-07 | 4K | Clean canary. 351 gateway tests + 85 storage unit tests + full pnpm test all green; only FEDERATED_INTEGRATION-gated tests skip. |
| FED-M1-10 | done | Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. | #460 | sonnet | feat/federation-m1-security-review | FED-M1-09 | 8K | 2 review rounds caught 7 issues: credential leak in pg/valkey/pgvector errors + redact-error util; missing advisory lock; SKIP_TABLES rationale. |
| FED-M1-11 | done | Docs update: `docs/federation/` operator notes for tier setup; README blurb on federated tier; `docs/guides/` entry for migration. Do NOT touch runbook yet (deferred to FED-M7). | #460 | haiku | feat/federation-m1-docs | FED-M1-10 | 4K | Shipped: `docs/federation/SETUP.md` (119 lines), `docs/guides/migrate-tier.md` (147 lines), README Configuration blurb. |
| FED-M1-12 | done | PR, CI green, merge to main, close #460. | #460 | sonnet | feat/federation-m1-close | FED-M1-11 | 3K | M1 closed. PRs #470-#480 merged across 11 tasks. Issue #460 closed; release tag `fed-v0.1.0-m1` published. |
**M1 total estimate:** ~74K tokens (over-budget vs 20K PRD estimate — explanation below)
**Why over-budget:** PRD's 20K estimate reflected implementation complexity only. The per-task breakdown includes tests, review, and docs as separate tasks per the delivery cycle, which catches the real cost. The final per-milestone budgets in MISSION-MANIFEST will be updated after M1 completes with actuals.
---
## Pre-M2 — Test deployment infrastructure (FED-M2-DEPLOY)
Goal: Two federated-tier gateways stood up on Portainer at `mos-test-1.woltje.com` and `mos-test-2.woltje.com` running the M1 release (`gateway:fed-v0.1.0-m1`). This is the test bed for M2 enrollment work and the M3 federation E2E harness. No federation logic exercised yet — pure infrastructure validation.
> **Why now:** M2 enrollment requires a real second gateway to test peer-add flows; standing the test hosts up before M2 code lands gives both code and deployment streams a fast feedback loop.
> **Parallelizable:** This workstream runs in parallel with the M2 code workstream (M2-01 → M2-13). They re-converge at M2-10 (E2E test).
> **Tracking issue:** #482.
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | ------ | ------------------------------------- | ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-DEPLOY-01 | done | Verify `gateway:fed-v0.1.0-m1` image was published by `.woodpecker/publish.yml` on tag push; if not, investigate and remediate. Document image URI in deployment artifact. | #482 | sonnet | (verified inline, no PR) | — | 2K | Tag exists; digest `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec` captured for digest-pinned deploys. |
| FED-M2-DEPLOY-02 | done | Author Portainer git-stack compose file `deploy/portainer/federated-test.stack.yml` (gateway + PG-pgvector + Valkey, env-driven). Use immutable tag, not `latest`. | #482 | sonnet | feat/federation-deploy-stack-template | DEPLOY-01 | 5K | Shipped in PR #485. Digest-pinned. Env: STACK_NAME, HOST_FQDN, POSTGRES_PASSWORD, BETTER_AUTH_SECRET, BETTER_AUTH_URL. |
| FED-M2-DEPLOY-IMG-FIX | in-progress | Gateway image runtime broken (ERR_MODULE_NOT_FOUND for `dotenv`); Dockerfile copies `.pnpm/` store but not `apps/gateway/node_modules` symlinks. Switch to `pnpm deploy` for self-contained runtime. | #482 | sonnet | (subagent in flight) | DEPLOY-02 | 4K | Subagent `a78a9ab0ddae91fbc` in flight. Triggers Kaniko rebuild on merge; capture new digest; bump stack template in follow-up PR before redeploy. |
| FED-M2-DEPLOY-03 | blocked | Deploy stack to mos-test-1.woltje.com via `~/.config/mosaic/tools/portainer/`. Verify M1 acceptance: federated-tier boot succeeds; `mosaic gateway doctor --json` returns green; pgvector `vector(3)` round-trip works. | #482 | sonnet | feat/federation-deploy-test-1 | IMG-FIX | 3K | Stack created on Portainer endpoint 3 (Swarm `local`), but blocked on image fix. Container fails on boot until IMG-FIX merges + redeploy. |
| FED-M2-DEPLOY-04 | blocked | Deploy stack to mos-test-2.woltje.com via Portainer wrapper. Same M1 acceptance probes as DEPLOY-03. | #482 | sonnet | feat/federation-deploy-test-2 | IMG-FIX | 3K | Same status as DEPLOY-03. Stack created; blocked on image fix. |
| FED-M2-DEPLOY-05 | not-started | Document deployment in `docs/federation/TEST-INFRA.md`: hosts, image tags, secrets sourcing, redeploy procedure, teardown. Update MISSION-MANIFEST with deployment status. | #482 | haiku | feat/federation-deploy-docs | DEPLOY-03,04 | 3K | Operator-facing doc; mentions but does not duplicate `tools/portainer/README.md`. |
**Deploy workstream estimate:** ~16K tokens
---
## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2)
Goal: An admin can create a federation grant; counterparty enrolls; cert is signed by Step-CA with SAN OIDs for `grantId` + `subjectUserId`. No runtime federation traffic flows yet (that's M3).
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------------------- | ---------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-01 | needs-qa | DB migration: `federation_grants`, `federation_peers`, `federation_audit_log` tables + enum types (`grant_status`, `peer_state`). Drizzle schema + migration generation; migration tests. | #461 | sonnet | feat/federation-m2-schema | — | 5K | PR #486 open. First review NEEDS CHANGES (missing DESC indexes + reserved cols). Remediation subagent `a673dd9355dc26f82` in flight in worktree `agent-a4404ac1`. |
| FED-M2-02 | not-started | Add Step-CA sidecar to `docker-compose.federated.yml`: official `smallstep/step-ca` image, persistent CA volume, JWK provisioner config baked into init script. | #461 | sonnet | feat/federation-m2-stepca | DEPLOY-02 | 4K | Profile-gated under `federated`. CA password from secret; dev compose uses dev-only password file. |
| FED-M2-03 | not-started | Scope JSON schema + validator: `resources` allowlist, `excluded_resources`, `include_teams`, `include_personal`, `max_rows_per_query`. Vitest unit tests for valid + invalid scopes. | #461 | sonnet | feat/federation-m2-scope-schema | — | 4K | Validator independent of CA — reusable from grant CRUD + (later) M3 scope enforcement. |
| FED-M2-04 | not-started | `apps/gateway/src/federation/ca.service.ts`: Step-CA client (CSR submission, OID-bearing cert retrieval). Mocked + integration tests against real Step-CA container. | #461 | sonnet | feat/federation-m2-ca-service | M2-02 | 6K | SAN OIDs: `grantId` (custom OID 1.3.6.1.4.1.99999.1) + `subjectUserId` (1.3.6.1.4.1.99999.2). Document OID assignments in PRD/SETUP. **Acceptance**: must (a) wire `federation.tpl` template into `mosaic-fed` provisioner config and (b) include a unit/integration test asserting issued certs contain BOTH OIDs — fails-loud guard against silent OID stripping (carry-forward from M2-02 review). |
| FED-M2-05 | not-started | Sealed storage for `client_key_pem` reusing existing `provider_credentials` sealing key. Tests prove DB-at-rest is ciphertext, not PEM. Key rotation path documented (deferred impl). | #461 | sonnet | feat/federation-m2-key-sealing | M2-01 | 5K | Separate from M2-06 to keep crypto seam isolated; reviewer focus is sealing only. |
| FED-M2-06 | not-started | `grants.service.ts`: CRUD + status transitions (`pending``active``revoked`); integrates M2-03 (scope) + M2-05 (sealing). Unit tests cover all transitions including invalid ones. | #461 | sonnet | feat/federation-m2-grants-service | M2-03, M2-05 | 6K | Business logic only — CSR + cert work delegated to M2-04. Revocation handler is M6. |
| FED-M2-07 | not-started | `enrollment.controller.ts`: short-lived single-use token endpoint; CSR signing; updates grant `pending``active`; emits enrollment audit (table-only write, M4 tightens). | #461 | sonnet | feat/federation-m2-enrollment | M2-04, M2-06 | 6K | Tokens single-use with 410 on replay; tokens TTL'd at 15min; rate-limited at request layer (M4 introduces guard, M2 uses simple lock). |
| FED-M2-08 | not-started | Admin CLI: `mosaic federation grant create/list/show` + `peer add/list`. Integration with grants.service (no API duplication). Help output + machine-readable JSON option. | #461 | sonnet | feat/federation-m2-cli | M2-06, M2-07 | 7K | `peer add <enrollment-url>` is the client-side flow; resolves enrollment URL → CSR → store sealed key + cert. |
| FED-M2-09 | not-started | Integration tests covering MILESTONES.md M2 acceptance tests #1, #2, #3, #5, #7, #8 (single-gateway suite). Real Step-CA container; vitest profile gated by `FEDERATED_INTEGRATION=1`. | #461 | sonnet | feat/federation-m2-integration | M2-08 | 8K | Tests #4 (cert OID match) + #6 (two-gateway peer-add) handled separately by M2-10 (E2E). |
| FED-M2-10 | not-started | E2E test against deployed mos-test-1 + mos-test-2 (or local two-gateway docker-compose if Portainer not ready): MILESTONES test #6 `peer add` yields `active` peer record with valid cert + key. | #461 | sonnet | feat/federation-m2-e2e | M2-08, DEPLOY-04 | 6K | Falls back to local docker-compose-two-gateways if remote test hosts not yet available. Documents both paths. |
| FED-M2-11 | not-started | Independent security review (sonnet, not author of M2-04/05/06/07): focus on single-use token replay, sealing leak surfaces, OID match enforcement, scope schema bypass paths. | #461 | sonnet | feat/federation-m2-security-review | M2-10 | 8K | Apply M1 two-round pattern. Reviewer should explicitly attempt enrollment-token replay, OID-spoofing CSR, and key leak in error messages. |
| FED-M2-12 | not-started | Docs update: `docs/federation/SETUP.md` Step-CA section; new `docs/federation/ADMIN-CLI.md` with grant/peer commands; scope schema reference; OID registration note. Runbook still M7-deferred. | #461 | haiku | feat/federation-m2-docs | M2-11 | 4K | Adds CA bootstrap section to SETUP.md with `docker compose --profile federated up step-ca` example. |
| FED-M2-13 | not-started | PR aggregate close, CI green, merge to main, close #461. Release tag `fed-v0.2.0-m2`. Mark deploy stream complete. Update mission manifest M2 row. | #461 | sonnet | feat/federation-m2-close | M2-12 | 3K | Same close pattern as M1-12; queue-guard before merge; tea release-create with notes including deploy-stream PRs. |
**M2 code workstream estimate:** ~72K tokens (vs MILESTONES.md 30K — same over-budget pattern as M1, where per-task breakdown including tests/review/docs catches the real cost).
**Deploy + code combined:** ~88K tokens.
## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3)
_Deferred. Issue #462._
## Milestone 4 — search + audit + rate limit (FED-M4)
_Deferred. Issue #463._
## Milestone 5 — cache + offline + OTEL (FED-M5)
_Deferred. Issue #464._
## Milestone 6 — revocation + auto-renewal + CRL (FED-M6)
_Deferred. Issue #465._
## Milestone 7 — multi-user hardening + acceptance suite (FED-M7)
_Deferred. Issue #466._
---
## Execution Notes
**Agent assignment rationale:**
- `codex` for most implementation tasks (OpenAI credit pool preferred for feature code)
- `sonnet` for tests (pattern-based, moderate complexity), `doctor` work (cross-cutting), and independent code review
- `haiku` for docs and the standalone regression canary (cheapest tier for mechanical/verification work)
- No `opus` in M1 — save for cross-cutting architecture decisions if they surface later
**Branch strategy:** Each task gets its own feature branch off `main`. Tasks within a milestone merge in dependency order. Final aggregate PR (FED-M1-12) isn't a branch of its own — it's the merge of the last upstream task that closes the issue.
**Queue guard:** Every push and every merge in this mission must run `~/.config/mosaic/tools/git/ci-queue-wait.sh --purpose push|merge` per Mosaic hard gate #6.

147
docs/guides/migrate-tier.md Normal file
View File

@@ -0,0 +1,147 @@
# Migrating to the Federated Tier
Step-by-step guide to migrate from `local` (PGlite) or `standalone` (PostgreSQL without pgvector) to `federated` (PostgreSQL 17 + pgvector + Valkey).
## When to migrate
Migrate to federated tier when:
- Scaling from single-user to multi-user deployments
- Adding vector embeddings or RAG features
- Running Mosaic across multiple hosts
- Requires distributed task queueing and caching
- Moving to production with high availability
## Prerequisites
- Federated stack running and healthy (see [Federated Tier Setup](../federation/SETUP.md))
- Source database accessible and empty target database at the federated URL
- Backup of source database (recommended before any migration)
## Dry-run first
Always run a dry-run to validate the migration:
```bash
mosaic storage migrate-tier --to federated \
--target-url postgresql://mosaic:mosaic@localhost:5433/mosaic \
--dry-run
```
Expected output (partial example):
```
[migrate-tier] Analyzing source tier: pglite
[migrate-tier] Analyzing target tier: federated
[migrate-tier] Precondition: target is empty ✓
users: 5 rows
teams: 2 rows
conversations: 12 rows
messages: 187 rows
... (all tables listed)
[migrate-tier] NOTE: Source tier has no pgvector support. insights.embedding will be NULL on all migrated rows.
[migrate-tier] DRY-RUN COMPLETE (no data written). 206 total rows would be migrated.
```
Review the output. If it shows an error (e.g., target not empty), address it before proceeding.
## Run the migration
When ready, run without `--dry-run`:
```bash
mosaic storage migrate-tier --to federated \
--target-url postgresql://mosaic:mosaic@localhost:5433/mosaic \
--yes
```
The `--yes` flag skips the confirmation prompt (required in non-TTY environments like CI).
The command will:
1. Acquire an advisory lock (blocks concurrent invocations)
2. Copy data from source to target in dependency order
3. Report rows migrated per table
4. Display any warnings (e.g., null vector embeddings)
## What gets migrated
All persistent, user-bound data is migrated in dependency order:
- **users, teams, team_members** — user and team ownership
- **accounts** — OAuth provider tokens (durable credentials)
- **projects, agents, missions, tasks** — all project and agent definitions
- **conversations, messages** — all chat history
- **preferences, insights, agent_logs** — preferences and observability
- **provider_credentials** — stored API keys and secrets
- **tickets, events, skills, routing_rules, appreciations** — auxiliary records
Full order is defined in code (`MIGRATION_ORDER` in `packages/storage/src/migrate-tier.ts`).
## What gets skipped and why
Three tables are intentionally not migrated:
| Table | Reason |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| **sessions** | TTL'd auth sessions from the old environment; they will fail JWT verification on the new target |
| **verifications** | One-time tokens (email verify, password reset) that have either expired or been consumed |
| **admin_tokens** | Hashed tokens bound to the old environment's secret keys; must be re-issued |
**Note on accounts and provider_credentials:** These durable credentials ARE migrated because they are user-bound and required for resuming agent work on the target environment. After migration to a multi-tenant federated deployment, operators may want to audit or wipe these if users are untrusted or credentials should not be shared.
## Idempotency and concurrency
The migration is **idempotent**:
- Re-running is safe (uses `ON CONFLICT DO UPDATE` internally)
- Ideal for retries on transient failures
- Concurrent invocations are blocked by a Postgres advisory lock; the second caller will wait
If a previous run is stuck, check for advisory locks:
```sql
SELECT * FROM pg_locks WHERE locktype='advisory';
```
If you need to force-unlock (dangerous):
```sql
SELECT pg_advisory_unlock(<lock_id>);
```
## Verify the migration
After migration completes, spot-check the target:
```bash
# Count rows on a few critical tables
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
"SELECT 'users' as table, COUNT(*) FROM users UNION ALL
SELECT 'conversations' as table, COUNT(*) FROM conversations UNION ALL
SELECT 'messages' as table, COUNT(*) FROM messages;"
```
Verify a known user or project exists by ID:
```bash
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
"SELECT id, email FROM users WHERE email='<your-email>';"
```
Ensure vector embeddings are NULL (if source was PGlite) or populated (if source was postgres + pgvector):
```bash
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
"SELECT embedding IS NOT NULL as has_vector FROM insights LIMIT 5;"
```
## Rollback
There is no in-place rollback. If the migration fails:
1. Restore the target database from a pre-migration backup
2. Investigate the failure logs
3. Rerun the migration
Always test migrations in a staging environment first.

View File

@@ -165,7 +165,13 @@ The `mosaic` CLI provides a terminal interface to the same gateway API.
Install via the Mosaic installer:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/mosaic-stack/raw/branch/main/tools/install.sh)
curl -fsSL https://mosaicstack.dev/install.sh | bash
```
Or use the direct URL:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/stack/raw/branch/main/tools/install.sh)
```
The installer places the `mosaic` binary at `~/.npm-global/bin/mosaic`. Flags for

View File

@@ -266,3 +266,349 @@ Issues closed: #52, #55, #57, #58, #120-#134
**P8-018 closed:** Spin-off stubs created (gatekeeper-service.md, task-queue-unification.md, chroot-sandboxing.md)
**Next:** Begin execution at Wave 1 — P8-007 (DB migrations) + P8-008 (Types) in parallel.
---
### Session 15 — 2026-04-19 — MVP Rollup Manifest Authored
| Session | Date | Milestone | Tasks Done | Outcome |
| ------- | ---------- | -------------- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 15 | 2026-04-19 | (rollup-level) | MVP-T01 (manifest), MVP-T02 (archive iuv-v2), MVP-T03 (land FED planning) | Authored MVP rollup manifest at `docs/MISSION-MANIFEST.md`. Federation v1 planning merged to `main` (PR #468 / commit `66512550`). Install-ux-v2 archived as complete. |
**Gap context:** The MVP scratchpad was last updated at Session 14 (2026-03-15). In the intervening month, two sub-missions ran outside the MVP framework: `install-ux-hardening` (complete, `mosaic-v0.0.25`) and `install-ux-v2` (complete on 2026-04-19, `0.0.27``0.0.29`). Both archived under `docs/archive/missions/`. The phase-based execution from Sessions 114 (Phases 08, issues #1#172) substantially shipped during this window via those sub-missions and standalone PRs — the MVP mission was nominally active but had no rollup manifest tracking it.
**User reframe (this session):**
> There will be more in the MVP. This will inevitably become scope creep. I need a solution that works via webUI, TUI, CLI, and just works for MVP. Federation is required because I need it to work NOW, so my disparate jarvis-brain usage can be consolidated properly.
**Decisions:**
1. **MVP is the rollup mission**, not a single-purpose mission. Federation v1 is one workstream of MVP, not MVP itself. Phase 08 work is preserved as historical context but is no longer the primary control plane.
2. **Three-surface parity (webUI / TUI / CLI) is a cross-cutting MVP requirement** (MVP-X1), not a workstream. Encoded explicitly so it can't be silently dropped.
3. **Scope creep is named and accommodated.** Manifest has explicit "Likely Additional Workstreams" section listing PRD-derived candidates without committing execution capacity to them.
4. **Workstream isolation** — each workstream gets its own manifest under `docs/{workstream}/MISSION-MANIFEST.md`. MVP manifest is rollup only.
5. **Archive-don't-delete** — install-ux-v2 manifest moved to `docs/archive/missions/install-ux-v2-20260405/` with status corrected to `complete` (IUV-M03 closeout note added pointing at PR #446 + releases 0.0.27 → 0.0.29).
6. **Federation planning landed first** — PR #468 merged before MVP manifest authored, so the manifest references real on-`main` artifacts.
**Open items:**
- `.mosaic/orchestrator/mission.json` MVP slot remains empty (zero milestones). Tracked as MVP-T04. Defer until next session — does not block W1 kickoff. Open question: hand-edit vs. `mosaic coord init` reinit.
- Additional workstreams (web dashboard parity, TUI/CLI completion, remote control, multi-user/SSO, LLM provider expansion, MCP, brain) anticipated per PRD but not declared. Pre-staged in manifest's "Likely Additional Workstreams" list.
**Artifacts this session:**
| Artifact | Status |
| -------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| PR #468 (`docs(federation): PRD, milestones, mission manifest, and M1 task breakdown`) | merged 2026-04-19 → `main` (commit `66512550`) |
| `docs/MISSION-MANIFEST.md` (MVP rollup, replaces install-ux-v2 manifest) | authored on `docs/mvp-mission-manifest` branch |
| `docs/TASKS.md` (MVP rollup, points at workstream task files) | authored |
| Install-ux-v2 manifest + tasks + scratchpad + iuv-m03-design | moved to `docs/archive/missions/install-ux-v2-20260405/` with status corrected to complete |
**Next:** PR `docs/mvp-mission-manifest` → merge to `main` → next session begins W1 / FED-M1 from clean state.
---
## Session 16 — 2026-04-19 — claude
**Mode:** Delivery (W1 / FED-M1 execution)
**Branch:** `feat/federation-m1-tier-config`
**Context budget:** 200K, currently ~45% used (compaction-aware)
**Goal:** FED-M1-01 — extend `mosaic.config.json` schema: add `"federated"` to tier enum.
**Critical reconciliation surfaced during pre-flight:**
The federation PRD (`docs/federation/PRD.md` line 247) defines three tiers: `local | standalone | federated`.
The existing code (`packages/config/src/mosaic-config.ts`, `packages/mosaic/src/types.ts`, `packages/mosaic/src/stages/gateway-config.ts`) uses `local | team`.
`team` is the same conceptual tier as PRD `standalone` (Postgres + Valkey, no pgvector). Rather than carrying a confusing alias forever, FED-M1-01 will rename `team``standalone` and add `federated` as a third value, so all downstream federation work has a coherent vocabulary.
Affected files (storage-tier semantics only — Team/workspace usages unaffected):
- `packages/config/src/mosaic-config.ts` (StorageTier type, validator enum, defaults)
- `packages/mosaic/src/types.ts` (GatewayStorageTier)
- `packages/mosaic/src/stages/gateway-config.ts` (~10 references)
- `packages/mosaic/src/stages/gateway-config.spec.ts` (test references)
- Possibly `tools/e2e-install-test.sh` (referenced grep) and headless env hint string
**Worker plan:**
1. Spawn sonnet subagent with explicit task spec + the reconciliation context above.
2. Worker delivers diff; orchestrator runs `pnpm typecheck && pnpm lint && pnpm format:check`.
3. Independent `feature-dev:code-reviewer` subagent reviews diff.
4. Second independent verification subagent (general-purpose, sonnet) verifies reviewer's claims and confirms all `'team'` storage-tier references migrated, no `Team`/workspace bleed.
5. Open PR via tea CLI; wait for CI; queue-guard; squash merge; record actuals.
**Open items:**
- `MVP-T04` (sync `.mosaic/orchestrator/mission.json`) still deferred.
- `team` tier rename touches install wizard headless env vars (`MOSAIC_STORAGE_TIER=team`); will need 0.0.x deprecation note in scratchpad if release notes are written this milestone.
---
## Session 17 — 2026-04-19 — claude
**Mode:** Delivery (W1 / FED-M1 execution; resumed after compaction)
**Branches landed this run:** `feat/federation-m1-tier-config` (PR #470), `feat/federation-m1-compose` (PR #471), `feat/federation-m1-pgvector` (PR #472)
**Branch active at end:** `feat/federation-m1-detector` (FED-M1-04, ready to push)
**Tasks closed:** FED-M1-01, FED-M1-02, FED-M1-03 (all merged to `main` via squash, CI green, issue #460 still open as milestone).
**FED-M1-04 — tier-detector:** Worker delivered `apps/gateway/src/bootstrap/tier-detector.ts` (~210 lines) + `tier-detector.spec.ts` (12 tests). Independent code review (sonnet) returned `changes-required` with 3 issues:
1. CRITICAL: `probeValkey` missing `connectTimeout: 5000` on the ioredis Redis client (defaulted to 10s, violated fail-fast spec).
2. IMPORTANT: `probePgvector` catch block did not discriminate "library not installed" (use `pgvector/pgvector:pg17`) from permission errors.
3. IMPORTANT: Federated tier silently skipped Valkey probe when `queue.type !== 'bullmq'` (computed Valkey URL conditionally).
Worker fix-up round addressed all three:
- L147: `connectTimeout: 5000` added to Redis options
- L113-117: catch block branches on `extension "vector" is not available` substring → distinct remediation per failure mode
- L206-215: federated branch fails fast with `service: 'config'` if `queue.type !== 'bullmq'`, then probes Valkey unconditionally
- 4 new tests (8 → 12 total) cover each fix specifically
Independent verifier (haiku) confirmed all 6 verification claims (line numbers, test presence, suite green: 12/12 PASS).
**Process note — review pipeline working as designed:**
Initial verifier (haiku) on the first delivery returned "OK to ship" but missed the 3 deeper issues that the sonnet code-reviewer caught. This validates the user's "always verify subagent claims independently with another subagent" rule — but specifically with the **right tier** for the task: code review needs sonnet-level reasoning, while haiku is fine for verifying surface claims (line counts, file existence) once review issues are known. Going forward: code review uses sonnet (`feature-dev:code-reviewer`), claim verification uses haiku.
**Followup tasks tracked but deferred:**
- #7: `tier=local` hardcoded in gateway-config resume branches (~262, ~317) — pre-existing bug, fix during M1-06 (doctor) or M1-09 (regression).
- #8: confirm `packages/config/dist` not git-tracked.
**Next:** PR for FED-M1-04 → CI wait → merge. Then FED-M1-05 (migration script, codex/sonnet, 10K).
---
## Session 18 — 2026-04-19 — FED-M1-07 + FED-M1-08
**Branches landed this run:** `feat/federation-m1-integration` (PR #476, FED-M1-07), `feat/federation-m1-migrate-test` (PR #477, FED-M1-08)
**Branch active at end:** none — both PRs merged to main, branches deleted
**M1 progress:** 8 of 12 tasks done. Remaining: M1-09 (regression e2e, haiku), M1-10 (security review, sonnet), M1-11 (docs, haiku), M1-12 (close + release, orchestrator).
### FED-M1-07 — Integration tests for federated tier gateway boot
Three test files under `apps/gateway/src/__tests__/integration/` gated by `FEDERATED_INTEGRATION=1`:
- `federated-boot.success.integration.test.ts``detectAndAssertTier` resolves; `pg_extension` row for `vector` exists
- `federated-boot.pg-unreachable.integration.test.ts` — throws `TierDetectionError` with `service: 'postgres'` when PG port is closed
- `federated-pgvector.integration.test.ts` — TEMP table with `vector(3)` column round-trips data
Independent code review (sonnet) returned VERDICT: B with two IMPORTANT items, both fixed in the same PR:
- Port 5499 collision risk → replaced with `net.createServer().listen(0)` reserved-port helper
- `afterAll` and `sql` scoped outside `describe` → moved both inside `describe.skipIf` block
Independent surface verifier (haiku) confirmed all claims. 4/4 tests pass live; 4/4 skip cleanly without env var.
### FED-M1-08 — Migration integration test (caught real P0 bug)
`packages/storage/src/migrate-tier.integration.test.ts` seeds temp PGlite with cross-table data (users, teams, team_members, conversations, messages), runs `runMigrateTier`, asserts row counts + spot-checks. Gated by `FEDERATED_INTEGRATION=1`.
**P0 bug surfaced and fixed in same PR:** `DrizzleMigrationSource.readTable()` returns Drizzle's camelCase keys (`emailVerified`, `userId`); `PostgresMigrationTarget.upsertBatch()` was using them verbatim as SQL identifiers, producing `column "emailVerified" does not exist` against real federated PG. The 32 unit tests in M1-05 missed this because both source and target were mocked. Fix: `normaliseSourceRow` now applies `toSnakeCase` (`/[A-Z]/g``_<lowercase>`), idempotent on already-snake_case keys.
Code review (sonnet) returned VERDICT: B with one IMPORTANT and one MINOR, both fixed:
- `createPgliteDbWithVector` and `runPgliteMigrations` were initially added to `@mosaicstack/db` public exports → moved to `packages/storage/src/test-utils/pglite-with-vector.ts` (avoids polluting prod consumers with WASM bundle)
- `afterAll` did not call `cleanTarget` → added before connection close, ensuring orphan rows cleaned even on test panic
Side change: `packages/storage/package.json` gained `"type": "module"` (codebase convention; required for `import.meta.url` in test-utils). All other workspace packages already declared this.
### Process notes for this session
- Review-then-verify pipeline now battle-tested: M1-08 reviewer caught the P0 bug + the public-API leak that the worker would have shipped. Without review, both would have gone to main.
- Integration tests are paying for themselves immediately: M1-08 caught a real P0 in M1-05 that 32 mocked unit tests missed. Going forward, **at least one real-services integration test per code-mutating PR** should become a soft norm where feasible.
- TASKS.md status updates continue to ride on the matching feature branch (avoids direct-to-main commits).
**Followup tasks tracked but still deferred (no change):**
- #7: `tier=local` hardcoded in gateway-config resume branches (~262, ~317)
- #8: confirm `packages/config/dist` not git-tracked
**Next:** FED-M1-09 — standalone regression e2e (haiku canary, ~4K). Verifies that the existing `standalone` tier behavior still works end-to-end on the federation-touched build, since M1 changes touched shared paths (storage, config, gateway boot).
---
## Session 19 — 2026-04-19 — FED-M1-09 → FED-M1-12 (M1 close)
**Branches landed this run:** `feat/federation-m1-regression` (PR #478, M1-09), `feat/federation-m1-security-review` (PR #479, M1-10), `feat/federation-m1-docs` (PR #480, M1-11), `feat/federation-m1-close` (PR #481, M1-12)
**Branch active at end:** none — M1 closed, all branches deleted, issue #460 closed, release tag `fed-v0.1.0-m1` published
**M1 progress:** 12 of 12 tasks done. **Milestone complete.**
### FED-M1-09 — Standalone regression canary
Verification-only milestone. Re-ran the existing standalone/local test suites against current `main` (with M1-01 → M1-08 merged):
- 4 target gateway test files: 148/148 pass (conversation-persistence, cross-user-isolation, resource-ownership, session-hardening)
- Full gateway suite: 351 pass, 4 skipped (FEDERATED_INTEGRATION-gated only)
- Storage unit tests: 85 pass, 1 skipped (integration-gated)
- Top-level `pnpm test`: all green; only env-gated skips
No regression in standalone or local tier. Federation M1 changes are non-disruptive.
### FED-M1-10 — Security review (two rounds, 7 findings)
Independent security review surfaced three high-impact and four medium findings; all fixed in same PR.
**Round 1 (4 findings):**
- MEDIUM: Credential leak via `postgres`/`ioredis` driver error messages (DSN strings) re-thrown by `migrate-tier.ts` → caller; `cli.ts:402` outer catch
- MEDIUM: Same leak in `tier-detection.ts` `probePostgresMeasured` / `probePgvectorMeasured` → emitted as JSON by `mosaic gateway doctor --json`
- LOW-MEDIUM: No advisory lock on `migrate-tier`; two concurrent invocations could both pass `checkTargetPreconditions` (non-atomic) and race
- ADVISORY: `SKIP_TABLES` lacked rationale comment
**Fixes:**
- New internal helper `packages/storage/src/redact-error.ts` — regex `(postgres(?:ql)?|rediss?):\/\/[^@\s]*@``<scheme>://***@`. NOT exported from package public surface. 10 unit tests covering all schemes, multi-URL, no-creds, case-insensitive.
- `redactErrMsg` applied at all 5 leak sites
- `PostgresMigrationTarget.tryAcquireAdvisoryLock()` / `releaseAdvisoryLock()` using session-scoped `pg_try_advisory_lock(hashtext('mosaic-migrate-tier'))`. Acquired before preflight, released in `finally`. Dry-run skips. Non-blocking.
- `SKIP_TABLES` comment expanded with rationale for skipped tables (TTL'd / one-time / env-bound) AND why `accounts` (OAuth) and `provider_credentials` (AI keys) are intentionally migrated (durable user-bound, not deployment-bound).
**Round 2 (3 findings missed by first round):**
- HIGH: Round 1 regex only covered `postgres` scheme, not `redis`/`rediss` — extended to `(postgres(?:ql)?|rediss?)`
- HIGH: `probeValkeyMeasured` was missed in Round 1 → applied `redactErrMsg`
- MEDIUM: `cli.ts:402` migrate-tier outer catch was missed in Round 1 → applied `redactErrMsg`
**Process validation:** the two-round review pattern proved load-bearing for security work. A single review-then-fix cycle would have shipped the Valkey credential leak.
### FED-M1-11 — Docs (haiku)
- `docs/federation/SETUP.md` (119 lines): federated tier setup — what it is, prerequisites, docker compose start, mosaic.config.json snippet, doctor health check, troubleshooting
- `docs/guides/migrate-tier.md` (147 lines): when to migrate, dry-run first, what migrates/skips with rationale, idempotency + advisory-lock semantics, no in-place rollback
- `README.md` Configuration blurb linking to both
- Runbook deferred to FED-M7 per TASKS.md scope rule
### FED-M1-12 — Aggregate close (this PR)
- Marked M1-12 done in TASKS.md
- MISSION-MANIFEST.md: phase → "M1 complete", progress 1/7, M1 row done with PR range #470-#481, session log appended
- This Session 19 entry added
- Issue #460 closed via `~/.config/mosaic/tools/git/issue-close.sh -i 460`
- Release tag `fed-v0.1.0-m1` created and pushed to gitea
### M1 PR ledger
| PR | Task | Branch |
| ---- | ----------------------------------------- | ---------------------------------- |
| #470 | M1-01 (tier config schema) | feat/federation-m1-tier-config |
| #471 | M1-02 (compose overlay) | feat/federation-m1-compose |
| #472 | M1-03 (pgvector adapter) | feat/federation-m1-pgvector |
| #473 | M1-04 (tier-detector) | feat/federation-m1-detector |
| #474 | M1-05 (migrate-tier script) | feat/federation-m1-migrate |
| #475 | M1-06 (gateway doctor) | feat/federation-m1-doctor |
| #476 | M1-07 (boot integration tests) | feat/federation-m1-integration |
| #477 | M1-08 (migrate integration test + P0 fix) | feat/federation-m1-migrate-test |
| #478 | M1-09 (standalone regression) | feat/federation-m1-regression |
| #479 | M1-10 (security review fixes) | feat/federation-m1-security-review |
| #480 | M1-11 (docs) | feat/federation-m1-docs |
| #481 | M1-12 (aggregate close) | feat/federation-m1-close |
### Process learnings (M1 retrospective)
1. **Two-round security review is non-negotiable for security work.** First round caught postgres credential leaks; second round caught equivalent valkey leaks the worker missed when extending the regex. Single-round would have shipped HIGH severity issues.
2. **Real-services integration tests catch what mocked unit tests cannot.** M1-08 caught a P0 in M1-05 (camelCase column names) that 32 mocked unit tests missed because both source and target were mocked. Going forward: at least one real-services test per code-mutating PR where feasible.
3. **Test-utils for live services co-locate with consumer, not in shared library.** M1-08 reviewer caught `createPgliteDbWithVector` initially being added to `@mosaicstack/db` public exports — would have polluted prod consumers with WASM bundle. Moved to `packages/storage/src/test-utils/`.
4. **Per-task budgets including tests/review/docs more accurate than PRD's implementation-only estimates.** M1 PRD estimated 20K; actual ~74K. Future milestones should budget the full delivery cycle.
5. **TASKS.md status updates ride feature branches, never direct-to-main.** Caught one violation early in M1; pattern held for all 12 tasks.
6. **Subagent tier matters.** Code review needs sonnet-level reasoning (haiku missed deep issues in M1-04); claim verification (line counts, file existence) is fine on haiku.
**Followup tasks still deferred (carry forward to M2):**
- #7: `tier=local` hardcoded in gateway-config resume branches (~262, ~317)
- #8: confirm `packages/config/dist` not git-tracked
**Next mission step:** FED-M2 (Step-CA + grant schema + admin CLI). Per TASKS.md scope rule, M2 will be decomposed when it enters active planning. Issue #461 tracks scope.
## Session 20 — 2026-04-21 — FED-M2 kickoff
### Decisions
- **Workstream split**: parallel CODE (M2-01..M2-13, ~72K) + DEPLOY (DEPLOY-01..DEPLOY-05, ~16K) tracks; re-converge at M2-10 E2E.
- **Test hosts**: `mos-test-1.woltje.com` (querying side / Server A), `mos-test-2.woltje.com` (serving side / Server B). Wildcard `*.woltje.com` A→174.137.97.162 already exists; Traefik wildcard cert covers both subdomains. No DNS or cert work needed pre-deploy.
- **Portainer access**: requires `PORTAINER_INSECURE=1` flag added to mosaic wrappers (self-signed cert at `https://10.1.1.43:9443`). PR pending on `feat/mosaic-portainer-tls-flag`.
- **Image policy**: deploy by digest (immutable) per Mosaic policy. `gateway:fed-v0.1.0-m1` digest = `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`.
### DEPLOY-01 — image manifest verified
- Tag `fed-v0.1.0-m1` exists at `git.mosaicstack.dev/mosaicstack/stack/gateway`
- Digest: `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`
- 9 layers, ~530MB total
- Use this digest in DEPLOY-02 stack template (do NOT reference `:fed-v0.1.0-m1` tag in stack — pin to digest)
### Registry auth note
- Gitea container registry uses Bearer token flow (`/v2/token?service=container_registry&scope=repository:<repo>:pull`)
- Username: `jarvis` (NOT `mosaicstack`); password: `gitea.mosaicstack.token` from credentials.json
- Direct `Authorization: Bearer <pat>` does NOT work — must exchange PAT for registry token first
### Active PRs
- #483 — docs: M2 mission planning (TASKS decomposition + manifest update) — CI running
- (pending) `feat/mosaic-portainer-tls-flag` — wrapper PORTAINER_INSECURE flag (sonnet subagent in progress)
- (pending) `feat/federation-m2-schema` — FED-M2-01 DB schema migration (sonnet subagent in progress)
### MISSION-MANIFEST layout fix
- Initial M2 commit had Test Infrastructure block inserted by lint-staged prettier between "Last Updated" and "Parent Mission" — split mission frontmatter
- Fixed in 3d001fdb: moved Parent Mission back to frontmatter, kept Test Infrastructure as standalone H2 between Mission and Context
## Session 21 — 2026-04-21/22 — DEPLOY-02 merged, gateway image bug discovered, M2-01 in remediation
### PRs merged
- **#483** — docs(federation): M2 mission planning (TASKS decomposition + manifest update)
- **#484** — feat(mosaic-portainer): PORTAINER_INSECURE flag for self-signed TLS (wrapper sync to `~/.config/mosaic/tools/portainer/` done manually due to broken `mosaic upgrade` `set -o pipefail` on dash)
- **#485** — feat(deploy): portainer stack template `deploy/portainer/federated-test.stack.yml` for federation test instances [DEPLOY-02]
### Stack deployed (mos-test-1, mos-test-2)
- Both stacks created on Portainer endpoint 3 (`local` Swarm @ 10.1.1.43, the only endpoint with traefik-public + woltje.com wildcard cert)
- Swarm ID `l7z67tfpd4bvj4979ufpkyi50`
- Image pinned to digest `sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec`
- Traefik labels target `${HOST_FQDN}` per env
### CRITICAL FINDING — gateway image runtime-broken
- `docker run` against `gateway:fed-v0.1.0-m1` fails immediately:
`Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'dotenv' imported from /app/dist/main.js`
- Root cause: `docker/gateway.Dockerfile` copies `/app/node_modules` from builder — but pnpm puts deps in the content-addressed `.pnpm/` store with symlinks at `apps/gateway/node_modules/*`. The runner stage misses the symlinks → Node can't resolve workspace deps.
- M1 release was never runtime-tested as a stripped container; CI passed because tests run in dev tree where pnpm symlinks are intact.
- **Fix in flight** (subagent `a78a9ab0ddae91fbc`): switch builder to `pnpm --filter @mosaic/gateway --prod deploy /deploy`, then runner copies `/deploy/node_modules` + `/deploy/dist` + `/deploy/package.json`.
### M2-01 schema review verdict — NEEDS CHANGES
- PR #486 (`feat/federation-m2-schema`) — independent reviewer (sonnet) found 2 real issues:
1. `federation_audit_log` time-range indexes missing `.desc()` on `created_at` (3 places)
2. Reserved columns missing per TASKS.md M2-01 spec: `query_hash`, `outcome`, `bytes_out` (M4 will write; spec said reserve now)
- Also notes (advisory): subject_user_id correctly `text` (matches BetterAuth users.id; spec defect, not code defect); peer→grant cascade test not present (would be trivial to add)
- **Remediation in flight** (subagent `a673dd9355dc26f82` in worktree `agent-a4404ac1`): apply DESC + reserved cols, regenerate migration in place (preferred) or stack 0009 (fallback), force-push, post PR comment.
### Process notes
- Branch race incident: schema subagent + wrapper subagent both ran in main checkout → schema files appeared on wrapper branch. Recovered by TaskStop, `git checkout --` to clean, respawned schema subagent with `isolation: "worktree"`. **Rule going forward:** any subagent doing code edits gets `isolation: "worktree"` unless work is single-file and the orchestrator confirms no other branch will touch overlapping files.
- `pr-create.sh` shell-quotes backticks badly → use `tea pr create --repo mosaicstack/stack` directly (matches CLI-skill behavior). Will leave a followup to harden pr-create.sh.
- Gitea registry auth: bearer-token exchange flow (`/v2/token?service=container_registry&scope=repository:<repo>:pull`) — direct `Authorization: Bearer <pat>` returns 401.
- Portainer Swarm stack create endpoint: `POST /api/stacks/create/swarm/string?endpointId=<id>` (NOT `/api/stacks?type=1` — deprecated and rejected with 400).
### In-flight at compaction boundary
- Subagent `a78a9ab0ddae91fbc` — Dockerfile pnpm-deploy fix → PR (not yet opened at handoff)
- Subagent `a673dd9355dc26f82` — M2-01 schema remediation (DESC + reserved cols) → force-push to PR #486
- Both will trigger CI; orchestrator must independently re-review fixes (especially the security-adjacent schema work) per "always verify subagent claims" rule.
### Next after subagents return
1. Independent re-review of schema remediation (different subagent, fresh context)
2. Merge #486 if green
3. Merge Dockerfile fix PR if green → triggers Kaniko CI rebuild → capture new digest
4. Update `deploy/portainer/federated-test.stack.yml` to new digest in a small PR
5. Redeploy mos-test-1 + mos-test-2 (Portainer stack update via API)
6. Verify HTTPS reachability + `/health` endpoint at both hosts
7. DEPLOY-03/04 acceptance probes (`mosaic gateway doctor --json`, pgvector `vector(3)` round-trip)
8. DEPLOY-05: author `docs/federation/TEST-INFRA.md`
9. M2-02 (Step-CA sidecar) kicks off after image health is green

View File

@@ -0,0 +1,110 @@
# Hotfix Scratchpad — `install.sh` does not seed `TOOLS.md`
- **Issue:** mosaicstack/stack#457
- **Branch:** `fix/tools-md-seeding`
- **Type:** Out-of-mission hotfix (not part of Install UX v2 mission)
- **Started:** 2026-04-11
- **Ships in:** `@mosaicstack/mosaic` 0.0.30
## Objective
Ensure `~/.config/mosaic/TOOLS.md` is created on every supported install path so the mandatory AGENTS.md load order actually resolves. The load order lists `TOOLS.md` at position 5 but the bash installer never seeds it.
## Root cause
`packages/mosaic/framework/install.sh:228-236` — the post-sync "Seed defaults" loop explicitly lists `AGENTS.md STANDARDS.md`:
```bash
DEFAULTS_DIR="$TARGET_DIR/defaults"
if [[ -d "$DEFAULTS_DIR" ]]; then
for default_file in AGENTS.md STANDARDS.md; do # ← missing TOOLS.md
if [[ -f "$DEFAULTS_DIR/$default_file" ]] && [[ ! -f "$TARGET_DIR/$default_file" ]]; then
cp "$DEFAULTS_DIR/$default_file" "$TARGET_DIR/$default_file"
ok "Seeded $default_file from defaults"
fi
done
fi
```
`TOOLS.md` is listed in `PRESERVE_PATHS` (line 24) but never created in the first place. A fresh bootstrap install via `tools/install.sh → framework/install.sh` leaves `~/.config/mosaic/TOOLS.md` absent, and the agent load order then points at a missing file.
### Secondary: TypeScript `syncFramework` is too greedy
`packages/mosaic/src/config/file-adapter.ts:133-160``FileConfigAdapter.syncFramework` correctly seeds TOOLS.md, but it does so by iterating _every_ file in `framework/defaults/`:
```ts
for (const entry of readdirSync(defaultsDir)) {
const dest = join(this.mosaicHome, entry);
if (!existsSync(dest)) {
copyFileSync(join(defaultsDir, entry), dest);
}
}
```
`framework/defaults/` contains:
```
AGENTS.md
AUDIT-2026-02-17-framework-consistency.md
README.md
SOUL.md ← hardcoded "Jarvis"
STANDARDS.md
TOOLS.md
USER.md
```
So on a fresh install the TS wizard would silently copy the `Jarvis`-flavored `SOUL.md` + placeholder `USER.md` + internal `AUDIT-*.md` and `README.md` into the user's mosaic home before `mosaic init` ever prompts them. That's a latent identity bug as well as a root-clutter bug — the wizard's own stages are responsible for generating `SOUL.md`/`USER.md` via templates.
### Tertiary: stale `TOOLS.md.template`
`packages/mosaic/framework/templates/TOOLS.md.template` still references `~/.config/mosaic/rails/git/…` and `~/.config/mosaic/rails/codex/…`. The `rails/` tree was renamed to `tools/` in the v1→v2 migration (see `run_migrations` in `install.sh`, which removes the old `rails/` symlink). Any user who does run `mosaic init` ends up with a `TOOLS.md` that points to paths that no longer exist.
## Scope of this fix
1. **`packages/mosaic/framework/install.sh`** — extend the explicit seed list to include `TOOLS.md`.
2. **`packages/mosaic/src/config/file-adapter.ts`** — restrict `syncFramework` defaults-seeding to an explicit whitelist (`AGENTS.md`, `STANDARDS.md`, `TOOLS.md`) so the TS wizard never accidentally seeds `SOUL.md`/`USER.md`/`README.md`/`AUDIT-*.md` into the mosaic home.
3. **`packages/mosaic/framework/templates/TOOLS.md.template`** — replace `rails/` with `tools/` in the wrapper-path examples (minimal surgical fix; full template modernization is out of scope for a 0.0.30 hotfix).
4. **Regression test** — unit test around `FileConfigAdapter.syncFramework` that runs against a tmpdir fixture asserting:
- `TOOLS.md` is seeded when absent
- `AGENTS.md` / `STANDARDS.md` are still seeded when absent
- `SOUL.md` / `USER.md` are **not** seeded from `defaults/` (the wizard stages own those)
- Existing root files are not clobbered.
Out of scope (tracked separately / future work):
- Regenerating `defaults/SOUL.md` and `defaults/USER.md` so they no longer contain Jarvis-specific content.
- Fully modernizing `TOOLS.md.template` to match the rich canonical `defaults/TOOLS.md` reference.
- `issue-create.sh` / `pr-create.sh` `eval` bugs (already captured to OpenBrain from the prior hotfix).
## Plan / checklist
- [ ] Branch `fix/tools-md-seeding` from `main` (at `b2cbf89`)
- [ ] File Gitea issue (direct API; wrappers broken for bodies with backticks)
- [ ] Scratchpad created (this file)
- [ ] `install.sh` seed loop extended to `AGENTS.md STANDARDS.md TOOLS.md`
- [ ] `file-adapter.ts` seeding restricted to explicit whitelist
- [ ] `TOOLS.md.template` `rails/``tools/`
- [ ] Regression test added (`file-adapter.test.ts`) — failing first, then green
- [ ] `pnpm --filter @mosaicstack/mosaic run typecheck` green
- [ ] `pnpm --filter @mosaicstack/mosaic run lint` green
- [ ] `pnpm --filter @mosaicstack/mosaic exec vitest run` — new test green, no new failures beyond the known pre-existing `uninstall.spec.ts:138`
- [ ] Repo baselines: `pnpm typecheck` / `pnpm lint` / `pnpm format:check`
- [ ] Independent code review (`feature-dev:code-reviewer`, sonnet tier)
- [ ] Commit + push
- [ ] PR opened via Gitea API
- [ ] CI queue guard cleared (bypass local `ci-queue-wait.sh` if stale origin URL breaks it; query Gitea API directly)
- [ ] CI green on PR
- [ ] PR merged (squash)
- [ ] CI green on main
- [ ] Issue closed with link to merge commit
- [ ] `chore/release-mosaic-0.0.30` branch bumps `packages/mosaic/package.json` 0.0.29 → 0.0.30
- [ ] Release PR opened + merged
- [ ] `.woodpecker/publish.yml` auto-publishes to Gitea npm registry
- [ ] Publish verified (`npm view @mosaicstack/mosaic version` or registry check)
## Risks / blockers
- `ci-queue-wait.sh` wrapper may still crash on stale `origin` URL (captured in OpenBrain from prior hotfix). Workaround: query Gitea API directly for running/queued pipelines.
- `issue-create.sh` / `pr-create.sh` `eval` bugs. Workaround: Gitea API direct call.
- `uninstall.spec.ts:138` is a pre-existing failure on main; not this change's problem.
- Publish flow is fire-and-forget on main push — if `publish.yml` fails, rollback means republishing a follow-up patch, not reverting the version bump.

View File

@@ -0,0 +1,114 @@
# Hotfix Scratchpad — `mosaic yolo <runtime>` passes runtime name as initial user message
- **Issue:** mosaicstack/stack#454
- **Branch:** `fix/yolo-runtime-initial-arg`
- **Type:** Out-of-mission hotfix (not part of Install UX v2 mission)
- **Started:** 2026-04-11
## Objective
Stop `mosaic yolo <runtime>` from passing the runtime name (`claude`, `codex`, etc.) as the initial user message to the underlying CLI. Restore the mission-auto-prompt path for yolo launches.
## Root cause (confirmed)
`packages/mosaic/src/commands/launch.ts:779` — the `yolo <runtime>` action handler:
```ts
.action((runtime: string, _opts: unknown, cmd: Command) => {
// ... validate runtime ...
launchRuntime(runtime as RuntimeName, cmd.args, true);
});
```
Commander.js includes declared positional arguments in `cmd.args`. For `mosaic yolo claude`:
- `runtime` (destructured) = `"claude"`
- `cmd.args` = `["claude"]` — the same value
`launchRuntime` treats `["claude"]` as excess positional args, and for the `claude` case that becomes the initial user message. As a secondary consequence, `hasMissionNoArgs` evaluates false, so the mission-auto-prompt path is bypassed too.
## Live reproduction (intercepted claude binary)
```
$ PATH=/tmp/fake-claude-bin:$PATH mosaic yolo claude
[mosaic] Launching Claude Code in YOLO mode...
argv[1]: --dangerously-skip-permissions
argv[2]: --append-system-prompt
argv[3] (len=25601): # ACTIVE MISSION — HARD GATE ...
argv[4]: claude ← the bug
```
Non-yolo variant `mosaic claude` is clean:
```
argv[1]: --append-system-prompt
argv[2]: <prompt>
argv[3]: Active mission detected: MVP. Read the mission state files and report status.
```
## Plan
1. Refactor `launch.ts`: extract `registerRuntimeLaunchers(program, handler)` with an injectable handler so commander wiring is testable without spawning subprocesses. `registerLaunchCommands` delegates to it with `launchRuntime` as the handler.
2. Fix: in the `yolo <runtime>` action, pass `cmd.args.slice(1)` instead of `cmd.args`.
3. Add `packages/mosaic/src/commands/launch.spec.ts`:
- Failing-first reproducer: parse `['node','x','yolo','claude']` and assert handler receives `extraArgs=[]` and `yolo=true`.
- Regression test: parse `['node','x','claude']` asserts handler receives `extraArgs=[]` and `yolo=false`.
- Excess args: parse `['node','x','yolo','claude','--print','hi']` asserts handler receives `extraArgs=['--print','hi']` (with `--print` kept because `allowUnknownOption` is true).
- Excess args non-yolo: parse `['node','x','claude','--print','hi']` asserts `extraArgs=['--print','hi']`.
- Reject unknown runtime under yolo.
4. Run typecheck, lint, format:check, vitest for `@mosaicstack/mosaic`.
5. Independent code review (feature-dev:code-reviewer subagent, sonnet tier).
6. Commit → push → PR via wrappers → merge → CI green → close issue #454.
7. Release decision (`mosaic-v0.0.30`) deferred to Jason after merge.
## Framework compliance sub-findings (out-of-scope; to capture in OpenBrain after)
- `~/.config/mosaic/tools/git/issue-create.sh` uses `eval` on `$BODY`; arbitrary bodies with backticks, `$`, or parens break catastrophically.
- `gitea_issue_create_api` fallback uses `curl -fsS` without `-L`; after the `mosaicstack/mosaic-stack → mosaicstack/stack` rename, the API redirect is not followed and the fallback silently fails.
- Local repo `origin` remote still points at old `mosaic/mosaic-stack.git` slug. Not touched here per git-config safety rule.
- `~/.config/mosaic/TOOLS.md` referenced by the global load order but does not exist on disk.
These will be captured to OpenBrain after the hotfix merges so they don't get lost, and filed as separate tracking items.
## Progress checkpoints
- [x] Branch created (`fix/yolo-runtime-initial-arg`)
- [x] Issue #454 opened
- [x] Scratchpad scaffolded
- [x] Failing test added (red)
- [x] Refactor + fix applied
- [x] Tests green (launch.spec.ts 11/11)
- [x] Baselines green (typecheck, lint, format:check, vitest — pre-existing `uninstall.spec.ts:138` failure on branch main acknowledged, not caused by this change)
- [x] Code review pass (feature-dev:code-reviewer, sonnet — no blockers)
- [x] Commit + push (commit 1dd4f59)
- [x] PR opened (mosaicstack/stack#455)
- [x] CI queue guard cleared (no pending pipelines pre-push or pre-merge)
- [x] PR merged (squash merge commit b2cec8c6bac29336a6cdcdb4f19806f7b5fa0054)
- [x] CI green on main (`ci/woodpecker/push/ci` + `ci/woodpecker/push/publish` both success on merge commit)
- [x] Issue #454 closed
- [x] Scratchpad final evidence entry
## Tests run
- `pnpm --filter @mosaicstack/mosaic run typecheck` → green
- `pnpm --filter @mosaicstack/mosaic run lint` → green
- `pnpm --filter @mosaicstack/mosaic exec prettier --check "src/**/*.ts"` → green
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/launch.spec.ts` → 11/11 pass
- `pnpm --filter @mosaicstack/mosaic exec vitest run` → 270/271 pass (1 pre-existing `uninstall.spec.ts:138` EACCES failure, confirmed on the branch before this change)
- `pnpm typecheck` (repo) → green
- `pnpm lint` (repo) → green
- `pnpm format:check` (repo) → green (after prettier-writing the scratchpad)
## Risks / blockers
None expected. Refactor is small and the Commander API is stable. Test needs `exitOverride()` to prevent `process.exit` on invalid runtime.
## Final verification evidence
- PR: mosaicstack/stack#455 — state `closed`, merged.
- Merge commit: `b2cec8c6bac29336a6cdcdb4f19806f7b5fa0054` (squash to `main`).
- Post-merge CI (main @ b2cec8c6): `ci/woodpecker/push/ci` = success, `ci/woodpecker/push/publish` = success. (`ci/woodpecker/tag/publish` was last observed as a pre-existing failure on the prior release tag and is unrelated to this change.)
- Issue mosaicstack/stack#454 closed with a comment linking the merge commit.
- Launch regression suite: `launch.spec.ts` 11/11 pass on main.
- Baselines on main after merge are inherited from the PR CI run.
- Release decision (`mosaicstack/mosaic` 0.0.30) intentionally deferred to the user — the fix is now sitting on main awaiting a release cut.

View File

@@ -28,6 +28,7 @@ export default tseslint.config(
'apps/web/e2e/helpers/*.ts',
'apps/web/playwright.config.ts',
'apps/gateway/vitest.config.ts',
'packages/storage/vitest.config.ts',
'packages/mosaic/__tests__/*.ts',
],
},

View File

@@ -73,6 +73,27 @@ Spawn a worker instead. No exceptions. No "quick fixes."
- Wait for at least one worker to complete before spawning more
- This optimizes token usage and reduces context pressure
## File Ownership & Partitioning (Hard Rule for Parallel Workers)
When dispatching parallel workers, the orchestrator MUST assign **non-overlapping file scopes** to each worker. File collisions between parallel workers cause merge conflicts, lost edits, and wasted tokens.
**Rules:**
1. **Exclusive file ownership.** Each file may be assigned to at most one active worker. The orchestrator records ownership in the worker dispatch (prompt or task definition).
2. **Partition by directory or module.** Prefer assigning entire directories/modules to one worker rather than splitting files within a directory across workers.
3. **Shared files are serialized.** If two tasks must modify the same file (e.g., a shared types file, a barrel export), they MUST run sequentially — never in parallel. Mark the second task with `depends_on` pointing to the first.
4. **Test files follow source ownership.** If Worker A owns `src/auth/login.ts`, Worker A also owns `src/auth/__tests__/login.test.ts`. Do not split source and test across workers.
5. **Config files are orchestrator-reserved.** Files like `package.json`, `tsconfig.json`, and CI config are owned by the orchestrator and modified only between worker cycles, never during parallel execution.
6. **Document ownership in dispatch.** When spawning a worker, include an explicit `Files:` section listing owned paths/globs. Example:
```
Files (exclusive — do not touch files outside this scope):
- apps/web/src/components/auth/**
- apps/web/src/lib/auth.ts
```
7. **Conflict recovery.** If a worker edits a file outside its scope, the orchestrator MUST flag the violation, assess the diff, and either revert the out-of-scope change or re-run the affected worker with the corrected file.
## Delegation Mode Selection
Choose one delegation mode at session start:

View File

@@ -0,0 +1 @@
dev-only-step-ca-password-do-not-use-in-production

90
infra/step-ca/init.sh Executable file
View File

@@ -0,0 +1,90 @@
#!/bin/sh
# infra/step-ca/init.sh
#
# Idempotent first-boot initialiser for the Mosaic Federation CA.
#
# On the first run (no /home/step/config/ca.json present) this script:
# 1. Initialises Step-CA with a JWK provisioner named "mosaic-fed".
# 2. Writes the CA configuration to the persistent volume at /home/step.
# 3. Copies the federation X.509 template into the CA config directory.
# 4. Patches the mosaic-fed provisioner entry in ca.json to reference the
# template via options.x509.templateFile (using jq — must be installed
# in the container image).
#
# On subsequent runs (config already exists) this script skips init and
# starts the CA directly.
#
# The provisioner name "mosaic-fed" is consumed by:
# apps/gateway/src/federation/ca.service.ts (added in M2-04)
#
# Password source:
# Dev: mounted from ./infra/step-ca/dev-password via bind mount.
# Prod: mounted from a Docker secret at /run/secrets/ca_password.
#
# OID template:
# infra/step-ca/templates/federation.tpl emits custom OID extensions:
# 1.3.6.1.4.1.99999.1 — mosaic_grant_id
# 1.3.6.1.4.1.99999.2 — mosaic_subject_user_id
set -e
CA_CONFIG="/home/step/config/ca.json"
PASSWORD_FILE="/run/secrets/ca_password"
TEMPLATE_SRC="/etc/step-ca-templates/federation.tpl"
TEMPLATE_DEST="/home/step/templates/federation.tpl"
if [ ! -f "${CA_CONFIG}" ]; then
echo "[step-ca init] First boot detected — initialising Mosaic Federation CA..."
step ca init \
--name "Mosaic Federation CA" \
--dns "localhost" \
--dns "step-ca" \
--address ":9000" \
--provisioner "mosaic-fed" \
--password-file "${PASSWORD_FILE}" \
--provisioner-password-file "${PASSWORD_FILE}" \
--no-db
echo "[step-ca init] CA initialised."
# Copy the X.509 template into the Step-CA config directory.
if [ -f "${TEMPLATE_SRC}" ]; then
mkdir -p /home/step/templates
cp "${TEMPLATE_SRC}" "${TEMPLATE_DEST}"
echo "[step-ca init] Federation X.509 template copied to ${TEMPLATE_DEST}."
else
echo "[step-ca init] WARNING: Template source ${TEMPLATE_SRC} not found — skipping copy."
fi
# Wire the template into the mosaic-fed provisioner via jq.
# This is idempotent: the block only runs once (first boot).
#
# jq filter: find the provisioner entry with name "mosaic-fed" and set
# .options.x509.templateFile to the absolute path of the template.
# All other provisioners and config keys are left unchanged.
if [ -f "${TEMPLATE_DEST}" ] && command -v jq > /dev/null 2>&1; then
echo "[step-ca init] Patching mosaic-fed provisioner with X.509 template..."
TEMP_CONFIG="${CA_CONFIG}.tmp"
jq --arg tpl "${TEMPLATE_DEST}" '
.authority.provisioners |= map(
if .name == "mosaic-fed" then
.options.x509.templateFile = $tpl
else
.
end
)
' "${CA_CONFIG}" > "${TEMP_CONFIG}" && mv "${TEMP_CONFIG}" "${CA_CONFIG}"
echo "[step-ca init] Provisioner patched."
elif ! command -v jq > /dev/null 2>&1; then
echo "[step-ca init] WARNING: jq not found — skipping provisioner template patch."
echo "[step-ca init] Install jq in the step-ca image to enable automatic template wiring."
fi
echo "[step-ca init] Startup complete."
else
echo "[step-ca init] Config already exists — skipping init."
fi
echo "[step-ca init] Starting Step-CA on :9000..."
exec step-ca /home/step/config/ca.json --password-file "${PASSWORD_FILE}"

View File

@@ -0,0 +1,56 @@
{
"subject": {{ toJson .Subject }},
"sans": {{ toJson .SANs }},
{{- /*
Mosaic Federation X.509 Certificate Template
============================================
Provisioner: mosaic-fed (JWK)
Implemented: FED-M2-04
This template emits two custom OID extensions carrying Mosaic federation
identifiers. The OTT token (built by CaService.buildOtt) MUST include the
claims `mosaic_grant_id` and `mosaic_subject_user_id` as top-level JWT
claims. step-ca exposes them under `.Token.<claim>` in this template.
OID Registry (Mosaic Internal Arc 1.3.6.1.4.1.99999):
1.3.6.1.4.1.99999.1 mosaic_grant_id (UUID, 36 ASCII chars)
1.3.6.1.4.1.99999.2 mosaic_subject_user_id (UUID, 36 ASCII chars)
DER encoding for each extension value (ASN.1 UTF8String):
Tag = 0x0C (UTF8String)
Length = 0x24 (decimal 36 the fixed length of a UUID string)
Value = 36 ASCII bytes of the UUID
The `printf` below builds the raw TLV bytes then base64-encodes them.
step-ca expects the `value` field to be base64-encoded raw DER bytes.
Fail-loud contract:
If either claim is missing from the token the printf will produce a
zero-length UUID field, making the extension malformed. step-ca will
reject the certificate rather than issuing one without the required OIDs.
Silent OID stripping is NEVER tolerated.
Step-CA template reference:
https://smallstep.com/docs/step-ca/templates
*/ -}}
"extensions": [
{
"id": "1.3.6.1.4.1.99999.1",
"critical": false,
"value": "{{ printf "\x0c%c%s" (len .Token.mosaic_grant_id) .Token.mosaic_grant_id | b64enc }}"
},
{
"id": "1.3.6.1.4.1.99999.2",
"critical": false,
"value": "{{ printf "\x0c%c%s" (len .Token.mosaic_subject_user_id) .Token.mosaic_subject_user_id | b64enc }}"
}
],
"keyUsage": ["digitalSignature"],
"extKeyUsage": ["clientAuth"],
"basicConstraints": {
"isCA": false
}
}

View File

@@ -3,7 +3,7 @@
"version": "0.0.2",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/agent"
},
"main": "dist/index.js",

View File

@@ -3,7 +3,7 @@
"version": "0.0.2",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/auth"
},
"type": "module",

View File

@@ -10,3 +10,4 @@ export {
type SsoTeamSyncConfig,
type SupportedSsoProviderId,
} from './sso.js';
export { seal, unseal } from './seal.js';

52
packages/auth/src/seal.ts Normal file
View File

@@ -0,0 +1,52 @@
import { createCipheriv, createDecipheriv, createHash, randomBytes } from 'node:crypto';
const ALGORITHM = 'aes-256-gcm';
const IV_LENGTH = 12; // 96-bit IV for GCM
const TAG_LENGTH = 16; // 128-bit auth tag
/**
* Derive a 32-byte AES-256 key from BETTER_AUTH_SECRET using SHA-256.
* Throws if BETTER_AUTH_SECRET is not set.
*/
function deriveKey(): Buffer {
const secret = process.env['BETTER_AUTH_SECRET'];
if (!secret) {
throw new Error('BETTER_AUTH_SECRET is not set — cannot derive encryption key');
}
return createHash('sha256').update(secret).digest();
}
/**
* Seal a plaintext string using AES-256-GCM.
* Output format: base64(IV || authTag || ciphertext)
*/
export function seal(plaintext: string): string {
const key = deriveKey();
const iv = randomBytes(IV_LENGTH);
const cipher = createCipheriv(ALGORITHM, key, iv);
const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
const authTag = cipher.getAuthTag();
const combined = Buffer.concat([iv, authTag, encrypted]);
return combined.toString('base64');
}
/**
* Unseal a value sealed by `seal()`.
* Throws on authentication failure (tampered data) or if BETTER_AUTH_SECRET is unset.
*/
export function unseal(encoded: string): string {
const key = deriveKey();
const combined = Buffer.from(encoded, 'base64');
const iv = combined.subarray(0, IV_LENGTH);
const authTag = combined.subarray(IV_LENGTH, IV_LENGTH + TAG_LENGTH);
const ciphertext = combined.subarray(IV_LENGTH + TAG_LENGTH);
const decipher = createDecipheriv(ALGORITHM, key, iv);
decipher.setAuthTag(authTag);
const decrypted = Buffer.concat([decipher.update(ciphertext), decipher.final()]);
return decrypted.toString('utf8');
}

View File

@@ -3,7 +3,7 @@
"version": "0.0.3",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/brain"
},
"main": "dist/index.js",

View File

@@ -3,7 +3,7 @@
"version": "0.0.2",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/config"
},
"type": "module",

View File

@@ -1,7 +1,9 @@
export type { MosaicConfig, StorageTier, MemoryConfigRef } from './mosaic-config.js';
export {
DEFAULT_LOCAL_CONFIG,
DEFAULT_TEAM_CONFIG,
DEFAULT_STANDALONE_CONFIG,
DEFAULT_FEDERATED_CONFIG,
loadConfig,
validateConfig,
detectFromEnv,
} from './mosaic-config.js';

View File

@@ -0,0 +1,170 @@
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import {
validateConfig,
detectFromEnv,
DEFAULT_LOCAL_CONFIG,
DEFAULT_STANDALONE_CONFIG,
DEFAULT_FEDERATED_CONFIG,
} from './mosaic-config.js';
describe('validateConfig — tier enum', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
let stderrSpy: any;
beforeEach(() => {
stderrSpy = vi.spyOn(process.stderr, 'write').mockImplementation(() => true);
});
afterEach(() => {
stderrSpy.mockRestore();
});
it('accepts tier="local"', () => {
const result = validateConfig({
tier: 'local',
storage: { type: 'pglite', dataDir: '.mosaic/storage-pglite' },
queue: { type: 'local', dataDir: '.mosaic/queue' },
memory: { type: 'keyword' },
});
expect(result.tier).toBe('local');
});
it('accepts tier="standalone"', () => {
const result = validateConfig({
tier: 'standalone',
storage: { type: 'postgres', url: 'postgresql://mosaic:mosaic@localhost:5432/mosaic' },
queue: { type: 'bullmq' },
memory: { type: 'keyword' },
});
expect(result.tier).toBe('standalone');
});
it('accepts tier="federated"', () => {
const result = validateConfig({
tier: 'federated',
storage: { type: 'postgres', url: 'postgresql://mosaic:mosaic@localhost:5433/mosaic' },
queue: { type: 'bullmq' },
memory: { type: 'pgvector' },
});
expect(result.tier).toBe('federated');
});
it('accepts deprecated tier="team" as alias for "standalone" and emits a deprecation warning', () => {
const result = validateConfig({
tier: 'team',
storage: { type: 'postgres', url: 'postgresql://mosaic:mosaic@localhost:5432/mosaic' },
queue: { type: 'bullmq' },
memory: { type: 'keyword' },
});
expect(result.tier).toBe('standalone');
expect(stderrSpy).toHaveBeenCalledWith(expect.stringContaining('DEPRECATED'));
});
it('rejects an invalid tier with an error listing all three valid values', () => {
expect(() =>
validateConfig({
tier: 'invalid',
storage: { type: 'postgres', url: 'postgresql://mosaic:mosaic@localhost:5432/mosaic' },
queue: { type: 'bullmq' },
memory: { type: 'keyword' },
}),
).toThrow(/local.*standalone.*federated|federated.*standalone.*local/);
});
it('error message for invalid tier mentions all three valid values', () => {
let message = '';
try {
validateConfig({
tier: 'invalid',
storage: { type: 'postgres', url: 'postgresql://...' },
queue: { type: 'bullmq' },
memory: { type: 'keyword' },
});
} catch (err) {
message = err instanceof Error ? err.message : String(err);
}
expect(message).toContain('"local"');
expect(message).toContain('"standalone"');
expect(message).toContain('"federated"');
});
});
describe('DEFAULT_* config constants', () => {
it('DEFAULT_LOCAL_CONFIG has tier="local"', () => {
expect(DEFAULT_LOCAL_CONFIG.tier).toBe('local');
});
it('DEFAULT_STANDALONE_CONFIG has tier="standalone"', () => {
expect(DEFAULT_STANDALONE_CONFIG.tier).toBe('standalone');
});
it('DEFAULT_FEDERATED_CONFIG has tier="federated" and pgvector memory', () => {
expect(DEFAULT_FEDERATED_CONFIG.tier).toBe('federated');
expect(DEFAULT_FEDERATED_CONFIG.memory.type).toBe('pgvector');
});
it('DEFAULT_FEDERATED_CONFIG uses port 5433 (distinct from standalone 5432)', () => {
const url = (DEFAULT_FEDERATED_CONFIG.storage as { url: string }).url;
expect(url).toContain('5433');
});
it('DEFAULT_FEDERATED_CONFIG has enableVector=true on storage', () => {
const storage = DEFAULT_FEDERATED_CONFIG.storage as {
type: string;
url: string;
enableVector?: boolean;
};
expect(storage.enableVector).toBe(true);
});
});
describe('detectFromEnv — tier env-var routing', () => {
const originalEnv = process.env;
beforeEach(() => {
// Work on a fresh copy so individual tests can set/delete keys freely.
process.env = { ...originalEnv };
delete process.env['MOSAIC_STORAGE_TIER'];
delete process.env['DATABASE_URL'];
delete process.env['VALKEY_URL'];
});
afterEach(() => {
process.env = originalEnv;
});
it('no env vars → returns local config', () => {
const config = detectFromEnv();
expect(config.tier).toBe('local');
expect(config.storage.type).toBe('pglite');
expect(config.memory.type).toBe('keyword');
});
it('MOSAIC_STORAGE_TIER=federated alone → returns federated config with enableVector=true', () => {
process.env['MOSAIC_STORAGE_TIER'] = 'federated';
const config = detectFromEnv();
expect(config.tier).toBe('federated');
expect(config.memory.type).toBe('pgvector');
const storage = config.storage as { type: string; enableVector?: boolean };
expect(storage.enableVector).toBe(true);
});
it('MOSAIC_STORAGE_TIER=federated + DATABASE_URL → uses the URL and still has enableVector=true', () => {
process.env['MOSAIC_STORAGE_TIER'] = 'federated';
process.env['DATABASE_URL'] = 'postgresql://custom:pass@db.example.com:5432/mydb';
const config = detectFromEnv();
expect(config.tier).toBe('federated');
const storage = config.storage as { type: string; url: string; enableVector?: boolean };
expect(storage.url).toBe('postgresql://custom:pass@db.example.com:5432/mydb');
expect(storage.enableVector).toBe(true);
expect(config.memory.type).toBe('pgvector');
});
it('MOSAIC_STORAGE_TIER=standalone alone → returns standalone-shaped config (not local)', () => {
process.env['MOSAIC_STORAGE_TIER'] = 'standalone';
const config = detectFromEnv();
expect(config.tier).toBe('standalone');
expect(config.storage.type).toBe('postgres');
expect(config.memory.type).toBe('keyword');
});
});

View File

@@ -7,7 +7,7 @@ import type { QueueAdapterConfig as QueueConfig } from '@mosaicstack/queue';
/* Types */
/* ------------------------------------------------------------------ */
export type StorageTier = 'local' | 'team';
export type StorageTier = 'local' | 'standalone' | 'federated';
export interface MemoryConfigRef {
type: 'pgvector' | 'sqlite-vec' | 'keyword';
@@ -31,10 +31,21 @@ export const DEFAULT_LOCAL_CONFIG: MosaicConfig = {
memory: { type: 'keyword' },
};
export const DEFAULT_TEAM_CONFIG: MosaicConfig = {
tier: 'team',
export const DEFAULT_STANDALONE_CONFIG: MosaicConfig = {
tier: 'standalone',
storage: { type: 'postgres', url: 'postgresql://mosaic:mosaic@localhost:5432/mosaic' },
queue: { type: 'bullmq' },
memory: { type: 'keyword' },
};
export const DEFAULT_FEDERATED_CONFIG: MosaicConfig = {
tier: 'federated',
storage: {
type: 'postgres',
url: 'postgresql://mosaic:mosaic@localhost:5433/mosaic',
enableVector: true,
},
queue: { type: 'bullmq' },
memory: { type: 'pgvector' },
};
@@ -42,7 +53,7 @@ export const DEFAULT_TEAM_CONFIG: MosaicConfig = {
/* Validation */
/* ------------------------------------------------------------------ */
const VALID_TIERS = new Set<string>(['local', 'team']);
const VALID_TIERS = new Set<string>(['local', 'standalone', 'federated']);
const VALID_STORAGE_TYPES = new Set<string>(['postgres', 'pglite', 'files']);
const VALID_QUEUE_TYPES = new Set<string>(['bullmq', 'local']);
const VALID_MEMORY_TYPES = new Set<string>(['pgvector', 'sqlite-vec', 'keyword']);
@@ -55,9 +66,19 @@ export function validateConfig(raw: unknown): MosaicConfig {
const obj = raw as Record<string, unknown>;
// tier
const tier = obj['tier'];
let tier = obj['tier'];
// Deprecated alias: 'team' → 'standalone' (kept for backward-compat with 0.0.x installs)
if (tier === 'team') {
process.stderr.write(
'[mosaic] DEPRECATED: tier="team" is deprecated — use "standalone" instead. ' +
'Update your mosaic.config.json.\n',
);
tier = 'standalone';
}
if (typeof tier !== 'string' || !VALID_TIERS.has(tier)) {
throw new Error(`Invalid tier "${String(tier)}" — expected "local" or "team"`);
throw new Error(
`Invalid tier "${String(tier)}" — expected "local", "standalone", or "federated"`,
);
}
// storage
@@ -102,10 +123,52 @@ export function validateConfig(raw: unknown): MosaicConfig {
/* Loader */
/* ------------------------------------------------------------------ */
function detectFromEnv(): MosaicConfig {
export function detectFromEnv(): MosaicConfig {
const tier = process.env['MOSAIC_STORAGE_TIER'];
if (tier === 'federated') {
if (process.env['DATABASE_URL']) {
return {
...DEFAULT_FEDERATED_CONFIG,
storage: {
type: 'postgres',
url: process.env['DATABASE_URL'],
enableVector: true,
},
queue: {
type: 'bullmq',
url: process.env['VALKEY_URL'],
},
};
}
// MOSAIC_STORAGE_TIER=federated without DATABASE_URL — use the default
// federated config (port 5433, enableVector: true, pgvector memory).
return DEFAULT_FEDERATED_CONFIG;
}
if (tier === 'standalone') {
if (process.env['DATABASE_URL']) {
return {
...DEFAULT_STANDALONE_CONFIG,
storage: {
type: 'postgres',
url: process.env['DATABASE_URL'],
},
queue: {
type: 'bullmq',
url: process.env['VALKEY_URL'],
},
};
}
// MOSAIC_STORAGE_TIER=standalone without DATABASE_URL — use the default
// standalone config instead of silently falling back to local.
return DEFAULT_STANDALONE_CONFIG;
}
// Legacy: DATABASE_URL set without MOSAIC_STORAGE_TIER — treat as standalone.
if (process.env['DATABASE_URL']) {
return {
...DEFAULT_TEAM_CONFIG,
...DEFAULT_STANDALONE_CONFIG,
storage: {
type: 'postgres',
url: process.env['DATABASE_URL'],
@@ -116,6 +179,7 @@ function detectFromEnv(): MosaicConfig {
},
};
}
return DEFAULT_LOCAL_CONFIG;
}

View File

@@ -3,7 +3,7 @@
"version": "0.0.2",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/coord"
},
"main": "dist/index.js",

View File

@@ -0,0 +1,75 @@
CREATE TYPE "public"."grant_status" AS ENUM('active', 'revoked', 'expired');--> statement-breakpoint
CREATE TYPE "public"."peer_state" AS ENUM('pending', 'active', 'suspended', 'revoked');--> statement-breakpoint
CREATE TABLE "admin_tokens" (
"id" text PRIMARY KEY NOT NULL,
"user_id" text NOT NULL,
"token_hash" text NOT NULL,
"label" text NOT NULL,
"scope" text DEFAULT 'admin' NOT NULL,
"expires_at" timestamp with time zone,
"last_used_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL
);
--> statement-breakpoint
CREATE TABLE "federation_audit_log" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"request_id" text NOT NULL,
"peer_id" uuid,
"subject_user_id" text,
"grant_id" uuid,
"verb" text NOT NULL,
"resource" text NOT NULL,
"status_code" integer NOT NULL,
"result_count" integer,
"denied_reason" text,
"latency_ms" integer,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"query_hash" text,
"outcome" text,
"bytes_out" integer
);
--> statement-breakpoint
CREATE TABLE "federation_grants" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"subject_user_id" text NOT NULL,
"peer_id" uuid NOT NULL,
"scope" jsonb NOT NULL,
"status" "grant_status" DEFAULT 'active' NOT NULL,
"expires_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"revoked_at" timestamp with time zone,
"revoked_reason" text
);
--> statement-breakpoint
CREATE TABLE "federation_peers" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"common_name" text NOT NULL,
"display_name" text NOT NULL,
"cert_pem" text NOT NULL,
"cert_serial" text NOT NULL,
"cert_not_after" timestamp with time zone NOT NULL,
"client_key_pem" text,
"state" "peer_state" DEFAULT 'pending' NOT NULL,
"endpoint_url" text,
"last_seen_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
"revoked_at" timestamp with time zone,
CONSTRAINT "federation_peers_common_name_unique" UNIQUE("common_name"),
CONSTRAINT "federation_peers_cert_serial_unique" UNIQUE("cert_serial")
);
--> statement-breakpoint
ALTER TABLE "admin_tokens" ADD CONSTRAINT "admin_tokens_user_id_users_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_peer_id_federation_peers_id_fk" FOREIGN KEY ("peer_id") REFERENCES "public"."federation_peers"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_subject_user_id_users_id_fk" FOREIGN KEY ("subject_user_id") REFERENCES "public"."users"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_audit_log" ADD CONSTRAINT "federation_audit_log_grant_id_federation_grants_id_fk" FOREIGN KEY ("grant_id") REFERENCES "public"."federation_grants"("id") ON DELETE set null ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_grants" ADD CONSTRAINT "federation_grants_subject_user_id_users_id_fk" FOREIGN KEY ("subject_user_id") REFERENCES "public"."users"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_grants" ADD CONSTRAINT "federation_grants_peer_id_federation_peers_id_fk" FOREIGN KEY ("peer_id") REFERENCES "public"."federation_peers"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
CREATE INDEX "admin_tokens_user_id_idx" ON "admin_tokens" USING btree ("user_id");--> statement-breakpoint
CREATE UNIQUE INDEX "admin_tokens_hash_idx" ON "admin_tokens" USING btree ("token_hash");--> statement-breakpoint
CREATE INDEX "federation_audit_log_peer_created_at_idx" ON "federation_audit_log" USING btree ("peer_id","created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_audit_log_subject_created_at_idx" ON "federation_audit_log" USING btree ("subject_user_id","created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_audit_log_created_at_idx" ON "federation_audit_log" USING btree ("created_at" DESC NULLS LAST);--> statement-breakpoint
CREATE INDEX "federation_grants_subject_status_idx" ON "federation_grants" USING btree ("subject_user_id","status");--> statement-breakpoint
CREATE INDEX "federation_grants_peer_status_idx" ON "federation_grants" USING btree ("peer_id","status");--> statement-breakpoint
CREATE INDEX "federation_peers_cert_serial_idx" ON "federation_peers" USING btree ("cert_serial");--> statement-breakpoint
CREATE INDEX "federation_peers_state_idx" ON "federation_peers" USING btree ("state");

View File

@@ -0,0 +1,2 @@
ALTER TYPE "public"."grant_status" ADD VALUE 'pending' BEFORE 'active';--> statement-breakpoint
ALTER TABLE "federation_grants" ALTER COLUMN "status" SET DEFAULT 'pending';

View File

@@ -0,0 +1,11 @@
CREATE TABLE "federation_enrollment_tokens" (
"token" text PRIMARY KEY NOT NULL,
"grant_id" uuid NOT NULL,
"peer_id" uuid NOT NULL,
"expires_at" timestamp with time zone NOT NULL,
"used_at" timestamp with time zone,
"created_at" timestamp with time zone DEFAULT now() NOT NULL
);
--> statement-breakpoint
ALTER TABLE "federation_enrollment_tokens" ADD CONSTRAINT "federation_enrollment_tokens_grant_id_federation_grants_id_fk" FOREIGN KEY ("grant_id") REFERENCES "public"."federation_grants"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "federation_enrollment_tokens" ADD CONSTRAINT "federation_enrollment_tokens_peer_id_federation_peers_id_fk" FOREIGN KEY ("peer_id") REFERENCES "public"."federation_peers"("id") ON DELETE cascade ON UPDATE no action;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -57,6 +57,27 @@
"when": 1774227064500,
"tag": "0006_swift_shen",
"breakpoints": true
},
{
"idx": 8,
"version": "7",
"when": 1776822435828,
"tag": "0008_smart_lyja",
"breakpoints": true
},
{
"idx": 9,
"version": "7",
"when": 1745280000000,
"tag": "0009_federation_grant_pending",
"breakpoints": true
},
{
"idx": 10,
"version": "7",
"when": 1745366400000,
"tag": "0010_federation_enrollment_tokens",
"breakpoints": true
}
]
}

View File

@@ -3,7 +3,7 @@
"version": "0.0.3",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/db"
},
"type": "module",

View File

@@ -0,0 +1,424 @@
/**
* FED-M2-01 — Integration test: federation DB schema (peers / grants / audit_log).
*
* Prereq: docker compose -f docker-compose.federated.yml --profile federated up -d
* (or any postgres with the mosaic schema already applied)
* Run: FEDERATED_INTEGRATION=1 pnpm --filter @mosaicstack/db test src/federation.integration.test.ts
*
* Skipped when FEDERATED_INTEGRATION !== '1'.
*
* Strategy:
* - Applies the federation migration SQL directly (idempotent: CREATE TYPE/TABLE
* with IF NOT EXISTS guards applied via inline SQL before the migration DDL).
* - Assumes the base schema (users table etc.) already exists in the target DB.
* - All test rows use the `fed-m2-01-` prefix; cleanup in afterAll.
*
* Coverage:
* 1. Federation tables + enums apply cleanly against the existing schema.
* 2. Insert a sample user + peer + grant + audit row; verify round-trip.
* 3. FK cascade: deleting the user cascades to federation_grants.
* 4. FK set-null: deleting the peer sets federation_audit_log.peer_id to NULL.
* 5. Enum constraint: inserting an invalid status/state value throws a DB error.
* 6. Unique constraint: duplicate cert_serial throws a DB error.
*/
import postgres from 'postgres';
import { afterAll, beforeAll, describe, expect, it } from 'vitest';
const run = process.env['FEDERATED_INTEGRATION'] === '1';
const PG_URL = process.env['DATABASE_URL'] ?? 'postgresql://mosaic:mosaic@localhost:5433/mosaic';
/** Recognisable test-row prefix for safe cleanup without full-table truncation. */
const T = 'fed-m2-01';
// Deterministic IDs (UUID format required for uuid PK columns: 8-4-4-4-12 hex digits).
const PEER1_ID = `f2000001-0000-4000-8000-000000000001`;
const PEER2_ID = `f2000002-0000-4000-8000-000000000002`;
const USER1_ID = `${T}-user-1`;
let sql: ReturnType<typeof postgres> | undefined;
beforeAll(async () => {
if (!run) return;
sql = postgres(PG_URL, { max: 1, connect_timeout: 10, idle_timeout: 10 });
// Apply the federation enums and tables idempotently.
// This mirrors the migration file but uses IF NOT EXISTS guards so it can run
// against a DB that may not have had drizzle migrations tracked.
await sql`
DO $$ BEGIN
CREATE TYPE peer_state AS ENUM ('pending', 'active', 'suspended', 'revoked');
EXCEPTION WHEN duplicate_object THEN NULL;
END $$
`;
await sql`
DO $$ BEGIN
CREATE TYPE grant_status AS ENUM ('active', 'revoked', 'expired');
EXCEPTION WHEN duplicate_object THEN NULL;
END $$
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_peers (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
common_name text NOT NULL,
display_name text NOT NULL,
cert_pem text NOT NULL,
cert_serial text NOT NULL,
cert_not_after timestamp with time zone NOT NULL,
client_key_pem text,
state peer_state NOT NULL DEFAULT 'pending',
endpoint_url text,
last_seen_at timestamp with time zone,
created_at timestamp with time zone NOT NULL DEFAULT now(),
revoked_at timestamp with time zone,
CONSTRAINT federation_peers_common_name_unique UNIQUE (common_name),
CONSTRAINT federation_peers_cert_serial_unique UNIQUE (cert_serial)
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_peers_cert_serial_idx ON federation_peers (cert_serial)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_peers_state_idx ON federation_peers (state)
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_grants (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
subject_user_id text NOT NULL REFERENCES users(id) ON DELETE CASCADE,
peer_id uuid NOT NULL REFERENCES federation_peers(id) ON DELETE CASCADE,
scope jsonb NOT NULL,
status grant_status NOT NULL DEFAULT 'active',
expires_at timestamp with time zone,
created_at timestamp with time zone NOT NULL DEFAULT now(),
revoked_at timestamp with time zone,
revoked_reason text
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_grants_subject_status_idx ON federation_grants (subject_user_id, status)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_grants_peer_status_idx ON federation_grants (peer_id, status)
`;
await sql`
CREATE TABLE IF NOT EXISTS federation_audit_log (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
request_id text NOT NULL,
peer_id uuid REFERENCES federation_peers(id) ON DELETE SET NULL,
subject_user_id text REFERENCES users(id) ON DELETE SET NULL,
grant_id uuid REFERENCES federation_grants(id) ON DELETE SET NULL,
verb text NOT NULL,
resource text NOT NULL,
status_code integer NOT NULL,
result_count integer,
denied_reason text,
latency_ms integer,
created_at timestamp with time zone NOT NULL DEFAULT now(),
query_hash text,
outcome text,
bytes_out integer
)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_peer_created_at_idx
ON federation_audit_log (peer_id, created_at DESC NULLS LAST)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_subject_created_at_idx
ON federation_audit_log (subject_user_id, created_at DESC NULLS LAST)
`;
await sql`
CREATE INDEX IF NOT EXISTS federation_audit_log_created_at_idx
ON federation_audit_log (created_at DESC NULLS LAST)
`;
});
afterAll(async () => {
if (!sql) return;
// Cleanup in FK-safe order (children before parents).
await sql`DELETE FROM federation_audit_log WHERE request_id LIKE ${T + '%'}`.catch(() => {});
await sql`
DELETE FROM federation_grants
WHERE subject_user_id LIKE ${T + '%'}
OR revoked_reason LIKE ${T + '%'}
`.catch(() => {});
await sql`DELETE FROM federation_peers WHERE common_name LIKE ${T + '%'}`.catch(() => {});
await sql`DELETE FROM users WHERE id LIKE ${T + '%'}`.catch(() => {});
await sql.end({ timeout: 3 }).catch(() => {});
});
describe.skipIf(!run)('federation schema — integration', () => {
// ── 1. Insert sample rows ──────────────────────────────────────────────────
it('inserts a user, peer, grant, and audit row without constraint violation', async () => {
const certPem = '-----BEGIN CERTIFICATE-----\nMIItest\n-----END CERTIFICATE-----';
// User — BetterAuth users.id is text (any string, not uuid).
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${USER1_ID}, ${'M2-01 Test User'}, ${USER1_ID + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
// Peer
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER1_ID},
${T + '-gateway-example-com'},
${'Test Peer'},
${certPem},
${T + '-serial-001'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
// Grant — scope is jsonb; pass as JSON string and cast server-side.
const scopeJson = JSON.stringify({
resources: ['tasks', 'notes'],
operations: ['list', 'get'],
});
const grants = await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${USER1_ID},
${PEER1_ID},
${scopeJson}::jsonb,
${'active'},
now()
)
RETURNING id
`;
expect(grants).toHaveLength(1);
const grantId = grants[0]!['id'] as string;
// Audit log row
await sql!`
INSERT INTO federation_audit_log
(request_id, peer_id, subject_user_id, grant_id, verb, resource, status_code, created_at)
VALUES (
${T + '-req-001'},
${PEER1_ID},
${USER1_ID},
${grantId},
${'list'},
${'tasks'},
${200},
now()
)
`;
// Verify the audit row is present and has correct data.
const auditRows = await sql!`
SELECT * FROM federation_audit_log WHERE request_id = ${T + '-req-001'}
`;
expect(auditRows).toHaveLength(1);
expect(auditRows[0]!['status_code']).toBe(200);
expect(auditRows[0]!['verb']).toBe('list');
expect(auditRows[0]!['resource']).toBe('tasks');
}, 30_000);
// ── 2. FK cascade: user delete cascades grants ─────────────────────────────
it('cascade-deletes federation_grants when the subject user is deleted', async () => {
const cascadeUserId = `${T}-cascade-user`;
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${cascadeUserId}, ${'Cascade User'}, ${cascadeUserId + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, revoked_reason, created_at)
VALUES (
${cascadeUserId},
${PEER1_ID},
${scopeJson}::jsonb,
${'active'},
${T + '-cascade-test'},
now()
)
`;
const before = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE subject_user_id = ${cascadeUserId}
`;
expect(before[0]!['cnt']).toBe(1);
// Delete user → grants should cascade-delete.
await sql!`DELETE FROM users WHERE id = ${cascadeUserId}`;
const after = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE subject_user_id = ${cascadeUserId}
`;
expect(after[0]!['cnt']).toBe(0);
}, 15_000);
// ── 3. FK set-null: peer delete sets audit_log.peer_id to NULL ────────────
it('sets federation_audit_log.peer_id to NULL when the peer is deleted', async () => {
// Insert a throwaway peer for this specific cascade test.
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER2_ID},
${T + '-gateway-throwaway-com'},
${'Throwaway Peer'},
${'cert-pem-placeholder'},
${T + '-serial-002'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
const reqId = `${T}-req-setnull`;
await sql!`
INSERT INTO federation_audit_log
(request_id, peer_id, subject_user_id, verb, resource, status_code, created_at)
VALUES (
${reqId},
${PEER2_ID},
${USER1_ID},
${'get'},
${'tasks'},
${200},
now()
)
`;
await sql!`DELETE FROM federation_peers WHERE id = ${PEER2_ID}`;
const rows = await sql!`
SELECT peer_id FROM federation_audit_log WHERE request_id = ${reqId}
`;
expect(rows).toHaveLength(1);
expect(rows[0]!['peer_id']).toBeNull();
}, 15_000);
// ── 4. Enum constraint: invalid grant_status rejected ─────────────────────
it('rejects an invalid grant_status value with a DB error', async () => {
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await expect(
sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${USER1_ID},
${PEER1_ID},
${scopeJson}::jsonb,
${'invalid_status'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 5. Enum constraint: invalid peer_state rejected ───────────────────────
it('rejects an invalid peer_state value with a DB error', async () => {
await expect(
sql!`
INSERT INTO federation_peers
(common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${'bad-state-peer'},
${'Bad State'},
${'pem'},
${'bad-serial-999'},
now() + interval '1 year',
${'invalid_state'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 6. Unique constraint: duplicate cert_serial rejected ──────────────────
it('rejects a duplicate cert_serial with a unique constraint violation', async () => {
await expect(
sql!`
INSERT INTO federation_peers
(common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${T + '-dup-cn'},
${'Dup Peer'},
${'pem'},
${T + '-serial-001'},
now() + interval '1 year',
${'pending'},
now()
)
`,
).rejects.toThrow();
}, 10_000);
// ── 7. FK cascade: peer delete cascades to federation_grants ─────────────
it('cascade-deletes federation_grants when the owning peer is deleted', async () => {
const PEER3_ID = `f2000003-0000-4000-8000-000000000003`;
const cascadeGrantUserId = `${T}-cascade-grant-user`;
// Insert a dedicated user and peer for this test.
await sql!`
INSERT INTO users (id, name, email, email_verified, created_at, updated_at)
VALUES (${cascadeGrantUserId}, ${'Cascade Grant User'}, ${cascadeGrantUserId + '@example.com'}, false, now(), now())
ON CONFLICT (id) DO NOTHING
`;
await sql!`
INSERT INTO federation_peers
(id, common_name, display_name, cert_pem, cert_serial, cert_not_after, state, created_at)
VALUES (
${PEER3_ID},
${T + '-gateway-cascade-peer'},
${'Cascade Peer'},
${'cert-pem-cascade'},
${T + '-serial-003'},
now() + interval '1 year',
${'active'},
now()
)
ON CONFLICT (id) DO NOTHING
`;
const scopeJson = JSON.stringify({ resources: ['tasks'] });
await sql!`
INSERT INTO federation_grants
(subject_user_id, peer_id, scope, status, created_at)
VALUES (
${cascadeGrantUserId},
${PEER3_ID},
${scopeJson}::jsonb,
${'active'},
now()
)
`;
const before = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE peer_id = ${PEER3_ID}
`;
expect(before[0]!['cnt']).toBe(1);
// Delete peer → grants should cascade-delete.
await sql!`DELETE FROM federation_peers WHERE id = ${PEER3_ID}`;
const after = await sql!`
SELECT count(*)::int AS cnt FROM federation_grants WHERE peer_id = ${PEER3_ID}
`;
expect(after[0]!['cnt']).toBe(0);
// Cleanup
await sql!`DELETE FROM users WHERE id = ${cascadeGrantUserId}`.catch(() => {});
}, 15_000);
});

View File

@@ -0,0 +1,21 @@
/**
* Federation schema re-exports.
*
* The actual table and enum definitions live in schema.ts (alongside all other
* Drizzle tables) to avoid CJS/ESM cross-import issues when drizzle-kit loads
* schema files via esbuild-register. Application code that wants named imports
* for federation symbols should import from this file.
*
* M2-01: DB tables and enums only. No business logic.
* M2-03 will add JSON schema validation for the `scope` column.
* M4 will write rows to federation_audit_log.
*/
export {
peerStateEnum,
grantStatusEnum,
federationPeers,
federationGrants,
federationAuditLog,
federationEnrollmentTokens,
} from './schema.js';

View File

@@ -2,6 +2,7 @@ export { createDb, type Db, type DbHandle } from './client.js';
export { createPgliteDb } from './client-pglite.js';
export { runMigrations } from './migrate.js';
export * from './schema.js';
export * from './federation.js';
export {
eq,
and,

View File

@@ -5,6 +5,7 @@
import {
pgTable,
pgEnum,
text,
timestamp,
boolean,
@@ -372,7 +373,11 @@ export const messages = pgTable(
// ─── pgvector custom type ───────────────────────────────────────────────────
const vector = customType<{ data: number[]; driverParam: string; config: { dimensions: number } }>({
export const vector = customType<{
data: number[];
driverParam: string;
config: { dimensions: number };
}>({
dataType(config) {
return `vector(${config?.dimensions ?? 1536})`;
},
@@ -581,3 +586,226 @@ export const summarizationJobs = pgTable(
},
(t) => [index('summarization_jobs_status_idx').on(t.status)],
);
// ─── Federation ──────────────────────────────────────────────────────────────
// Enums declared before tables that reference them.
// All federation definitions live in this file (avoids CJS/ESM cross-import
// issues when drizzle-kit loads schema files via esbuild-register).
// Application code imports from `federation.ts` which re-exports from here.
/**
* Lifecycle state of a federation peer.
* - pending: registered but not yet approved / TLS handshake not confirmed
* - active: fully operational; mTLS verified
* - suspended: temporarily blocked; cert still valid
* - revoked: cert revoked; no traffic allowed
*/
export const peerStateEnum = pgEnum('peer_state', ['pending', 'active', 'suspended', 'revoked']);
/**
* Lifecycle state of a federation grant.
* - pending: created but not yet activated (awaiting cert enrollment, M2-07)
* - active: grant is in effect
* - revoked: manually revoked before expiry
* - expired: natural expiry (expires_at passed)
*/
export const grantStatusEnum = pgEnum('grant_status', ['pending', 'active', 'revoked', 'expired']);
/**
* A registered peer gateway identified by its Step-CA certificate CN.
* Represents both inbound peers (other gateways querying us) and outbound
* peers (gateways we query — identified by client_key_pem being set).
*/
export const federationPeers = pgTable(
'federation_peers',
{
id: uuid('id').primaryKey().defaultRandom(),
/** Certificate CN, e.g. "gateway-uscllc-com". Unique — one row per peer identity. */
commonName: text('common_name').notNull().unique(),
/** Human-friendly label shown in admin UI. */
displayName: text('display_name').notNull(),
/** Pinned PEM certificate used for mTLS verification. */
certPem: text('cert_pem').notNull(),
/** Certificate serial number — used for CRL / revocation lookup. */
certSerial: text('cert_serial').notNull().unique(),
/** Certificate expiry — used by the renewal scheduler (FED-M6). */
certNotAfter: timestamp('cert_not_after', { withTimezone: true }).notNull(),
/**
* Sealed (encrypted) private key for outbound connections TO this peer.
* NULL for inbound-only peer rows (we serve them; we don't call them).
*/
clientKeyPem: text('client_key_pem'),
/** Current peer lifecycle state. */
state: peerStateEnum('state').notNull().default('pending'),
/** Base URL for outbound queries, e.g. "https://woltje.com:443". NULL for inbound-only peers. */
endpointUrl: text('endpoint_url'),
/** Timestamp of the most recent successful inbound or outbound request. */
lastSeenAt: timestamp('last_seen_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
/** Populated when the cert is revoked; NULL while the peer is active. */
revokedAt: timestamp('revoked_at', { withTimezone: true }),
},
(t) => [
// CRL / revocation lookups by serial.
index('federation_peers_cert_serial_idx').on(t.certSerial),
// Filter peers by state (e.g. find all active peers for outbound routing).
index('federation_peers_state_idx').on(t.state),
],
);
/**
* A grant lets a specific peer cert query a specific local user's data within
* a defined scope. Scopes are validated by JSON Schema in M2-03; this table
* stores them as raw jsonb.
*/
export const federationGrants = pgTable(
'federation_grants',
{
id: uuid('id').primaryKey().defaultRandom(),
/**
* The local user whose data this grant exposes.
* Cascade delete: if the user account is deleted, revoke all their grants.
*/
subjectUserId: text('subject_user_id')
.notNull()
.references(() => users.id, { onDelete: 'cascade' }),
/**
* The peer gateway holding the grant.
* Cascade delete: if the peer record is removed, the grant is moot.
*/
peerId: uuid('peer_id')
.notNull()
.references(() => federationPeers.id, { onDelete: 'cascade' }),
/**
* Scope object — validated by JSON Schema (M2-03).
* Example: { "resources": ["tasks", "notes"], "operations": ["list", "get"] }
*/
scope: jsonb('scope').notNull(),
/** Current grant lifecycle state. */
status: grantStatusEnum('status').notNull().default('pending'),
/** Optional hard expiry. NULL means the grant does not expire automatically. */
expiresAt: timestamp('expires_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
/** Populated when the grant is explicitly revoked. */
revokedAt: timestamp('revoked_at', { withTimezone: true }),
/** Human-readable reason for revocation (audit trail). */
revokedReason: text('revoked_reason'),
},
(t) => [
// Hot path: look up active grants for a subject user (auth middleware).
index('federation_grants_subject_status_idx').on(t.subjectUserId, t.status),
// Hot path: look up active grants held by a peer (inbound request check).
index('federation_grants_peer_status_idx').on(t.peerId, t.status),
],
);
/**
* Append-only audit log of all federation requests.
* M4 writes rows here. M2 only creates the table.
*
* All FKs use SET NULL so audit rows survive peer/user/grant deletion.
*/
export const federationAuditLog = pgTable(
'federation_audit_log',
{
id: uuid('id').primaryKey().defaultRandom(),
/** UUIDv7 from the X-Request-ID header — correlates with OTEL traces. */
requestId: text('request_id').notNull(),
/** Peer that made the request. SET NULL if the peer is later deleted. */
peerId: uuid('peer_id').references(() => federationPeers.id, { onDelete: 'set null' }),
/** Subject user whose data was queried. SET NULL if the user is deleted. */
subjectUserId: text('subject_user_id').references(() => users.id, { onDelete: 'set null' }),
/** Grant under which the request was authorised. SET NULL if the grant is deleted. */
grantId: uuid('grant_id').references(() => federationGrants.id, { onDelete: 'set null' }),
/** Request verb: "list" | "get" | "search". */
verb: text('verb').notNull(),
/** Resource type: "tasks" | "notes" | "memory" | etc. */
resource: text('resource').notNull(),
/** HTTP status code returned to the peer. */
statusCode: integer('status_code').notNull(),
/** Number of items returned (NULL for non-list requests or errors). */
resultCount: integer('result_count'),
/** Why the request was denied (NULL when allowed). */
deniedReason: text('denied_reason'),
/** End-to-end latency in milliseconds. */
latencyMs: integer('latency_ms'),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
// Reserved for M4 — see PRD 7.3
/** SHA-256 of the normalised GraphQL/REST query string; written by M4 search. */
queryHash: text('query_hash'),
/** Request outcome: "allowed" | "denied" | "partial"; written by M4. */
outcome: text('outcome'),
/** Response payload size in bytes; written by M4. */
bytesOut: integer('bytes_out'),
},
(t) => [
// Per-peer request history in reverse chronological order.
index('federation_audit_log_peer_created_at_idx').on(t.peerId, t.createdAt.desc()),
// Per-user access log in reverse chronological order.
index('federation_audit_log_subject_created_at_idx').on(t.subjectUserId, t.createdAt.desc()),
// Global time-range scans (dashboards, rate-limit windows).
index('federation_audit_log_created_at_idx').on(t.createdAt.desc()),
],
);
/**
* Single-use enrollment tokens — M2-07.
*
* An admin creates a token (with a TTL) and hands it out-of-band to the
* remote peer operator. The peer redeems it exactly once by posting its
* CSR to POST /api/federation/enrollment/:token. The token is atomically
* marked as used to prevent replay attacks.
*/
export const federationEnrollmentTokens = pgTable('federation_enrollment_tokens', {
/** 32-byte hex token — crypto.randomBytes(32).toString('hex') */
token: text('token').primaryKey(),
/** The federation grant this enrollment activates. */
grantId: uuid('grant_id')
.notNull()
.references(() => federationGrants.id, { onDelete: 'cascade' }),
/** The peer record that will be updated on successful enrollment. */
peerId: uuid('peer_id')
.notNull()
.references(() => federationPeers.id, { onDelete: 'cascade' }),
/** Hard expiry — token rejected after this time even if not used. */
expiresAt: timestamp('expires_at', { withTimezone: true }).notNull(),
/** NULL until the token is redeemed. Set atomically to prevent replay. */
usedAt: timestamp('used_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
});

View File

@@ -3,7 +3,7 @@
"version": "0.0.2",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/design-tokens"
},
"type": "module",

View File

@@ -3,7 +3,7 @@
"version": "0.0.3",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/forge"
},
"type": "module",

View File

@@ -3,7 +3,7 @@
"version": "0.0.3",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/log"
},
"type": "module",

View File

@@ -3,7 +3,7 @@
"version": "0.0.3",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/macp"
},
"type": "module",

View File

@@ -3,7 +3,7 @@
"version": "0.0.4",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/memory"
},
"type": "module",

View File

@@ -17,6 +17,7 @@ import { runWizard } from '../../src/wizard.js';
describe('Full Wizard (headless)', () => {
let tmpDir: string;
const repoRoot = join(import.meta.dirname, '..', '..');
const originalEnv = { ...process.env };
beforeEach(() => {
tmpDir = mkdtempSync(join(tmpdir(), 'mosaic-wizard-test-'));
@@ -32,12 +33,16 @@ describe('Full Wizard (headless)', () => {
afterEach(() => {
rmSync(tmpDir, { recursive: true, force: true });
process.env = { ...originalEnv };
});
it('quick start produces valid SOUL.md', async () => {
// The headless path reads agent name from MOSAIC_AGENT_NAME env var
// (via agentIntentStage) rather than prompting interactively.
process.env['MOSAIC_AGENT_NAME'] = 'TestBot';
const prompter = new HeadlessPrompter({
'Installation mode': 'quick',
'What name should agents use?': 'TestBot',
'Communication style': 'direct',
'Your name': 'Tester',
'Your pronouns': 'They/Them',
@@ -62,9 +67,10 @@ describe('Full Wizard (headless)', () => {
});
it('quick start produces valid USER.md', async () => {
process.env['MOSAIC_AGENT_NAME'] = 'TestBot';
const prompter = new HeadlessPrompter({
'Installation mode': 'quick',
'What name should agents use?': 'TestBot',
'Communication style': 'direct',
'Your name': 'Tester',
'Your pronouns': 'He/Him',

View File

@@ -151,11 +151,68 @@ When delegating work to subagents, you MUST select the cheapest model capable of
**Runtime-specific syntax**: See the runtime reference for how to specify model tier when spawning subagents (e.g., Claude Code Task tool `model` parameter).
## Superpowers Enforcement (Hard Rule)
Mosaic provides capabilities beyond basic code editing: **skills**, **hooks**, **MCP tools**, and **plugins**. These are not optional extras — they are force multipliers that agents MUST actively use when applicable. Under-utilization of superpowers is a framework violation.
### Skills
Skills are domain-specific instruction sets in `~/.config/mosaic/skills/` that encode best practices, patterns, and guardrails. They are loaded into agents via the runtime's skill mechanism (e.g., Claude Code slash commands, Pi `--skill` flag).
**Rules:**
1. Before starting implementation, scan available skills (`ls ~/.config/mosaic/skills/`) and load any that match the task domain.
2. When a skill exists for the technology being used (e.g., `nestjs-best-practices` for NestJS work), you MUST load it.
3. When spawning workers, include skill loading in the kickstart prompt.
4. If you complete a task without loading a relevant available skill, that is a quality gap.
### Hooks
Hooks provide automated quality gates (lint, format, typecheck) that fire on file edits. They are configured in the runtime settings and run automatically.
**Rules:**
1. Do NOT bypass or suppress hook output. If a hook reports errors, fix them before proceeding.
2. Hook failures are immediate feedback — treat them like failing tests.
3. If a hook is consistently failing on valid code, report it as a framework issue rather than working around it.
### MCP Tools
MCP servers extend agent capabilities with external integrations (sequential-thinking, web search, memory, browser automation, etc.). Available MCP tools are listed at session start.
**Rules:**
1. **sequential-thinking** is REQUIRED for planning, architecture, and multi-step reasoning. Use it — do not skip structured thinking for complex decisions.
2. **OpenBrain** (`capture`, `search`, `recent`) is the cross-agent memory layer. Capture discoveries and search for prior context at session start.
3. When a task involves web research, browser testing, or external data, use the available MCP tools (web-search, chrome-devtools, web-reader) rather than asking the user to look things up.
4. Check available MCP tools at session start and use them proactively throughout the session.
### Plugins (Runtime-Specific)
Runtime plugins (e.g., Claude Code's `feature-dev`, `pr-review-toolkit`, `code-review`) provide specialized agent capabilities like code review, architecture analysis, and test coverage analysis.
**Rules:**
1. After completing a significant code change, use code review plugins proactively — do not wait for the user to ask.
2. Before creating a PR, use PR review plugins to catch issues early.
3. When designing architecture, use planning/architecture plugins for structured analysis.
### Self-Evolution
The Mosaic framework should improve over time based on usage patterns:
1. When you discover a recurring pattern that should be codified, capture it to OpenBrain with `type: "framework-improvement"`.
2. When a hook, skill, or tool is missing for a common task, capture the gap to OpenBrain with `type: "tooling-gap"`.
3. When a framework rule causes friction without adding value, capture the observation to OpenBrain with `type: "framework-friction"`.
These captures feed the framework's continuous improvement cycle.
## Skills Policy
- Use only the minimum required skills for the active task.
- Load skills that match the active task domain before starting implementation.
- Do not load unrelated skills.
- Follow skill trigger rules from the active runtime instruction layer.
- Actively check `~/.config/mosaic/skills/` for applicable skills rather than passively waiting for them to be mentioned.
## Session Closure Requirement

View File

@@ -4,14 +4,20 @@ Universal agent standards layer for Claude Code, Codex, OpenCode, and Pi.
One config, every runtime, same standards.
> **This is the framework component of [mosaic-stack](https://git.mosaicstack.dev/mosaic/mosaic-stack).** No personal data, credentials, user-specific preferences, or machine-specific paths should be committed. All personalization happens at install time via `mosaic init` or by editing files in `~/.config/mosaic/` after installation.
> **This is the framework component of [mosaic-stack](https://git.mosaicstack.dev/mosaicstack/stack).** No personal data, credentials, user-specific preferences, or machine-specific paths should be committed. All personalization happens at install time via `mosaic init` or by editing files in `~/.config/mosaic/` after installation.
## Quick Install
### Mac / Linux
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaic/mosaic-stack/raw/branch/main/tools/install.sh)
curl -fsSL https://mosaicstack.dev/install.sh | bash
```
Or use the direct URL:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/stack/raw/branch/main/tools/install.sh)
```
### Windows (PowerShell)
@@ -23,8 +29,8 @@ bash <(curl -fsSL https://git.mosaicstack.dev/mosaic/mosaic-stack/raw/branch/mai
### From Source (any platform)
```bash
git clone git@git.mosaicstack.dev:mosaic/mosaic-stack.git ~/src/mosaic-stack
cd ~/src/mosaic-stack && bash tools/install.sh
git clone git@git.mosaicstack.dev:mosaicstack/stack.git ~/src/stack
cd ~/src/stack && bash tools/install.sh
```
The installer:
@@ -145,13 +151,19 @@ mosaic upgrade check # Check upgrade status (no changes)
Run the installer again — it handles upgrades automatically:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaic/mosaic-stack/raw/branch/main/tools/install.sh)
curl -fsSL https://mosaicstack.dev/install.sh | bash
```
Or use the direct URL:
```bash
bash <(curl -fsSL https://git.mosaicstack.dev/mosaicstack/stack/raw/branch/main/tools/install.sh)
```
Or from a local checkout:
```bash
cd ~/src/mosaic-stack && git pull && bash tools/install.sh
cd ~/src/stack && git pull && bash tools/install.sh
```
The installer preserves local `SOUL.md`, `USER.md`, `TOOLS.md`, and `memory/` by default.

View File

@@ -19,8 +19,9 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TARGET_DIR="${MOSAIC_HOME:-$HOME/.config/mosaic}"
INSTALL_MODE="${MOSAIC_INSTALL_MODE:-prompt}"
# Files preserved across upgrades (never overwritten)
PRESERVE_PATHS=("SOUL.md" "USER.md" "TOOLS.md" "memory" "sources")
# Files/dirs preserved across upgrades (never overwritten).
# User-created content in these paths survives rsync --delete.
PRESERVE_PATHS=("AGENTS.md" "SOUL.md" "USER.md" "TOOLS.md" "STANDARDS.md" "memory" "sources" "credentials")
# Current framework schema version — bump this when the layout changes.
# The migration system uses this to run upgrade steps.
@@ -217,8 +218,27 @@ fi
sync_framework
# Ensure memory directory exists
# Ensure persistent directories exist
mkdir -p "$TARGET_DIR/memory"
mkdir -p "$TARGET_DIR/credentials"
# Seed defaults — copy framework contract files from defaults/ to framework
# root if not already present. These ship with sensible defaults but must
# never be overwritten once the user has customized them.
#
# This list must match the framework-contract whitelist in
# packages/mosaic/src/config/file-adapter.ts (FileConfigAdapter.syncFramework).
# SOUL.md and USER.md are intentionally NOT seeded here — they are generated
# by `mosaic init` from templates with user-supplied values.
DEFAULTS_DIR="$TARGET_DIR/defaults"
if [[ -d "$DEFAULTS_DIR" ]]; then
for default_file in AGENTS.md STANDARDS.md TOOLS.md; do
if [[ -f "$DEFAULTS_DIR/$default_file" ]] && [[ ! -f "$TARGET_DIR/$default_file" ]]; then
cp "$DEFAULTS_DIR/$default_file" "$TARGET_DIR/$default_file"
ok "Seeded $default_file from defaults"
fi
done
fi
# Ensure tool scripts are executable
find "$TARGET_DIR/tools" -name "*.sh" -exec chmod +x {} + 2>/dev/null || true

View File

@@ -102,3 +102,30 @@ claude mcp add --scope user <name> -- npx -y <package>
`--scope local` = default, local-only (not committed).
Do NOT add `mcpServers` to `~/.claude/settings.json` — that key is ignored for MCP loading.
## Required Claude Code Settings (Enforced by Launcher)
The `mosaic claude` launcher validates that `~/.claude/settings.json` contains the required Mosaic configuration. Missing or outdated settings trigger a warning at launch.
**Required hooks:**
| Event | Matcher | Script | Purpose |
| ----------- | ------------------------ | ------------------------- | ---------------------------------------------- |
| PreToolUse | `Write\|Edit\|MultiEdit` | `prevent-memory-write.sh` | Block writes to `~/.claude/projects/*/memory/` |
| PostToolUse | `Edit\|MultiEdit\|Write` | `qa-hook-stdin.sh` | QA report generation after code edits |
| PostToolUse | `Edit\|MultiEdit\|Write` | `typecheck-hook.sh` | Inline TypeScript type checking |
**Required plugins:**
| Plugin | Purpose |
| ------------------- | -------------------------------------------------------------------------------------------------------- |
| `feature-dev` | Subagent architecture: code-reviewer, code-architect, code-explorer |
| `pr-review-toolkit` | PR review: code-simplifier, comment-analyzer, test-analyzer, silent-failure-hunter, type-design-analyzer |
| `code-review` | Standalone code review capabilities |
**Required settings:**
- `enableAllMcpTools: true` — Allow all configured MCP tools without per-tool approval
- `model: "opus"` — Default to opus for orchestrator-level sessions (workers use tiered models via Task tool)
If `mosaic claude` detects missing hooks or plugins, it will print a warning with the exact settings to add. The session will still launch — enforcement is advisory, not blocking — but agents operating without these settings are running degraded.

View File

@@ -23,6 +23,16 @@
"timeout": 60
}
]
},
{
"matcher": "Edit|MultiEdit|Write",
"hooks": [
{
"type": "command",
"command": "~/.config/mosaic/tools/qa/typecheck-hook.sh",
"timeout": 30
}
]
}
]
},

View File

@@ -5,32 +5,32 @@ Project-specific tooling belongs in the project's `AGENTS.md`, not here.
## Mosaic Git Wrappers (Use First)
Mosaic wrappers at `~/.config/mosaic/rails/git/*.sh` handle platform detection and edge cases. Always use these before raw CLI commands.
Mosaic wrappers at `~/.config/mosaic/tools/git/*.sh` handle platform detection and edge cases. Always use these before raw CLI commands.
```bash
# Issues
~/.config/mosaic/rails/git/issue-create.sh
~/.config/mosaic/rails/git/issue-close.sh
~/.config/mosaic/tools/git/issue-create.sh
~/.config/mosaic/tools/git/issue-close.sh
# PRs
~/.config/mosaic/rails/git/pr-create.sh
~/.config/mosaic/rails/git/pr-merge.sh
~/.config/mosaic/tools/git/pr-create.sh
~/.config/mosaic/tools/git/pr-merge.sh
# Milestones
~/.config/mosaic/rails/git/milestone-create.sh
~/.config/mosaic/tools/git/milestone-create.sh
# CI queue guard (required before push/merge)
~/.config/mosaic/rails/git/ci-queue-wait.sh --purpose push|merge
~/.config/mosaic/tools/git/ci-queue-wait.sh --purpose push|merge
```
## Code Review (Codex)
```bash
# Code quality review
~/.config/mosaic/rails/codex/codex-code-review.sh --uncommitted
~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted
# Security review
~/.config/mosaic/rails/codex/codex-security-review.sh --uncommitted
~/.config/mosaic/tools/codex/codex-security-review.sh --uncommitted
```
## Git Providers

View File

@@ -7,6 +7,11 @@ SKILLS_REPO_DIR="${MOSAIC_SKILLS_REPO_DIR:-$MOSAIC_HOME/sources/agent-skills}"
MOSAIC_SKILLS_DIR="$MOSAIC_HOME/skills"
MOSAIC_LOCAL_SKILLS_DIR="$MOSAIC_HOME/skills-local"
# Colon-separated list of skill names to install. When set, only these skills
# are linked into runtime skill directories. Empty/unset = link all skills
# (the legacy "mosaic sync" full-catalog behavior).
MOSAIC_INSTALL_SKILLS="${MOSAIC_INSTALL_SKILLS:-}"
fetch=1
link_only=0
@@ -25,6 +30,7 @@ Env:
MOSAIC_HOME Default: ~/.config/mosaic
MOSAIC_SKILLS_REPO_URL Default: https://git.mosaicstack.dev/mosaic/agent-skills.git
MOSAIC_SKILLS_REPO_DIR Default: ~/.config/mosaic/sources/agent-skills
MOSAIC_INSTALL_SKILLS Colon-separated list of skills to link (default: all)
USAGE
}
@@ -156,6 +162,27 @@ link_targets=(
canonical_real="$(readlink -f "$MOSAIC_SKILLS_DIR")"
# Build an associative array from the colon-separated whitelist for O(1) lookup.
# When MOSAIC_INSTALL_SKILLS is empty, all skills are allowed.
declare -A _skill_whitelist=()
_whitelist_active=0
if [[ -n "$MOSAIC_INSTALL_SKILLS" ]]; then
_whitelist_active=1
IFS=':' read -ra _wl_items <<< "$MOSAIC_INSTALL_SKILLS"
for _item in "${_wl_items[@]}"; do
[[ -n "$_item" ]] && _skill_whitelist["$_item"]=1
done
fi
is_skill_selected() {
local name="$1"
if [[ $_whitelist_active -eq 0 ]]; then
return 0
fi
[[ -n "${_skill_whitelist[$name]:-}" ]] && return 0
return 1
}
link_skill_into_target() {
local skill_path="$1"
local target_dir="$2"
@@ -168,6 +195,11 @@ link_skill_into_target() {
return
fi
# Respect the install whitelist (set during first-run wizard).
if ! is_skill_selected "$name"; then
return
fi
link_path="$target_dir/$name"
if [[ -L "$link_path" ]]; then

View File

@@ -13,6 +13,14 @@ export PORTAINER_URL="https://portainer.example.com:9443"
export PORTAINER_API_KEY="your-api-key-here"
```
If your Portainer instance uses a self-signed TLS certificate (e.g. internal LAN), set:
```bash
export PORTAINER_INSECURE=1
```
This passes `-k` to all curl calls, bypassing certificate verification. Do not set this against public/production instances.
You can add these to your shell profile (`~/.bashrc`, `~/.zshrc`) or use a `.env` file.
### Creating an API Key

View File

@@ -46,8 +46,14 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Fetch endpoints
response=$(curl -s -w "\n%{http_code}" \
response=$(curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints")

View File

@@ -52,8 +52,14 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Fetch stacks
response=$(curl -s -w "\n%{http_code}" \
response=$(curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/stacks")

View File

@@ -64,12 +64,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}
@@ -165,7 +171,7 @@ fi
# Note: Docker API returns raw log stream, not JSON
if [[ "$FOLLOW" == "true" ]]; then
# Stream logs
curl -s -N \
curl -s "${CURL_OPTS[@]}" -N \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints/${ENDPOINT_ID}/docker/containers/${CONTAINER_ID}/logs?${params}" | \
# Docker log format has 8-byte header per line, strip it
@@ -175,7 +181,7 @@ if [[ "$FOLLOW" == "true" ]]; then
done
else
# Get logs (non-streaming)
curl -s \
curl -s "${CURL_OPTS[@]}" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}/api/endpoints/${ENDPOINT_ID}/docker/containers/${CONTAINER_ID}/logs?${params}" | \
# Docker log format has 8-byte header per line, attempt to strip it

View File

@@ -63,13 +63,19 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
local data="${3:-}"
local args=(-s -w "\n%{http_code}" -X "$method" -H "X-API-Key: ${PORTAINER_API_KEY}")
local args=(-s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" -H "X-API-Key: ${PORTAINER_API_KEY}")
if [[ -n "$data" ]]; then
args+=(-H "Content-Type: application/json" -d "$data")

View File

@@ -54,12 +54,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}

View File

@@ -57,12 +57,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}

View File

@@ -54,12 +54,18 @@ fi
# Remove trailing slash from URL
PORTAINER_URL="${PORTAINER_URL%/}"
# TLS options
CURL_OPTS=()
if [ "${PORTAINER_INSECURE:-0}" = "1" ]; then
CURL_OPTS+=(-k)
fi
# Function to make API requests
api_request() {
local method="$1"
local endpoint="$2"
curl -s -w "\n%{http_code}" -X "$method" \
curl -s "${CURL_OPTS[@]}" -w "\n%{http_code}" -X "$method" \
-H "X-API-Key: ${PORTAINER_API_KEY}" \
"${PORTAINER_URL}${endpoint}"
}

View File

@@ -0,0 +1,63 @@
#!/bin/bash
# Lightweight PostToolUse typecheck hook for TypeScript files.
# Runs tsc --noEmit on the nearest tsconfig after TS/TSX edits.
# Returns non-zero with diagnostic output so the agent sees type errors immediately.
# Location: ~/.config/mosaic/tools/qa/typecheck-hook.sh
set -eo pipefail
# Read JSON from stdin (Claude Code PostToolUse payload)
JSON_INPUT=$(cat)
# Extract file path
if command -v jq &>/dev/null; then
FILE_PATH=$(echo "$JSON_INPUT" | jq -r '.tool_input.file_path // .tool_response.filePath // .file_path // empty' 2>/dev/null || echo "")
else
FILE_PATH=$(echo "$JSON_INPUT" | grep -o '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*"\([^"]*\)"$/\1/' | head -1)
fi
# Only check TypeScript files
if ! [[ "$FILE_PATH" =~ \.(ts|tsx)$ ]]; then
exit 0
fi
# Must be a real file
if [ ! -f "$FILE_PATH" ]; then
exit 0
fi
# Find nearest tsconfig.json by walking up from the file
DIR=$(dirname "$FILE_PATH")
TSCONFIG=""
while [ "$DIR" != "/" ] && [ "$DIR" != "." ]; do
if [ -f "$DIR/tsconfig.json" ]; then
TSCONFIG="$DIR/tsconfig.json"
break
fi
DIR=$(dirname "$DIR")
done
if [ -z "$TSCONFIG" ]; then
# No tsconfig found — skip silently
exit 0
fi
# Run tsc --noEmit from the tsconfig directory
# Use --pretty for readable output, limit to 10 errors to keep output short
TSCONFIG_DIR=$(dirname "$TSCONFIG")
cd "$TSCONFIG_DIR"
# Run typecheck — capture output and exit code
OUTPUT=$(npx tsc --noEmit --pretty --maxNodeModuleJsDepth 0 2>&1) || STATUS=$?
if [ "${STATUS:-0}" -ne 0 ]; then
# Filter output to only show errors related to the edited file (if possible)
BASENAME=$(basename "$FILE_PATH")
RELEVANT=$(echo "$OUTPUT" | grep -A2 "$BASENAME" 2>/dev/null || echo "$OUTPUT" | head -20)
echo "TypeScript type errors detected after editing $FILE_PATH:"
echo "$RELEVANT"
exit 1
fi
exit 0

View File

@@ -1,9 +1,9 @@
{
"name": "@mosaicstack/mosaic",
"version": "0.0.26",
"version": "0.0.30",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/mosaic-stack.git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
"directory": "packages/mosaic"
},
"description": "Mosaic agent framework — installation wizard and meta package",

Some files were not shown because too many files have changed in this diff Show More