docs(federation): operator setup + migration guides (FED-M1-11) (#480)
This commit was merged in pull request #480.
This commit is contained in:
@@ -80,6 +80,8 @@ If you already have a gateway account but no token, use `mosaic gateway config r
|
|||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
|
Mosaic supports three storage tiers: `local` (PGlite, single-host), `standalone` (PostgreSQL, single-host), and `federated` (PostgreSQL + pgvector + Valkey, multi-host). See [Federated Tier Setup](docs/federation/SETUP.md) for multi-user and production deployments, or [Migrating to Federated](docs/guides/migrate-tier.md) to upgrade from existing tiers.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mosaic config show # Print full config as JSON
|
mosaic config show # Print full config as JSON
|
||||||
mosaic config get <key> # Read a specific key
|
mosaic config get <key> # Read a specific key
|
||||||
|
|||||||
119
docs/federation/SETUP.md
Normal file
119
docs/federation/SETUP.md
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
# Federated Tier Setup Guide
|
||||||
|
|
||||||
|
## What is the federated tier?
|
||||||
|
|
||||||
|
The federated tier is designed for multi-user and multi-host deployments. It consists of PostgreSQL 17 with pgvector extension (for embeddings and RAG), Valkey for distributed task queueing and caching, and a shared configuration across multiple Mosaic gateway instances. Use this tier when running Mosaic in production or when scaling beyond a single-host deployment.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Docker and Docker Compose installed
|
||||||
|
- Ports 5433 (PostgreSQL) and 6380 (Valkey) available on your host (or adjust environment variables)
|
||||||
|
- At least 2 GB free disk space for data volumes
|
||||||
|
|
||||||
|
## Start the federated stack
|
||||||
|
|
||||||
|
Run the federated overlay:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose -f docker-compose.federated.yml --profile federated up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
This starts PostgreSQL 17 with pgvector and Valkey 8. The pgvector extension is created automatically on first boot.
|
||||||
|
|
||||||
|
Verify the services are running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose -f docker-compose.federated.yml ps
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output shows `postgres-federated` and `valkey-federated` both healthy.
|
||||||
|
|
||||||
|
## Configure mosaic for federated tier
|
||||||
|
|
||||||
|
Create or update your `mosaic.config.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tier": "federated",
|
||||||
|
"database": "postgresql://mosaic:mosaic@localhost:5433/mosaic",
|
||||||
|
"queue": "redis://localhost:6380"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If you're using environment variables instead:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export DATABASE_URL="postgresql://mosaic:mosaic@localhost:5433/mosaic"
|
||||||
|
export REDIS_URL="redis://localhost:6380"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verify health
|
||||||
|
|
||||||
|
Run the health check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mosaic gateway doctor
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output (green):
|
||||||
|
|
||||||
|
```
|
||||||
|
Tier: federated Config: mosaic.config.json
|
||||||
|
✓ postgres localhost:5433 (42ms)
|
||||||
|
✓ valkey localhost:6380 (8ms)
|
||||||
|
✓ pgvector (embedded) (15ms)
|
||||||
|
```
|
||||||
|
|
||||||
|
For JSON output (useful in CI/automation):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mosaic gateway doctor --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Port conflicts
|
||||||
|
|
||||||
|
**Symptom:** `bind: address already in use`
|
||||||
|
|
||||||
|
**Fix:** Stop the base dev stack first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose down
|
||||||
|
docker compose -f docker-compose.federated.yml --profile federated up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Or change the host port with an environment variable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PG_FEDERATED_HOST_PORT=5434 VALKEY_FEDERATED_HOST_PORT=6381 \
|
||||||
|
docker compose -f docker-compose.federated.yml --profile federated up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### pgvector extension error
|
||||||
|
|
||||||
|
**Symptom:** `ERROR: could not open extension control file`
|
||||||
|
|
||||||
|
**Fix:** pgvector is created at first boot. Check logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose -f docker-compose.federated.yml logs postgres-federated | grep -i vector
|
||||||
|
```
|
||||||
|
|
||||||
|
If missing, exec into the container and create it manually:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec <postgres-federated-id> psql -U mosaic -d mosaic -c "CREATE EXTENSION vector;"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Valkey connection refused
|
||||||
|
|
||||||
|
**Symptom:** `Error: connect ECONNREFUSED 127.0.0.1:6380`
|
||||||
|
|
||||||
|
**Fix:** Check service health:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose -f docker-compose.federated.yml logs valkey-federated
|
||||||
|
```
|
||||||
|
|
||||||
|
If Valkey is running, verify your firewall allows 6380. On macOS, Docker Desktop may require binding to `host.docker.internal` instead of `localhost`.
|
||||||
@@ -27,7 +27,7 @@ Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No
|
|||||||
| FED-M1-08 | done | Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. | #460 | sonnet | feat/federation-m1-migrate-test | FED-M1-05 | 6K | Shipped in PR #477. Caught P0 in M1-05 (camelCase→snake_case) missed by mocked unit tests; fix in same PR. |
|
| FED-M1-08 | done | Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. | #460 | sonnet | feat/federation-m1-migrate-test | FED-M1-05 | 6K | Shipped in PR #477. Caught P0 in M1-05 (camelCase→snake_case) missed by mocked unit tests; fix in same PR. |
|
||||||
| FED-M1-09 | done | Standalone regression: full agent-session E2E on existing `standalone` tier with a gateway built from this branch. Must pass without referencing any federation module. | #460 | sonnet | feat/federation-m1-regression | FED-M1-07 | 4K | Clean canary. 351 gateway tests + 85 storage unit tests + full pnpm test all green; only FEDERATED_INTEGRATION-gated tests skip. |
|
| FED-M1-09 | done | Standalone regression: full agent-session E2E on existing `standalone` tier with a gateway built from this branch. Must pass without referencing any federation module. | #460 | sonnet | feat/federation-m1-regression | FED-M1-07 | 4K | Clean canary. 351 gateway tests + 85 storage unit tests + full pnpm test all green; only FEDERATED_INTEGRATION-gated tests skip. |
|
||||||
| FED-M1-10 | done | Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. | #460 | sonnet | feat/federation-m1-security-review | FED-M1-09 | 8K | 2 review rounds caught 7 issues: credential leak in pg/valkey/pgvector errors + redact-error util; missing advisory lock; SKIP_TABLES rationale. |
|
| FED-M1-10 | done | Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. | #460 | sonnet | feat/federation-m1-security-review | FED-M1-09 | 8K | 2 review rounds caught 7 issues: credential leak in pg/valkey/pgvector errors + redact-error util; missing advisory lock; SKIP_TABLES rationale. |
|
||||||
| FED-M1-11 | not-started | Docs update: `docs/federation/` operator notes for tier setup; README blurb on federated tier; `docs/guides/` entry for migration. Do NOT touch runbook yet (deferred to FED-M7). | #460 | haiku | feat/federation-m1-docs | FED-M1-10 | 4K | Short, actionable. Link from MISSION-MANIFEST. No decisions captured here — those belong in PRD. |
|
| FED-M1-11 | done | Docs update: `docs/federation/` operator notes for tier setup; README blurb on federated tier; `docs/guides/` entry for migration. Do NOT touch runbook yet (deferred to FED-M7). | #460 | haiku | feat/federation-m1-docs | FED-M1-10 | 4K | Shipped: `docs/federation/SETUP.md` (119 lines), `docs/guides/migrate-tier.md` (147 lines), README Configuration blurb. |
|
||||||
| FED-M1-12 | not-started | PR, CI green, merge to main, close #460. | #460 | — | (aggregate) | FED-M1-11 | 3K | Queue-guard before push; wait for green; merge squashed; tea `issue-close` #460. |
|
| FED-M1-12 | not-started | PR, CI green, merge to main, close #460. | #460 | — | (aggregate) | FED-M1-11 | 3K | Queue-guard before push; wait for green; merge squashed; tea `issue-close` #460. |
|
||||||
|
|
||||||
**M1 total estimate:** ~74K tokens (over-budget vs 20K PRD estimate — explanation below)
|
**M1 total estimate:** ~74K tokens (over-budget vs 20K PRD estimate — explanation below)
|
||||||
|
|||||||
147
docs/guides/migrate-tier.md
Normal file
147
docs/guides/migrate-tier.md
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
# Migrating to the Federated Tier
|
||||||
|
|
||||||
|
Step-by-step guide to migrate from `local` (PGlite) or `standalone` (PostgreSQL without pgvector) to `federated` (PostgreSQL 17 + pgvector + Valkey).
|
||||||
|
|
||||||
|
## When to migrate
|
||||||
|
|
||||||
|
Migrate to federated tier when:
|
||||||
|
|
||||||
|
- Scaling from single-user to multi-user deployments
|
||||||
|
- Adding vector embeddings or RAG features
|
||||||
|
- Running Mosaic across multiple hosts
|
||||||
|
- Requires distributed task queueing and caching
|
||||||
|
- Moving to production with high availability
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Federated stack running and healthy (see [Federated Tier Setup](../federation/SETUP.md))
|
||||||
|
- Source database accessible and empty target database at the federated URL
|
||||||
|
- Backup of source database (recommended before any migration)
|
||||||
|
|
||||||
|
## Dry-run first
|
||||||
|
|
||||||
|
Always run a dry-run to validate the migration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mosaic storage migrate-tier --to federated \
|
||||||
|
--target-url postgresql://mosaic:mosaic@localhost:5433/mosaic \
|
||||||
|
--dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output (partial example):
|
||||||
|
|
||||||
|
```
|
||||||
|
[migrate-tier] Analyzing source tier: pglite
|
||||||
|
[migrate-tier] Analyzing target tier: federated
|
||||||
|
[migrate-tier] Precondition: target is empty ✓
|
||||||
|
users: 5 rows
|
||||||
|
teams: 2 rows
|
||||||
|
conversations: 12 rows
|
||||||
|
messages: 187 rows
|
||||||
|
... (all tables listed)
|
||||||
|
[migrate-tier] NOTE: Source tier has no pgvector support. insights.embedding will be NULL on all migrated rows.
|
||||||
|
[migrate-tier] DRY-RUN COMPLETE (no data written). 206 total rows would be migrated.
|
||||||
|
```
|
||||||
|
|
||||||
|
Review the output. If it shows an error (e.g., target not empty), address it before proceeding.
|
||||||
|
|
||||||
|
## Run the migration
|
||||||
|
|
||||||
|
When ready, run without `--dry-run`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mosaic storage migrate-tier --to federated \
|
||||||
|
--target-url postgresql://mosaic:mosaic@localhost:5433/mosaic \
|
||||||
|
--yes
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--yes` flag skips the confirmation prompt (required in non-TTY environments like CI).
|
||||||
|
|
||||||
|
The command will:
|
||||||
|
|
||||||
|
1. Acquire an advisory lock (blocks concurrent invocations)
|
||||||
|
2. Copy data from source to target in dependency order
|
||||||
|
3. Report rows migrated per table
|
||||||
|
4. Display any warnings (e.g., null vector embeddings)
|
||||||
|
|
||||||
|
## What gets migrated
|
||||||
|
|
||||||
|
All persistent, user-bound data is migrated in dependency order:
|
||||||
|
|
||||||
|
- **users, teams, team_members** — user and team ownership
|
||||||
|
- **accounts** — OAuth provider tokens (durable credentials)
|
||||||
|
- **projects, agents, missions, tasks** — all project and agent definitions
|
||||||
|
- **conversations, messages** — all chat history
|
||||||
|
- **preferences, insights, agent_logs** — preferences and observability
|
||||||
|
- **provider_credentials** — stored API keys and secrets
|
||||||
|
- **tickets, events, skills, routing_rules, appreciations** — auxiliary records
|
||||||
|
|
||||||
|
Full order is defined in code (`MIGRATION_ORDER` in `packages/storage/src/migrate-tier.ts`).
|
||||||
|
|
||||||
|
## What gets skipped and why
|
||||||
|
|
||||||
|
Three tables are intentionally not migrated:
|
||||||
|
|
||||||
|
| Table | Reason |
|
||||||
|
| ----------------- | ----------------------------------------------------------------------------------------------- |
|
||||||
|
| **sessions** | TTL'd auth sessions from the old environment; they will fail JWT verification on the new target |
|
||||||
|
| **verifications** | One-time tokens (email verify, password reset) that have either expired or been consumed |
|
||||||
|
| **admin_tokens** | Hashed tokens bound to the old environment's secret keys; must be re-issued |
|
||||||
|
|
||||||
|
**Note on accounts and provider_credentials:** These durable credentials ARE migrated because they are user-bound and required for resuming agent work on the target environment. After migration to a multi-tenant federated deployment, operators may want to audit or wipe these if users are untrusted or credentials should not be shared.
|
||||||
|
|
||||||
|
## Idempotency and concurrency
|
||||||
|
|
||||||
|
The migration is **idempotent**:
|
||||||
|
|
||||||
|
- Re-running is safe (uses `ON CONFLICT DO UPDATE` internally)
|
||||||
|
- Ideal for retries on transient failures
|
||||||
|
- Concurrent invocations are blocked by a Postgres advisory lock; the second caller will wait
|
||||||
|
|
||||||
|
If a previous run is stuck, check for advisory locks:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT * FROM pg_locks WHERE locktype='advisory';
|
||||||
|
```
|
||||||
|
|
||||||
|
If you need to force-unlock (dangerous):
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT pg_advisory_unlock(<lock_id>);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verify the migration
|
||||||
|
|
||||||
|
After migration completes, spot-check the target:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Count rows on a few critical tables
|
||||||
|
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
|
||||||
|
"SELECT 'users' as table, COUNT(*) FROM users UNION ALL
|
||||||
|
SELECT 'conversations' as table, COUNT(*) FROM conversations UNION ALL
|
||||||
|
SELECT 'messages' as table, COUNT(*) FROM messages;"
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify a known user or project exists by ID:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
|
||||||
|
"SELECT id, email FROM users WHERE email='<your-email>';"
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure vector embeddings are NULL (if source was PGlite) or populated (if source was postgres + pgvector):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
psql postgresql://mosaic:mosaic@localhost:5433/mosaic -c \
|
||||||
|
"SELECT embedding IS NOT NULL as has_vector FROM insights LIMIT 5;"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
There is no in-place rollback. If the migration fails:
|
||||||
|
|
||||||
|
1. Restore the target database from a pre-migration backup
|
||||||
|
2. Investigate the failure logs
|
||||||
|
3. Rerun the migration
|
||||||
|
|
||||||
|
Always test migrations in a staging environment first.
|
||||||
Reference in New Issue
Block a user