Files
stack/docs/DOCKER-SWARM.md
Jason Woltje 6fd8e85266
All checks were successful
ci/woodpecker/push/infra Pipeline was successful
ci/woodpecker/push/orchestrator Pipeline was successful
fix(orchestrator): make provider-aware Claude key startup requirements
2026-02-17 17:15:42 -06:00

305 lines
7.7 KiB
Markdown

# Mosaic Stack - Docker Swarm Deployment
**⚠️ This guide has been superseded. Please see [SWARM-DEPLOYMENT.md](SWARM-DEPLOYMENT.md) for the complete, up-to-date deployment guide.**
This guide covers deploying Mosaic Stack to a Docker Swarm cluster with Traefik reverse proxy integration.
## Prerequisites
1. **Docker Swarm initialized:**
```bash
docker swarm init
```
2. **Traefik running on the swarm** with a network named `traefik-public`
3. **DNS or /etc/hosts configured** with your domain names:
- `mosaic.mosaicstack.dev` → Web UI
- `api.mosaicstack.dev` → API
- `auth.mosaicstack.dev` → Authentik SSO
## Quick Start
### 1. Configure Environment
Copy the swarm environment template:
```bash
cp .env.swarm.example .env
```
Edit `.env` and set the following **critical** values:
```bash
# Database passwords
POSTGRES_PASSWORD=your-secure-password-here
AUTHENTIK_POSTGRES_PASSWORD=your-secure-password-here
# Secrets (generate with openssl rand -hex 32 or openssl rand -base64 50)
AUTHENTIK_SECRET_KEY=$(openssl rand -base64 50)
JWT_SECRET=$(openssl rand -base64 32)
ENCRYPTION_KEY=$(openssl rand -hex 32)
ORCHESTRATOR_API_KEY=$(openssl rand -base64 32)
COORDINATOR_API_KEY=$(openssl rand -base64 32)
# AI Provider for Orchestrator
AI_PROVIDER=ollama
# Claude API Key (only required when AI_PROVIDER=claude)
CLAUDE_API_KEY=your-claude-api-key
# Authentik Bootstrap
AUTHENTIK_BOOTSTRAP_PASSWORD=your-admin-password
AUTHENTIK_BOOTSTRAP_EMAIL=admin@yourdomain.com
```
### 2. Create Traefik Network (if not exists)
```bash
docker network create --driver=overlay traefik-public
```
### 3. Deploy the Stack
```bash
./scripts/deploy-swarm.sh mosaic
```
Or manually:
```bash
docker stack deploy -c docker-compose.swarm.yml mosaic
```
### 4. Verify Deployment
Check stack status:
```bash
docker stack services mosaic
docker stack ps mosaic
```
Check service logs:
```bash
docker service logs mosaic_api
docker service logs mosaic_web
docker service logs mosaic_postgres
```
## Stack Services
The following services will be deployed:
| Service | Internal Port | Traefik Domain | Description |
| ------------------ | ------------- | ------------------------ | ------------------------ |
| `web` | 3000 | `mosaic.mosaicstack.dev` | Next.js Web UI |
| `api` | 3001 | `api.mosaicstack.dev` | NestJS API |
| `authentik-server` | 9000 | `auth.mosaicstack.dev` | Authentik SSO |
| `postgres` | 5432 | - | PostgreSQL 17 + pgvector |
| `valkey` | 6379 | - | Redis-compatible cache |
| `openbao` | 8200 | - | Secrets vault |
| `ollama` | 11434 | - | LLM service (optional) |
| `orchestrator` | 3001 | - | Agent orchestrator |
## Traefik Integration
Services are automatically registered with Traefik using labels defined in `deploy.labels`:
```yaml
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.mosaic-web.rule=Host(`mosaic.mosaicstack.dev`)"
- "traefik.http.routers.mosaic-web.entrypoints=web"
- "traefik.http.services.mosaic-web.loadbalancer.server.port=3000"
```
**Important:** Traefik labels MUST be under `deploy.labels` for Docker Swarm (not at service level).
## Accessing Services
Once deployed and Traefik is configured:
- **Web UI:** http://mosaic.mosaicstack.dev
- **API:** http://api.mosaicstack.dev
- **Authentik:** http://auth.mosaicstack.dev
## Scaling Services
Scale specific services:
```bash
# Scale web frontend to 3 replicas
docker service scale mosaic_web=3
# Scale API to 2 replicas
docker service scale mosaic_api=2
```
**Note:** Database services (postgres, valkey) should NOT be scaled (remain at 1 replica).
## Updating Services
Update a specific service:
```bash
# Rebuild image
docker compose -f docker-compose.swarm.yml build api
# Update the service
docker service update --image mosaic-stack-api:latest mosaic_api
```
Or redeploy the entire stack:
```bash
./scripts/deploy-swarm.sh mosaic
```
## Rolling Updates
Docker Swarm supports rolling updates. To configure:
```yaml
deploy:
update_config:
parallelism: 1
delay: 10s
order: start-first
rollback_config:
parallelism: 1
delay: 10s
```
## Troubleshooting
### Service Won't Start
Check service logs:
```bash
docker service logs mosaic_api --tail 100 --follow
```
Check service tasks:
```bash
docker service ps mosaic_api --no-trunc
```
### Traefik Not Routing
1. Verify service is on `traefik-public` network:
```bash
docker service inspect mosaic_web | grep -A 10 Networks
```
2. Check Traefik dashboard for registered routes:
- Usually at http://traefik.yourdomain.com/dashboard/
3. Verify domain DNS/hosts resolution:
```bash
ping mosaic.mosaicstack.dev
```
### Database Connection Issues
Check postgres is healthy:
```bash
docker service logs mosaic_postgres --tail 50
```
Verify DATABASE_URL in API service:
```bash
docker service inspect mosaic_api --format '{{json .Spec.TaskTemplate.ContainerSpec.Env}}' | jq
```
### Volume Permissions
If volume permission errors occur, check service user:
```bash
# Orchestrator runs as user 1000:1000
docker service inspect mosaic_orchestrator | grep -A 5 User
```
## Backup & Restore
### Backup Volumes
```bash
# Backup postgres data
docker run --rm -v mosaic_postgres_data:/data -v $(pwd):/backup alpine \
tar czf /backup/postgres-backup-$(date +%Y%m%d).tar.gz -C /data .
# Backup authentik data
docker run --rm -v mosaic_authentik_postgres_data:/data -v $(pwd):/backup alpine \
tar czf /backup/authentik-backup-$(date +%Y%m%d).tar.gz -C /data .
```
### Restore Volumes
```bash
# Restore postgres data
docker run --rm -v mosaic_postgres_data:/data -v $(pwd):/backup alpine \
tar xzf /backup/postgres-backup-20260208.tar.gz -C /data
# Restore authentik data
docker run --rm -v mosaic_authentik_postgres_data:/data -v $(pwd):/backup alpine \
tar xzf /backup/authentik-backup-20260208.tar.gz -C /data
```
## Removing the Stack
Remove all services and networks (volumes are preserved):
```bash
docker stack rm mosaic
```
Remove volumes (⚠️ **DATA WILL BE LOST**):
```bash
docker volume rm mosaic_postgres_data
docker volume rm mosaic_valkey_data
docker volume rm mosaic_authentik_postgres_data
# ... etc
```
## Security Considerations
1. **Change default passwords** in `.env` before deploying
2. **Use secrets management** for production:
```bash
echo "my-db-password" | docker secret create postgres_password -
```
3. **Enable TLS** in Traefik (Let's Encrypt)
4. **Restrict network access** using Docker network policies
5. **Run services as non-root** (orchestrator already does this)
## Differences from Docker Compose
Key differences when running in Swarm mode:
| Feature | Docker Compose | Docker Swarm |
| ---------------- | ---------------------------------- | ----------------------- |
| Container names | `container_name: foo` | Auto-generated |
| Restart policy | `restart: unless-stopped` | `deploy.restart_policy` |
| Labels (Traefik) | Service level | `deploy.labels` |
| Networks | `bridge` driver | `overlay` driver |
| Scaling | Manual `docker compose up --scale` | `docker service scale` |
| Updates | Stop/start containers | Rolling updates |
## Reference
- **Compose file:** `docker-compose.swarm.yml`
- **Environment:** `.env.swarm.example`
- **Deployment script:** `scripts/deploy-swarm.sh`
- **Traefik example:** `../mosaic-telemetry/docker-compose.yml`