feat(#swarm): Add Docker Swarm deployment with AI provider configuration
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add setup-wizard.sh for interactive configuration - Add docker-compose.swarm.yml optimized for swarm deployment - Make CLAUDE_API_KEY optional based on AI_PROVIDER setting - Support multiple AI providers: Ollama, Claude API, OpenAI - Add BETTER_AUTH_SECRET to .env.example - Update deploy-swarm.sh to validate AI provider config - Add comprehensive documentation (DOCKER-SWARM.md, SWARM-QUICKREF.md) Changes: - AI_PROVIDER env var controls which AI backend to use - Ollama is default (no API key required) - Claude API and OpenAI require respective API keys - Deployment script validates based on selected provider - Removed Authentik services from swarm compose (using external) - Configured for upstream Traefik integration
This commit is contained in:
299
DOCKER-SWARM.md
Normal file
299
DOCKER-SWARM.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Mosaic Stack - Docker Swarm Deployment
|
||||
|
||||
This guide covers deploying Mosaic Stack to a Docker Swarm cluster with Traefik reverse proxy integration.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Docker Swarm initialized:**
|
||||
|
||||
```bash
|
||||
docker swarm init
|
||||
```
|
||||
|
||||
2. **Traefik running on the swarm** with a network named `traefik-public`
|
||||
|
||||
3. **DNS or /etc/hosts configured** with your domain names:
|
||||
- `mosaic.mosaicstack.dev` → Web UI
|
||||
- `api.mosaicstack.dev` → API
|
||||
- `auth.mosaicstack.dev` → Authentik SSO
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Configure Environment
|
||||
|
||||
Copy the swarm environment template:
|
||||
|
||||
```bash
|
||||
cp .env.swarm.example .env
|
||||
```
|
||||
|
||||
Edit `.env` and set the following **critical** values:
|
||||
|
||||
```bash
|
||||
# Database passwords
|
||||
POSTGRES_PASSWORD=your-secure-password-here
|
||||
AUTHENTIK_POSTGRES_PASSWORD=your-secure-password-here
|
||||
|
||||
# Secrets (generate with openssl rand -hex 32 or openssl rand -base64 50)
|
||||
AUTHENTIK_SECRET_KEY=$(openssl rand -base64 50)
|
||||
JWT_SECRET=$(openssl rand -base64 32)
|
||||
ENCRYPTION_KEY=$(openssl rand -hex 32)
|
||||
ORCHESTRATOR_API_KEY=$(openssl rand -base64 32)
|
||||
COORDINATOR_API_KEY=$(openssl rand -base64 32)
|
||||
|
||||
# Claude API Key
|
||||
CLAUDE_API_KEY=your-claude-api-key
|
||||
|
||||
# Authentik Bootstrap
|
||||
AUTHENTIK_BOOTSTRAP_PASSWORD=your-admin-password
|
||||
AUTHENTIK_BOOTSTRAP_EMAIL=admin@yourdomain.com
|
||||
```
|
||||
|
||||
### 2. Create Traefik Network (if not exists)
|
||||
|
||||
```bash
|
||||
docker network create --driver=overlay traefik-public
|
||||
```
|
||||
|
||||
### 3. Deploy the Stack
|
||||
|
||||
```bash
|
||||
./deploy-swarm.sh mosaic
|
||||
```
|
||||
|
||||
Or manually:
|
||||
|
||||
```bash
|
||||
docker stack deploy -c docker-compose.swarm.yml mosaic
|
||||
```
|
||||
|
||||
### 4. Verify Deployment
|
||||
|
||||
Check stack status:
|
||||
|
||||
```bash
|
||||
docker stack services mosaic
|
||||
docker stack ps mosaic
|
||||
```
|
||||
|
||||
Check service logs:
|
||||
|
||||
```bash
|
||||
docker service logs mosaic_api
|
||||
docker service logs mosaic_web
|
||||
docker service logs mosaic_postgres
|
||||
```
|
||||
|
||||
## Stack Services
|
||||
|
||||
The following services will be deployed:
|
||||
|
||||
| Service | Internal Port | Traefik Domain | Description |
|
||||
| ------------------ | ------------- | ------------------------ | ------------------------ |
|
||||
| `web` | 3000 | `mosaic.mosaicstack.dev` | Next.js Web UI |
|
||||
| `api` | 3001 | `api.mosaicstack.dev` | NestJS API |
|
||||
| `authentik-server` | 9000 | `auth.mosaicstack.dev` | Authentik SSO |
|
||||
| `postgres` | 5432 | - | PostgreSQL 17 + pgvector |
|
||||
| `valkey` | 6379 | - | Redis-compatible cache |
|
||||
| `openbao` | 8200 | - | Secrets vault |
|
||||
| `ollama` | 11434 | - | LLM service (optional) |
|
||||
| `orchestrator` | 3001 | - | Agent orchestrator |
|
||||
|
||||
## Traefik Integration
|
||||
|
||||
Services are automatically registered with Traefik using labels defined in `deploy.labels`:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.mosaic-web.rule=Host(`mosaic.mosaicstack.dev`)"
|
||||
- "traefik.http.routers.mosaic-web.entrypoints=web"
|
||||
- "traefik.http.services.mosaic-web.loadbalancer.server.port=3000"
|
||||
```
|
||||
|
||||
**Important:** Traefik labels MUST be under `deploy.labels` for Docker Swarm (not at service level).
|
||||
|
||||
## Accessing Services
|
||||
|
||||
Once deployed and Traefik is configured:
|
||||
|
||||
- **Web UI:** http://mosaic.mosaicstack.dev
|
||||
- **API:** http://api.mosaicstack.dev
|
||||
- **Authentik:** http://auth.mosaicstack.dev
|
||||
|
||||
## Scaling Services
|
||||
|
||||
Scale specific services:
|
||||
|
||||
```bash
|
||||
# Scale web frontend to 3 replicas
|
||||
docker service scale mosaic_web=3
|
||||
|
||||
# Scale API to 2 replicas
|
||||
docker service scale mosaic_api=2
|
||||
```
|
||||
|
||||
**Note:** Database services (postgres, valkey) should NOT be scaled (remain at 1 replica).
|
||||
|
||||
## Updating Services
|
||||
|
||||
Update a specific service:
|
||||
|
||||
```bash
|
||||
# Rebuild image
|
||||
docker compose -f docker-compose.swarm.yml build api
|
||||
|
||||
# Update the service
|
||||
docker service update --image mosaic-stack-api:latest mosaic_api
|
||||
```
|
||||
|
||||
Or redeploy the entire stack:
|
||||
|
||||
```bash
|
||||
./deploy-swarm.sh mosaic
|
||||
```
|
||||
|
||||
## Rolling Updates
|
||||
|
||||
Docker Swarm supports rolling updates. To configure:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
update_config:
|
||||
parallelism: 1
|
||||
delay: 10s
|
||||
order: start-first
|
||||
rollback_config:
|
||||
parallelism: 1
|
||||
delay: 10s
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service Won't Start
|
||||
|
||||
Check service logs:
|
||||
|
||||
```bash
|
||||
docker service logs mosaic_api --tail 100 --follow
|
||||
```
|
||||
|
||||
Check service tasks:
|
||||
|
||||
```bash
|
||||
docker service ps mosaic_api --no-trunc
|
||||
```
|
||||
|
||||
### Traefik Not Routing
|
||||
|
||||
1. Verify service is on `traefik-public` network:
|
||||
|
||||
```bash
|
||||
docker service inspect mosaic_web | grep -A 10 Networks
|
||||
```
|
||||
|
||||
2. Check Traefik dashboard for registered routes:
|
||||
- Usually at http://traefik.yourdomain.com/dashboard/
|
||||
|
||||
3. Verify domain DNS/hosts resolution:
|
||||
```bash
|
||||
ping mosaic.mosaicstack.dev
|
||||
```
|
||||
|
||||
### Database Connection Issues
|
||||
|
||||
Check postgres is healthy:
|
||||
|
||||
```bash
|
||||
docker service logs mosaic_postgres --tail 50
|
||||
```
|
||||
|
||||
Verify DATABASE_URL in API service:
|
||||
|
||||
```bash
|
||||
docker service inspect mosaic_api --format '{{json .Spec.TaskTemplate.ContainerSpec.Env}}' | jq
|
||||
```
|
||||
|
||||
### Volume Permissions
|
||||
|
||||
If volume permission errors occur, check service user:
|
||||
|
||||
```bash
|
||||
# Orchestrator runs as user 1000:1000
|
||||
docker service inspect mosaic_orchestrator | grep -A 5 User
|
||||
```
|
||||
|
||||
## Backup & Restore
|
||||
|
||||
### Backup Volumes
|
||||
|
||||
```bash
|
||||
# Backup postgres data
|
||||
docker run --rm -v mosaic_postgres_data:/data -v $(pwd):/backup alpine \
|
||||
tar czf /backup/postgres-backup-$(date +%Y%m%d).tar.gz -C /data .
|
||||
|
||||
# Backup authentik data
|
||||
docker run --rm -v mosaic_authentik_postgres_data:/data -v $(pwd):/backup alpine \
|
||||
tar czf /backup/authentik-backup-$(date +%Y%m%d).tar.gz -C /data .
|
||||
```
|
||||
|
||||
### Restore Volumes
|
||||
|
||||
```bash
|
||||
# Restore postgres data
|
||||
docker run --rm -v mosaic_postgres_data:/data -v $(pwd):/backup alpine \
|
||||
tar xzf /backup/postgres-backup-20260208.tar.gz -C /data
|
||||
|
||||
# Restore authentik data
|
||||
docker run --rm -v mosaic_authentik_postgres_data:/data -v $(pwd):/backup alpine \
|
||||
tar xzf /backup/authentik-backup-20260208.tar.gz -C /data
|
||||
```
|
||||
|
||||
## Removing the Stack
|
||||
|
||||
Remove all services and networks (volumes are preserved):
|
||||
|
||||
```bash
|
||||
docker stack rm mosaic
|
||||
```
|
||||
|
||||
Remove volumes (⚠️ **DATA WILL BE LOST**):
|
||||
|
||||
```bash
|
||||
docker volume rm mosaic_postgres_data
|
||||
docker volume rm mosaic_valkey_data
|
||||
docker volume rm mosaic_authentik_postgres_data
|
||||
# ... etc
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Change default passwords** in `.env` before deploying
|
||||
2. **Use secrets management** for production:
|
||||
```bash
|
||||
echo "my-db-password" | docker secret create postgres_password -
|
||||
```
|
||||
3. **Enable TLS** in Traefik (Let's Encrypt)
|
||||
4. **Restrict network access** using Docker network policies
|
||||
5. **Run services as non-root** (orchestrator already does this)
|
||||
|
||||
## Differences from Docker Compose
|
||||
|
||||
Key differences when running in Swarm mode:
|
||||
|
||||
| Feature | Docker Compose | Docker Swarm |
|
||||
| ---------------- | ---------------------------------- | ----------------------- |
|
||||
| Container names | `container_name: foo` | Auto-generated |
|
||||
| Restart policy | `restart: unless-stopped` | `deploy.restart_policy` |
|
||||
| Labels (Traefik) | Service level | `deploy.labels` |
|
||||
| Networks | `bridge` driver | `overlay` driver |
|
||||
| Scaling | Manual `docker compose up --scale` | `docker service scale` |
|
||||
| Updates | Stop/start containers | Rolling updates |
|
||||
|
||||
## Reference
|
||||
|
||||
- **Compose file:** `docker-compose.swarm.yml`
|
||||
- **Environment:** `.env.swarm.example`
|
||||
- **Deployment script:** `deploy-swarm.sh`
|
||||
- **Traefik example:** `../mosaic-telemetry/docker-compose.yml`
|
||||
Reference in New Issue
Block a user