docs(coolify): update deployment docs with operations guide (#445)
Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
This commit was merged in pull request #445.
This commit is contained in:
@@ -93,37 +93,71 @@ Critical vars that were missing initially:
|
|||||||
|
|
||||||
- `BETTER_AUTH_URL` — **Required** in production. API won't start without it. Set to `https://api.mosaic.woltje.com`.
|
- `BETTER_AUTH_URL` — **Required** in production. API won't start without it. Set to `https://api.mosaic.woltje.com`.
|
||||||
|
|
||||||
|
## Operations
|
||||||
|
|
||||||
|
### Restart Procedure (IMPORTANT)
|
||||||
|
|
||||||
|
Coolify's `CleanupDocker` action periodically prunes unused images. During a restart (stop → start), images become "unused" when containers stop and may be pruned before the start phase runs. This causes "No such image" failures.
|
||||||
|
|
||||||
|
**Always pre-pull images before any Coolify restart/start:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh localadmin@10.1.1.44
|
||||||
|
|
||||||
|
# 1. Pre-pull all images (run in parallel)
|
||||||
|
docker pull git.mosaicstack.dev/mosaic/stack-postgres:latest &
|
||||||
|
docker pull valkey/valkey:8-alpine &
|
||||||
|
docker pull git.mosaicstack.dev/mosaic/stack-api:latest &
|
||||||
|
docker pull git.mosaicstack.dev/mosaic/stack-web:latest &
|
||||||
|
docker pull git.mosaicstack.dev/mosaic/stack-coordinator:latest &
|
||||||
|
docker pull git.mosaicstack.dev/mosaic/stack-orchestrator:latest &
|
||||||
|
wait
|
||||||
|
|
||||||
|
# 2. Remove stale internal network (prevents "already exists" errors)
|
||||||
|
docker network rm ug0ssok4g44wocok8kws8gg8_internal 2>/dev/null || true
|
||||||
|
|
||||||
|
# 3. Start via Coolify API
|
||||||
|
TOKEN="<from credentials.json>"
|
||||||
|
curl -X POST "http://10.1.1.44:8000/api/v1/services/ug0ssok4g44wocok8kws8gg8/start" \
|
||||||
|
-H "Authorization: Bearer $TOKEN"
|
||||||
|
|
||||||
|
# 4. Verify (wait ~30s for health checks)
|
||||||
|
docker ps --filter 'name=ug0ssok4g44wocok8kws8gg8' --format 'table {{.Names}}\t{{.Status}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### OTEL Configuration
|
||||||
|
|
||||||
|
The coordinator's Python OTLP exporter initializes at import time, before checking `MOSAIC_TELEMETRY_ENABLED`. To suppress OTLP connection noise, set the standard OpenTelemetry env var in the service `.env`:
|
||||||
|
|
||||||
|
```
|
||||||
|
OTEL_SDK_DISABLED=true
|
||||||
|
```
|
||||||
|
|
||||||
## Current State (2026-02-22)
|
## Current State (2026-02-22)
|
||||||
|
|
||||||
### Working
|
### Verified Working
|
||||||
|
|
||||||
- All 6 containers running and healthy
|
- All 6 containers running and healthy
|
||||||
- API health endpoint responds at `https://api.mosaic.woltje.com/health`
|
- Web UI at `https://mosaic.woltje.com/login` — 200 OK
|
||||||
- Database migrations completed
|
- API health at `https://api.mosaic.woltje.com/health` — healthy, PostgreSQL connected
|
||||||
- Inter-service networking (api→postgres, api→valkey) confirmed via health checks
|
- CORS: `access-control-allow-origin: https://mosaic.woltje.com`
|
||||||
|
- Runtime env injection: `NEXT_PUBLIC_API_URL=https://api.mosaic.woltje.com`, `AUTH_MODE=real`
|
||||||
|
- Valkey: PONG
|
||||||
|
- Coordinator: healthy, no OTLP noise (`OTEL_SDK_DISABLED=true`)
|
||||||
|
- Orchestrator: healthy
|
||||||
|
- TLS: Let's Encrypt certs (web + api), valid until May 23 2026
|
||||||
|
- Auth endpoint: `/auth/get-session` responds correctly
|
||||||
|
|
||||||
### Issues
|
### Resolved Issues
|
||||||
|
|
||||||
1. **DNS: `mosaic.woltje.com` points to wrong server**
|
- **#441**: Coordinator OTLP noise — fixed via `OTEL_SDK_DISABLED=true`
|
||||||
- Resolves to `10.1.1.45` (old Swarm node) instead of through Cloudflare (`174.137.97.162`)
|
- **#442**: Coolify managed lifecycle — root cause was image pruning during restart + CoolifyTask timeout on large pulls. Fix: pre-pull images before start.
|
||||||
- `api.mosaic.woltje.com` resolves correctly through Cloudflare
|
- **#443**: Full stack connectivity — all checks pass
|
||||||
- Fix: Update Cloudflare DNS A record for `mosaic.woltje.com`
|
|
||||||
|
|
||||||
2. **Coordinator: OTLP exporter noise**
|
### Known Limitations
|
||||||
- Trying to export traces to `localhost:4318` which doesn't exist
|
|
||||||
- Container is healthy, errors are non-critical
|
|
||||||
- Fix: Set `MOSAIC_TELEMETRY_ENABLED=false` in Coolify env vars, or deploy an OTLP collector
|
|
||||||
|
|
||||||
3. **Coolify managed lifecycle**
|
- Coolify restart is NOT safe without pre-pulling images first (CleanupDocker prunes between stop/start)
|
||||||
- CoolifyTask was failing when starting the service via API/UI
|
- CoolifyTask has ~40s timeout — large image pulls will fail if not cached
|
||||||
- Containers were started manually via `docker compose up -d` from the service directory
|
|
||||||
- Coolify recognizes the containers (correct naming convention) but may not properly manage restarts/redeploys
|
|
||||||
- Needs investigation: check Coolify task logs, verify compose processing
|
|
||||||
|
|
||||||
4. **Full connectivity verification needed**
|
|
||||||
- web→api communication untested (blocked by DNS issue)
|
|
||||||
- Orchestrator→valkey and orchestrator→api connectivity unverified
|
|
||||||
- Coordinator webhook endpoint untested
|
|
||||||
|
|
||||||
## SSH Access
|
## SSH Access
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user