# Mosaic Stack — Coolify Deployment ## Overview Coolify deployment on VM `10.1.1.44` (Proxmox). Replaces the Docker Swarm deployment on w-docker0 (`10.1.1.45`). ## Architecture ``` Internet → Cloudflare → Public IP (174.137.97.162) → Main Traefik (10.1.1.43) — TCP TLS passthrough for *.woltje.com → Coolify Traefik (10.1.1.44) — terminates TLS via Cloudflare DNS-01 wildcard certs → Service containers ``` ## Services (Core Stack) | Service | Image | Internal Port | External Domain | | ------------ | ----------------------------------------------- | --------------- | ----------------------- | | postgres | `git.mosaicstack.dev/mosaic/stack-postgres` | 5432 | — | | valkey | `valkey/valkey:8-alpine` | 6379 | — | | api | `git.mosaicstack.dev/mosaic/stack-api` | 3001 | `api.mosaic.woltje.com` | | web | `git.mosaicstack.dev/mosaic/stack-web` | 3000 | `mosaic.woltje.com` | | coordinator | `git.mosaicstack.dev/mosaic/stack-coordinator` | 8000 | — | | orchestrator | `git.mosaicstack.dev/mosaic/stack-orchestrator` | 3001 (internal) | — | Matrix (synapse, element-web) and speech services (speaches, kokoro-tts) are NOT included in the core stack. Deploy separately if needed. ## Compose File `docker-compose.coolify.yml` in the repo root. This is the Coolify-compatible version of the deployment compose. Key differences from the Swarm compose (`docker-compose.swarm.portainer.yml`): - No `deploy:` blocks (Swarm-only) - No Traefik labels (Coolify manages routing) - Bridge network instead of overlay - `restart: unless-stopped` instead of Swarm restart policies - `SERVICE_FQDN_*` magic environment variables for Coolify domain assignment - List-style environment syntax (required for Coolify magic vars) ## Coolify IDs | Resource | UUID | | ----------- | -------------------------- | | Project | `rs04g008kgkkw4s0wgsk40w4` | | Environment | `gko8csc804g8og0oosc8ccs8` | | Service | `ug0ssok4g44wocok8kws8gg8` | | Server | `as8kcogk08skskkcsok888g4` | ### Application UUIDs | App | UUID | | ------------ | --------------------------- | | postgres | `jcw0ogskkw040os48ggkgkc8` | | valkey | `skssgwcggc0c8owoogcso8og` | | api | `mc40cgwwo8okwwoko84408k4k` | | web | `c48gcwgc40ok44scscowc8cc` | | coordinator | `s8gwog4c44w08c8sgkcg04k8` | | orchestrator | `uo4wkg88co0ckc4c4k44sowc` | ## Coolify API Base URL: `http://10.1.1.44:8000/api/v1` Auth: Bearer token from `credentials.json` → `coolify.app_token` ### Patterns & Gotchas - **Compose must be base64-encoded** when sending via `docker_compose_raw` field - **`SERVICE_FQDN_*` magic vars**: Coolify reads these from the compose to auto-assign domains. Format: `SERVICE_FQDN_{NAME}_{PORT}` (e.g., `SERVICE_FQDN_API_3001`). Must use list-style env syntax (`- SERVICE_FQDN_API_3001`), NOT dict-style. - **FQDN updates on sub-applications**: Coolify API doesn't support updating FQDNs on compose service sub-apps via REST. Workaround: update directly in Coolify's PostgreSQL DB (`coolify-db` container, `service_applications` table). - **Environment variable management**: Use `PATCH /api/v1/services/{uuid}/envs` with `{ "key": "VAR_NAME", "value": "val", "is_preview": false }` - **Service start**: `POST /api/v1/services/{uuid}/start` - **Coolify uses PostgreSQL** (not SQLite) for its internal database — container `coolify-db` ### DB Access (for workarounds) ```bash ssh localadmin@10.1.1.44 docker exec -it coolify-db psql -U coolify -d coolify -- Check service app FQDNs SELECT name, fqdn FROM service_applications WHERE service_id = ( SELECT id FROM services WHERE uuid = 'ug0ssok4g44wocok8kws8gg8' ); ``` ## Environment Variables All env vars are set via Coolify API and stored in `/data/coolify/services/{uuid}/.env` on the node. Critical vars that were missing initially: - `BETTER_AUTH_URL` — **Required** in production. API won't start without it. Set to `https://api.mosaic.woltje.com`. ## Operations ### Restart Procedure (IMPORTANT) Coolify's `CleanupDocker` action periodically prunes unused images. During a restart (stop → start), images become "unused" when containers stop and may be pruned before the start phase runs. This causes "No such image" failures. **Always pre-pull images before any Coolify restart/start:** ```bash ssh localadmin@10.1.1.44 # 1. Pre-pull all images (run in parallel) docker pull git.mosaicstack.dev/mosaic/stack-postgres:latest & docker pull valkey/valkey:8-alpine & docker pull git.mosaicstack.dev/mosaic/stack-api:latest & docker pull git.mosaicstack.dev/mosaic/stack-web:latest & docker pull git.mosaicstack.dev/mosaic/stack-coordinator:latest & docker pull git.mosaicstack.dev/mosaic/stack-orchestrator:latest & wait # 2. Remove stale internal network (prevents "already exists" errors) docker network rm ug0ssok4g44wocok8kws8gg8_internal 2>/dev/null || true # 3. Start via Coolify API TOKEN="" curl -X POST "http://10.1.1.44:8000/api/v1/services/ug0ssok4g44wocok8kws8gg8/start" \ -H "Authorization: Bearer $TOKEN" # 4. Verify (wait ~30s for health checks) docker ps --filter 'name=ug0ssok4g44wocok8kws8gg8' --format 'table {{.Names}}\t{{.Status}}' ``` ### OTEL Configuration The coordinator's Python OTLP exporter initializes at import time, before checking `MOSAIC_TELEMETRY_ENABLED`. To suppress OTLP connection noise, set the standard OpenTelemetry env var in the service `.env`: ``` OTEL_SDK_DISABLED=true ``` ## Current State (2026-02-22) ### Verified Working - All 6 containers running and healthy - Web UI at `https://mosaic.woltje.com/login` — 200 OK - API health at `https://api.mosaic.woltje.com/health` — healthy, PostgreSQL connected - CORS: `access-control-allow-origin: https://mosaic.woltje.com` - Runtime env injection: `NEXT_PUBLIC_API_URL=https://api.mosaic.woltje.com`, `AUTH_MODE=real` - Valkey: PONG - Coordinator: healthy, no OTLP noise (`OTEL_SDK_DISABLED=true`) - Orchestrator: healthy - TLS: Let's Encrypt certs (web + api), valid until May 23 2026 - Auth endpoint: `/auth/get-session` responds correctly ### Resolved Issues - **#441**: Coordinator OTLP noise — fixed via `OTEL_SDK_DISABLED=true` - **#442**: Coolify managed lifecycle — root cause was image pruning during restart + CoolifyTask timeout on large pulls. Fix: pre-pull images before start. - **#443**: Full stack connectivity — all checks pass ### Known Limitations - Coolify restart is NOT safe without pre-pulling images first (CleanupDocker prunes between stop/start) - CoolifyTask has ~40s timeout — large image pulls will fail if not cached ## SSH Access ```bash ssh localadmin@10.1.1.44 # Note: localadmin cannot sudo without TTY/password # Use docker to access files: docker run --rm -v /data/coolify/services:/srv alpine cat /srv/{uuid}/docker-compose.yml # Use docker exec for Coolify DB: docker exec -it coolify-db psql -U coolify -d coolify ```