Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
172 lines
7.1 KiB
Markdown
172 lines
7.1 KiB
Markdown
# Mosaic Stack — Coolify Deployment
|
|
|
|
## Overview
|
|
|
|
Coolify deployment on VM `10.1.1.44` (Proxmox). Replaces the Docker Swarm deployment on w-docker0 (`10.1.1.45`).
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet → Cloudflare → Public IP (174.137.97.162)
|
|
→ Main Traefik (10.1.1.43) — TCP TLS passthrough for *.woltje.com
|
|
→ Coolify Traefik (10.1.1.44) — terminates TLS via Cloudflare DNS-01 wildcard certs
|
|
→ Service containers
|
|
```
|
|
|
|
## Services (Core Stack)
|
|
|
|
| Service | Image | Internal Port | External Domain |
|
|
| ------------ | ----------------------------------------------- | --------------- | ----------------------- |
|
|
| postgres | `git.mosaicstack.dev/mosaic/stack-postgres` | 5432 | — |
|
|
| valkey | `valkey/valkey:8-alpine` | 6379 | — |
|
|
| api | `git.mosaicstack.dev/mosaic/stack-api` | 3001 | `api.mosaic.woltje.com` |
|
|
| web | `git.mosaicstack.dev/mosaic/stack-web` | 3000 | `mosaic.woltje.com` |
|
|
| coordinator | `git.mosaicstack.dev/mosaic/stack-coordinator` | 8000 | — |
|
|
| orchestrator | `git.mosaicstack.dev/mosaic/stack-orchestrator` | 3001 (internal) | — |
|
|
|
|
Matrix (synapse, element-web) and speech services (speaches, kokoro-tts) are NOT included in the core stack. Deploy separately if needed.
|
|
|
|
## Compose File
|
|
|
|
`docker-compose.coolify.yml` in the repo root. This is the Coolify-compatible version of the deployment compose.
|
|
|
|
Key differences from the Swarm compose (`docker-compose.swarm.portainer.yml`):
|
|
|
|
- No `deploy:` blocks (Swarm-only)
|
|
- No Traefik labels (Coolify manages routing)
|
|
- Bridge network instead of overlay
|
|
- `restart: unless-stopped` instead of Swarm restart policies
|
|
- `SERVICE_FQDN_*` magic environment variables for Coolify domain assignment
|
|
- List-style environment syntax (required for Coolify magic vars)
|
|
|
|
## Coolify IDs
|
|
|
|
| Resource | UUID |
|
|
| ----------- | -------------------------- |
|
|
| Project | `rs04g008kgkkw4s0wgsk40w4` |
|
|
| Environment | `gko8csc804g8og0oosc8ccs8` |
|
|
| Service | `ug0ssok4g44wocok8kws8gg8` |
|
|
| Server | `as8kcogk08skskkcsok888g4` |
|
|
|
|
### Application UUIDs
|
|
|
|
| App | UUID |
|
|
| ------------ | --------------------------- |
|
|
| postgres | `jcw0ogskkw040os48ggkgkc8` |
|
|
| valkey | `skssgwcggc0c8owoogcso8og` |
|
|
| api | `mc40cgwwo8okwwoko84408k4k` |
|
|
| web | `c48gcwgc40ok44scscowc8cc` |
|
|
| coordinator | `s8gwog4c44w08c8sgkcg04k8` |
|
|
| orchestrator | `uo4wkg88co0ckc4c4k44sowc` |
|
|
|
|
## Coolify API
|
|
|
|
Base URL: `http://10.1.1.44:8000/api/v1`
|
|
Auth: Bearer token from `credentials.json` → `coolify.app_token`
|
|
|
|
### Patterns & Gotchas
|
|
|
|
- **Compose must be base64-encoded** when sending via `docker_compose_raw` field
|
|
- **`SERVICE_FQDN_*` magic vars**: Coolify reads these from the compose to auto-assign domains. Format: `SERVICE_FQDN_{NAME}_{PORT}` (e.g., `SERVICE_FQDN_API_3001`). Must use list-style env syntax (`- SERVICE_FQDN_API_3001`), NOT dict-style.
|
|
- **FQDN updates on sub-applications**: Coolify API doesn't support updating FQDNs on compose service sub-apps via REST. Workaround: update directly in Coolify's PostgreSQL DB (`coolify-db` container, `service_applications` table).
|
|
- **Environment variable management**: Use `PATCH /api/v1/services/{uuid}/envs` with `{ "key": "VAR_NAME", "value": "val", "is_preview": false }`
|
|
- **Service start**: `POST /api/v1/services/{uuid}/start`
|
|
- **Coolify uses PostgreSQL** (not SQLite) for its internal database — container `coolify-db`
|
|
|
|
### DB Access (for workarounds)
|
|
|
|
```bash
|
|
ssh localadmin@10.1.1.44
|
|
docker exec -it coolify-db psql -U coolify -d coolify
|
|
|
|
-- Check service app FQDNs
|
|
SELECT name, fqdn FROM service_applications WHERE service_id = (
|
|
SELECT id FROM services WHERE uuid = 'ug0ssok4g44wocok8kws8gg8'
|
|
);
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
All env vars are set via Coolify API and stored in `/data/coolify/services/{uuid}/.env` on the node.
|
|
|
|
Critical vars that were missing initially:
|
|
|
|
- `BETTER_AUTH_URL` — **Required** in production. API won't start without it. Set to `https://api.mosaic.woltje.com`.
|
|
|
|
## Operations
|
|
|
|
### Restart Procedure (IMPORTANT)
|
|
|
|
Coolify's `CleanupDocker` action periodically prunes unused images. During a restart (stop → start), images become "unused" when containers stop and may be pruned before the start phase runs. This causes "No such image" failures.
|
|
|
|
**Always pre-pull images before any Coolify restart/start:**
|
|
|
|
```bash
|
|
ssh localadmin@10.1.1.44
|
|
|
|
# 1. Pre-pull all images (run in parallel)
|
|
docker pull git.mosaicstack.dev/mosaic/stack-postgres:latest &
|
|
docker pull valkey/valkey:8-alpine &
|
|
docker pull git.mosaicstack.dev/mosaic/stack-api:latest &
|
|
docker pull git.mosaicstack.dev/mosaic/stack-web:latest &
|
|
docker pull git.mosaicstack.dev/mosaic/stack-coordinator:latest &
|
|
docker pull git.mosaicstack.dev/mosaic/stack-orchestrator:latest &
|
|
wait
|
|
|
|
# 2. Remove stale internal network (prevents "already exists" errors)
|
|
docker network rm ug0ssok4g44wocok8kws8gg8_internal 2>/dev/null || true
|
|
|
|
# 3. Start via Coolify API
|
|
TOKEN="<from credentials.json>"
|
|
curl -X POST "http://10.1.1.44:8000/api/v1/services/ug0ssok4g44wocok8kws8gg8/start" \
|
|
-H "Authorization: Bearer $TOKEN"
|
|
|
|
# 4. Verify (wait ~30s for health checks)
|
|
docker ps --filter 'name=ug0ssok4g44wocok8kws8gg8' --format 'table {{.Names}}\t{{.Status}}'
|
|
```
|
|
|
|
### OTEL Configuration
|
|
|
|
The coordinator's Python OTLP exporter initializes at import time, before checking `MOSAIC_TELEMETRY_ENABLED`. To suppress OTLP connection noise, set the standard OpenTelemetry env var in the service `.env`:
|
|
|
|
```
|
|
OTEL_SDK_DISABLED=true
|
|
```
|
|
|
|
## Current State (2026-02-22)
|
|
|
|
### Verified Working
|
|
|
|
- All 6 containers running and healthy
|
|
- Web UI at `https://mosaic.woltje.com/login` — 200 OK
|
|
- API health at `https://api.mosaic.woltje.com/health` — healthy, PostgreSQL connected
|
|
- CORS: `access-control-allow-origin: https://mosaic.woltje.com`
|
|
- Runtime env injection: `NEXT_PUBLIC_API_URL=https://api.mosaic.woltje.com`, `AUTH_MODE=real`
|
|
- Valkey: PONG
|
|
- Coordinator: healthy, no OTLP noise (`OTEL_SDK_DISABLED=true`)
|
|
- Orchestrator: healthy
|
|
- TLS: Let's Encrypt certs (web + api), valid until May 23 2026
|
|
- Auth endpoint: `/auth/get-session` responds correctly
|
|
|
|
### Resolved Issues
|
|
|
|
- **#441**: Coordinator OTLP noise — fixed via `OTEL_SDK_DISABLED=true`
|
|
- **#442**: Coolify managed lifecycle — root cause was image pruning during restart + CoolifyTask timeout on large pulls. Fix: pre-pull images before start.
|
|
- **#443**: Full stack connectivity — all checks pass
|
|
|
|
### Known Limitations
|
|
|
|
- Coolify restart is NOT safe without pre-pulling images first (CleanupDocker prunes between stop/start)
|
|
- CoolifyTask has ~40s timeout — large image pulls will fail if not cached
|
|
|
|
## SSH Access
|
|
|
|
```bash
|
|
ssh localadmin@10.1.1.44
|
|
# Note: localadmin cannot sudo without TTY/password
|
|
# Use docker to access files:
|
|
docker run --rm -v /data/coolify/services:/srv alpine cat /srv/{uuid}/docker-compose.yml
|
|
# Use docker exec for Coolify DB:
|
|
docker exec -it coolify-db psql -U coolify -d coolify
|
|
```
|