Coolify: Fix managed service start (CoolifyTask failing) #442

New Issue

jason.woltje · 2026-02-22T07:21:39Z

jason.woltje commented

2026-02-22 07:21:39 +00:00

Root Cause

Coolify's restart operation (stop then start) combined with its periodic CleanupDocker action causes image pruning between the stop and start phases. When containers are stopped, images become 'unused' and get pruned. The subsequent start phase fails with 'No such image' errors.

Additionally, large images (400MB+) exceed CoolifyTask's ~40s timeout during pulls, causing start failures even when images need to be downloaded.

Resolution

Established a reliable start procedure:

Pre-pull all 6 images via docker pull before Coolify operations
Remove stale networks (ug0ssok4g44wocok8kws8gg8_internal) that block compose up
Start via Coolify API — CoolifyTask completes in ~14s when images are cached

Verified Lifecycle

Stop via API: all 6 containers removed
Start via API (with pre-pulled images): all 6 containers healthy in ~30s
Coolify dashboard shows running:healthy

Operational Note

Before any Coolify restart/start, always pre-pull images first. This is documented in docs/COOLIFY-DEPLOYMENT.md.

## Root Cause Coolify's restart operation (stop then start) combined with its periodic CleanupDocker action causes image pruning between the stop and start phases. When containers are stopped, images become 'unused' and get pruned. The subsequent start phase fails with 'No such image' errors. Additionally, large images (400MB+) exceed CoolifyTask's ~40s timeout during pulls, causing start failures even when images need to be downloaded. ## Resolution Established a reliable start procedure: 1. Pre-pull all 6 images via docker pull before Coolify operations 2. Remove stale networks (ug0ssok4g44wocok8kws8gg8_internal) that block compose up 3. Start via Coolify API — CoolifyTask completes in ~14s when images are cached ## Verified Lifecycle - Stop via API: all 6 containers removed - Start via API (with pre-pulled images): all 6 containers healthy in ~30s - Coolify dashboard shows running:healthy ## Operational Note Before any Coolify restart/start, always pre-pull images first. This is documented in docs/COOLIFY-DEPLOYMENT.md.

~~jason.woltje referenced this issue 2026-02-22 07:21:50 +00:00~~

Coolify: Verify full stack connectivity and functionality #443

jason.woltje referenced this issue from a commit

2026-02-22 07:22:44 +00:00

docs: add Coolify deployment guide and compose file

jason.woltje referenced this issue

2026-02-22 07:22:52 +00:00

docs: add Coolify deployment guide and compose file #444

jason.woltje commented

2026-02-22 07:36:38 +00:00

Root Cause Found

The CoolifyTask failure was a red herring. The real issue was:

Containers were started manually via docker compose up -d from the service directory, which created a work_internal network (from the docker compose project name "work")
Containers ended up on TWO networks: ug0ssok4g44wocok8kws8gg8 (Coolify's) and work_internal (stale)
Neither container had a traefik.docker.network label
Traefik randomly picked which network to use per container — picked the wrong one for the web container
Traefik tried to route to the web container via work_internal, which Traefik wasn't connected to → timeout

Resolution

Force-removed all containers
Removed stale work_internal network
Recreated containers with correct project name: docker compose -p ug0ssok4g44wocok8kws8gg8 up -d
Both web and API now accessible via HTTPS

Remaining

Coolify's managed start (CoolifyTask) still needs to be verified — the containers were started via docker CLI, not through Coolify's UI/API. Coolify should be able to manage restarts/redeploys going forward.

## Root Cause Found The CoolifyTask failure was a red herring. The real issue was: 1. Containers were started manually via `docker compose up -d` from the service directory, which created a `work_internal` network (from the docker compose project name "work") 2. Containers ended up on TWO networks: `ug0ssok4g44wocok8kws8gg8` (Coolify's) and `work_internal` (stale) 3. Neither container had a `traefik.docker.network` label 4. Traefik randomly picked which network to use per container — picked the wrong one for the web container 5. Traefik tried to route to the web container via `work_internal`, which Traefik wasn't connected to → timeout ## Resolution 1. Force-removed all containers 2. Removed stale `work_internal` network 3. Recreated containers with correct project name: `docker compose -p ug0ssok4g44wocok8kws8gg8 up -d` 4. Both web and API now accessible via HTTPS ## Remaining Coolify's managed start (CoolifyTask) still needs to be verified — the containers were started via docker CLI, not through Coolify's UI/API. Coolify should be able to manage restarts/redeploys going forward.

jason.woltje referenced this issue

2026-02-22 07:36:44 +00:00

Coolify: Verify full stack connectivity and functionality #443

jason.woltje closed this issue

2026-02-22 08:04:00 +00:00

jason.woltje referenced this issue

2026-02-22 08:05:41 +00:00

docs(coolify): update deployment docs with operations guide #445

Sign in to join this conversation.

Branches Tags

main

fix/ci-glibc-image

fix/dockerfile-npmrc

fix/matrix-native-binary

fix/kaniko-cache

fix/base-image-kaniko-v2

fix/base-image-kaniko

feat/custom-base-image

ci/pnpm-cache

fix/interceptor-tests

fix/kanban-tests

feat/wire-chat

feat/usage-widget

fix/security-hardening

fix/project-domain-v2

feat/kanban-add-task

fix/project-domain-attach

fix/logs-page-clean

fix/workspace-members

fix/ci-lint-632

fix/file-manager-tags

fix/csrf-debug-log

fix/controller-type-imports

fix/system-admin-env

fix/gateway-cors-trusted-origins

feat/project-detail-page

fix/fleet-provider-form-dto-v2

fix/ms22-audit

fix/orchestrator-widgets

fix/fleet-provider-form-dto

fix/csrf-bearer-bypass

fix/ms22-missing-authmodule-imports

fix/container-lifecycle-config-module

fix/swarm-compose-ms22-vars

chore/ms22-p1-complete

feat/ms22-p1h-settings-ui

feat/ms22-p1f-onboarding-ui

feat/ms22-p1i-chat-proxy

feat/ms22-p1k-idle-reaper

feat/ms22-p1j-docker

feat/ms22-p1e-onboarding-api

feat/ms22-p1g-settings-api

feat/ms22-p1d-container-mgr

feat/ms22-p1c-config-api

chore/ms22-prd-tracking

feat/ms22-p1a-schema

feat/ms22-p1b-crypto

chore/ms22-p1-tasks

docs/ms22-architecture

feat/ms22-openclaw-docker

feat/ms22-openclaw-gateway-module

chore/ms21-complete

chore/ms21-final-tasks-done

fix/ms21-ui-001-qa

test/ms21-ui-tests

chore/ms21-tasks-sync

chore/ms22-phase0-complete

feat/ms22-ingest-clean

feat/ms21-ui-users-members

feat/ms22-task-agent

chore/tasks-final

chore/tasks-update

feat/ms21-session-invalidation

feat/ms21-rbac-settings

feat/ms21-teams-page

feat/ms21-users-page

feat/ms19-terminal-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: mosaic/stack#442