Compare commits

..

2 Commits

Author SHA1 Message Date
Jarvis
7210b7391a fix(ci): gitignore vite/vitest *.timestamp-*.mjs to stop turbo traversal race
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
The push/ci lint step intermittently failed with:

  x Package traversal error: .../packages/macp/vitest.config.ts.timestamp-
    <n>.mjs: IO error ... No such file or directory (os error 2)

vite/vitest/esbuild write a transient *.timestamp-*.mjs next to a TS
config while loading it, then unlink it. The files were untracked but not
ignored, so turbo's package traversal hashed them and raced the unlink.
Ignoring them excludes them from turbo's input set and removes the race.

Same class of fix as the pglite timeout/OOM change in this PR: transient
test tooling artifacts destabilising CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:37:33 -05:00
Jarvis
80570f7040 fix(db): stop pglite migration tests flaking CI on timeout + WASM OOM
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
packages/db's migrate.test.ts spins up a real PGlite (WASM Postgres)
instance per test and applies the full drizzle migration set. Each case
takes ~3-5s locally and longer on CI, where turbo runs ~20 packages'
suites concurrently. Two failure modes resulted, bouncing between the
push/ci and pr/ci pipelines on identical SHAs:

  FAIL src/migrate.test.ts > runPgliteMigrations > ...
    Error: Test timed out in 5000ms.
    -> memory access out of bounds  (wasm:/wasm/...)

1. The 5s vitest default timeout expires mid-migration -> phantom
   'Test timed out in 5000ms'. Raise testTimeout/hookTimeout to 120s so
   legitimately-slow migrations finish.
2. Each PGlite WASM heap is multi-hundred-MB (RSS ~705MB for this file
   alone); parallel forks multiply the peak and tip the runner into the
   WASM OOM. Pin the package to a single fork so only one instance is
   resident at a time.

Also register packages/db/vitest.config.ts in eslint's allowDefaultProject
(alongside the gateway/storage vitest configs) so the typed lint can parse
the now-non-trivial config.

Verified: full db suite green 3x locally with the new config; each run
~13s, no timeouts, no OOM. eslint clean on both files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:29:41 -05:00
4 changed files with 25 additions and 79 deletions

7
.gitignore vendored
View File

@@ -15,3 +15,10 @@ infra/step-ca/dev-password
# Scratch dirs created by the framework git-wrapper shell test harnesses
.mosaic-test-work/
# Transient config files vite/vitest/esbuild write next to a *.config.ts while
# loading it, then unlink. They are untracked but were not ignored, so turbo's
# package traversal hashed them and intermittently failed CI with "Package
# traversal error: ... .timestamp-*.mjs: No such file or directory" when the
# file vanished mid-scan. Ignoring them removes the race.
*.timestamp-*.mjs

View File

@@ -28,6 +28,7 @@ export default tseslint.config(
'apps/web/e2e/helpers/*.ts',
'apps/web/playwright.config.ts',
'apps/gateway/vitest.config.ts',
'packages/db/vitest.config.ts',
'packages/storage/vitest.config.ts',
'packages/mosaic/__tests__/*.ts',
'tools/federation-harness/*.ts',

View File

@@ -4,5 +4,22 @@ export default defineConfig({
test: {
globals: true,
environment: 'node',
// The migration suite spins up a real PGlite (WASM Postgres) instance per
// test and applies the full drizzle migration set. Each case legitimately
// takes ~5s locally and considerably longer on CI, where turbo runs many
// packages' test suites concurrently. The 5s vitest default then expires
// mid-migration and the run fails as a phantom "Test timed out in 5000ms"
// (often surfacing the underlying WASM `memory access out of bounds` when
// the heap is starved). Give migrations real headroom.
testTimeout: 120_000,
hookTimeout: 120_000,
// Each PGlite instance carries a multi-hundred-MB WASM heap. Running test
// files in parallel forks multiplies that peak and is what tips the CI
// runner into the WASM OOM. A single fork keeps only one instance resident
// at a time — slightly slower, but deterministic.
pool: 'forks',
poolOptions: {
forks: { singleFork: true },
},
},
});

View File

@@ -122,85 +122,6 @@ fi
mkdir -p "$MOSAIC_AGENT_WORKDIR"
# ── Pre-trust the workdir for the Claude runtime ─────────────────────────────
# Claude Code shows a one-time "Is this a project you trust?" folder-trust gate
# the first time it opens a directory. A fleet-launched agent has no human to
# answer it, so the pane stalls forever at the prompt while its heartbeat keeps
# reporting "healthy" (the pane process IS alive — it's just blocked).
#
# IMPORTANT: --dangerously-skip-permissions does NOT bypass this gate, and
# neither does `trustedProjectDirectories` in settings.json (verified empirically
# 2026-06-24). The ONLY thing the gate honors is the per-project record in
# ~/.claude.json: projects["<dir>"].hasTrustDialogAccepted == true (exactly what
# answering the prompt writes). So we pre-seed that record here.
#
# Idempotent, atomic, best-effort: any failure is non-fatal (the agent still
# launches — worst case it stalls on the gate, i.e. the pre-fix status quo).
# Only the claude runtime needs this; codex/pi have no such gate.
_ensure_claude_workdir_trusted() {
local workdir="$1"
# The path claude keys on is the resolved cwd it is launched in.
local rp
rp=$(cd "$workdir" 2>/dev/null && pwd -P) || rp="$workdir"
# ~/.claude.json lives next to the claude config dir; honor CLAUDE_CONFIG_DIR.
local claude_json="${MOSAIC_CLAUDE_JSON:-${CLAUDE_CONFIG_DIR:+$CLAUDE_CONFIG_DIR/.claude.json}}"
claude_json="${claude_json:-$HOME/.claude.json}"
if ! command -v python3 >/dev/null 2>&1; then
echo "WARNING: python3 not found; cannot pre-trust '$rp' for claude (agent may stall on the folder-trust gate)" >&2
return 1
fi
# Serialize concurrent agent launches that share ~/.claude.json (flock if available).
local lock="${claude_json}.mosaic-lock"
_seed() {
MOSAIC_CJ="$claude_json" MOSAIC_TRUST_DIR="$rp" python3 - <<'PY'
import json, os, sys, tempfile
cj = os.environ["MOSAIC_CJ"]
d = os.environ["MOSAIC_TRUST_DIR"]
try:
data = json.load(open(cj)) if os.path.exists(cj) else {}
if not isinstance(data, dict):
data = {}
except Exception:
# Never corrupt an unreadable/partial file — bail without writing.
sys.exit(2)
projects = data.setdefault("projects", {})
entry = projects.get(d)
if not isinstance(entry, dict):
entry = {}
projects[d] = entry
if entry.get("hasTrustDialogAccepted") is True:
sys.exit(0) # already trusted — nothing to do
entry["hasTrustDialogAccepted"] = True
tmp_dir = os.path.dirname(cj) or "."
fd, tmp = tempfile.mkstemp(dir=tmp_dir, prefix=".claude.json.mosaic.")
try:
with os.fdopen(fd, "w") as f:
json.dump(data, f, indent=2)
os.replace(tmp, cj) # atomic
except Exception:
try:
os.unlink(tmp)
except OSError:
pass
sys.exit(3)
PY
}
if command -v flock >/dev/null 2>&1; then
( flock 9; _seed ) 9>"$lock" 2>/dev/null || _seed
else
_seed
fi
}
case "$MOSAIC_AGENT_RUNTIME" in
claude)
_ensure_claude_workdir_trusted "$MOSAIC_AGENT_WORKDIR" \
|| echo "WARNING: could not pre-trust workdir for claude agent $AGENT_NAME" >&2
;;
esac
# ── Launch the tmux session (no exec — we continue to wire the heartbeat) ────
_tmux new-session -d -s "$AGENT_NAME" -c "$MOSAIC_AGENT_WORKDIR" \
bash -c "$PANE_SHELL_SNIPPET"