fix(framework/tools): wrapper hardening — TLS validation, cred-path fallback, no-CI fast-exit #551

Open
jason.woltje wants to merge 2 commits from fix/wrapper-hardening-tls-credpath-cicwait into main
Owner

Closes #550

One wrapper-hardening PR fixing three authorized defects in packages/mosaic/framework/tools/. Design pre-decided by the lead; implemented faithfully.

F-03 — TLS now validated for public hosts; -k retained only for private-IP / opt-in

  • New _mosaic_tls_opt helper in _lib/credentials.sh decides the curl TLS flag per target URL:
    • Public FQDN hosts → validate (no -k). MITM matters on the WAN — these were previously unverified.
    • Private-network IP literals → -k (10., 127., 192.168., 172.16–31.). These are self-signed certs on the trusted LAN (Portainer 172.16.0.253:10443, Proxmox 172.16.1.108:8007 / 172.20.1.20:8007, etc.) that legitimately need -k.
    • MOSAIC_INSECURE_TLS opt-in → -k for any explicit override.
  • mosaic_http / mosaic_http_post / mosaic_http_patch now use curl -sS $_tls (kept -s, added -S so real errors surface, scoped flag unquoted so empty expands to nothing). Headers/body/-X unchanged.
  • Woodpecker scripts (_lib.sh, pipeline-status.sh, pipeline-list.sh, pipeline-trigger.sh) talk only to the two CI hosts (ci.uscllc.com, ci.mosaicstack.dev), both public and TLS-valid (spot-verified HTTP 200 with no -k). For these, -sk-sS (straight removal; no helper needed since they never hit private hosts).

F-02 — cred-path fallback chain; legacy stays as final fallback

MOSAIC_CREDENTIALS_FILE now resolves via: env first → ~/.config/mosaic/credentials.json → legacy ~/src/jarvis-brain/credentials.json. The legacy path is kept as the final fallback so the running fleet keeps working — the default is NOT flipped out from under it. (Verified on this fleet: with env unset, resolves to the legacy path since the standard config path doesn't yet exist, and load_credentials woodpecker succeeds.)

F-06 — pr-ci-wait.sh fast-exits no-CI repos

extract_state_from_status_json now emits a distinct no-status when there is genuinely no pipeline/status of any kind (empty state AND empty statuses) — distinct from unknown (ambiguous, keeps polling). The poll loop counts consecutive no-status results and, after 3 in a row, fast-exits 0 with [INFO] no CI configured for this repo/commit (distinct from a real failure). Modeled on git/ci-queue-wait.sh's existing no-status → proceed idiom. Repos that DO have pipelines are unchanged — any pipeline signal resets the streak, and pending still waits to timeout.

Not swept

F-04 was deliberately NOT addressed in this PR.

Testing

  • bash -n on all 6 touched files — clean.
  • _mosaic_tls_opt: ci.uscllc.com, git.mosaicstack.dev, m2m-api.uscllc.com → empty; 172.16.0.253:10443 (portainer), 172.16.1.108:8007, 172.20.1.20:8007, 192.168.1.5, 10.0.0.1-k; MOSAIC_INSECURE_TLS=1 on a public host → -k.
  • F-03 live (public, no -k): pipeline-status.sh -r usc/uconnect -n 3943 -a usc -f json through the patched code returned full pipeline JSON over a validated TLS connection.
  • F-03 private path: portainer base_url + all private LAN hosts confirmed to still receive -k.
  • F-02: cred-path fallback resolves correctly with env unset; load_credentials woodpecker exports successfully (no token values printed).
  • F-06: extractor maps empty payloads → no-status, real states correctly; counter unit-tested across 4 cases (pure no-CI fast-exits after 3; pending mid-stream resets streak; pipeline-bearing repo never fast-exits; real failure still exits 1).

Independent review note

Codex review raised two findings, both adjudicated against the documented environment / decided design:

  • (blocker) no-status could falsely pass delayed-CI PRs — kept per the lead's decided design; mirrors the established ci-queue-wait.sh idiom. The consecutive-3 requirement + streak-reset on any pipeline signal mitigates transient/delayed publication. Flagged here for visibility.
  • (should-fix) Woodpecker lost -k — false positive against the cert evidence: both CI hosts are public and TLS-valid (spot-verified). The generic helpers retain MOSAIC_INSECURE_TLS for any future edge case.

Fixes #550

Closes #550 One wrapper-hardening PR fixing three authorized defects in `packages/mosaic/framework/tools/`. Design pre-decided by the lead; implemented faithfully. ## F-03 — TLS now validated for public hosts; `-k` retained only for private-IP / opt-in - New `_mosaic_tls_opt` helper in `_lib/credentials.sh` decides the curl TLS flag per target URL: - **Public FQDN hosts → validate** (no `-k`). MITM matters on the WAN — these were previously unverified. - **Private-network IP literals → `-k`** (`10.`, `127.`, `192.168.`, `172.16–31.`). These are self-signed certs on the trusted LAN (Portainer `172.16.0.253:10443`, Proxmox `172.16.1.108:8007` / `172.20.1.20:8007`, etc.) that legitimately need `-k`. - **`MOSAIC_INSECURE_TLS` opt-in → `-k`** for any explicit override. - `mosaic_http` / `mosaic_http_post` / `mosaic_http_patch` now use `curl -sS $_tls` (kept `-s`, added `-S` so real errors surface, scoped flag unquoted so empty expands to nothing). Headers/body/`-X` unchanged. - **Woodpecker** scripts (`_lib.sh`, `pipeline-status.sh`, `pipeline-list.sh`, `pipeline-trigger.sh`) talk **only** to the two CI hosts (`ci.uscllc.com`, `ci.mosaicstack.dev`), both public and TLS-valid (spot-verified HTTP 200 with no `-k`). For these, `-sk` → `-sS` (straight removal; no helper needed since they never hit private hosts). ## F-02 — cred-path fallback chain; legacy stays as final fallback `MOSAIC_CREDENTIALS_FILE` now resolves via: **env first → `~/.config/mosaic/credentials.json` → legacy `~/src/jarvis-brain/credentials.json`**. The legacy path is kept as the final fallback so the running fleet keeps working — the default is NOT flipped out from under it. (Verified on this fleet: with env unset, resolves to the legacy path since the standard config path doesn't yet exist, and `load_credentials woodpecker` succeeds.) ## F-06 — `pr-ci-wait.sh` fast-exits no-CI repos `extract_state_from_status_json` now emits a distinct `no-status` when there is genuinely no pipeline/status of any kind (empty state AND empty statuses) — distinct from `unknown` (ambiguous, keeps polling). The poll loop counts consecutive `no-status` results and, after **3** in a row, fast-exits `0` with `[INFO] no CI configured for this repo/commit` (distinct from a real failure). Modeled on `git/ci-queue-wait.sh`'s existing `no-status` → proceed idiom. **Repos that DO have pipelines are unchanged** — any pipeline signal resets the streak, and `pending` still waits to timeout. ## Not swept **F-04 was deliberately NOT addressed** in this PR. ## Testing - `bash -n` on all 6 touched files — clean. - `_mosaic_tls_opt`: `ci.uscllc.com`, `git.mosaicstack.dev`, `m2m-api.uscllc.com` → empty; `172.16.0.253:10443` (portainer), `172.16.1.108:8007`, `172.20.1.20:8007`, `192.168.1.5`, `10.0.0.1` → `-k`; `MOSAIC_INSECURE_TLS=1` on a public host → `-k`. - F-03 live (public, no `-k`): `pipeline-status.sh -r usc/uconnect -n 3943 -a usc -f json` through the patched code returned full pipeline JSON over a validated TLS connection. - F-03 private path: portainer base_url + all private LAN hosts confirmed to still receive `-k`. - F-02: cred-path fallback resolves correctly with env unset; `load_credentials woodpecker` exports successfully (no token values printed). - F-06: extractor maps empty payloads → `no-status`, real states correctly; counter unit-tested across 4 cases (pure no-CI fast-exits after 3; pending mid-stream resets streak; pipeline-bearing repo never fast-exits; real failure still exits 1). ## Independent review note Codex review raised two findings, both adjudicated against the documented environment / decided design: - **(blocker) `no-status` could falsely pass delayed-CI PRs** — kept per the lead's decided design; mirrors the established `ci-queue-wait.sh` idiom. The consecutive-3 requirement + streak-reset on any pipeline signal mitigates transient/delayed publication. Flagged here for visibility. - **(should-fix) Woodpecker lost `-k`** — false positive against the cert evidence: both CI hosts are public and TLS-valid (spot-verified). The generic helpers retain `MOSAIC_INSECURE_TLS` for any future edge case. Fixes #550
jason.woltje added 1 commit 2026-06-18 19:03:14 +00:00
fix(framework/tools): wrapper hardening — TLS validation, cred-path fallback, no-CI fast-exit (#550)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/pr/ci Pipeline was canceled
b90aec2024
F-03: validate TLS by default. New _mosaic_tls_opt helper in _lib/credentials.sh
returns -k only for private-network IP literals (trusted LAN) or an explicit
MOSAIC_INSECURE_TLS opt-in; generic mosaic_http/_post/_patch helpers now use
`curl -sS $_tls` instead of `curl -sk`. Woodpecker scripts (_lib.sh,
pipeline-status/list/trigger.sh) talk only to the two public/valid CI hosts, so
`-sk` is changed to `-sS` (straight -k removal, no helper).

F-02: credentials.sh resolves MOSAIC_CREDENTIALS_FILE via a fallback chain —
env first, then ~/.config/mosaic/credentials.json, then the legacy
~/src/jarvis-brain/credentials.json retained as final fallback so the running
fleet keeps working.

F-06: pr-ci-wait.sh distinguishes a genuine no-CI condition (empty state AND no
statuses) as a new `no-status` state and fast-exits 0 after 3 consecutive empty
polls with a clear "no CI configured" message. Repos that DO have pipelines are
unaffected — any pipeline signal resets the streak and pending still waits.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kt2D8TsnDwhtzEAPijsNmR
jason.woltje added 1 commit 2026-06-18 19:18:39 +00:00
fix(pr-ci-wait): CI-history primary tier — close webhook-lag false-green (#550)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
9e8a9cfa8d
F-06 follow-up per Mos ruling. The no-CI fast-exit was a pure empty-poll streak
(NO_CI_MAX×interval ≈ 45s), so a slow-to-register pipeline (webhook/queue lag)
looked like 'no CI' and could false-green a merge gate before the pipeline existed.

Two-tier no-CI determination:
- PRIMARY: probe the repo's DEFAULT BRANCH commit status once at startup. If it
  has CI history, the repo runs CI → an empty status on the PR head means the
  pipeline has not REGISTERED yet → never fast-green; poll until it registers or
  timeout (both safe). Closes the webhook-lag false-green.
- SECONDARY: the empty-poll streak fast-exit now applies ONLY to genuinely CI-less
  repos (default branch also has no CI history). Preserves the original no-CI win.
- Probe failure → conservative REPO_HAS_CI=1 (assume CI; wait-then-timeout beats
  false-green). All early returns are explicit 'return 0' + guarded call so the
  probe can never abort under set -e.

Verified: bash -n + shellcheck clean; behavioral harness covers established-repo
(stays 1), CI-less (→0), empty-branch/probe-fail (conservative 1), and the
no-status gate (has-CI never fast-greens, CI-less fast-exits).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kt2D8TsnDwhtzEAPijsNmR
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/wrapper-hardening-tls-credpath-cicwait:fix/wrapper-hardening-tls-credpath-cicwait
git checkout fix/wrapper-hardening-tls-credpath-cicwait
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaicstack/stack#551