Files
stack/docs/scratchpads/362-auth-session-chain-debug.md
Jason Woltje 8424a28faa
All checks were successful
ci/woodpecker/push/api Pipeline was successful
fix(auth): use set_config for transaction-scoped RLS context
2026-02-18 23:23:15 -06:00

8.1 KiB

362 - Auth Session Chain Debug (Authentik -> BetterAuth -> API Guard)

Context

  • Date (UTC): 2026-02-19
  • Environment under test: production domains
    • Web: https://app.mosaicstack.dev/login
    • API: https://api.mosaicstack.dev
    • IdP: https://auth.diversecanvas.com
  • Tooling: Playwright MCP + Chromium

Problem Statement

Users can complete Authentik login and consent, but Mosaic web app returns to login and remains unauthenticated.

Timeline and Evidence

  1. Initial reproduction from web login:

    • POST /auth/sign-in/oauth2 returned 200 with Authentik authorize URL.
    • Authentik login flow and consent screen loaded correctly.
  2. First callback failure mode (before jarvis email fix):

    • Callback ended at API error redirect with error=email_is_missing.
    • Result URL: https://api.mosaicstack.dev/?error=email_is_missing.
  3. User updated Authentik account:

    • jarvis account email set to jarvis@mosaic.local.
    • email_is_missing failure no longer occurs.
  4. Current callback behavior (after email fix):

    • GET /auth/oauth2/callback/authentik?code=...&state=... returns 302 to https://app.mosaicstack.dev/.
    • Callback sets BetterAuth cookies:
      • __Secure-better-auth.state=...; Max-Age=0; ...
      • __Secure-better-auth.session_token=...; Max-Age=604800; Path=/; HttpOnly; Secure; SameSite=Lax
    • Browser cookie jar confirms session cookie present for api.mosaicstack.dev.
  5. Session validation mismatch (critical):

    • BetterAuth direct session endpoint succeeds:
      • GET /auth/get-session -> 200 with session payload.
    • Guarded API session endpoint fails:
      • GET /auth/session -> 401 with {"message":"Invalid or expired session", ...}
    • Reproduced repeatedly in same browser context immediately after callback.

Config Sync Notes

User synced local files with deployed Portainer stack:

  • .env updated with deployed values.
  • docker-compose.swarm.portainer.yml changed:
    • Removed BETTER_AUTH_URL env mapping from API service.

Observed auth behavior after sync:

  • Improvement: removed email_is_missing callback error.
  • Remaining failure: /auth/session still returns 401 despite valid BetterAuth cookie and successful /auth/get-session.

Root Cause Hypothesis (Strong)

AuthGuard extracts BetterAuth session cookie token correctly, but AuthService.verifySession() validates it using Authorization: Bearer <token> instead of a BetterAuth cookie/header context.

Relevant code paths:

  • apps/api/src/auth/guards/auth.guard.ts
    • extracts __Secure-better-auth.session_token / better-auth.session_token
  • apps/api/src/auth/auth.service.ts
    • verifySession() calls auth.api.getSession({ headers: { authorization: "Bearer ..." } })

Why this matches evidence:

  • /auth/get-session (native BetterAuth endpoint reading request cookie) succeeds.
  • /auth/session (custom guard + verify path) fails for same browser session.

Next Actions

  1. Fix verifySession() to validate using BetterAuth-compatible cookie header candidates first, with bearer fallback for API clients.
  2. Add/update unit tests in auth.service.spec.ts to cover cookie-first validation and bearer fallback.
  3. Re-run targeted API auth tests.
  4. Re-run Playwright auth chain to confirm:
    • callback sets cookie
    • /auth/session returns 200
    • web app transitions out of /login.

Implementation Update (2026-02-19)

Completed items:

  1. Updated backend session verification logic:

    • File: apps/api/src/auth/auth.service.ts
    • verifySession() now tries session resolution in this order:
      • cookie: __Secure-better-auth.session_token=<token>
      • cookie: better-auth.session_token=<token>
      • cookie: __Host-better-auth.session_token=<token>
      • authorization: Bearer <token> (fallback)
    • Added helper methods:
      • buildSessionHeaderCandidates()
      • isExpectedAuthError()
  2. Added/updated tests:

    • File: apps/api/src/auth/auth.service.spec.ts
    • Added RED->GREEN test:
      • should validate session token using secure BetterAuth cookie header
    • Updated fallback coverage test:
      • should fall back to Authorization header when cookie-based lookups miss
  3. Verification:

    • Command: pnpm --filter @mosaic/api test -- src/auth/auth.service.spec.ts
    • Result: pass (all tests green).
    • Command: pnpm --filter @mosaic/api lint
    • Result: pass.

Remaining step (requires deploy):

  • Redeploy API with this patch and rerun live Playwright flow on app.mosaicstack.dev to confirm /auth/session returns 200 after callback.

Playwright Re-Check (2026-02-19, later run)

Live flow evidence after previous deploy attempt:

  1. OAuth callback succeeds:

    • GET https://api.mosaicstack.dev/auth/oauth2/callback/authentik?code=...&state=... -> 302
    • Redirect target observed: https://app.mosaicstack.dev/
    • Browser cookie jar includes:
      • __Secure-better-auth.session_token on api.mosaicstack.dev (HttpOnly, Secure, SameSite=Lax)
  2. Session bootstrap still fails immediately:

    • GET https://api.mosaicstack.dev/auth/session -> 500
    • Response body shape:
      • {"success":false,"message":"An unexpected error occurred","errorId":"...","path":"/auth/session","statusCode":500}
    • Web app returns to login because session fetch fails.
  3. Frontend version mismatch observed:

    • Live POST /auth/sign-in/oauth2 response from login flow still shows callback URL pointing to /dashboard.
    • Current repository login page uses callback URL /.
    • This indicates deployed web image is older than current develop code (or stale image tag in runtime).

Additional Code Fix Applied Locally (pending push/deploy)

Refined cookie candidate construction in API session verification:

  • File: apps/api/src/auth/auth.service.ts
    • Removed URL-encoding of session token when constructing cookie headers.
    • Cookie candidates now pass raw token value exactly as extracted from incoming cookie.

Why:

  • BetterAuth cookie tokens can contain characters like /, +, and =.
  • Re-encoding these values can mutate token bytes and cause lookup/parse failures.

Regression test added:

  • File: apps/api/src/auth/auth.service.spec.ts
    • should preserve raw cookie token value without URL re-encoding

Deployment actions executed:

  1. Pushed auth cookie fix commit to develop.
  2. Waited for Woodpecker pipeline success (mosaic/stack, build #514).
  3. On 10.1.1.90:
    • Ran /home/localadmin/mosaic/pull_all.sh.
    • Updated swarm services to :dev images:
      • stack_api
      • stack_web
      • stack_coordinator
      • stack_orchestrator
    • Verified service convergence.

Post-deploy behavior:

  • Initial /auth/session without cookies now returns 401 (expected).
  • OAuth callback succeeds and sets BetterAuth session cookie.
  • /auth/session still fails after callback, now due to a new backend 500.

New Root Cause Discovered (RLS interceptor SQL)

Live stack_api logs showed:

  • Auth guard successfully finds session cookie:
    • Session cookie found: __Secure-better-auth.session_token
  • Then failure inside RLS setup:
    • PostgreSQL 42601 syntax error at or near $1
    • Source: RlsContextInterceptor raw SQL while setting context vars
    • Request ends as 500 Request processing failed on /auth/session

Cause:

  • SET LOCAL app.current_user_id = ${userId} became SET LOCAL ... = $1 under parameterization.
  • PostgreSQL does not accept bind placeholders in SET assignment syntax.

RLS Fix Applied Locally (pending commit/deploy)

Files updated:

  • apps/api/src/common/interceptors/rls-context.interceptor.ts

    • Replaced SET LOCAL statements with parameter-safe, transaction-local calls:
      • SELECT set_config('app.current_user_id', ${userId}, true)
      • SELECT set_config('app.current_workspace_id', ${workspaceId}, true)
    • Keeps transaction scoping (true => local to transaction).
  • apps/api/src/common/interceptors/rls-context.interceptor.spec.ts

    • Updated expected SQL template fragments to set_config(...).
  • apps/api/src/common/interceptors/rls-context.integration.spec.ts

    • Updated integration expectations to set_config(...).