# 362 - Auth Session Chain Debug (Authentik -> BetterAuth -> API Guard) ## Context - Date (UTC): 2026-02-19 - Environment under test: production domains - Web: `https://app.mosaicstack.dev/login` - API: `https://api.mosaicstack.dev` - IdP: `https://auth.diversecanvas.com` - Tooling: Playwright MCP + Chromium ## Problem Statement Users can complete Authentik login and consent, but Mosaic web app returns to login and remains unauthenticated. ## Timeline and Evidence 1. Initial reproduction from web login: - `POST /auth/sign-in/oauth2` returned `200` with Authentik authorize URL. - Authentik login flow and consent screen loaded correctly. 2. First callback failure mode (before `jarvis` email fix): - Callback ended at API error redirect with `error=email_is_missing`. - Result URL: `https://api.mosaicstack.dev/?error=email_is_missing`. 3. User updated Authentik account: - `jarvis` account email set to `jarvis@mosaic.local`. - `email_is_missing` failure no longer occurs. 4. Current callback behavior (after email fix): - `GET /auth/oauth2/callback/authentik?code=...&state=...` returns `302` to `https://app.mosaicstack.dev/`. - Callback sets BetterAuth cookies: - `__Secure-better-auth.state=...; Max-Age=0; ...` - `__Secure-better-auth.session_token=...; Max-Age=604800; Path=/; HttpOnly; Secure; SameSite=Lax` - Browser cookie jar confirms session cookie present for `api.mosaicstack.dev`. 5. Session validation mismatch (critical): - BetterAuth direct session endpoint succeeds: - `GET /auth/get-session` -> `200` with session payload. - Guarded API session endpoint fails: - `GET /auth/session` -> `401` with `{"message":"Invalid or expired session", ...}` - Reproduced repeatedly in same browser context immediately after callback. ## Config Sync Notes User synced local files with deployed Portainer stack: - `.env` updated with deployed values. - `docker-compose.swarm.portainer.yml` changed: - Removed `BETTER_AUTH_URL` env mapping from API service. Observed auth behavior after sync: - Improvement: removed `email_is_missing` callback error. - Remaining failure: `/auth/session` still returns 401 despite valid BetterAuth cookie and successful `/auth/get-session`. ## Root Cause Hypothesis (Strong) `AuthGuard` extracts BetterAuth session cookie token correctly, but `AuthService.verifySession()` validates it using `Authorization: Bearer ` instead of a BetterAuth cookie/header context. Relevant code paths: - `apps/api/src/auth/guards/auth.guard.ts` - extracts `__Secure-better-auth.session_token` / `better-auth.session_token` - `apps/api/src/auth/auth.service.ts` - `verifySession()` calls `auth.api.getSession({ headers: { authorization: "Bearer ..." } })` Why this matches evidence: - `/auth/get-session` (native BetterAuth endpoint reading request cookie) succeeds. - `/auth/session` (custom guard + verify path) fails for same browser session. ## Next Actions 1. Fix `verifySession()` to validate using BetterAuth-compatible cookie header candidates first, with bearer fallback for API clients. 2. Add/update unit tests in `auth.service.spec.ts` to cover cookie-first validation and bearer fallback. 3. Re-run targeted API auth tests. 4. Re-run Playwright auth chain to confirm: - callback sets cookie - `/auth/session` returns `200` - web app transitions out of `/login`. ## Implementation Update (2026-02-19) Completed items: 1. Updated backend session verification logic: - File: `apps/api/src/auth/auth.service.ts` - `verifySession()` now tries session resolution in this order: - `cookie: __Secure-better-auth.session_token=` - `cookie: better-auth.session_token=` - `cookie: __Host-better-auth.session_token=` - `authorization: Bearer ` (fallback) - Added helper methods: - `buildSessionHeaderCandidates()` - `isExpectedAuthError()` 2. Added/updated tests: - File: `apps/api/src/auth/auth.service.spec.ts` - Added RED->GREEN test: - `should validate session token using secure BetterAuth cookie header` - Updated fallback coverage test: - `should fall back to Authorization header when cookie-based lookups miss` 3. Verification: - Command: `pnpm --filter @mosaic/api test -- src/auth/auth.service.spec.ts` - Result: pass (all tests green). - Command: `pnpm --filter @mosaic/api lint` - Result: pass. Remaining step (requires deploy): - Redeploy API with this patch and rerun live Playwright flow on `app.mosaicstack.dev` to confirm `/auth/session` returns `200` after callback. ## Playwright Re-Check (2026-02-19, later run) Live flow evidence after previous deploy attempt: 1. OAuth callback succeeds: - `GET https://api.mosaicstack.dev/auth/oauth2/callback/authentik?code=...&state=...` -> `302` - Redirect target observed: `https://app.mosaicstack.dev/` - Browser cookie jar includes: - `__Secure-better-auth.session_token` on `api.mosaicstack.dev` (HttpOnly, Secure, SameSite=Lax) 2. Session bootstrap still fails immediately: - `GET https://api.mosaicstack.dev/auth/session` -> `500` - Response body shape: - `{"success":false,"message":"An unexpected error occurred","errorId":"...","path":"/auth/session","statusCode":500}` - Web app returns to login because session fetch fails. 3. Frontend version mismatch observed: - Live `POST /auth/sign-in/oauth2` response from login flow still shows callback URL pointing to `/dashboard`. - Current repository login page uses callback URL `/`. - This indicates deployed web image is older than current `develop` code (or stale image tag in runtime). ## Additional Code Fix Applied Locally (pending push/deploy) Refined cookie candidate construction in API session verification: - File: `apps/api/src/auth/auth.service.ts` - Removed URL-encoding of session token when constructing cookie headers. - Cookie candidates now pass raw token value exactly as extracted from incoming cookie. Why: - BetterAuth cookie tokens can contain characters like `/`, `+`, and `=`. - Re-encoding these values can mutate token bytes and cause lookup/parse failures. Regression test added: - File: `apps/api/src/auth/auth.service.spec.ts` - `should preserve raw cookie token value without URL re-encoding` ## Deploy + Live Repro (after auth cookie fix deploy) Deployment actions executed: 1. Pushed auth cookie fix commit to `develop`. 2. Waited for Woodpecker pipeline success (`mosaic/stack`, build `#514`). 3. On `10.1.1.90`: - Ran `/home/localadmin/mosaic/pull_all.sh`. - Updated swarm services to `:dev` images: - `stack_api` - `stack_web` - `stack_coordinator` - `stack_orchestrator` - Verified service convergence. Post-deploy behavior: - Initial `/auth/session` without cookies now returns `401` (expected). - OAuth callback succeeds and sets BetterAuth session cookie. - `/auth/session` still fails after callback, now due to a new backend `500`. ## New Root Cause Discovered (RLS interceptor SQL) Live `stack_api` logs showed: - Auth guard successfully finds session cookie: - `Session cookie found: __Secure-better-auth.session_token` - Then failure inside RLS setup: - PostgreSQL `42601` syntax error at or near `$1` - Source: `RlsContextInterceptor` raw SQL while setting context vars - Request ends as `500 Request processing failed` on `/auth/session` Cause: - `SET LOCAL app.current_user_id = ${userId}` became `SET LOCAL ... = $1` under parameterization. - PostgreSQL does not accept bind placeholders in `SET` assignment syntax. ## RLS Fix Applied Locally (pending commit/deploy) Files updated: - `apps/api/src/common/interceptors/rls-context.interceptor.ts` - Replaced `SET LOCAL` statements with parameter-safe, transaction-local calls: - `SELECT set_config('app.current_user_id', ${userId}, true)` - `SELECT set_config('app.current_workspace_id', ${workspaceId}, true)` - Keeps transaction scoping (`true` => local to transaction). - `apps/api/src/common/interceptors/rls-context.interceptor.spec.ts` - Updated expected SQL template fragments to `set_config(...)`. - `apps/api/src/common/interceptors/rls-context.integration.spec.ts` - Updated integration expectations to `set_config(...)`. ## Deploy + Verify (RLS fix commit `8424a28`) Pipeline and deploy sequence: 1. Commit `8424a28` pushed to `develop`. 2. Woodpecker pipeline `mosaic/stack#515` completed successfully. 3. Host deploy actions on `10.1.1.90`: - Ran `/home/localadmin/mosaic/pull_all.sh` - Updated swarm services (`stack_api`, `stack_web`, `stack_coordinator`, `stack_orchestrator`) to `:dev` Observed issue after first restart: - Playwright still reproduced `/auth/session` `500` after Authentik callback. - `stack_api` logs still showed old RLS SQL failure (`SET LOCAL ... $1`), indicating runtime image drift/stale task. Resolution: 1. Checked host image digest for API: - `git.mosaicstack.dev/mosaic/stack-api:dev` -> `sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220` 2. Forced swarm API service to exact digest: - `docker service update --with-registry-auth --image git.mosaicstack.dev/mosaic/stack-api@sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220 stack_api` 3. Verified new running task uses digest-pinned image. Final verification (Playwright MCP): - Login flow: `https://app.mosaicstack.dev/login` -> Authentik (`jarvis` / `jarvis`) -> redirect back to app. - Session endpoint: `GET https://api.mosaicstack.dev/auth/session` -> `200`. - App landed authenticated on `https://app.mosaicstack.dev/tasks` (not bounced to login). Status: - Auth chain is functioning end-to-end after digest-forced API rollout. - Remaining console noise observed: missing `favicon.ico` (`404`) on app domain (non-blocking for auth).