241 lines
9.7 KiB
Markdown
241 lines
9.7 KiB
Markdown
# 362 - Auth Session Chain Debug (Authentik -> BetterAuth -> API Guard)
|
|
|
|
## Context
|
|
|
|
- Date (UTC): 2026-02-19
|
|
- Environment under test: production domains
|
|
- Web: `https://app.mosaicstack.dev/login`
|
|
- API: `https://api.mosaicstack.dev`
|
|
- IdP: `https://auth.diversecanvas.com`
|
|
- Tooling: Playwright MCP + Chromium
|
|
|
|
## Problem Statement
|
|
|
|
Users can complete Authentik login and consent, but Mosaic web app returns to login and remains unauthenticated.
|
|
|
|
## Timeline and Evidence
|
|
|
|
1. Initial reproduction from web login:
|
|
- `POST /auth/sign-in/oauth2` returned `200` with Authentik authorize URL.
|
|
- Authentik login flow and consent screen loaded correctly.
|
|
|
|
2. First callback failure mode (before `jarvis` email fix):
|
|
- Callback ended at API error redirect with `error=email_is_missing`.
|
|
- Result URL: `https://api.mosaicstack.dev/?error=email_is_missing`.
|
|
|
|
3. User updated Authentik account:
|
|
- `jarvis` account email set to `jarvis@mosaic.local`.
|
|
- `email_is_missing` failure no longer occurs.
|
|
|
|
4. Current callback behavior (after email fix):
|
|
- `GET /auth/oauth2/callback/authentik?code=...&state=...` returns `302` to `https://app.mosaicstack.dev/`.
|
|
- Callback sets BetterAuth cookies:
|
|
- `__Secure-better-auth.state=...; Max-Age=0; ...`
|
|
- `__Secure-better-auth.session_token=...; Max-Age=604800; Path=/; HttpOnly; Secure; SameSite=Lax`
|
|
- Browser cookie jar confirms session cookie present for `api.mosaicstack.dev`.
|
|
|
|
5. Session validation mismatch (critical):
|
|
- BetterAuth direct session endpoint succeeds:
|
|
- `GET /auth/get-session` -> `200` with session payload.
|
|
- Guarded API session endpoint fails:
|
|
- `GET /auth/session` -> `401` with
|
|
`{"message":"Invalid or expired session", ...}`
|
|
- Reproduced repeatedly in same browser context immediately after callback.
|
|
|
|
## Config Sync Notes
|
|
|
|
User synced local files with deployed Portainer stack:
|
|
|
|
- `.env` updated with deployed values.
|
|
- `docker-compose.swarm.portainer.yml` changed:
|
|
- Removed `BETTER_AUTH_URL` env mapping from API service.
|
|
|
|
Observed auth behavior after sync:
|
|
|
|
- Improvement: removed `email_is_missing` callback error.
|
|
- Remaining failure: `/auth/session` still returns 401 despite valid BetterAuth cookie and successful `/auth/get-session`.
|
|
|
|
## Root Cause Hypothesis (Strong)
|
|
|
|
`AuthGuard` extracts BetterAuth session cookie token correctly, but `AuthService.verifySession()` validates it using `Authorization: Bearer <token>` instead of a BetterAuth cookie/header context.
|
|
|
|
Relevant code paths:
|
|
|
|
- `apps/api/src/auth/guards/auth.guard.ts`
|
|
- extracts `__Secure-better-auth.session_token` / `better-auth.session_token`
|
|
- `apps/api/src/auth/auth.service.ts`
|
|
- `verifySession()` calls `auth.api.getSession({ headers: { authorization: "Bearer ..." } })`
|
|
|
|
Why this matches evidence:
|
|
|
|
- `/auth/get-session` (native BetterAuth endpoint reading request cookie) succeeds.
|
|
- `/auth/session` (custom guard + verify path) fails for same browser session.
|
|
|
|
## Next Actions
|
|
|
|
1. Fix `verifySession()` to validate using BetterAuth-compatible cookie header candidates first, with bearer fallback for API clients.
|
|
2. Add/update unit tests in `auth.service.spec.ts` to cover cookie-first validation and bearer fallback.
|
|
3. Re-run targeted API auth tests.
|
|
4. Re-run Playwright auth chain to confirm:
|
|
- callback sets cookie
|
|
- `/auth/session` returns `200`
|
|
- web app transitions out of `/login`.
|
|
|
|
## Implementation Update (2026-02-19)
|
|
|
|
Completed items:
|
|
|
|
1. Updated backend session verification logic:
|
|
- File: `apps/api/src/auth/auth.service.ts`
|
|
- `verifySession()` now tries session resolution in this order:
|
|
- `cookie: __Secure-better-auth.session_token=<token>`
|
|
- `cookie: better-auth.session_token=<token>`
|
|
- `cookie: __Host-better-auth.session_token=<token>`
|
|
- `authorization: Bearer <token>` (fallback)
|
|
- Added helper methods:
|
|
- `buildSessionHeaderCandidates()`
|
|
- `isExpectedAuthError()`
|
|
|
|
2. Added/updated tests:
|
|
- File: `apps/api/src/auth/auth.service.spec.ts`
|
|
- Added RED->GREEN test:
|
|
- `should validate session token using secure BetterAuth cookie header`
|
|
- Updated fallback coverage test:
|
|
- `should fall back to Authorization header when cookie-based lookups miss`
|
|
|
|
3. Verification:
|
|
- Command: `pnpm --filter @mosaic/api test -- src/auth/auth.service.spec.ts`
|
|
- Result: pass (all tests green).
|
|
- Command: `pnpm --filter @mosaic/api lint`
|
|
- Result: pass.
|
|
|
|
Remaining step (requires deploy):
|
|
|
|
- Redeploy API with this patch and rerun live Playwright flow on `app.mosaicstack.dev` to confirm `/auth/session` returns `200` after callback.
|
|
|
|
## Playwright Re-Check (2026-02-19, later run)
|
|
|
|
Live flow evidence after previous deploy attempt:
|
|
|
|
1. OAuth callback succeeds:
|
|
- `GET https://api.mosaicstack.dev/auth/oauth2/callback/authentik?code=...&state=...` -> `302`
|
|
- Redirect target observed: `https://app.mosaicstack.dev/`
|
|
- Browser cookie jar includes:
|
|
- `__Secure-better-auth.session_token` on `api.mosaicstack.dev` (HttpOnly, Secure, SameSite=Lax)
|
|
|
|
2. Session bootstrap still fails immediately:
|
|
- `GET https://api.mosaicstack.dev/auth/session` -> `500`
|
|
- Response body shape:
|
|
- `{"success":false,"message":"An unexpected error occurred","errorId":"...","path":"/auth/session","statusCode":500}`
|
|
- Web app returns to login because session fetch fails.
|
|
|
|
3. Frontend version mismatch observed:
|
|
- Live `POST /auth/sign-in/oauth2` response from login flow still shows callback URL pointing to `/dashboard`.
|
|
- Current repository login page uses callback URL `/`.
|
|
- This indicates deployed web image is older than current `develop` code (or stale image tag in runtime).
|
|
|
|
## Additional Code Fix Applied Locally (pending push/deploy)
|
|
|
|
Refined cookie candidate construction in API session verification:
|
|
|
|
- File: `apps/api/src/auth/auth.service.ts`
|
|
- Removed URL-encoding of session token when constructing cookie headers.
|
|
- Cookie candidates now pass raw token value exactly as extracted from incoming cookie.
|
|
|
|
Why:
|
|
|
|
- BetterAuth cookie tokens can contain characters like `/`, `+`, and `=`.
|
|
- Re-encoding these values can mutate token bytes and cause lookup/parse failures.
|
|
|
|
Regression test added:
|
|
|
|
- File: `apps/api/src/auth/auth.service.spec.ts`
|
|
- `should preserve raw cookie token value without URL re-encoding`
|
|
|
|
## Deploy + Live Repro (after auth cookie fix deploy)
|
|
|
|
Deployment actions executed:
|
|
|
|
1. Pushed auth cookie fix commit to `develop`.
|
|
2. Waited for Woodpecker pipeline success (`mosaic/stack`, build `#514`).
|
|
3. On `10.1.1.90`:
|
|
- Ran `/home/localadmin/mosaic/pull_all.sh`.
|
|
- Updated swarm services to `:dev` images:
|
|
- `stack_api`
|
|
- `stack_web`
|
|
- `stack_coordinator`
|
|
- `stack_orchestrator`
|
|
- Verified service convergence.
|
|
|
|
Post-deploy behavior:
|
|
|
|
- Initial `/auth/session` without cookies now returns `401` (expected).
|
|
- OAuth callback succeeds and sets BetterAuth session cookie.
|
|
- `/auth/session` still fails after callback, now due to a new backend `500`.
|
|
|
|
## New Root Cause Discovered (RLS interceptor SQL)
|
|
|
|
Live `stack_api` logs showed:
|
|
|
|
- Auth guard successfully finds session cookie:
|
|
- `Session cookie found: __Secure-better-auth.session_token`
|
|
- Then failure inside RLS setup:
|
|
- PostgreSQL `42601` syntax error at or near `$1`
|
|
- Source: `RlsContextInterceptor` raw SQL while setting context vars
|
|
- Request ends as `500 Request processing failed` on `/auth/session`
|
|
|
|
Cause:
|
|
|
|
- `SET LOCAL app.current_user_id = ${userId}` became `SET LOCAL ... = $1` under parameterization.
|
|
- PostgreSQL does not accept bind placeholders in `SET` assignment syntax.
|
|
|
|
## RLS Fix Applied Locally (pending commit/deploy)
|
|
|
|
Files updated:
|
|
|
|
- `apps/api/src/common/interceptors/rls-context.interceptor.ts`
|
|
- Replaced `SET LOCAL` statements with parameter-safe, transaction-local calls:
|
|
- `SELECT set_config('app.current_user_id', ${userId}, true)`
|
|
- `SELECT set_config('app.current_workspace_id', ${workspaceId}, true)`
|
|
- Keeps transaction scoping (`true` => local to transaction).
|
|
|
|
- `apps/api/src/common/interceptors/rls-context.interceptor.spec.ts`
|
|
- Updated expected SQL template fragments to `set_config(...)`.
|
|
|
|
- `apps/api/src/common/interceptors/rls-context.integration.spec.ts`
|
|
- Updated integration expectations to `set_config(...)`.
|
|
|
|
## Deploy + Verify (RLS fix commit `8424a28`)
|
|
|
|
Pipeline and deploy sequence:
|
|
|
|
1. Commit `8424a28` pushed to `develop`.
|
|
2. Woodpecker pipeline `mosaic/stack#515` completed successfully.
|
|
3. Host deploy actions on `10.1.1.90`:
|
|
- Ran `/home/localadmin/mosaic/pull_all.sh`
|
|
- Updated swarm services (`stack_api`, `stack_web`, `stack_coordinator`, `stack_orchestrator`) to `:dev`
|
|
|
|
Observed issue after first restart:
|
|
|
|
- Playwright still reproduced `/auth/session` `500` after Authentik callback.
|
|
- `stack_api` logs still showed old RLS SQL failure (`SET LOCAL ... $1`), indicating runtime image drift/stale task.
|
|
|
|
Resolution:
|
|
|
|
1. Checked host image digest for API:
|
|
- `git.mosaicstack.dev/mosaic/stack-api:dev` -> `sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220`
|
|
2. Forced swarm API service to exact digest:
|
|
- `docker service update --with-registry-auth --image git.mosaicstack.dev/mosaic/stack-api@sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220 stack_api`
|
|
3. Verified new running task uses digest-pinned image.
|
|
|
|
Final verification (Playwright MCP):
|
|
|
|
- Login flow: `https://app.mosaicstack.dev/login` -> Authentik (`jarvis` / `jarvis`) -> redirect back to app.
|
|
- Session endpoint: `GET https://api.mosaicstack.dev/auth/session` -> `200`.
|
|
- App landed authenticated on `https://app.mosaicstack.dev/tasks` (not bounced to login).
|
|
|
|
Status:
|
|
|
|
- Auth chain is functioning end-to-end after digest-forced API rollout.
|
|
- Remaining console noise observed: missing `favicon.ico` (`404`) on app domain (non-blocking for auth).
|