9.7 KiB
362 - Auth Session Chain Debug (Authentik -> BetterAuth -> API Guard)
Context
- Date (UTC): 2026-02-19
- Environment under test: production domains
- Web:
https://app.mosaicstack.dev/login - API:
https://api.mosaicstack.dev - IdP:
https://auth.diversecanvas.com
- Web:
- Tooling: Playwright MCP + Chromium
Problem Statement
Users can complete Authentik login and consent, but Mosaic web app returns to login and remains unauthenticated.
Timeline and Evidence
-
Initial reproduction from web login:
POST /auth/sign-in/oauth2returned200with Authentik authorize URL.- Authentik login flow and consent screen loaded correctly.
-
First callback failure mode (before
jarvisemail fix):- Callback ended at API error redirect with
error=email_is_missing. - Result URL:
https://api.mosaicstack.dev/?error=email_is_missing.
- Callback ended at API error redirect with
-
User updated Authentik account:
jarvisaccount email set tojarvis@mosaic.local.email_is_missingfailure no longer occurs.
-
Current callback behavior (after email fix):
GET /auth/oauth2/callback/authentik?code=...&state=...returns302tohttps://app.mosaicstack.dev/.- Callback sets BetterAuth cookies:
__Secure-better-auth.state=...; Max-Age=0; ...__Secure-better-auth.session_token=...; Max-Age=604800; Path=/; HttpOnly; Secure; SameSite=Lax
- Browser cookie jar confirms session cookie present for
api.mosaicstack.dev.
-
Session validation mismatch (critical):
- BetterAuth direct session endpoint succeeds:
GET /auth/get-session->200with session payload.
- Guarded API session endpoint fails:
GET /auth/session->401with{"message":"Invalid or expired session", ...}
- Reproduced repeatedly in same browser context immediately after callback.
- BetterAuth direct session endpoint succeeds:
Config Sync Notes
User synced local files with deployed Portainer stack:
.envupdated with deployed values.docker-compose.swarm.portainer.ymlchanged:- Removed
BETTER_AUTH_URLenv mapping from API service.
- Removed
Observed auth behavior after sync:
- Improvement: removed
email_is_missingcallback error. - Remaining failure:
/auth/sessionstill returns 401 despite valid BetterAuth cookie and successful/auth/get-session.
Root Cause Hypothesis (Strong)
AuthGuard extracts BetterAuth session cookie token correctly, but AuthService.verifySession() validates it using Authorization: Bearer <token> instead of a BetterAuth cookie/header context.
Relevant code paths:
apps/api/src/auth/guards/auth.guard.ts- extracts
__Secure-better-auth.session_token/better-auth.session_token
- extracts
apps/api/src/auth/auth.service.tsverifySession()callsauth.api.getSession({ headers: { authorization: "Bearer ..." } })
Why this matches evidence:
/auth/get-session(native BetterAuth endpoint reading request cookie) succeeds./auth/session(custom guard + verify path) fails for same browser session.
Next Actions
- Fix
verifySession()to validate using BetterAuth-compatible cookie header candidates first, with bearer fallback for API clients. - Add/update unit tests in
auth.service.spec.tsto cover cookie-first validation and bearer fallback. - Re-run targeted API auth tests.
- Re-run Playwright auth chain to confirm:
- callback sets cookie
/auth/sessionreturns200- web app transitions out of
/login.
Implementation Update (2026-02-19)
Completed items:
-
Updated backend session verification logic:
- File:
apps/api/src/auth/auth.service.ts verifySession()now tries session resolution in this order:cookie: __Secure-better-auth.session_token=<token>cookie: better-auth.session_token=<token>cookie: __Host-better-auth.session_token=<token>authorization: Bearer <token>(fallback)
- Added helper methods:
buildSessionHeaderCandidates()isExpectedAuthError()
- File:
-
Added/updated tests:
- File:
apps/api/src/auth/auth.service.spec.ts - Added RED->GREEN test:
should validate session token using secure BetterAuth cookie header
- Updated fallback coverage test:
should fall back to Authorization header when cookie-based lookups miss
- File:
-
Verification:
- Command:
pnpm --filter @mosaic/api test -- src/auth/auth.service.spec.ts - Result: pass (all tests green).
- Command:
pnpm --filter @mosaic/api lint - Result: pass.
- Command:
Remaining step (requires deploy):
- Redeploy API with this patch and rerun live Playwright flow on
app.mosaicstack.devto confirm/auth/sessionreturns200after callback.
Playwright Re-Check (2026-02-19, later run)
Live flow evidence after previous deploy attempt:
-
OAuth callback succeeds:
GET https://api.mosaicstack.dev/auth/oauth2/callback/authentik?code=...&state=...->302- Redirect target observed:
https://app.mosaicstack.dev/ - Browser cookie jar includes:
__Secure-better-auth.session_tokenonapi.mosaicstack.dev(HttpOnly, Secure, SameSite=Lax)
-
Session bootstrap still fails immediately:
GET https://api.mosaicstack.dev/auth/session->500- Response body shape:
{"success":false,"message":"An unexpected error occurred","errorId":"...","path":"/auth/session","statusCode":500}
- Web app returns to login because session fetch fails.
-
Frontend version mismatch observed:
- Live
POST /auth/sign-in/oauth2response from login flow still shows callback URL pointing to/dashboard. - Current repository login page uses callback URL
/. - This indicates deployed web image is older than current
developcode (or stale image tag in runtime).
- Live
Additional Code Fix Applied Locally (pending push/deploy)
Refined cookie candidate construction in API session verification:
- File:
apps/api/src/auth/auth.service.ts- Removed URL-encoding of session token when constructing cookie headers.
- Cookie candidates now pass raw token value exactly as extracted from incoming cookie.
Why:
- BetterAuth cookie tokens can contain characters like
/,+, and=. - Re-encoding these values can mutate token bytes and cause lookup/parse failures.
Regression test added:
- File:
apps/api/src/auth/auth.service.spec.tsshould preserve raw cookie token value without URL re-encoding
Deploy + Live Repro (after auth cookie fix deploy)
Deployment actions executed:
- Pushed auth cookie fix commit to
develop. - Waited for Woodpecker pipeline success (
mosaic/stack, build#514). - On
10.1.1.90:- Ran
/home/localadmin/mosaic/pull_all.sh. - Updated swarm services to
:devimages:stack_apistack_webstack_coordinatorstack_orchestrator
- Verified service convergence.
- Ran
Post-deploy behavior:
- Initial
/auth/sessionwithout cookies now returns401(expected). - OAuth callback succeeds and sets BetterAuth session cookie.
/auth/sessionstill fails after callback, now due to a new backend500.
New Root Cause Discovered (RLS interceptor SQL)
Live stack_api logs showed:
- Auth guard successfully finds session cookie:
Session cookie found: __Secure-better-auth.session_token
- Then failure inside RLS setup:
- PostgreSQL
42601syntax error at or near$1 - Source:
RlsContextInterceptorraw SQL while setting context vars - Request ends as
500 Request processing failedon/auth/session
- PostgreSQL
Cause:
SET LOCAL app.current_user_id = ${userId}becameSET LOCAL ... = $1under parameterization.- PostgreSQL does not accept bind placeholders in
SETassignment syntax.
RLS Fix Applied Locally (pending commit/deploy)
Files updated:
-
apps/api/src/common/interceptors/rls-context.interceptor.ts- Replaced
SET LOCALstatements with parameter-safe, transaction-local calls:SELECT set_config('app.current_user_id', ${userId}, true)SELECT set_config('app.current_workspace_id', ${workspaceId}, true)
- Keeps transaction scoping (
true=> local to transaction).
- Replaced
-
apps/api/src/common/interceptors/rls-context.interceptor.spec.ts- Updated expected SQL template fragments to
set_config(...).
- Updated expected SQL template fragments to
-
apps/api/src/common/interceptors/rls-context.integration.spec.ts- Updated integration expectations to
set_config(...).
- Updated integration expectations to
Deploy + Verify (RLS fix commit 8424a28)
Pipeline and deploy sequence:
- Commit
8424a28pushed todevelop. - Woodpecker pipeline
mosaic/stack#515completed successfully. - Host deploy actions on
10.1.1.90:- Ran
/home/localadmin/mosaic/pull_all.sh - Updated swarm services (
stack_api,stack_web,stack_coordinator,stack_orchestrator) to:dev
- Ran
Observed issue after first restart:
- Playwright still reproduced
/auth/session500after Authentik callback. stack_apilogs still showed old RLS SQL failure (SET LOCAL ... $1), indicating runtime image drift/stale task.
Resolution:
- Checked host image digest for API:
git.mosaicstack.dev/mosaic/stack-api:dev->sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220
- Forced swarm API service to exact digest:
docker service update --with-registry-auth --image git.mosaicstack.dev/mosaic/stack-api@sha256:fd0cbfe053ed27945577553d67da5cbda0bf71610006e5ccc197d5761e29a220 stack_api
- Verified new running task uses digest-pinned image.
Final verification (Playwright MCP):
- Login flow:
https://app.mosaicstack.dev/login-> Authentik (jarvis/jarvis) -> redirect back to app. - Session endpoint:
GET https://api.mosaicstack.dev/auth/session->200. - App landed authenticated on
https://app.mosaicstack.dev/tasks(not bounced to login).
Status:
- Auth chain is functioning end-to-end after digest-forced API rollout.
- Remaining console noise observed: missing
favicon.ico(404) on app domain (non-blocking for auth).