From 491675b613ccc2d272eb0dfbde49d9c6b70dbf6a Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Mon, 16 Feb 2026 04:43:38 -0600 Subject: [PATCH] docs: add auth & frontend remediation plan Comprehensive plan for fixing the production 500 on POST /auth/sign-in/oauth2 and redesigning the frontend login page to be OIDC-aware with multi-method authentication support. Key areas covered: - Backend: OIDC startup validation, auth config discovery endpoint, BetterAuth error handling, PKCE, session hardening, trustedOrigins extraction - Frontend: Multi-method login page, PDA-friendly error display, adaptive UI based on backend-advertised providers, loading states, accessibility - Security: CSRF rationale, secret leakage prevention, redirect URI validation, session idle timeout, OIDC health checks - 6 implementation phases with file change map and testing strategy Created with input from frontend design, backend, security, and auth architecture specialist reviews. Co-Authored-By: Claude Opus 4.6 --- docs/plans/auth-frontend-remediation.md | 796 ++++++++++++++++++++++++ 1 file changed, 796 insertions(+) create mode 100644 docs/plans/auth-frontend-remediation.md diff --git a/docs/plans/auth-frontend-remediation.md b/docs/plans/auth-frontend-remediation.md new file mode 100644 index 0000000..ae98b61 --- /dev/null +++ b/docs/plans/auth-frontend-remediation.md @@ -0,0 +1,796 @@ +# Auth & Frontend Remediation Plan + +**Created:** 2026-02-16 +**Status:** Draft - Pending milestone/issue creation +**Scope:** Backend auth hardening + frontend OIDC-aware multi-method login +**Branch:** `develop` + +--- + +## Executive Summary + +The Mosaic Stack authentication system has critical gaps that cause silent 500 errors +in production and leave the frontend unable to adapt to backend configuration. The +frontend login UI is hardcoded for OIDC-only authentication with no fallback, no error +display, and no awareness of backend state. + +This plan addresses both sides with a phased approach: fix the backend validation and +error handling first, then build a proper multi-method login UI that adapts to the +backend's advertised capabilities. + +--- + +## Table of Contents + +1. [Current State Assessment](#1-current-state-assessment) +2. [Architecture Design](#2-architecture-design) +3. [Backend Remediation](#3-backend-remediation) +4. [Frontend Remediation](#4-frontend-remediation) +5. [Security Hardening](#5-security-hardening) +6. [Implementation Phases](#6-implementation-phases) +7. [File Change Map](#7-file-change-map) +8. [Testing Strategy](#8-testing-strategy) +9. [Rollout & Rollback](#9-rollout--rollback) +10. [Open Questions](#10-open-questions) + +--- + +## 1. Current State Assessment + +### Backend + +| Area | Status | Issue | +| ------------------------- | ---------------------- | ---------------------------------------------------- | +| OIDC startup validation | Incomplete | `OIDC_REDIRECT_URI` not validated | +| BetterAuth error handling | Missing | Silent 500s bypass NestJS exception filter | +| Auth config discovery | Missing | Frontend cannot learn what auth methods exist | +| Email/password backend | Enabled but incomplete | No email verification service configured | +| Docker env vars | Inconsistent | Swarm compose has no default for `OIDC_REDIRECT_URI` | +| trustedOrigins | Hardcoded | Production URLs in source code | +| PKCE | Not enabled | genericOAuth lacks `pkce: true` | + +### Frontend + +| Area | Grade | Issue | +| ------------------ | ----- | ------------------------------------------- | +| Auth flow | C+ | OIDC-only, no fallback path | +| OIDC awareness | D | Zero conditional logic, no env check | +| Login UI | C | Single OAuth button, no email/password form | +| Error display | D | Callback errors silently lost | +| Session management | A- | AuthProvider is solid | +| Route protection | B | Component-level only, no middleware | +| Theme storage key | Bug | Still reads `"jarvis-theme"` | + +### Root Cause of Production 500 + +`POST /auth/sign-in/oauth2` returns 500 because: + +1. `OIDC_REDIRECT_URI` may be empty in the Swarm deployment (no default value) +2. BetterAuth's genericOAuth plugin fails when constructing the authorization URL +3. The error is swallowed — `toNodeHandler()` operates outside NestJS exception handling +4. `validateOidcConfig()` checks only 3 of 4 required OIDC variables + +--- + +## 2. Architecture Design + +### Auth Discovery Pattern + +The backend advertises available auth methods via `GET /auth/config`. The frontend +fetches this on the login page and renders the appropriate UI dynamically. + +``` +Browser API Authentik + │ │ │ + │ GET /auth/config │ │ + ├────────────────────────►│ │ + │◄────────────────────────┤ │ + │ { providers: [...] } │ │ + │ │ │ + │ (render login UI │ │ + │ based on providers) │ │ + │ │ │ + │ POST /auth/sign-in/... │ │ + ├────────────────────────►│ │ + │ (BetterAuth handles flow) │ + │ ├────────────────────────►│ + │ │◄────────────────────────┤ + │◄────────────────────────┤ │ + │ Set-Cookie + redirect │ │ +``` + +**Why backend discovery (not build-time env var):** + +- Auth config can change without rebuilding the frontend Docker image +- Health-aware: backend can disable a provider if its upstream is unreachable +- Single source of truth: no risk of frontend/backend config drift + +### Auth Config Response Shape + +```typescript +interface AuthConfigResponse { + providers: AuthProvider[]; +} + +interface AuthProvider { + id: string; // "authentik", "email" + name: string; // Display name for UI + type: "oauth" | "credentials"; +} +``` + +**What is NOT exposed:** client secrets, client IDs, issuer URLs, redirect URIs, +session expiry times, rate limit thresholds. Only capability metadata. + +### Frontend Auth State Machine + +``` +loading ──► unauthenticated ──► authenticating ──► authenticated + │ │ │ + │◄───── error ◄──────┘ │ + │ │ + │◄──────────── session-expired ◄────────┘ +``` + +States: + +- `loading` — checking `/auth/session` on mount +- `unauthenticated` — no valid session, show login page +- `authenticating` — OAuth redirect or form submission in progress +- `authenticated` — valid session, user object available +- `error` — auth failed (network, credentials, OAuth, backend) +- `session-expired` — session ended mid-use, redirect to login + +--- + +## 3. Backend Remediation + +### 3.1 Extend Startup Validation + +**File:** `apps/api/src/auth/auth.config.ts` + +Add `OIDC_REDIRECT_URI` to `REQUIRED_OIDC_ENV_VARS`. Add URL format validation: + +- Must be a valid URL +- Path must start with `/auth/callback` +- Warn if using `localhost` in production + +**Tests to add:** Missing var, invalid URL, invalid path, valid URL. + +### 3.2 Auth Config Discovery Endpoint + +**New endpoint:** `GET /auth/config` (public, no auth required) + +Returns the list of enabled providers: + +- Always includes `email` provider (when `emailAndPassword.enabled`) +- Includes `authentik` provider only when `OIDC_ENABLED=true` **and** the OIDC + provider is reachable (health check) + +Cache: `Cache-Control: public, max-age=300` (5 minutes). + +No rate limiting needed (read-only, public, low-risk). + +**OIDC Health Check:** Implement `isOidcProviderReachable()` in `AuthService` that +fetches the OIDC discovery URL with a 2-second timeout. Cache the result for 30 +seconds to avoid repeated network calls. When Authentik is unreachable, the +`authentik` provider is omitted from the config response, causing the frontend to +hide the OAuth button and show only email/password login. + +**Secret leakage prevention:** The response must NOT contain `OIDC_CLIENT_SECRET`, +`OIDC_CLIENT_ID`, `OIDC_ISSUER`, `OIDC_REDIRECT_URI`, `BETTER_AUTH_SECRET`, +`JWT_SECRET`, `CSRF_SECRET`, session expiry times, or rate limit thresholds. +Add an explicit test that serializes the response body and asserts none of these +patterns appear. + +**Files:** + +- `apps/api/src/auth/auth.controller.ts` — add endpoint +- `apps/api/src/auth/auth.service.ts` — add `getAuthConfig()` and `isOidcProviderReachable()` +- `packages/shared/src/types/auth.types.ts` — add `AuthProvider`, `AuthConfigResponse` + +### 3.3 BetterAuth Error Handling Wrapper + +**File:** `apps/api/src/auth/auth.controller.ts` + +Wrap the `handler(req, res)` call in try/catch: + +- Log errors with full context (method, URL, stack trace) +- If response not yet sent, throw `HttpException` to trigger `GlobalExceptionFilter` +- If response already started, log warning only (can't throw after headers sent) + +### 3.4 Docker Compose Fixes + +**File:** `docker-compose.swarm.portainer.yml` + +Change line 115 from: + +```yaml +OIDC_REDIRECT_URI: ${OIDC_REDIRECT_URI} +``` + +To: + +```yaml +OIDC_REDIRECT_URI: ${OIDC_REDIRECT_URI:-} +``` + +Empty string is intentional — startup validation catches it when OIDC is enabled. + +### 3.5 Email/Password Status Decision + +BetterAuth `emailAndPassword: { enabled: true }` is set but incomplete: + +- No `sendVerificationEmail` callback configured +- No `sendResetPassword` callback configured +- No email service (SMTP/SendGrid) integrated + +**Decision:** Keep enabled without email verification for MVP. This provides a +fallback login method when Authentik is unreachable. Users can sign in with +email/password but cannot reset passwords or verify email addresses. A future +milestone should add an email service (SMTP/SendGrid) with `sendVerificationEmail` +and `sendResetPassword` callbacks. + +**Trade-off acknowledged:** The backend specialist recommended disabling until email +service exists (safer). We chose to keep enabled because: (a) it provides the only +fallback when Authentik is down, (b) the risk is limited — no public sign-up means +only admin-created accounts can use it, (c) password reset can be handled manually +by admins until the email service is added. + +### 3.6 Extract trustedOrigins to Environment Variables + +**File:** `apps/api/src/auth/auth.config.ts` + +Replace hardcoded origins with a `getTrustedOrigins()` function that reads: + +- `NEXT_PUBLIC_APP_URL` (primary frontend URL) +- `NEXT_PUBLIC_API_URL` (API's own origin) +- `TRUSTED_ORIGINS` (comma-separated additional origins) +- Development-only localhost fallbacks + +Align with CORS configuration in `main.ts` to use the same origin list. + +### 3.7 Enable PKCE + +**File:** `apps/api/src/auth/auth.config.ts` + +Add `pkce: true` to the genericOAuth provider config. PKCE (Proof Key for Code +Exchange) prevents authorization code interception attacks. Authentik supports PKCE. + +--- + +## 4. Frontend Remediation + +### 4.1 Login Page Redesign + +The login page adapts based on the auth config from `GET /auth/config`: + +**When OIDC is enabled (OAuth + email/password):** + +``` +┌─────────────────────────────────┐ +│ Welcome to Mosaic Stack │ +│ │ +│ [error banner if ?error param] │ +│ │ +│ ┌─────────────────────────┐ │ +│ │ Continue with Authentik │ │ ← OAuthButton (primary) +│ └─────────────────────────┘ │ +│ │ +│ ──── or continue with email ── │ ← AuthDivider +│ │ +│ Email: [________________] │ +│ Password: [_____________] │ ← LoginForm (secondary) +│ [ Continue ] │ +│ │ +└─────────────────────────────────┘ +``` + +**When OIDC is disabled (email/password only):** + +``` +┌─────────────────────────────────┐ +│ Welcome to Mosaic Stack │ +│ │ +│ [error banner if ?error param] │ +│ │ +│ Email: [________________] │ +│ Password: [_____________] │ ← LoginForm (primary) +│ [ Continue ] │ +│ │ +└─────────────────────────────────┘ +``` + +### 4.2 New Components + +| Component | File | Purpose | +| ---------------------- | ------------------------------------------ | ------------------------------------------------------- | +| `OAuthButton` | `components/auth/OAuthButton.tsx` | Replaces `LoginButton`. Loading state, provider config. | +| `LoginForm` | `components/auth/LoginForm.tsx` | Email/password form with validation | +| `AuthErrorBanner` | `components/auth/AuthErrorBanner.tsx` | PDA-friendly error display | +| `AuthDivider` | `components/auth/AuthDivider.tsx` | "or continue with email" separator | +| `SessionExpiryWarning` | `components/auth/SessionExpiryWarning.tsx` | Floating banner when session nears expiry | + +**Delete:** `components/auth/LoginButton.tsx` (replaced by `OAuthButton`) + +### 4.3 PDA-Friendly Error Messages + +All error messages follow PDA design principles. No alarming language. + +| Error Source | Message | +| --------------------- | ---------------------------------------------------------------- | +| OAuth callback failed | "Authentication paused. Please try again when ready." | +| Invalid credentials | "The email and password combination wasn't recognized." | +| Network failure | "Unable to connect. Check your network and try again." | +| Backend 500 | "The service is taking a break. Please try again in a moment." | +| Backend 502/503 | "The service is temporarily unavailable. Try again in a moment." | +| Backend 504 | "The connection took longer than expected. Check your network." | +| Rate limited | "You've tried a few times. Take a moment and try again shortly." | +| Session expired | "Your session ended. Please sign in again when ready." | + +**Colors:** Blue info banner (`bg-blue-50`, `border-blue-200`, `text-blue-700`). +No red. No warning icons. Info icon only. + +### 4.4 Auth Config Fetching + +The login page fetches `GET /auth/config` on mount to determine which providers +to render. If the fetch fails, fall back to showing only the email/password form +(safest default). + +```typescript +// In login page +const [authConfig, setAuthConfig] = useState(null); + +useEffect(() => { + fetch(`${API_BASE_URL}/auth/config`) + .then((res) => res.json()) + .then(setAuthConfig) + .catch(() => { + // Fallback: show email/password only + setAuthConfig({ providers: [{ id: "email", name: "Email", type: "credentials" }] }); + }); +}, []); +``` + +### 4.5 Loading States + +- **OAuth button:** Shows spinner + "Connecting..." during redirect +- **Login form:** Inputs disabled + submit button shows spinner during API call +- **Callback page:** Already has spinner (no changes needed) +- **Session check:** Full-page spinner while AuthProvider checks `/auth/session` + +### 4.6 Error Display on Login Page + +The login page reads `?error=` query params from the callback redirect and displays +them in the `AuthErrorBanner`. Error codes are sanitized against an allowlist (already +implemented in callback page). + +### 4.7 Fix Theme Storage Key + +**File:** `apps/web/src/providers/ThemeProvider.tsx` + +Change `STORAGE_KEY` from `"jarvis-theme"` to `"mosaic-theme"`. + +### 4.8 Accessibility Requirements + +- All form inputs have associated `