Files
stack/guides/VAULT-SECRETS.md
Hermes Agent 373e4558a3
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
chore(framework): canonize Vault-as-SSOT + ESO-default secrets policy
Encodes operator-approved (Jason, 2026-05-22) secrets policy as binding
framework rules across all Mosaic agent sessions and projects.

Changes:
- STANDARDS.md: add "Secrets handling (HARD RULE)" subsection under
  Non-Negotiables — Vault as SSOT, ESO bridge as default, Direct-Vault
  opt-in only, forbidden ${VAR:-default} for required values, forbidden
  .env in prod, required startup schema validation
- VAULT-SECRETS.md: add four new sections — architecture decision matrix
  (ESO vs Direct-Vault), full ESO bridge worked example (Vault path +
  ExternalSecret + Deployment YAML + zod/pydantic/Go validators),
  Direct-Vault opt-in pattern (AppRole provisioning + ESO bootstrap
  for chicken-and-egg), and forbidden patterns CI lint targets
- BOOTSTRAP.md: add "Secrets Bootstrap" required subsection with
  checklist for new apps (Vault path, README docs, ExternalSecret,
  secretKeyRef, schema validator, Direct-Vault justification)

All duplicate file paths kept in sync (md5-equal pairs):
  guides/ <-> packages/mosaic/framework/guides/
  packages/mosaic/framework/defaults/STANDARDS.md (single copy in repo)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:58:27 -05:00

16 KiB

Vault Secrets Management Guide

This guide applies when the project uses HashiCorp Vault for secrets management.

Before Starting

  1. Verify Vault access: vault status
  2. Authenticate: vault login (method depends on environment)
  3. Check your permissions for the required paths

Canonical Structure

ALL Vault secrets MUST follow this structure:

{mount}/{service}/{component}/{secret-name}

Components

  • mount: Environment-specific mount point
  • service: The service or application name
  • component: Logical grouping (database, api, oauth, etc.)
  • secret-name: Specific secret identifier

Environment Mounts

Mount Environment Usage
secret-dev/ Development Local dev, CI
secret-staging/ Staging Pre-production testing
secret-prod/ Production Live systems

Examples

# Database credentials
secret-prod/postgres/database/app
secret-prod/mysql/database/readonly
secret-staging/redis/auth/default

# API tokens
secret-prod/authentik/admin/token
secret-prod/stripe/api/live-key
secret-dev/sendgrid/api/test-key

# JWT/Authentication
secret-prod/backend-api/jwt/signing-key
secret-prod/auth-service/session/secret

# OAuth providers
secret-prod/backend-api/oauth/google
secret-prod/backend-api/oauth/github

# Internal services
secret-prod/loki/read-auth/admin
secret-prod/grafana/admin/password

Standard Field Names

Use consistent field names within secrets:

Purpose Fields
Credentials username, password
Tokens token
OAuth client_id, client_secret
Connection url, host, port
Keys public_key, private_key

Example Secret Structure

// secret-prod/postgres/database/app
{
  "username": "app_user",
  "password": "secure-password-here",
  "host": "db.example.com",
  "port": "5432",
  "database": "myapp"
}

Rules

  1. DO NOT GUESS secret paths - Always verify the path exists
  2. Use helper scripts in scripts/vault/ when available
  3. All lowercase, hyphenated (kebab-case) for all path segments
  4. Standard field names - Use the conventions above
  5. No sensitive data in path names - Path itself should not reveal secrets
  6. Environment separation - Never reference prod secrets from dev

Deprecated Paths (DO NOT USE)

These legacy patterns are deprecated and should be migrated:

Deprecated Migrate To
secret/infrastructure/* secret-{env}/{service}/...
secret/oauth/* secret-{env}/{service}/oauth/{provider}
secret/database/* secret-{env}/{service}/database/{user}
secret/credentials/* secret-{env}/{service}/{component}/{name}

Reading Secrets

CLI

# Read a secret
vault kv get secret-prod/postgres/database/app

# Get specific field
vault kv get -field=password secret-prod/postgres/database/app

# JSON output
vault kv get -format=json secret-prod/postgres/database/app

Application Code

Python (hvac):

import hvac

client = hvac.Client(url='https://vault.example.com')
secret = client.secrets.kv.v2.read_secret_version(
    path='postgres/database/app',
    mount_point='secret-prod'
)
password = secret['data']['data']['password']

Node.js (node-vault):

const vault = require('node-vault')({ endpoint: 'https://vault.example.com' });
const secret = await vault.read('secret-prod/data/postgres/database/app');
const password = secret.data.data.password;

Go:

secret, err := client.Logical().Read("secret-prod/data/postgres/database/app")
password := secret.Data["data"].(map[string]interface{})["password"].(string)

Writing Secrets

Only authorized personnel should write secrets. If you need a new secret:

  1. Request through proper channels (ticket, PR to IaC repo)
  2. Follow the canonical structure
  3. Document the secret's purpose
  4. Set appropriate access policies
# Example (requires write permissions)
vault kv put secret-dev/myapp/database/app \
  username="dev_user" \
  password="dev-password" \
  host="localhost" \
  port="5432"

Troubleshooting

Permission Denied

Error: permission denied
  • Verify your token has read access to the path
  • Check if you're using the correct mount point
  • Confirm the secret path exists

Secret Not Found

Error: no value found at secret-prod/data/service/component/name
  • Verify the exact path (use vault kv list to explore)
  • Check for typos in service/component names
  • Confirm you're using the correct environment mount

Token Expired

Error: token expired
  • Re-authenticate: vault login
  • Check token TTL: vault token lookup

Security Best Practices

  1. Least privilege - Request only the permissions you need
  2. Short-lived tokens - Use tokens with appropriate TTLs
  3. Audit logging - All access is logged; act accordingly
  4. No local copies - Don't store secrets in files or env vars long-term
  5. Rotate on compromise - Immediately rotate any exposed secrets

Secrets Architecture Decision Matrix

Use this table to choose between the ESO bridge (default) and Direct-Vault (opt-in) patterns for every new app or integration.

Factor ESO Bridge (default) Direct-Vault (opt-in)
Use-case All static secrets (DB creds, API keys, signing keys, OAuth secrets) Dynamic creds with short TTLs (DB rotation, AWS STS, PKI), per-request audit trails, or lease renewal mid-pod-lifecycle
App code change None — reads standard env vars via secretKeyRef Requires Vault client (hvac, node-vault, vault/api) in application code
Secret rotation ESO re-syncs on Vault write; pod restart or secret refresh picks up new value App manages lease renewal or re-auth within the running process
Audit granularity Access logged at Vault when ESO syncs; no per-request app audit Every app request to Vault is a separate audit log entry
Operational burden Low — ESO handles polling, sync, and k8s Secret lifecycle Higher — app must handle auth, lease renewal, error paths, and token rotation
Justification required? No — this is the default Yes — document in project README under "Secrets architecture"
Example use cases Web app DB password, OAuth client secret, JWT signing key, API token HashiCorp DB secrets engine with 15-min TTL leases, AWS STS assume-role, Vault PKI short-lived certs

Decision rule: If you are unsure, use ESO. Only justify Direct-Vault when the secret cannot be safely stored in a k8s Secret (too short-lived, per-request TTL required, or mid-lifecycle renewal needed).


ESO Bridge Pattern (Default)

This is the required default for all k8s workloads. Follow this exact pattern unless a documented dynamic-secrets requirement justifies Direct-Vault.

1. Provision Vault path

# Write the secrets for the app (run once; use IaC/Terraform for repeatable provisioning)
vault kv put secret/k3s/<app> \
  db_password="..." \
  api_key="..." \
  jwt_secret="..."

Use the canonical path structure: secret/k3s/<app> for k3s cluster workloads.

2. ExternalSecret manifest

Commit this to the repo's deploy/ or k8s/ directory:

# deploy/external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: <app>-secrets
  namespace: <namespace>
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend        # ClusterSecretStore name — verify with cluster admin
    kind: ClusterSecretStore
  target:
    name: <app>-secrets        # k8s Secret name that will be created
    creationPolicy: Owner
  data:
    - secretKey: DB_PASSWORD   # key in the k8s Secret
      remoteRef:
        key: secret/k3s/<app>  # Vault path
        property: db_password  # field within the Vault secret
    - secretKey: API_KEY
      remoteRef:
        key: secret/k3s/<app>
        property: api_key
    - secretKey: JWT_SECRET
      remoteRef:
        key: secret/k3s/<app>
        property: jwt_secret

3. Deployment manifest — reference synced k8s Secret

# deploy/deployment.yaml (env section)
env:
  - name: DB_PASSWORD
    valueFrom:
      secretKeyRef:
        name: <app>-secrets   # matches ExternalSecret target.name
        key: DB_PASSWORD
  - name: API_KEY
    valueFrom:
      secretKeyRef:
        name: <app>-secrets
        key: API_KEY
  - name: JWT_SECRET
    valueFrom:
      secretKeyRef:
        name: <app>-secrets
        key: JWT_SECRET
  - name: PORT
    value: "3000"              # safe-default: non-secret, no Vault needed

4. App-side schema validation — TypeScript (zod)

Validate all required env vars at startup. Exit non-zero on missing values.

// src/env.ts
import { z } from 'zod';

const envSchema = z.object({
  DB_PASSWORD: z.string().min(1, 'DB_PASSWORD is required'),
  API_KEY: z.string().min(1, 'API_KEY is required'),
  JWT_SECRET: z.string().min(32, 'JWT_SECRET must be at least 32 chars'),
  PORT: z.coerce.number().default(3000),
  NODE_ENV: z.enum(['development', 'production', 'test']).default('production'),
});

const result = envSchema.safeParse(process.env);
if (!result.success) {
  console.error('Missing or invalid environment variables:');
  console.error(result.error.flatten().fieldErrors);
  process.exit(1);
}

export const env = result.data;

4b. App-side schema validation — Python (pydantic)

# src/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    db_password: str
    api_key: str
    jwt_secret: str
    port: int = 3000
    node_env: str = "production"

    model_config = SettingsConfigDict(env_file=None)  # no .env in prod

try:
    settings = Settings()
except Exception as e:
    import sys
    print(f"Missing or invalid environment variables: {e}", file=sys.stderr)
    sys.exit(1)

4c. App-side schema validation — Go (envconfig)

// config/config.go
package config

import (
    "fmt"
    "os"
    "github.com/kelseyhightower/envconfig"
)

type Config struct {
    DBPassword string `envconfig:"DB_PASSWORD" required:"true"`
    APIKey     string `envconfig:"API_KEY" required:"true"`
    JWTSecret  string `envconfig:"JWT_SECRET" required:"true"`
    Port       int    `envconfig:"PORT" default:"3000"`
}

func Load() (*Config, error) {
    var cfg Config
    if err := envconfig.Process("", &cfg); err != nil {
        return nil, fmt.Errorf("invalid environment: %w", err)
    }
    return &cfg, nil
}

// In main():
// cfg, err := config.Load()
// if err != nil { fmt.Fprintln(os.Stderr, err); os.Exit(1) }

Direct-Vault Opt-In Pattern

Use this pattern ONLY when a documented dynamic-secrets requirement applies (DB rotation with short TTLs, AWS STS, PKI, per-request audit). Document the justification in the project README under "Secrets architecture" before implementing.

When it is justified

  • Vault DB secrets engine with lease TTLs shorter than a typical pod lifecycle (< 1 hour)
  • AWS STS assume-role tokens generated per-request
  • Vault PKI short-lived certificates (< 24 hours) that must be renewed within a running pod
  • Per-request audit trail requirement (each app call must appear separately in Vault audit log)

Provision an AppRole for the app

# Enable AppRole auth (if not already enabled)
vault auth enable approle

# Create a Vault policy for the app
vault policy write <app>-policy - <<EOF
path "secret/data/k3s/<app>/*" {
  capabilities = ["read"]
}
path "database/creds/<app>-role" {
  capabilities = ["read"]
}
EOF

# Create the AppRole
vault write auth/approle/role/<app>-role \
  token_policies="<app>-policy" \
  token_ttl=1h \
  token_max_ttl=4h \
  secret_id_ttl=0

# Retrieve role-id and secret-id
vault read auth/approle/role/<app>-role/role-id
vault write -f auth/approle/role/<app>-role/secret-id

Bootstrap AppRole credentials via ESO (solving the chicken-and-egg problem)

The AppRole role-id and secret-id are themselves secrets. Store them in Vault at a bootstrap path, then use ESO to sync them into a k8s Secret. The app reads that k8s Secret at startup to authenticate with Vault directly.

# Store the bootstrap credentials in Vault
vault kv put secret/k3s/<app>-bootstrap \
  role_id="<role-id>" \
  secret_id="<secret-id>"
# deploy/external-secret-bootstrap.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: <app>-vault-auth
  namespace: <namespace>
spec:
  refreshInterval: 24h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: <app>-vault-auth
    creationPolicy: Owner
  data:
    - secretKey: VAULT_ROLE_ID
      remoteRef:
        key: secret/k3s/<app>-bootstrap
        property: role_id
    - secretKey: VAULT_SECRET_ID
      remoteRef:
        key: secret/k3s/<app>-bootstrap
        property: secret_id
# deploy/deployment.yaml (env section for Direct-Vault app)
env:
  - name: VAULT_ADDR
    value: "https://vault.example.com"  # safe-default: non-secret cluster address
  - name: VAULT_ROLE_ID
    valueFrom:
      secretKeyRef:
        name: <app>-vault-auth
        key: VAULT_ROLE_ID
  - name: VAULT_SECRET_ID
    valueFrom:
      secretKeyRef:
        name: <app>-vault-auth
        key: VAULT_SECRET_ID

App-side Vault client pattern

// src/vault-client.ts — only exists in Direct-Vault apps
import vault from 'node-vault';
import { z } from 'zod';

const bootstrapSchema = z.object({
  VAULT_ADDR: z.string().url(),
  VAULT_ROLE_ID: z.string().min(1),
  VAULT_SECRET_ID: z.string().min(1),
});

const bootstrap = bootstrapSchema.parse(process.env);

const client = vault({ endpoint: bootstrap.VAULT_ADDR });

export async function getVaultClient() {
  const { auth } = await client.approleLogin({
    role_id: bootstrap.VAULT_ROLE_ID,
    secret_id: bootstrap.VAULT_SECRET_ID,
  });
  client.token = auth.client_token;
  return client;
}

Document in README under "Secrets architecture": the Vault path, why Direct-Vault is required, and the lease/renewal strategy.


Forbidden Patterns (CI Lint Targets)

The following patterns are forbidden in all Mosaic projects. CI lint SHOULD catch these automatically (implementation tracked separately). Agents MUST NOT introduce these patterns.

1. Untagged fallback defaults for required values

# FORBIDDEN — required secret with silent fallback
environment:
  - DB_PASSWORD=${DB_PASSWORD:-changeme}
  - API_KEY=${API_KEY:-}

# REQUIRED — fast-fail on missing required values
environment:
  - DB_PASSWORD=${DB_PASSWORD:?DB_PASSWORD is required}
  - API_KEY=${API_KEY:?API_KEY is required}

# ALLOWED — true convenience default, tagged
environment:
  - PORT=${PORT:-3000}  # safe-default: non-secret, app works at any port

This applies to: docker-compose.yml, k8s manifests, Helm values.yaml, any env file committed to git.

2. Vault KV calls in application source code (ESO-default projects)

# FORBIDDEN in ESO-default apps — direct Vault client in app source
import hvac
client = hvac.Client(url=os.environ['VAULT_ADDR'])
secret = client.secrets.kv.v2.read_secret_version(path='myapp/db')

ESO-default apps read env vars only. Direct-Vault clients belong only in apps with a documented dynamic-secrets justification in README.

3. Hardcoded secrets or API keys in committed files

# FORBIDDEN — hardcoded credential
DB_PASSWORD = "supersecret123"
API_KEY = "sk-live-abc123"

No exceptions. CI lint must flag any string matching common secret patterns (password, secret, api_key, token assigned a literal non-env-var value).

4. .env files in production deployment paths

# FORBIDDEN — .env file in a production deploy path
deploy/.env
k8s/.env
docker/.env

# ALLOWED — local dev only
.env.example          # template only, no real values
.env                  # local dev, must be in .gitignore

.env files are acceptable in local-dev contexts only and MUST be in .gitignore. They are forbidden in any path that a CI pipeline or production deployment process reads directly.