Two should-fix findings from automated Codex review: 1. Vault KV v2 policy path — add explicit path for exact top-level `secret/data/k3s/<app>` entry alongside the wildcard `/*` sub-path rule. Without the exact path, apps reading the top-level secret get permission denied from Vault KV v2 even with the wildcard. 2. Go envconfig example — remove unused `os` import from config.go snippet (os was only referenced in a comment). Move the main() usage to a separate clearly-labelled main.go block to make both snippets copy-paste compilable. Both fixes mirrored to duplicate path: guides/ <-> packages/mosaic/framework/guides/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
577 lines
16 KiB
Markdown
577 lines
16 KiB
Markdown
# Vault Secrets Management Guide
|
|
|
|
This guide applies when the project uses HashiCorp Vault for secrets management.
|
|
|
|
## Before Starting
|
|
|
|
1. Verify Vault access: `vault status`
|
|
2. Authenticate: `vault login` (method depends on environment)
|
|
3. Check your permissions for the required paths
|
|
|
|
## Canonical Structure
|
|
|
|
**ALL Vault secrets MUST follow this structure:**
|
|
|
|
```
|
|
{mount}/{service}/{component}/{secret-name}
|
|
```
|
|
|
|
### Components
|
|
|
|
- **mount**: Environment-specific mount point
|
|
- **service**: The service or application name
|
|
- **component**: Logical grouping (database, api, oauth, etc.)
|
|
- **secret-name**: Specific secret identifier
|
|
|
|
## Environment Mounts
|
|
|
|
| Mount | Environment | Usage |
|
|
| ----------------- | ----------- | ---------------------- |
|
|
| `secret-dev/` | Development | Local dev, CI |
|
|
| `secret-staging/` | Staging | Pre-production testing |
|
|
| `secret-prod/` | Production | Live systems |
|
|
|
|
## Examples
|
|
|
|
```bash
|
|
# Database credentials
|
|
secret-prod/postgres/database/app
|
|
secret-prod/mysql/database/readonly
|
|
secret-staging/redis/auth/default
|
|
|
|
# API tokens
|
|
secret-prod/authentik/admin/token
|
|
secret-prod/stripe/api/live-key
|
|
secret-dev/sendgrid/api/test-key
|
|
|
|
# JWT/Authentication
|
|
secret-prod/backend-api/jwt/signing-key
|
|
secret-prod/auth-service/session/secret
|
|
|
|
# OAuth providers
|
|
secret-prod/backend-api/oauth/google
|
|
secret-prod/backend-api/oauth/github
|
|
|
|
# Internal services
|
|
secret-prod/loki/read-auth/admin
|
|
secret-prod/grafana/admin/password
|
|
```
|
|
|
|
## Standard Field Names
|
|
|
|
Use consistent field names within secrets:
|
|
|
|
| Purpose | Fields |
|
|
| ----------- | ---------------------------- |
|
|
| Credentials | `username`, `password` |
|
|
| Tokens | `token` |
|
|
| OAuth | `client_id`, `client_secret` |
|
|
| Connection | `url`, `host`, `port` |
|
|
| Keys | `public_key`, `private_key` |
|
|
|
|
### Example Secret Structure
|
|
|
|
```json
|
|
// secret-prod/postgres/database/app
|
|
{
|
|
"username": "app_user",
|
|
"password": "secure-password-here",
|
|
"host": "db.example.com",
|
|
"port": "5432",
|
|
"database": "myapp"
|
|
}
|
|
```
|
|
|
|
## Rules
|
|
|
|
1. **DO NOT GUESS** secret paths - Always verify the path exists
|
|
2. **Use helper scripts** in `scripts/vault/` when available
|
|
3. **All lowercase, hyphenated** (kebab-case) for all path segments
|
|
4. **Standard field names** - Use the conventions above
|
|
5. **No sensitive data in path names** - Path itself should not reveal secrets
|
|
6. **Environment separation** - Never reference prod secrets from dev
|
|
|
|
## Deprecated Paths (DO NOT USE)
|
|
|
|
These legacy patterns are deprecated and should be migrated:
|
|
|
|
| Deprecated | Migrate To |
|
|
| ------------------------- | ------------------------------------------- |
|
|
| `secret/infrastructure/*` | `secret-{env}/{service}/...` |
|
|
| `secret/oauth/*` | `secret-{env}/{service}/oauth/{provider}` |
|
|
| `secret/database/*` | `secret-{env}/{service}/database/{user}` |
|
|
| `secret/credentials/*` | `secret-{env}/{service}/{component}/{name}` |
|
|
|
|
## Reading Secrets
|
|
|
|
### CLI
|
|
|
|
```bash
|
|
# Read a secret
|
|
vault kv get secret-prod/postgres/database/app
|
|
|
|
# Get specific field
|
|
vault kv get -field=password secret-prod/postgres/database/app
|
|
|
|
# JSON output
|
|
vault kv get -format=json secret-prod/postgres/database/app
|
|
```
|
|
|
|
### Application Code
|
|
|
|
**Python (hvac):**
|
|
|
|
```python
|
|
import hvac
|
|
|
|
client = hvac.Client(url='https://vault.example.com')
|
|
secret = client.secrets.kv.v2.read_secret_version(
|
|
path='postgres/database/app',
|
|
mount_point='secret-prod'
|
|
)
|
|
password = secret['data']['data']['password']
|
|
```
|
|
|
|
**Node.js (node-vault):**
|
|
|
|
```javascript
|
|
const vault = require('node-vault')({ endpoint: 'https://vault.example.com' });
|
|
const secret = await vault.read('secret-prod/data/postgres/database/app');
|
|
const password = secret.data.data.password;
|
|
```
|
|
|
|
**Go:**
|
|
|
|
```go
|
|
secret, err := client.Logical().Read("secret-prod/data/postgres/database/app")
|
|
password := secret.Data["data"].(map[string]interface{})["password"].(string)
|
|
```
|
|
|
|
## Writing Secrets
|
|
|
|
Only authorized personnel should write secrets. If you need a new secret:
|
|
|
|
1. Request through proper channels (ticket, PR to IaC repo)
|
|
2. Follow the canonical structure
|
|
3. Document the secret's purpose
|
|
4. Set appropriate access policies
|
|
|
|
```bash
|
|
# Example (requires write permissions)
|
|
vault kv put secret-dev/myapp/database/app \
|
|
username="dev_user" \
|
|
password="dev-password" \
|
|
host="localhost" \
|
|
port="5432"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Permission Denied
|
|
|
|
```
|
|
Error: permission denied
|
|
```
|
|
|
|
- Verify your token has read access to the path
|
|
- Check if you're using the correct mount point
|
|
- Confirm the secret path exists
|
|
|
|
### Secret Not Found
|
|
|
|
```
|
|
Error: no value found at secret-prod/data/service/component/name
|
|
```
|
|
|
|
- Verify the exact path (use `vault kv list` to explore)
|
|
- Check for typos in service/component names
|
|
- Confirm you're using the correct environment mount
|
|
|
|
### Token Expired
|
|
|
|
```
|
|
Error: token expired
|
|
```
|
|
|
|
- Re-authenticate: `vault login`
|
|
- Check token TTL: `vault token lookup`
|
|
|
|
## Security Best Practices
|
|
|
|
1. **Least privilege** - Request only the permissions you need
|
|
2. **Short-lived tokens** - Use tokens with appropriate TTLs
|
|
3. **Audit logging** - All access is logged; act accordingly
|
|
4. **No local copies** - Don't store secrets in files or env vars long-term
|
|
5. **Rotate on compromise** - Immediately rotate any exposed secrets
|
|
|
|
---
|
|
|
|
## Secrets Architecture Decision Matrix
|
|
|
|
Use this table to choose between the ESO bridge (default) and Direct-Vault (opt-in) patterns for every new app or integration.
|
|
|
|
| Factor | ESO Bridge (default) | Direct-Vault (opt-in) |
|
|
| --- | --- | --- |
|
|
| **Use-case** | All static secrets (DB creds, API keys, signing keys, OAuth secrets) | Dynamic creds with short TTLs (DB rotation, AWS STS, PKI), per-request audit trails, or lease renewal mid-pod-lifecycle |
|
|
| **App code change** | None — reads standard env vars via `secretKeyRef` | Requires Vault client (`hvac`, `node-vault`, `vault/api`) in application code |
|
|
| **Secret rotation** | ESO re-syncs on Vault write; pod restart or secret refresh picks up new value | App manages lease renewal or re-auth within the running process |
|
|
| **Audit granularity** | Access logged at Vault when ESO syncs; no per-request app audit | Every app request to Vault is a separate audit log entry |
|
|
| **Operational burden** | Low — ESO handles polling, sync, and k8s Secret lifecycle | Higher — app must handle auth, lease renewal, error paths, and token rotation |
|
|
| **Justification required?** | No — this is the default | Yes — document in project README under "Secrets architecture" |
|
|
| **Example use cases** | Web app DB password, OAuth client secret, JWT signing key, API token | HashiCorp DB secrets engine with 15-min TTL leases, AWS STS assume-role, Vault PKI short-lived certs |
|
|
|
|
**Decision rule:** If you are unsure, use ESO. Only justify Direct-Vault when the secret cannot be safely stored in a k8s Secret (too short-lived, per-request TTL required, or mid-lifecycle renewal needed).
|
|
|
|
---
|
|
|
|
## ESO Bridge Pattern (Default)
|
|
|
|
This is the required default for all k8s workloads. Follow this exact pattern unless a documented dynamic-secrets requirement justifies Direct-Vault.
|
|
|
|
### 1. Provision Vault path
|
|
|
|
```bash
|
|
# Write the secrets for the app (run once; use IaC/Terraform for repeatable provisioning)
|
|
vault kv put secret/k3s/<app> \
|
|
db_password="..." \
|
|
api_key="..." \
|
|
jwt_secret="..."
|
|
```
|
|
|
|
Use the canonical path structure: `secret/k3s/<app>` for k3s cluster workloads.
|
|
|
|
### 2. ExternalSecret manifest
|
|
|
|
Commit this to the repo's `deploy/` or `k8s/` directory:
|
|
|
|
```yaml
|
|
# deploy/external-secret.yaml
|
|
apiVersion: external-secrets.io/v1beta1
|
|
kind: ExternalSecret
|
|
metadata:
|
|
name: <app>-secrets
|
|
namespace: <namespace>
|
|
spec:
|
|
refreshInterval: 1h
|
|
secretStoreRef:
|
|
name: vault-backend # ClusterSecretStore name — verify with cluster admin
|
|
kind: ClusterSecretStore
|
|
target:
|
|
name: <app>-secrets # k8s Secret name that will be created
|
|
creationPolicy: Owner
|
|
data:
|
|
- secretKey: DB_PASSWORD # key in the k8s Secret
|
|
remoteRef:
|
|
key: secret/k3s/<app> # Vault path
|
|
property: db_password # field within the Vault secret
|
|
- secretKey: API_KEY
|
|
remoteRef:
|
|
key: secret/k3s/<app>
|
|
property: api_key
|
|
- secretKey: JWT_SECRET
|
|
remoteRef:
|
|
key: secret/k3s/<app>
|
|
property: jwt_secret
|
|
```
|
|
|
|
### 3. Deployment manifest — reference synced k8s Secret
|
|
|
|
```yaml
|
|
# deploy/deployment.yaml (env section)
|
|
env:
|
|
- name: DB_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: <app>-secrets # matches ExternalSecret target.name
|
|
key: DB_PASSWORD
|
|
- name: API_KEY
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: <app>-secrets
|
|
key: API_KEY
|
|
- name: JWT_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: <app>-secrets
|
|
key: JWT_SECRET
|
|
- name: PORT
|
|
value: "3000" # safe-default: non-secret, no Vault needed
|
|
```
|
|
|
|
### 4. App-side schema validation — TypeScript (zod)
|
|
|
|
Validate all required env vars at startup. Exit non-zero on missing values.
|
|
|
|
```typescript
|
|
// src/env.ts
|
|
import { z } from 'zod';
|
|
|
|
const envSchema = z.object({
|
|
DB_PASSWORD: z.string().min(1, 'DB_PASSWORD is required'),
|
|
API_KEY: z.string().min(1, 'API_KEY is required'),
|
|
JWT_SECRET: z.string().min(32, 'JWT_SECRET must be at least 32 chars'),
|
|
PORT: z.coerce.number().default(3000),
|
|
NODE_ENV: z.enum(['development', 'production', 'test']).default('production'),
|
|
});
|
|
|
|
const result = envSchema.safeParse(process.env);
|
|
if (!result.success) {
|
|
console.error('Missing or invalid environment variables:');
|
|
console.error(result.error.flatten().fieldErrors);
|
|
process.exit(1);
|
|
}
|
|
|
|
export const env = result.data;
|
|
```
|
|
|
|
### 4b. App-side schema validation — Python (pydantic)
|
|
|
|
```python
|
|
# src/config.py
|
|
from pydantic_settings import BaseSettings, SettingsConfigDict
|
|
|
|
class Settings(BaseSettings):
|
|
db_password: str
|
|
api_key: str
|
|
jwt_secret: str
|
|
port: int = 3000
|
|
node_env: str = "production"
|
|
|
|
model_config = SettingsConfigDict(env_file=None) # no .env in prod
|
|
|
|
try:
|
|
settings = Settings()
|
|
except Exception as e:
|
|
import sys
|
|
print(f"Missing or invalid environment variables: {e}", file=sys.stderr)
|
|
sys.exit(1)
|
|
```
|
|
|
|
### 4c. App-side schema validation — Go (envconfig)
|
|
|
|
```go
|
|
// config/config.go
|
|
package config
|
|
|
|
import (
|
|
"fmt"
|
|
"github.com/kelseyhightower/envconfig"
|
|
)
|
|
|
|
type Config struct {
|
|
DBPassword string `envconfig:"DB_PASSWORD" required:"true"`
|
|
APIKey string `envconfig:"API_KEY" required:"true"`
|
|
JWTSecret string `envconfig:"JWT_SECRET" required:"true"`
|
|
Port int `envconfig:"PORT" default:"3000"`
|
|
}
|
|
|
|
func Load() (*Config, error) {
|
|
var cfg Config
|
|
if err := envconfig.Process("", &cfg); err != nil {
|
|
return nil, fmt.Errorf("invalid environment: %w", err)
|
|
}
|
|
return &cfg, nil
|
|
}
|
|
```
|
|
|
|
In your `main.go`:
|
|
|
|
```go
|
|
cfg, err := config.Load()
|
|
if err != nil {
|
|
fmt.Fprintln(os.Stderr, err)
|
|
os.Exit(1)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Direct-Vault Opt-In Pattern
|
|
|
|
Use this pattern ONLY when a documented dynamic-secrets requirement applies (DB rotation with short TTLs, AWS STS, PKI, per-request audit). Document the justification in the project README under "Secrets architecture" before implementing.
|
|
|
|
### When it is justified
|
|
|
|
- Vault DB secrets engine with lease TTLs shorter than a typical pod lifecycle (< 1 hour)
|
|
- AWS STS assume-role tokens generated per-request
|
|
- Vault PKI short-lived certificates (< 24 hours) that must be renewed within a running pod
|
|
- Per-request audit trail requirement (each app call must appear separately in Vault audit log)
|
|
|
|
### Provision an AppRole for the app
|
|
|
|
```bash
|
|
# Enable AppRole auth (if not already enabled)
|
|
vault auth enable approle
|
|
|
|
# Create a Vault policy for the app
|
|
# Note: KV v2 paths require both the exact path (for the top-level secret) and the
|
|
# wildcard (for sub-paths). Always include both to avoid permission denied errors.
|
|
vault policy write <app>-policy - <<EOF
|
|
path "secret/data/k3s/<app>" {
|
|
capabilities = ["read"]
|
|
}
|
|
path "secret/data/k3s/<app>/*" {
|
|
capabilities = ["read"]
|
|
}
|
|
path "database/creds/<app>-role" {
|
|
capabilities = ["read"]
|
|
}
|
|
EOF
|
|
|
|
# Create the AppRole
|
|
vault write auth/approle/role/<app>-role \
|
|
token_policies="<app>-policy" \
|
|
token_ttl=1h \
|
|
token_max_ttl=4h \
|
|
secret_id_ttl=0
|
|
|
|
# Retrieve role-id and secret-id
|
|
vault read auth/approle/role/<app>-role/role-id
|
|
vault write -f auth/approle/role/<app>-role/secret-id
|
|
```
|
|
|
|
### Bootstrap AppRole credentials via ESO (solving the chicken-and-egg problem)
|
|
|
|
The AppRole `role-id` and `secret-id` are themselves secrets. Store them in Vault at a bootstrap path, then use ESO to sync them into a k8s Secret. The app reads that k8s Secret at startup to authenticate with Vault directly.
|
|
|
|
```bash
|
|
# Store the bootstrap credentials in Vault
|
|
vault kv put secret/k3s/<app>-bootstrap \
|
|
role_id="<role-id>" \
|
|
secret_id="<secret-id>"
|
|
```
|
|
|
|
```yaml
|
|
# deploy/external-secret-bootstrap.yaml
|
|
apiVersion: external-secrets.io/v1beta1
|
|
kind: ExternalSecret
|
|
metadata:
|
|
name: <app>-vault-auth
|
|
namespace: <namespace>
|
|
spec:
|
|
refreshInterval: 24h
|
|
secretStoreRef:
|
|
name: vault-backend
|
|
kind: ClusterSecretStore
|
|
target:
|
|
name: <app>-vault-auth
|
|
creationPolicy: Owner
|
|
data:
|
|
- secretKey: VAULT_ROLE_ID
|
|
remoteRef:
|
|
key: secret/k3s/<app>-bootstrap
|
|
property: role_id
|
|
- secretKey: VAULT_SECRET_ID
|
|
remoteRef:
|
|
key: secret/k3s/<app>-bootstrap
|
|
property: secret_id
|
|
```
|
|
|
|
```yaml
|
|
# deploy/deployment.yaml (env section for Direct-Vault app)
|
|
env:
|
|
- name: VAULT_ADDR
|
|
value: "https://vault.example.com" # safe-default: non-secret cluster address
|
|
- name: VAULT_ROLE_ID
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: <app>-vault-auth
|
|
key: VAULT_ROLE_ID
|
|
- name: VAULT_SECRET_ID
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: <app>-vault-auth
|
|
key: VAULT_SECRET_ID
|
|
```
|
|
|
|
### App-side Vault client pattern
|
|
|
|
```typescript
|
|
// src/vault-client.ts — only exists in Direct-Vault apps
|
|
import vault from 'node-vault';
|
|
import { z } from 'zod';
|
|
|
|
const bootstrapSchema = z.object({
|
|
VAULT_ADDR: z.string().url(),
|
|
VAULT_ROLE_ID: z.string().min(1),
|
|
VAULT_SECRET_ID: z.string().min(1),
|
|
});
|
|
|
|
const bootstrap = bootstrapSchema.parse(process.env);
|
|
|
|
const client = vault({ endpoint: bootstrap.VAULT_ADDR });
|
|
|
|
export async function getVaultClient() {
|
|
const { auth } = await client.approleLogin({
|
|
role_id: bootstrap.VAULT_ROLE_ID,
|
|
secret_id: bootstrap.VAULT_SECRET_ID,
|
|
});
|
|
client.token = auth.client_token;
|
|
return client;
|
|
}
|
|
```
|
|
|
|
Document in README under "Secrets architecture": the Vault path, why Direct-Vault is required, and the lease/renewal strategy.
|
|
|
|
---
|
|
|
|
## Forbidden Patterns (CI Lint Targets)
|
|
|
|
The following patterns are forbidden in all Mosaic projects. CI lint SHOULD catch these automatically (implementation tracked separately). Agents MUST NOT introduce these patterns.
|
|
|
|
### 1. Untagged fallback defaults for required values
|
|
|
|
```yaml
|
|
# FORBIDDEN — required secret with silent fallback
|
|
environment:
|
|
- DB_PASSWORD=${DB_PASSWORD:-changeme}
|
|
- API_KEY=${API_KEY:-}
|
|
|
|
# REQUIRED — fast-fail on missing required values
|
|
environment:
|
|
- DB_PASSWORD=${DB_PASSWORD:?DB_PASSWORD is required}
|
|
- API_KEY=${API_KEY:?API_KEY is required}
|
|
|
|
# ALLOWED — true convenience default, tagged
|
|
environment:
|
|
- PORT=${PORT:-3000} # safe-default: non-secret, app works at any port
|
|
```
|
|
|
|
This applies to: `docker-compose.yml`, k8s manifests, Helm `values.yaml`, any env file committed to git.
|
|
|
|
### 2. Vault KV calls in application source code (ESO-default projects)
|
|
|
|
```python
|
|
# FORBIDDEN in ESO-default apps — direct Vault client in app source
|
|
import hvac
|
|
client = hvac.Client(url=os.environ['VAULT_ADDR'])
|
|
secret = client.secrets.kv.v2.read_secret_version(path='myapp/db')
|
|
```
|
|
|
|
ESO-default apps read env vars only. Direct-Vault clients belong only in apps with a documented dynamic-secrets justification in README.
|
|
|
|
### 3. Hardcoded secrets or API keys in committed files
|
|
|
|
```python
|
|
# FORBIDDEN — hardcoded credential
|
|
DB_PASSWORD = "supersecret123"
|
|
API_KEY = "sk-live-abc123"
|
|
```
|
|
|
|
No exceptions. CI lint must flag any string matching common secret patterns (`password`, `secret`, `api_key`, `token` assigned a literal non-env-var value).
|
|
|
|
### 4. `.env` files in production deployment paths
|
|
|
|
```
|
|
# FORBIDDEN — .env file in a production deploy path
|
|
deploy/.env
|
|
k8s/.env
|
|
docker/.env
|
|
|
|
# ALLOWED — local dev only
|
|
.env.example # template only, no real values
|
|
.env # local dev, must be in .gitignore
|
|
```
|
|
|
|
`.env` files are acceptable in local-dev contexts only and MUST be in `.gitignore`. They are forbidden in any path that a CI pipeline or production deployment process reads directly.
|