Files
bootstrap/guides/infrastructure.md

3.8 KiB

Infrastructure & DevOps Guide

Before Starting

  1. Check assigned issue: ~/.mosaic/rails/git/issue-list.sh -a @me
  2. Create scratchpad: docs/scratchpads/{issue-number}-{short-name}.md
  3. Review existing infrastructure configuration

Vault Secrets Management

CRITICAL: Follow canonical Vault structure for ALL secrets.

Structure

{mount}/{service}/{component}/{secret-name}

Examples:
- secret-prod/postgres/database/app
- secret-prod/redis/auth/default
- secret-prod/authentik/admin/token

Environment Mounts

  • secret-dev/ - Development environment
  • secret-staging/ - Staging environment
  • secret-prod/ - Production environment

Standard Field Names

  • Credentials: username, password
  • Tokens: token
  • OAuth: client_id, client_secret
  • Connection strings: url, host, port

See docs/vault-secrets-structure.md for complete reference.

Container Standards

Dockerfile Best Practices

# Use specific version tags
FROM node:20-alpine

# Create non-root user
RUN addgroup -S app && adduser -S app -G app

# Set working directory
WORKDIR /app

# Copy dependency files first (layer caching)
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY --chown=app:app . .

# Switch to non-root user
USER app

# Use exec form for CMD
CMD ["node", "server.js"]

Container Security

  • Use minimal base images (alpine, distroless)
  • Run as non-root user
  • Don't store secrets in images
  • Scan images for vulnerabilities
  • Pin dependency versions

Kubernetes/Docker Compose

Resource Limits

Always set resource limits to prevent runaway containers:

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "500m"

Health Checks

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 3

CI/CD Pipelines

Pipeline Stages

  1. Lint: Code style and static analysis
  2. Test: Unit and integration tests
  3. Build: Compile and package
  4. Scan: Security and vulnerability scanning
  5. Deploy: Environment-specific deployment

Pipeline Security

  • Use secrets management (not hardcoded)
  • Pin action/image versions
  • Implement approval gates for production
  • Audit pipeline access

Monitoring & Logging

Logging Standards

  • Use structured logging (JSON)
  • Include correlation IDs
  • Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
  • Never log sensitive data

Metrics to Collect

  • Request latency (p50, p95, p99)
  • Error rates
  • Resource utilization (CPU, memory)
  • Business metrics

Alerting

  • Define SLOs (Service Level Objectives)
  • Alert on symptoms, not causes
  • Include runbook links in alerts
  • Avoid alert fatigue

Testing Infrastructure

Test Categories

  1. Unit tests: Terraform/Ansible logic
  2. Integration tests: Deployed resources work together
  3. Smoke tests: Critical paths after deployment
  4. Chaos tests: Failure mode validation

Infrastructure Testing Tools

  • Terraform: terraform validate, terraform plan
  • Ansible: ansible-lint, molecule
  • Kubernetes: kubectl dry-run, kubeval
  • General: Terratest, ServerSpec

Commit Format

chore(#67): Configure Redis cluster

- Add Redis StatefulSet with 3 replicas
- Configure persistence with PVC
- Add Vault secret for auth password

Refs #67

Before Completing

  1. Validate configuration syntax
  2. Run infrastructure tests
  3. Test in dev/staging first
  4. Document any manual steps required
  5. Update scratchpad and close issue