3.8 KiB
3.8 KiB
Infrastructure & DevOps Guide
Before Starting
- Check assigned issue:
~/.mosaic/rails/git/issue-list.sh -a @me - Create scratchpad:
docs/scratchpads/{issue-number}-{short-name}.md - Review existing infrastructure configuration
Vault Secrets Management
CRITICAL: Follow canonical Vault structure for ALL secrets.
Structure
{mount}/{service}/{component}/{secret-name}
Examples:
- secret-prod/postgres/database/app
- secret-prod/redis/auth/default
- secret-prod/authentik/admin/token
Environment Mounts
secret-dev/- Development environmentsecret-staging/- Staging environmentsecret-prod/- Production environment
Standard Field Names
- Credentials:
username,password - Tokens:
token - OAuth:
client_id,client_secret - Connection strings:
url,host,port
See docs/vault-secrets-structure.md for complete reference.
Container Standards
Dockerfile Best Practices
# Use specific version tags
FROM node:20-alpine
# Create non-root user
RUN addgroup -S app && adduser -S app -G app
# Set working directory
WORKDIR /app
# Copy dependency files first (layer caching)
COPY package*.json ./
RUN npm ci --only=production
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Use exec form for CMD
CMD ["node", "server.js"]
Container Security
- Use minimal base images (alpine, distroless)
- Run as non-root user
- Don't store secrets in images
- Scan images for vulnerabilities
- Pin dependency versions
Kubernetes/Docker Compose
Resource Limits
Always set resource limits to prevent runaway containers:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Health Checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
CI/CD Pipelines
Pipeline Stages
- Lint: Code style and static analysis
- Test: Unit and integration tests
- Build: Compile and package
- Scan: Security and vulnerability scanning
- Deploy: Environment-specific deployment
Pipeline Security
- Use secrets management (not hardcoded)
- Pin action/image versions
- Implement approval gates for production
- Audit pipeline access
Monitoring & Logging
Logging Standards
- Use structured logging (JSON)
- Include correlation IDs
- Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
- Never log sensitive data
Metrics to Collect
- Request latency (p50, p95, p99)
- Error rates
- Resource utilization (CPU, memory)
- Business metrics
Alerting
- Define SLOs (Service Level Objectives)
- Alert on symptoms, not causes
- Include runbook links in alerts
- Avoid alert fatigue
Testing Infrastructure
Test Categories
- Unit tests: Terraform/Ansible logic
- Integration tests: Deployed resources work together
- Smoke tests: Critical paths after deployment
- Chaos tests: Failure mode validation
Infrastructure Testing Tools
- Terraform:
terraform validate,terraform plan - Ansible:
ansible-lint, molecule - Kubernetes:
kubectl dry-run, kubeval - General: Terratest, ServerSpec
Commit Format
chore(#67): Configure Redis cluster
- Add Redis StatefulSet with 3 replicas
- Configure persistence with PVC
- Add Vault secret for auth password
Refs #67
Before Completing
- Validate configuration syntax
- Run infrastructure tests
- Test in dev/staging first
- Document any manual steps required
- Update scratchpad and close issue