centralize guides and rails under mosaic with runtime compatibility links
This commit is contained in:
165
guides/infrastructure.md
Normal file
165
guides/infrastructure.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Infrastructure & DevOps Guide
|
||||
|
||||
## Before Starting
|
||||
1. Check assigned issue: `~/.mosaic/rails/git/issue-list.sh -a @me`
|
||||
2. Create scratchpad: `docs/scratchpads/{issue-number}-{short-name}.md`
|
||||
3. Review existing infrastructure configuration
|
||||
|
||||
## Vault Secrets Management
|
||||
|
||||
**CRITICAL**: Follow canonical Vault structure for ALL secrets.
|
||||
|
||||
### Structure
|
||||
```
|
||||
{mount}/{service}/{component}/{secret-name}
|
||||
|
||||
Examples:
|
||||
- secret-prod/postgres/database/app
|
||||
- secret-prod/redis/auth/default
|
||||
- secret-prod/authentik/admin/token
|
||||
```
|
||||
|
||||
### Environment Mounts
|
||||
- `secret-dev/` - Development environment
|
||||
- `secret-staging/` - Staging environment
|
||||
- `secret-prod/` - Production environment
|
||||
|
||||
### Standard Field Names
|
||||
- Credentials: `username`, `password`
|
||||
- Tokens: `token`
|
||||
- OAuth: `client_id`, `client_secret`
|
||||
- Connection strings: `url`, `host`, `port`
|
||||
|
||||
See `docs/vault-secrets-structure.md` for complete reference.
|
||||
|
||||
## Container Standards
|
||||
|
||||
### Dockerfile Best Practices
|
||||
```dockerfile
|
||||
# Use specific version tags
|
||||
FROM node:20-alpine
|
||||
|
||||
# Create non-root user
|
||||
RUN addgroup -S app && adduser -S app -G app
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Copy dependency files first (layer caching)
|
||||
COPY package*.json ./
|
||||
RUN npm ci --only=production
|
||||
|
||||
# Copy application code
|
||||
COPY --chown=app:app . .
|
||||
|
||||
# Switch to non-root user
|
||||
USER app
|
||||
|
||||
# Use exec form for CMD
|
||||
CMD ["node", "server.js"]
|
||||
```
|
||||
|
||||
### Container Security
|
||||
- Use minimal base images (alpine, distroless)
|
||||
- Run as non-root user
|
||||
- Don't store secrets in images
|
||||
- Scan images for vulnerabilities
|
||||
- Pin dependency versions
|
||||
|
||||
## Kubernetes/Docker Compose
|
||||
|
||||
### Resource Limits
|
||||
Always set resource limits to prevent runaway containers:
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
memory: "128Mi"
|
||||
cpu: "100m"
|
||||
limits:
|
||||
memory: "256Mi"
|
||||
cpu: "500m"
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
```yaml
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ready
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 3
|
||||
```
|
||||
|
||||
## CI/CD Pipelines
|
||||
|
||||
### Pipeline Stages
|
||||
1. **Lint**: Code style and static analysis
|
||||
2. **Test**: Unit and integration tests
|
||||
3. **Build**: Compile and package
|
||||
4. **Scan**: Security and vulnerability scanning
|
||||
5. **Deploy**: Environment-specific deployment
|
||||
|
||||
### Pipeline Security
|
||||
- Use secrets management (not hardcoded)
|
||||
- Pin action/image versions
|
||||
- Implement approval gates for production
|
||||
- Audit pipeline access
|
||||
|
||||
## Monitoring & Logging
|
||||
|
||||
### Logging Standards
|
||||
- Use structured logging (JSON)
|
||||
- Include correlation IDs
|
||||
- Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
|
||||
- Never log sensitive data
|
||||
|
||||
### Metrics to Collect
|
||||
- Request latency (p50, p95, p99)
|
||||
- Error rates
|
||||
- Resource utilization (CPU, memory)
|
||||
- Business metrics
|
||||
|
||||
### Alerting
|
||||
- Define SLOs (Service Level Objectives)
|
||||
- Alert on symptoms, not causes
|
||||
- Include runbook links in alerts
|
||||
- Avoid alert fatigue
|
||||
|
||||
## Testing Infrastructure
|
||||
|
||||
### Test Categories
|
||||
1. **Unit tests**: Terraform/Ansible logic
|
||||
2. **Integration tests**: Deployed resources work together
|
||||
3. **Smoke tests**: Critical paths after deployment
|
||||
4. **Chaos tests**: Failure mode validation
|
||||
|
||||
### Infrastructure Testing Tools
|
||||
- Terraform: `terraform validate`, `terraform plan`
|
||||
- Ansible: `ansible-lint`, molecule
|
||||
- Kubernetes: `kubectl dry-run`, kubeval
|
||||
- General: Terratest, ServerSpec
|
||||
|
||||
## Commit Format
|
||||
```
|
||||
chore(#67): Configure Redis cluster
|
||||
|
||||
- Add Redis StatefulSet with 3 replicas
|
||||
- Configure persistence with PVC
|
||||
- Add Vault secret for auth password
|
||||
|
||||
Refs #67
|
||||
```
|
||||
|
||||
## Before Completing
|
||||
1. Validate configuration syntax
|
||||
2. Run infrastructure tests
|
||||
3. Test in dev/staging first
|
||||
4. Document any manual steps required
|
||||
5. Update scratchpad and close issue
|
||||
Reference in New Issue
Block a user