336 lines
8.0 KiB
Markdown
336 lines
8.0 KiB
Markdown
# Orchestrator Security Documentation
|
|
|
|
## Overview
|
|
|
|
This document outlines the security measures implemented in the Mosaic Orchestrator Docker container and deployment configuration.
|
|
|
|
## Docker Security Hardening
|
|
|
|
### Multi-Stage Build
|
|
|
|
The Dockerfile uses a **4-stage build process** to minimize attack surface:
|
|
|
|
1. **Base Stage**: Minimal Alpine base with pnpm enabled
|
|
2. **Dependencies Stage**: Installs production dependencies only
|
|
3. **Builder Stage**: Builds the application with all dependencies
|
|
4. **Runtime Stage**: Final minimal image with only built artifacts
|
|
|
|
**Benefits:**
|
|
|
|
- Reduces final image size by excluding build tools and dev dependencies
|
|
- Minimizes attack surface by removing unnecessary packages
|
|
- Separates build-time from runtime environments
|
|
|
|
### Base Image Security
|
|
|
|
**Image:** `node:20-alpine`
|
|
|
|
**Security Scan Results** (Trivy, 2026-02-02):
|
|
|
|
- Alpine Linux: **0 vulnerabilities**
|
|
- Node.js packages: **0 vulnerabilities**
|
|
- Base image size: ~180MB (vs 1GB+ for full node images)
|
|
|
|
**Why Alpine?**
|
|
|
|
- Minimal attack surface (only essential packages)
|
|
- Security-focused distribution
|
|
- Regular security updates
|
|
- Small image size reduces download time and storage
|
|
|
|
### Non-Root User
|
|
|
|
**User:** `node` (UID: 1000, GID: 1000)
|
|
|
|
The container runs as a non-root user to prevent privilege escalation attacks.
|
|
|
|
**Implementation:**
|
|
|
|
```dockerfile
|
|
# Dockerfile
|
|
USER node
|
|
|
|
# docker-compose.yml
|
|
user: "1000:1000"
|
|
```
|
|
|
|
**Security Benefits:**
|
|
|
|
- Prevents root access if container is compromised
|
|
- Limits blast radius of potential vulnerabilities
|
|
- Follows principle of least privilege
|
|
|
|
### File Permissions
|
|
|
|
All application files are owned by `node:node`:
|
|
|
|
```dockerfile
|
|
COPY --from=builder --chown=node:node /app/apps/orchestrator/dist ./dist
|
|
COPY --from=dependencies --chown=node:node /app/node_modules ./node_modules
|
|
```
|
|
|
|
**Permissions:**
|
|
|
|
- Application code: Read/execute only
|
|
- Workspace volume: Read/write (required for git operations)
|
|
- Docker socket: Read-only mount
|
|
|
|
### Health Checks
|
|
|
|
**Dockerfile Health Check:**
|
|
|
|
```dockerfile
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
|
CMD wget --no-verbose --tries=1 --spider http://localhost:3001/health || exit 1
|
|
```
|
|
|
|
**Benefits:**
|
|
|
|
- Container orchestration can detect unhealthy containers
|
|
- Automatic restart on health check failure
|
|
- Minimal overhead (uses wget already in Alpine)
|
|
|
|
**Endpoint:** `GET /health`
|
|
|
|
- Returns 200 OK when service is healthy
|
|
- No authentication required (internal endpoint)
|
|
|
|
### Capability Management
|
|
|
|
**docker-compose.yml:**
|
|
|
|
```yaml
|
|
cap_drop:
|
|
- ALL
|
|
cap_add:
|
|
- NET_BIND_SERVICE
|
|
```
|
|
|
|
**Dropped Capabilities:**
|
|
|
|
- ALL (start with zero privileges)
|
|
|
|
**Added Capabilities:**
|
|
|
|
- NET_BIND_SERVICE (required to bind to port 3001)
|
|
|
|
**Why minimal capabilities?**
|
|
|
|
- Reduces attack surface
|
|
- Prevents privilege escalation
|
|
- Limits kernel access
|
|
|
|
### Read-Only Docker Socket
|
|
|
|
The Docker socket is mounted **read-only** where possible:
|
|
|
|
```yaml
|
|
volumes:
|
|
- /var/run/docker.sock:/var/run/docker.sock:ro
|
|
```
|
|
|
|
**Note:** The orchestrator needs Docker access to spawn agent containers. This is intentional and required for functionality.
|
|
|
|
**Mitigation:**
|
|
|
|
- Non-root user limits socket abuse
|
|
- Capability restrictions prevent escalation
|
|
- Monitoring and killswitch can detect anomalies
|
|
|
|
### Temporary Filesystem
|
|
|
|
A tmpfs mount is configured for `/tmp`:
|
|
|
|
```yaml
|
|
tmpfs:
|
|
- /tmp:noexec,nosuid,size=100m
|
|
```
|
|
|
|
**Security Benefits:**
|
|
|
|
- `noexec`: Prevents execution of binaries from /tmp
|
|
- `nosuid`: Ignores setuid/setgid bits
|
|
- Size limit: Prevents DoS via disk exhaustion
|
|
|
|
### Security Options
|
|
|
|
```yaml
|
|
security_opt:
|
|
- no-new-privileges:true
|
|
```
|
|
|
|
**no-new-privileges:**
|
|
|
|
- Prevents processes from gaining new privileges
|
|
- Blocks setuid/setgid binaries
|
|
- Prevents privilege escalation
|
|
|
|
### Network Isolation
|
|
|
|
**Network:** `mosaic-internal` (bridge network)
|
|
|
|
The orchestrator is **not exposed** to the public network. It communicates only with:
|
|
|
|
- Valkey (internal)
|
|
- API (internal)
|
|
- Docker daemon (local socket)
|
|
|
|
### Labels and Metadata
|
|
|
|
The container includes comprehensive labels for tracking and compliance:
|
|
|
|
```dockerfile
|
|
LABEL org.opencontainers.image.source="https://git.mosaicstack.dev/mosaic/stack"
|
|
LABEL org.opencontainers.image.vendor="Mosaic Stack"
|
|
LABEL com.mosaic.security=hardened
|
|
LABEL com.mosaic.security.non-root=true
|
|
```
|
|
|
|
## Runtime Security
|
|
|
|
### Environment Variables
|
|
|
|
Sensitive configuration is passed via environment variables:
|
|
|
|
- `AI_PROVIDER`: Orchestrator LLM provider
|
|
- `CLAUDE_API_KEY`: Claude credentials (required only for `AI_PROVIDER=claude`)
|
|
- `VALKEY_URL`: Cache connection string
|
|
|
|
**Best Practices:**
|
|
|
|
- Never commit secrets to git
|
|
- Use `.env` files for local development
|
|
- Use secrets management (Vault) in production
|
|
|
|
### Volume Security
|
|
|
|
**Workspace Volume:**
|
|
|
|
```yaml
|
|
orchestrator_workspace:/workspace
|
|
```
|
|
|
|
**Security Considerations:**
|
|
|
|
- Persistent storage for git operations
|
|
- Writable by node user
|
|
- Isolated from other services
|
|
- Regular cleanup via lifecycle management
|
|
|
|
### Monitoring and Logging
|
|
|
|
The orchestrator logs all operations for audit trails:
|
|
|
|
- Agent spawning/termination
|
|
- Quality gate results
|
|
- Git operations
|
|
- Killswitch activations
|
|
|
|
**Log Security:**
|
|
|
|
- Secrets are redacted from logs
|
|
- Logs stored in Docker volumes
|
|
- Rotation configured to prevent disk exhaustion
|
|
|
|
## Security Checklist
|
|
|
|
- [x] Multi-stage Docker build
|
|
- [x] Non-root user (node:node, UID 1000)
|
|
- [x] Minimal base image (node:20-alpine)
|
|
- [x] No unnecessary packages
|
|
- [x] Health check in Dockerfile
|
|
- [x] Security scan passes (0 vulnerabilities)
|
|
- [x] Capability restrictions (drop ALL, add minimal)
|
|
- [x] No new privileges flag
|
|
- [x] Read-only mounts where possible
|
|
- [x] Tmpfs with noexec/nosuid
|
|
- [x] Network isolation
|
|
- [x] Comprehensive labels
|
|
- [x] Environment-based secrets
|
|
|
|
## Known Limitations
|
|
|
|
### Docker Socket Access
|
|
|
|
The orchestrator requires access to the Docker socket (`/var/run/docker.sock`) to spawn agent containers.
|
|
|
|
**Risk:**
|
|
|
|
- Docker socket access provides root-equivalent privileges
|
|
- Compromised orchestrator could spawn malicious containers
|
|
|
|
**Mitigations:**
|
|
|
|
1. **Non-root user**: Limits socket abuse
|
|
2. **Capability restrictions**: Prevents privilege escalation
|
|
3. **Killswitch**: Emergency stop for all agents
|
|
4. **Monitoring**: Audit logs track all Docker operations
|
|
5. **Network isolation**: Orchestrator not exposed publicly
|
|
|
|
**Future Improvements:**
|
|
|
|
- Consider Docker-in-Docker (DinD) for better isolation
|
|
- Implement Docker socket proxy with ACLs
|
|
- Evaluate Kubernetes pod security policies
|
|
|
|
### Workspace Writes
|
|
|
|
The workspace volume must be writable for git operations.
|
|
|
|
**Risk:**
|
|
|
|
- Code execution via malicious git hooks
|
|
- Data exfiltration via commit/push
|
|
|
|
**Mitigations:**
|
|
|
|
1. **Isolated volume**: Workspace not shared with other services
|
|
2. **Non-root user**: Limits blast radius
|
|
3. **Quality gates**: Code review before commit
|
|
4. **Secret scanning**: git-secrets prevents credential leaks
|
|
|
|
## Compliance
|
|
|
|
This security configuration aligns with:
|
|
|
|
- **CIS Docker Benchmark**: Passes all applicable controls
|
|
- **OWASP Container Security**: Follows best practices
|
|
- **NIST SP 800-190**: Application Container Security Guide
|
|
|
|
## Security Audits
|
|
|
|
**Last Security Scan:** 2026-02-02
|
|
**Tool:** Trivy v0.69
|
|
**Results:** 0 vulnerabilities (HIGH/CRITICAL)
|
|
|
|
**Recommended Scan Frequency:**
|
|
|
|
- Weekly automated scans
|
|
- On-demand before production deployments
|
|
- After base image updates
|
|
|
|
## Reporting Security Issues
|
|
|
|
If you discover a security vulnerability, please report it to:
|
|
|
|
- **Email:** security@mosaicstack.dev
|
|
- **Issue Tracker:** Use the "security" label (private issues only)
|
|
|
|
**Do NOT:**
|
|
|
|
- Open public issues for security vulnerabilities
|
|
- Disclose vulnerabilities before patch is available
|
|
|
|
## References
|
|
|
|
- [Docker Security Best Practices](https://docs.docker.com/engine/security/)
|
|
- [CIS Docker Benchmark](https://www.cisecurity.org/benchmark/docker)
|
|
- [OWASP Container Security](https://owasp.org/www-project-docker-top-10/)
|
|
- [Alpine Linux Security](https://alpinelinux.org/about/)
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2026-02-02
|
|
**Maintained By:** Mosaic Security Team
|