feat: rename rails/ to tools/ and add service tool suites (#4)
Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
This commit was merged in pull request #4.
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# Infrastructure & DevOps Guide
|
||||
|
||||
## Before Starting
|
||||
1. Check assigned issue: `~/.config/mosaic/rails/git/issue-list.sh -a @me`
|
||||
1. Check assigned issue: `~/.config/mosaic/tools/git/issue-list.sh -a @me`
|
||||
2. Create scratchpad: `docs/scratchpads/{issue-number}-{short-name}.md`
|
||||
3. Review existing infrastructure configuration
|
||||
|
||||
@@ -97,10 +97,10 @@ readinessProbe:
|
||||
periodSeconds: 3
|
||||
```
|
||||
|
||||
## CI/CD Pipelines
|
||||
|
||||
### Pipeline Stages
|
||||
1. **Lint**: Code style and static analysis
|
||||
## CI/CD Pipelines
|
||||
|
||||
### Pipeline Stages
|
||||
1. **Lint**: Code style and static analysis
|
||||
2. **Test**: Unit and integration tests
|
||||
3. **Build**: Compile and package
|
||||
4. **Scan**: Security and vulnerability scanning
|
||||
@@ -109,65 +109,96 @@ readinessProbe:
|
||||
### Pipeline Security
|
||||
- Use secrets management (not hardcoded)
|
||||
- Pin action/image versions
|
||||
- Implement approval gates for production
|
||||
- Audit pipeline access
|
||||
|
||||
## Steered-Autonomous Deployment (Hard Rule)
|
||||
|
||||
In lights-out mode, the agent owns deployment end-to-end when deployment is in scope.
|
||||
The human is escalation-only for missing access, hard policy conflicts, or irreversible risk.
|
||||
|
||||
### Deployment Target Selection
|
||||
|
||||
1. Use explicit target from `docs/PRD.md` / `docs/PRD.json` or `docs/DEPLOYMENT.md`.
|
||||
2. If unspecified, infer from existing project config/integration.
|
||||
3. If multiple targets exist, choose the target already wired in CI/CD and document rationale.
|
||||
|
||||
### Supported Targets
|
||||
|
||||
- **Portainer**: Deploy via configured stack webhook/API, then verify service health and container status.
|
||||
- **Coolify**: Trigger deployment via Coolify API/webhook, then verify deployment status and endpoint health.
|
||||
- **Vercel**: Deploy via `vercel` CLI or connected Git integration, then verify preview/production URL health.
|
||||
- **Other SaaS providers**: Use provider CLI/API/runbook with the same validation and rollback gates.
|
||||
|
||||
### Image Tagging and Promotion (Hard Rule)
|
||||
|
||||
For containerized deployments:
|
||||
|
||||
1. Build immutable image tags: `sha-<shortsha>` and `v{base-version}-rc.{build}`.
|
||||
2. Use mutable environment tags only as pointers: `testing`, optional `staging`, and `prod`.
|
||||
3. Deploy by immutable digest, not by mutable tag alone.
|
||||
4. Promote the exact tested digest between environments (no rebuild between testing and prod).
|
||||
5. Do not use `latest` or `dev` as deployment references.
|
||||
|
||||
Blue-green is the default strategy for production promotion.
|
||||
Canary is allowed only when automated SLO/error-rate gates and auto-rollback triggers are implemented.
|
||||
|
||||
### Post-Deploy Validation (REQUIRED)
|
||||
|
||||
1. Health endpoints return expected status.
|
||||
2. Critical smoke tests pass in target environment.
|
||||
3. Running version and digest match the promoted release candidate.
|
||||
4. Observability signals (errors/latency) are within expected thresholds.
|
||||
|
||||
### Rollback Rule
|
||||
|
||||
If post-deploy validation fails:
|
||||
|
||||
1. Execute rollback/redeploy-safe path immediately.
|
||||
2. Mark deployment as blocked in `docs/TASKS.md`.
|
||||
3. Record failure evidence and next remediation step in scratchpad and release notes.
|
||||
|
||||
### Registry Retention and Cleanup
|
||||
|
||||
Cleanup MUST be automated.
|
||||
|
||||
- Keep all final release tags (`vX.Y.Z`) indefinitely.
|
||||
- Keep active environment digests (`prod`, `testing`, and active blue/green slots).
|
||||
- Keep recent RC tags (`vX.Y.Z-rc.N`) based on retention window.
|
||||
- Remove stale `sha-*` and RC tags outside retention window if they are not actively deployed.
|
||||
|
||||
## Monitoring & Logging
|
||||
- Implement approval gates for production
|
||||
- Audit pipeline access
|
||||
|
||||
## Steered-Autonomous Deployment (Hard Rule)
|
||||
|
||||
In lights-out mode, the agent owns deployment end-to-end when deployment is in scope.
|
||||
The human is escalation-only for missing access, hard policy conflicts, or irreversible risk.
|
||||
|
||||
### Deployment Target Selection
|
||||
|
||||
1. Use explicit target from `docs/PRD.md` / `docs/PRD.json` or `docs/DEPLOYMENT.md`.
|
||||
2. If unspecified, infer from existing project config/integration.
|
||||
3. If multiple targets exist, choose the target already wired in CI/CD and document rationale.
|
||||
|
||||
### Supported Targets
|
||||
|
||||
- **Portainer**: Deploy via `~/.config/mosaic/tools/portainer/stack-redeploy.sh`, then verify with `stack-status.sh`.
|
||||
- **Coolify**: Deploy via `~/.config/mosaic/tools/coolify/deploy.sh -u <uuid>`, then verify with `service-status.sh`.
|
||||
- **Vercel**: Deploy via `vercel` CLI or connected Git integration, then verify preview/production URL health.
|
||||
- **Other SaaS providers**: Use provider CLI/API/runbook with the same validation and rollback gates.
|
||||
|
||||
### Coolify API Operations
|
||||
|
||||
```bash
|
||||
# List projects and services
|
||||
~/.config/mosaic/tools/coolify/project-list.sh
|
||||
~/.config/mosaic/tools/coolify/service-list.sh
|
||||
|
||||
# Check service status
|
||||
~/.config/mosaic/tools/coolify/service-status.sh -u <uuid>
|
||||
|
||||
# Set env vars (takes effect on next deploy)
|
||||
~/.config/mosaic/tools/coolify/env-set.sh -u <uuid> -k KEY -v VALUE
|
||||
|
||||
# Deploy
|
||||
~/.config/mosaic/tools/coolify/deploy.sh -u <uuid>
|
||||
```
|
||||
|
||||
**Known Coolify Limitations:**
|
||||
- FQDN updates on compose sub-apps not supported via API (DB workaround required)
|
||||
- Compose files must be base64-encoded in `docker_compose_raw` field
|
||||
- Magic variables (`SERVICE_FQDN_*`) require list-style env syntax, not dict-style
|
||||
- Rate limit: 200 requests per interval
|
||||
|
||||
### Stack Health Check
|
||||
|
||||
Verify all infrastructure services are reachable:
|
||||
|
||||
```bash
|
||||
~/.config/mosaic/tools/health/stack-health.sh
|
||||
```
|
||||
|
||||
### Image Tagging and Promotion (Hard Rule)
|
||||
|
||||
For containerized deployments:
|
||||
|
||||
1. Build immutable image tags: `sha-<shortsha>` and `v{base-version}-rc.{build}`.
|
||||
2. Use mutable environment tags only as pointers: `testing`, optional `staging`, and `prod`.
|
||||
3. Deploy by immutable digest, not by mutable tag alone.
|
||||
4. Promote the exact tested digest between environments (no rebuild between testing and prod).
|
||||
5. Do not use `latest` or `dev` as deployment references.
|
||||
|
||||
Blue-green is the default strategy for production promotion.
|
||||
Canary is allowed only when automated SLO/error-rate gates and auto-rollback triggers are implemented.
|
||||
|
||||
### Post-Deploy Validation (REQUIRED)
|
||||
|
||||
1. Health endpoints return expected status.
|
||||
2. Critical smoke tests pass in target environment.
|
||||
3. Running version and digest match the promoted release candidate.
|
||||
4. Observability signals (errors/latency) are within expected thresholds.
|
||||
|
||||
### Rollback Rule
|
||||
|
||||
If post-deploy validation fails:
|
||||
|
||||
1. Execute rollback/redeploy-safe path immediately.
|
||||
2. Mark deployment as blocked in `docs/TASKS.md`.
|
||||
3. Record failure evidence and next remediation step in scratchpad and release notes.
|
||||
|
||||
### Registry Retention and Cleanup
|
||||
|
||||
Cleanup MUST be automated.
|
||||
|
||||
- Keep all final release tags (`vX.Y.Z`) indefinitely.
|
||||
- Keep active environment digests (`prod`, `testing`, and active blue/green slots).
|
||||
- Keep recent RC tags (`vX.Y.Z-rc.N`) based on retention window.
|
||||
- Remove stale `sha-*` and RC tags outside retention window if they are not actively deployed.
|
||||
|
||||
## Monitoring & Logging
|
||||
|
||||
### Logging Standards
|
||||
- Use structured logging (JSON)
|
||||
|
||||
Reference in New Issue
Block a user