feat: rename rails/ to tools/ and add service tool suites (#4)

Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
This commit was merged in pull request #4.
This commit is contained in:
2026-02-22 17:52:23 +00:00
committed by jason.woltje
parent 248db8935c
commit a8e580e1a3
158 changed files with 2481 additions and 213 deletions

View File

@@ -1,7 +1,7 @@
# Infrastructure & DevOps Guide
## Before Starting
1. Check assigned issue: `~/.config/mosaic/rails/git/issue-list.sh -a @me`
1. Check assigned issue: `~/.config/mosaic/tools/git/issue-list.sh -a @me`
2. Create scratchpad: `docs/scratchpads/{issue-number}-{short-name}.md`
3. Review existing infrastructure configuration
@@ -97,10 +97,10 @@ readinessProbe:
periodSeconds: 3
```
## CI/CD Pipelines
### Pipeline Stages
1. **Lint**: Code style and static analysis
## CI/CD Pipelines
### Pipeline Stages
1. **Lint**: Code style and static analysis
2. **Test**: Unit and integration tests
3. **Build**: Compile and package
4. **Scan**: Security and vulnerability scanning
@@ -109,65 +109,96 @@ readinessProbe:
### Pipeline Security
- Use secrets management (not hardcoded)
- Pin action/image versions
- Implement approval gates for production
- Audit pipeline access
## Steered-Autonomous Deployment (Hard Rule)
In lights-out mode, the agent owns deployment end-to-end when deployment is in scope.
The human is escalation-only for missing access, hard policy conflicts, or irreversible risk.
### Deployment Target Selection
1. Use explicit target from `docs/PRD.md` / `docs/PRD.json` or `docs/DEPLOYMENT.md`.
2. If unspecified, infer from existing project config/integration.
3. If multiple targets exist, choose the target already wired in CI/CD and document rationale.
### Supported Targets
- **Portainer**: Deploy via configured stack webhook/API, then verify service health and container status.
- **Coolify**: Trigger deployment via Coolify API/webhook, then verify deployment status and endpoint health.
- **Vercel**: Deploy via `vercel` CLI or connected Git integration, then verify preview/production URL health.
- **Other SaaS providers**: Use provider CLI/API/runbook with the same validation and rollback gates.
### Image Tagging and Promotion (Hard Rule)
For containerized deployments:
1. Build immutable image tags: `sha-<shortsha>` and `v{base-version}-rc.{build}`.
2. Use mutable environment tags only as pointers: `testing`, optional `staging`, and `prod`.
3. Deploy by immutable digest, not by mutable tag alone.
4. Promote the exact tested digest between environments (no rebuild between testing and prod).
5. Do not use `latest` or `dev` as deployment references.
Blue-green is the default strategy for production promotion.
Canary is allowed only when automated SLO/error-rate gates and auto-rollback triggers are implemented.
### Post-Deploy Validation (REQUIRED)
1. Health endpoints return expected status.
2. Critical smoke tests pass in target environment.
3. Running version and digest match the promoted release candidate.
4. Observability signals (errors/latency) are within expected thresholds.
### Rollback Rule
If post-deploy validation fails:
1. Execute rollback/redeploy-safe path immediately.
2. Mark deployment as blocked in `docs/TASKS.md`.
3. Record failure evidence and next remediation step in scratchpad and release notes.
### Registry Retention and Cleanup
Cleanup MUST be automated.
- Keep all final release tags (`vX.Y.Z`) indefinitely.
- Keep active environment digests (`prod`, `testing`, and active blue/green slots).
- Keep recent RC tags (`vX.Y.Z-rc.N`) based on retention window.
- Remove stale `sha-*` and RC tags outside retention window if they are not actively deployed.
## Monitoring & Logging
- Implement approval gates for production
- Audit pipeline access
## Steered-Autonomous Deployment (Hard Rule)
In lights-out mode, the agent owns deployment end-to-end when deployment is in scope.
The human is escalation-only for missing access, hard policy conflicts, or irreversible risk.
### Deployment Target Selection
1. Use explicit target from `docs/PRD.md` / `docs/PRD.json` or `docs/DEPLOYMENT.md`.
2. If unspecified, infer from existing project config/integration.
3. If multiple targets exist, choose the target already wired in CI/CD and document rationale.
### Supported Targets
- **Portainer**: Deploy via `~/.config/mosaic/tools/portainer/stack-redeploy.sh`, then verify with `stack-status.sh`.
- **Coolify**: Deploy via `~/.config/mosaic/tools/coolify/deploy.sh -u <uuid>`, then verify with `service-status.sh`.
- **Vercel**: Deploy via `vercel` CLI or connected Git integration, then verify preview/production URL health.
- **Other SaaS providers**: Use provider CLI/API/runbook with the same validation and rollback gates.
### Coolify API Operations
```bash
# List projects and services
~/.config/mosaic/tools/coolify/project-list.sh
~/.config/mosaic/tools/coolify/service-list.sh
# Check service status
~/.config/mosaic/tools/coolify/service-status.sh -u <uuid>
# Set env vars (takes effect on next deploy)
~/.config/mosaic/tools/coolify/env-set.sh -u <uuid> -k KEY -v VALUE
# Deploy
~/.config/mosaic/tools/coolify/deploy.sh -u <uuid>
```
**Known Coolify Limitations:**
- FQDN updates on compose sub-apps not supported via API (DB workaround required)
- Compose files must be base64-encoded in `docker_compose_raw` field
- Magic variables (`SERVICE_FQDN_*`) require list-style env syntax, not dict-style
- Rate limit: 200 requests per interval
### Stack Health Check
Verify all infrastructure services are reachable:
```bash
~/.config/mosaic/tools/health/stack-health.sh
```
### Image Tagging and Promotion (Hard Rule)
For containerized deployments:
1. Build immutable image tags: `sha-<shortsha>` and `v{base-version}-rc.{build}`.
2. Use mutable environment tags only as pointers: `testing`, optional `staging`, and `prod`.
3. Deploy by immutable digest, not by mutable tag alone.
4. Promote the exact tested digest between environments (no rebuild between testing and prod).
5. Do not use `latest` or `dev` as deployment references.
Blue-green is the default strategy for production promotion.
Canary is allowed only when automated SLO/error-rate gates and auto-rollback triggers are implemented.
### Post-Deploy Validation (REQUIRED)
1. Health endpoints return expected status.
2. Critical smoke tests pass in target environment.
3. Running version and digest match the promoted release candidate.
4. Observability signals (errors/latency) are within expected thresholds.
### Rollback Rule
If post-deploy validation fails:
1. Execute rollback/redeploy-safe path immediately.
2. Mark deployment as blocked in `docs/TASKS.md`.
3. Record failure evidence and next remediation step in scratchpad and release notes.
### Registry Retention and Cleanup
Cleanup MUST be automated.
- Keep all final release tags (`vX.Y.Z`) indefinitely.
- Keep active environment digests (`prod`, `testing`, and active blue/green slots).
- Keep recent RC tags (`vX.Y.Z-rc.N`) based on retention window.
- Remove stale `sha-*` and RC tags outside retention window if they are not actively deployed.
## Monitoring & Logging
### Logging Standards
- Use structured logging (JSON)