From 8f63b3e1dc10157ee8e6df3921290ea39bc54e52 Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Sun, 1 Feb 2026 01:26:01 -0600 Subject: [PATCH] docs: Add Mosaic Component Architecture and Guard Rails design docs - mosaic-component-architecture.md: OpenClaw wrapper pattern, component naming, job tracking, chat integration, database schema - guard-rails-capability-permissions.md: Capability-based permission model Related: #162 (M4.2 Infrastructure Epic) Co-Authored-By: Claude Opus 4.5 --- .../mosaic-component-architecture.md | 314 ++++++++++++++++++ 1 file changed, 314 insertions(+) create mode 100644 docs/3-architecture/mosaic-component-architecture.md diff --git a/docs/3-architecture/mosaic-component-architecture.md b/docs/3-architecture/mosaic-component-architecture.md new file mode 100644 index 0000000..a45323c --- /dev/null +++ b/docs/3-architecture/mosaic-component-architecture.md @@ -0,0 +1,314 @@ +# Mosaic Component Architecture Design + +## Strategic Decision + +**OpenClaw as execution engine, Mosaic as control layer.** + +- **Now (M1-M2):** Wrapper approach - use OpenClaw, add Mosaic controls +- **After M2:** Evaluate - is OpenClaw working for us? +- **If needed:** Fork or rebuild with lessons learned + +**Why:** 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed. + +## Philosophy + +**Mosaic** = pieces combining to create a beautiful, larger picture. + +Each component has a **dedicated function** (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track. + +## Overview + +Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging. + +## Component Naming + +| Component | Dedicated Function | Rails | +| -------------------- | ------------------------------------------------------------------------ | ------------------------------------ | +| **@mosaic** | Gitea bot user - triggers workflow on issue assignment/mention | Webhook receiver only | +| **mosaic-stitcher** | Orchestrates workflow, sequences jobs, manages priorities | Control plane only, no execution | +| **mosaic-bridge** | Chat integrations (Discord, Mattermost, Slack) - commands in, status out | I/O only, no execution | +| **mosaic-runner** | Fetches information, gathers context, reads repos | Read-only operations | +| **mosaic-weaver** | Implements code changes, writes files | Write operations, scoped to worktree | +| **mosaic-inspector** | Runs quality gates (build, lint, test) | Validation only, no modifications | +| **mosaic-herald** | Reports status, creates PR comments, notifications | Output/reporting only | + +**Why this works:** Each component has exactly ONE job. Can't go off rails if there's only one rail. + +**Note:** Names are placeholders. Components are modular plugins—names can change later. + +## Architecture + +``` +┌─────────────────┐ ┌─────────────────┐ +│ @mosaic │ │ mosaic-bridge │ +│ (Gitea Bot) │ │ (Chat I/O) │ +│ Webhook Trigger│ │ Discord/MM/etc │ +└────────┬────────┘ └────────┬────────┘ + │ Issue assigned │ Commands + └───────────────┬────────────────┘ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ MOSAIC STACK (Control Layer) │ +│ │ +│ ┌──────────────────────────────────────────────────────┐ │ +│ │ MOSAIC-STITCHER (Wrapper) │ │ +│ │ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │ │ +│ │ │ Guard │ │ Quality │ │ Job Tracking │ │ │ +│ │ │ Rails │ │ Rails │ │ (Events/Steps) │ │ │ +│ │ │ (perms) │ │ (gates) │ │ │ │ │ +│ │ └───────────┘ └───────────┘ └───────────────────┘ │ │ +│ └──────────────────────────┬───────────────────────────┘ │ +│ │ │ +└─────────────────────────────┼────────────────────────────────┘ + │ Dispatch with constraints + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ OPENCLAW (Execution Layer) │ +│ 355+ contributors maintain │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ +│ │ Agent │ │ Session │ │ Multi-LLM │ │ Discord │ │ +│ │ Spawning │ │ Manager │ │ Support │ │ Integr. │ │ +│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ +│ │ +│ Agent Profiles (Mosaic-defined constraints): │ +│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ │ +│ │ RUNNER │ │ WEAVER │ │ INSPECTOR │ │ HERALD │ │ +│ │ (read) │ │ (write) │ │ (validate)│ │ (report)│ │ +│ └─────────┘ └─────────┘ └───────────┘ └─────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Key insight:** Agent profiles (runner, weaver, etc.) are **constraints passed to OpenClaw**, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do. + +## Relationship to Non-AI Coordinator (M4.1) + +This architecture **complements** the Non-AI Coordinator Pattern: + +| Layer | Responsibility | Milestone | +| --------------------------------- | ----------------------------------------------------------------------------------- | --------- | +| **Non-AI Coordinator** | Orchestration logic (when to assign, context monitoring, quality gates enforcement) | M4.1 | +| **Mosaic Component Architecture** | Execution infrastructure (job tracking, OpenClaw integration, chat commands) | M4.2 | + +The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs. + +## Chat Integration (mosaic-bridge) + +**Control Mosaic Stack via Discord, Mattermost, Slack, etc.** + +``` +#mosaic-control +├── User: "@mosaic fix issue #42" +├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread] +│ +└── Thread: "Job #123: Fix issue #42" + ├── 📖 Runner: Gathering context... ✓ + ├── 🧵 Weaver: Implementing... ✓ + ├── 🔍 Inspector: Running tests... ✓ + ├── 📢 Herald: PR created → #456 + └── [Full event log: /api/jobs/123/events] +``` + +### Noise Management Strategy + +| Channel | Purpose | Verbosity | +| ----------------------- | ---------------------------------- | ------------------------- | +| `#mosaic-control` | Commands + summaries | Low (milestones only) | +| Job threads | Per-job activity | Medium (step completions) | +| `/api/jobs/{id}/events` | Full audit log | High (everything) | +| DMs (optional) | Private updates to triggering user | Configurable | + +### Commands (via chat) + +``` +@mosaic fix # Start job for issue +@mosaic status # Get job status +@mosaic cancel # Cancel running job +@mosaic verbose # Stream full logs to thread +@mosaic quiet # Reduce notifications +@mosaic help # Show commands +``` + +### Integration lives at Mosaic layer, not OpenClaw + +- **mosaic-bridge** handles Discord/Mattermost/Slack APIs +- **mosaic-stitcher** receives commands, dispatches jobs +- **mosaic-herald** sends status updates back through bridge +- OpenClaw has NO direct chat access (stays focused on execution) + +## Key Components + +### 1. Mosaic-Stitcher (The Wrapper) + +The control layer that wraps OpenClaw: + +- Receives webhooks from @mosaic bot +- Applies Guard Rails (capability permissions) +- Applies Quality Rails (mandatory gates) +- Tracks all job steps and events +- Dispatches work to OpenClaw with constraints + +### 2. OpenClaw (Execution Engine) + +Community-maintained agent swarm (355+ contributors): + +- Spawns and manages AI agent sessions +- Multi-LLM support (Claude, GPT, Ollama, etc.) +- Session management and recovery +- We use as-is, wrapped by Mosaic-Stitcher + +### 3. Agent Profiles (Constraints for OpenClaw) + +Mosaic-defined capability constraints passed to OpenClaw agents: + +- **runner** - read-only: fetch context, read files, query APIs +- **weaver** - write: implement code, scoped to git worktree +- **inspector** - validate: run gates, no modifications +- **herald** - report: PR comments, notifications, status updates + +### 4. Job Structure + +Every job contains granular steps: + +| Phase | Steps | +| ---------- | ------------------------------------------------------- | +| SETUP | Clone repo, create worktree, install deps | +| EXECUTION | Read requirements, analyze code, implement, write tests | +| VALIDATION | Lint gate, typecheck gate, test gate, coverage gate | +| CLEANUP | Stage, commit, push, create PR | + +### 5. Event Logging (Event Sourcing) + +Every action emits an event: + +- `job.created`, `job.queued`, `job.started`, `job.completed`, `job.failed` +- `step.started`, `step.progress`, `step.output`, `step.completed` +- `ai.tool_called`, `ai.tokens_used`, `ai.artifact_created` +- `gate.started`, `gate.passed`, `gate.failed` + +Storage: + +- PostgreSQL: Immutable audit log (permanent) +- Valkey Streams: Recent events (last 1000 per job) +- Valkey Pub/Sub: Real-time streaming + +### 6. Queue Architecture + +**BullMQ** over plain ValkeyService because: + +- Job progress tracking (0-100%) +- Automatic retry with exponential backoff +- Rate limiting +- Job dependencies +- Rich lifecycle events + +Uses same Valkey instance already configured. + +## Database Schema + +```sql +-- Runner jobs (links to existing agent_tasks) +CREATE TABLE runner_jobs ( + id UUID PRIMARY KEY, + workspace_id UUID NOT NULL, + agent_task_id UUID REFERENCES agent_tasks(id), + type VARCHAR(100), -- 'git-status', 'code-task', 'priority-calc' + status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED + priority INT, + progress_percent INT, + result JSONB, + error TEXT, + created_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ +); + +-- Job steps (granular tracking) +CREATE TABLE job_steps ( + id UUID PRIMARY KEY, + job_id UUID REFERENCES runner_jobs(id), + ordinal INT, + phase VARCHAR(50), -- setup, execution, validation, cleanup + name VARCHAR(255), + type VARCHAR(50), -- command, ai-action, gate, artifact + status VARCHAR(50), + output TEXT, + tokens_input INT, + tokens_output INT, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + duration_ms INT +); + +-- Job events (immutable audit log) +CREATE TABLE job_events ( + id UUID PRIMARY KEY, + job_id UUID REFERENCES runner_jobs(id), + step_id UUID REFERENCES job_steps(id), + type VARCHAR(100), + timestamp TIMESTAMPTZ, + actor VARCHAR(100), + payload JSONB +); +``` + +## Deployment Model + +**Mosaic wrapper + OpenClaw instance:** + +``` +docker-compose.yml: + mosaic-stitcher: # Control layer (our code) + mosaic-bridge: # Chat integrations (Discord, Mattermost, Slack) + openclaw: # Execution layer (community code) + valkey: # Queue + cache + postgres: # Job store, events +``` + +**NOT separate containers per agent type.** Runner/weaver/inspector are **agent profiles** (constraints), not services. OpenClaw spawns agents with the profile constraints we define. + +All services: + +- Share Valkey (BullMQ queues) +- Share PostgreSQL (job store, events) +- Communicate via queue (stitcher → openclaw) + +## New Modules (in API for now, extract to containers later) + +``` +apps/api/src/ +├── stitcher/ # Workflow engine, job creation +├── runner-jobs/ # Job CRUD, queue submission +├── job-steps/ # Step tracking +├── job-events/ # Event logging, WebSocket gateway +└── workers/ # BullMQ processors (one per component type) +``` + +## Implementation Phases + +1. **Core Infrastructure** - BullMQ setup, database migrations +2. **Coordinator Service** - Job submission, status polling, cancel/retry +3. **Runner Worker** - Claude Code integration, step-by-step execution +4. **Real-time Status** - WebSocket gateway, SSE for CLI +5. **Integration Testing** - End-to-end tests + +## Files to Modify + +- `apps/api/src/app.module.ts` - Import new modules +- `apps/api/src/valkey/valkey.service.ts` - Share connection with BullMQ +- `apps/api/src/quality-orchestrator/` - Integrate with runner for gates +- `package.json` - Add `@nestjs/bullmq`, `bullmq` + +## Verification + +1. Create a test job via API +2. Verify job appears in BullMQ queue +3. Runner picks up and executes with step events +4. WebSocket receives real-time updates +5. All events persisted to PostgreSQL +6. Quality gates run before completion + +## Related Documentation + +- [Guard Rails: Capability-Based Permission System](./guard-rails-capability-permissions.md) +- [Quality Rails Architecture](./quality-rails-orchestration-architecture.md) +- [Non-AI Coordinator Pattern](./non-ai-coordinator-architecture.md)