docs: Add Mosaic Component Architecture and Guard Rails design docs

- mosaic-component-architecture.md: OpenClaw wrapper pattern, component naming, job tracking, chat integration, database schema - guard-rails-capability-permissions.md: Capability-based permission model Related: #162 (M4.2 Infrastructure Epic) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 01:26:01 -06:00
parent e045cb5a45
commit 8f63b3e1dc
1 changed files with 314 additions and 0 deletions
--- a/docs/3-architecture/mosaic-component-architecture.md
+++ b/docs/3-architecture/mosaic-component-architecture.md
@@ -0,0 +1,314 @@
+# Mosaic Component Architecture Design
+
+## Strategic Decision
+
+**OpenClaw as execution engine, Mosaic as control layer.**
+
+- **Now (M1-M2):** Wrapper approach - use OpenClaw, add Mosaic controls
+- **After M2:** Evaluate - is OpenClaw working for us?
+- **If needed:** Fork or rebuild with lessons learned
+
+**Why:** 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed.
+
+## Philosophy
+
+**Mosaic** = pieces combining to create a beautiful, larger picture.
+
+Each component has a **dedicated function** (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track.
+
+## Overview
+
+Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging.
+
+## Component Naming
+
+| Component            | Dedicated Function                                                       | Rails                                |
+| -------------------- | ------------------------------------------------------------------------ | ------------------------------------ |
+| **@mosaic**          | Gitea bot user - triggers workflow on issue assignment/mention           | Webhook receiver only                |
+| **mosaic-stitcher**  | Orchestrates workflow, sequences jobs, manages priorities                | Control plane only, no execution     |
+| **mosaic-bridge**    | Chat integrations (Discord, Mattermost, Slack) - commands in, status out | I/O only, no execution               |
+| **mosaic-runner**    | Fetches information, gathers context, reads repos                        | Read-only operations                 |
+| **mosaic-weaver**    | Implements code changes, writes files                                    | Write operations, scoped to worktree |
+| **mosaic-inspector** | Runs quality gates (build, lint, test)                                   | Validation only, no modifications    |
+| **mosaic-herald**    | Reports status, creates PR comments, notifications                       | Output/reporting only                |
+
+**Why this works:** Each component has exactly ONE job. Can't go off rails if there's only one rail.
+
+**Note:** Names are placeholders. Components are modular plugins—names can change later.
+
+## Architecture
+
+```
+┌─────────────────┐              ┌─────────────────┐
+│    @mosaic      │              │  mosaic-bridge  │
+│  (Gitea Bot)    │              │  (Chat I/O)     │
+│  Webhook Trigger│              │  Discord/MM/etc │
+└────────┬────────┘              └────────┬────────┘
+         │ Issue assigned                 │ Commands
+         └───────────────┬────────────────┘
+                         ▼
+┌─────────────────────────────────────────────────────────────┐
+│              MOSAIC STACK (Control Layer)                    │
+│                                                              │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │              MOSAIC-STITCHER (Wrapper)               │   │
+│  │  ┌───────────┐ ┌───────────┐ ┌───────────────────┐   │   │
+│  │  │ Guard     │ │ Quality   │ │ Job Tracking      │   │   │
+│  │  │ Rails     │ │ Rails     │ │ (Events/Steps)    │   │   │
+│  │  │ (perms)   │ │ (gates)   │ │                   │   │   │
+│  │  └───────────┘ └───────────┘ └───────────────────┘   │   │
+│  └──────────────────────────┬───────────────────────────┘   │
+│                             │                                │
+└─────────────────────────────┼────────────────────────────────┘
+                              │ Dispatch with constraints
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│                 OPENCLAW (Execution Layer)                   │
+│                 355+ contributors maintain                   │
+│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐   │
+│  │ Agent     │ │ Session   │ │ Multi-LLM │ │ Discord   │   │
+│  │ Spawning  │ │ Manager   │ │ Support   │ │ Integr.   │   │
+│  └───────────┘ └───────────┘ └───────────┘ └───────────┘   │
+│                                                              │
+│  Agent Profiles (Mosaic-defined constraints):               │
+│  ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐         │
+│  │ RUNNER  │ │ WEAVER  │ │ INSPECTOR │ │ HERALD  │         │
+│  │ (read)  │ │ (write) │ │ (validate)│ │ (report)│         │
+│  └─────────┘ └─────────┘ └───────────┘ └─────────┘         │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Key insight:** Agent profiles (runner, weaver, etc.) are **constraints passed to OpenClaw**, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do.
+
+## Relationship to Non-AI Coordinator (M4.1)
+
+This architecture **complements** the Non-AI Coordinator Pattern:
+
+| Layer                             | Responsibility                                                                      | Milestone |
+| --------------------------------- | ----------------------------------------------------------------------------------- | --------- |
+| **Non-AI Coordinator**            | Orchestration logic (when to assign, context monitoring, quality gates enforcement) | M4.1      |
+| **Mosaic Component Architecture** | Execution infrastructure (job tracking, OpenClaw integration, chat commands)        | M4.2      |
+
+The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs.
+
+## Chat Integration (mosaic-bridge)
+
+**Control Mosaic Stack via Discord, Mattermost, Slack, etc.**
+
+```
+#mosaic-control
+├── User: "@mosaic fix issue #42"
+├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread]
+│
+└── Thread: "Job #123: Fix issue #42"
+    ├── 📖 Runner: Gathering context... ✓
+    ├── 🧵 Weaver: Implementing... ✓
+    ├── 🔍 Inspector: Running tests... ✓
+    ├── 📢 Herald: PR created → #456
+    └── [Full event log: /api/jobs/123/events]
+```
+
+### Noise Management Strategy
+
+| Channel                 | Purpose                            | Verbosity                 |
+| ----------------------- | ---------------------------------- | ------------------------- |
+| `#mosaic-control`       | Commands + summaries               | Low (milestones only)     |
+| Job threads             | Per-job activity                   | Medium (step completions) |
+| `/api/jobs/{id}/events` | Full audit log                     | High (everything)         |
+| DMs (optional)          | Private updates to triggering user | Configurable              |
+
+### Commands (via chat)
+
+```
+@mosaic fix <issue>        # Start job for issue
+@mosaic status <job>       # Get job status
+@mosaic cancel <job>       # Cancel running job
+@mosaic verbose <job>      # Stream full logs to thread
+@mosaic quiet              # Reduce notifications
+@mosaic help               # Show commands
+```
+
+### Integration lives at Mosaic layer, not OpenClaw
+
+- **mosaic-bridge** handles Discord/Mattermost/Slack APIs
+- **mosaic-stitcher** receives commands, dispatches jobs
+- **mosaic-herald** sends status updates back through bridge
+- OpenClaw has NO direct chat access (stays focused on execution)
+
+## Key Components
+
+### 1. Mosaic-Stitcher (The Wrapper)
+
+The control layer that wraps OpenClaw:
+
+- Receives webhooks from @mosaic bot
+- Applies Guard Rails (capability permissions)
+- Applies Quality Rails (mandatory gates)
+- Tracks all job steps and events
+- Dispatches work to OpenClaw with constraints
+
+### 2. OpenClaw (Execution Engine)
+
+Community-maintained agent swarm (355+ contributors):
+
+- Spawns and manages AI agent sessions
+- Multi-LLM support (Claude, GPT, Ollama, etc.)
+- Session management and recovery
+- We use as-is, wrapped by Mosaic-Stitcher
+
+### 3. Agent Profiles (Constraints for OpenClaw)
+
+Mosaic-defined capability constraints passed to OpenClaw agents:
+
+- **runner** - read-only: fetch context, read files, query APIs
+- **weaver** - write: implement code, scoped to git worktree
+- **inspector** - validate: run gates, no modifications
+- **herald** - report: PR comments, notifications, status updates
+
+### 4. Job Structure
+
+Every job contains granular steps:
+
+| Phase      | Steps                                                   |
+| ---------- | ------------------------------------------------------- |
+| SETUP      | Clone repo, create worktree, install deps               |
+| EXECUTION  | Read requirements, analyze code, implement, write tests |
+| VALIDATION | Lint gate, typecheck gate, test gate, coverage gate     |
+| CLEANUP    | Stage, commit, push, create PR                          |
+
+### 5. Event Logging (Event Sourcing)
+
+Every action emits an event:
+
+- `job.created`, `job.queued`, `job.started`, `job.completed`, `job.failed`
+- `step.started`, `step.progress`, `step.output`, `step.completed`
+- `ai.tool_called`, `ai.tokens_used`, `ai.artifact_created`
+- `gate.started`, `gate.passed`, `gate.failed`
+
+Storage:
+
+- PostgreSQL: Immutable audit log (permanent)
+- Valkey Streams: Recent events (last 1000 per job)
+- Valkey Pub/Sub: Real-time streaming
+
+### 6. Queue Architecture
+
+**BullMQ** over plain ValkeyService because:
+
+- Job progress tracking (0-100%)
+- Automatic retry with exponential backoff
+- Rate limiting
+- Job dependencies
+- Rich lifecycle events
+
+Uses same Valkey instance already configured.
+
+## Database Schema
+
+```sql
+-- Runner jobs (links to existing agent_tasks)
+CREATE TABLE runner_jobs (
+  id UUID PRIMARY KEY,
+  workspace_id UUID NOT NULL,
+  agent_task_id UUID REFERENCES agent_tasks(id),
+  type VARCHAR(100),  -- 'git-status', 'code-task', 'priority-calc'
+  status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED
+  priority INT,
+  progress_percent INT,
+  result JSONB,
+  error TEXT,
+  created_at TIMESTAMPTZ,
+  started_at TIMESTAMPTZ,
+  completed_at TIMESTAMPTZ
+);
+
+-- Job steps (granular tracking)
+CREATE TABLE job_steps (
+  id UUID PRIMARY KEY,
+  job_id UUID REFERENCES runner_jobs(id),
+  ordinal INT,
+  phase VARCHAR(50),  -- setup, execution, validation, cleanup
+  name VARCHAR(255),
+  type VARCHAR(50),   -- command, ai-action, gate, artifact
+  status VARCHAR(50),
+  output TEXT,
+  tokens_input INT,
+  tokens_output INT,
+  started_at TIMESTAMPTZ,
+  completed_at TIMESTAMPTZ,
+  duration_ms INT
+);
+
+-- Job events (immutable audit log)
+CREATE TABLE job_events (
+  id UUID PRIMARY KEY,
+  job_id UUID REFERENCES runner_jobs(id),
+  step_id UUID REFERENCES job_steps(id),
+  type VARCHAR(100),
+  timestamp TIMESTAMPTZ,
+  actor VARCHAR(100),
+  payload JSONB
+);
+```
+
+## Deployment Model
+
+**Mosaic wrapper + OpenClaw instance:**
+
+```
+docker-compose.yml:
+  mosaic-stitcher:    # Control layer (our code)
+  mosaic-bridge:      # Chat integrations (Discord, Mattermost, Slack)
+  openclaw:           # Execution layer (community code)
+  valkey:             # Queue + cache
+  postgres:           # Job store, events
+```
+
+**NOT separate containers per agent type.** Runner/weaver/inspector are **agent profiles** (constraints), not services. OpenClaw spawns agents with the profile constraints we define.
+
+All services:
+
+- Share Valkey (BullMQ queues)
+- Share PostgreSQL (job store, events)
+- Communicate via queue (stitcher → openclaw)
+
+## New Modules (in API for now, extract to containers later)
+
+```
+apps/api/src/
+├── stitcher/         # Workflow engine, job creation
+├── runner-jobs/      # Job CRUD, queue submission
+├── job-steps/        # Step tracking
+├── job-events/       # Event logging, WebSocket gateway
+└── workers/          # BullMQ processors (one per component type)
+```
+
+## Implementation Phases
+
+1. **Core Infrastructure** - BullMQ setup, database migrations
+2. **Coordinator Service** - Job submission, status polling, cancel/retry
+3. **Runner Worker** - Claude Code integration, step-by-step execution
+4. **Real-time Status** - WebSocket gateway, SSE for CLI
+5. **Integration Testing** - End-to-end tests
+
+## Files to Modify
+
+- `apps/api/src/app.module.ts` - Import new modules
+- `apps/api/src/valkey/valkey.service.ts` - Share connection with BullMQ
+- `apps/api/src/quality-orchestrator/` - Integrate with runner for gates
+- `package.json` - Add `@nestjs/bullmq`, `bullmq`
+
+## Verification
+
+1. Create a test job via API
+2. Verify job appears in BullMQ queue
+3. Runner picks up and executes with step events
+4. WebSocket receives real-time updates
+5. All events persisted to PostgreSQL
+6. Quality gates run before completion
+
+## Related Documentation
+
+- [Guard Rails: Capability-Based Permission System](./guard-rails-capability-permissions.md)
+- [Quality Rails Architecture](./quality-rails-orchestration-architecture.md)
+- [Non-AI Coordinator Pattern](./non-ai-coordinator-architecture.md)