# Mosaic Component Architecture Design ## Strategic Decision **OpenClaw as execution engine, Mosaic as control layer.** - **Now (M1-M2):** Wrapper approach - use OpenClaw, add Mosaic controls - **After M2:** Evaluate - is OpenClaw working for us? - **If needed:** Fork or rebuild with lessons learned **Why:** 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed. ## Philosophy **Mosaic** = pieces combining to create a beautiful, larger picture. Each component has a **dedicated function** (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track. ## Overview Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging. ## Component Naming | Component | Dedicated Function | Rails | | -------------------- | ------------------------------------------------------------------------ | ------------------------------------ | | **@mosaic** | Gitea bot user - triggers workflow on issue assignment/mention | Webhook receiver only | | **mosaic-stitcher** | Orchestrates workflow, sequences jobs, manages priorities | Control plane only, no execution | | **mosaic-bridge** | Chat integrations (Discord, Mattermost, Slack) - commands in, status out | I/O only, no execution | | **mosaic-runner** | Fetches information, gathers context, reads repos | Read-only operations | | **mosaic-weaver** | Implements code changes, writes files | Write operations, scoped to worktree | | **mosaic-inspector** | Runs quality gates (build, lint, test) | Validation only, no modifications | | **mosaic-herald** | Reports status, creates PR comments, notifications | Output/reporting only | **Why this works:** Each component has exactly ONE job. Can't go off rails if there's only one rail. **Note:** Names are placeholders. Components are modular plugins—names can change later. ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ │ @mosaic │ │ mosaic-bridge │ │ (Gitea Bot) │ │ (Chat I/O) │ │ Webhook Trigger│ │ Discord/MM/etc │ └────────┬────────┘ └────────┬────────┘ │ Issue assigned │ Commands └───────────────┬────────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ MOSAIC STACK (Control Layer) │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ MOSAIC-STITCHER (Wrapper) │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │ │ │ │ │ Guard │ │ Quality │ │ Job Tracking │ │ │ │ │ │ Rails │ │ Rails │ │ (Events/Steps) │ │ │ │ │ │ (perms) │ │ (gates) │ │ │ │ │ │ │ └───────────┘ └───────────┘ └───────────────────┘ │ │ │ └──────────────────────────┬───────────────────────────┘ │ │ │ │ └─────────────────────────────┼────────────────────────────────┘ │ Dispatch with constraints ▼ ┌─────────────────────────────────────────────────────────────┐ │ OPENCLAW (Execution Layer) │ │ 355+ contributors maintain │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Agent │ │ Session │ │ Multi-LLM │ │ Discord │ │ │ │ Spawning │ │ Manager │ │ Support │ │ Integr. │ │ │ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ │ │ │ Agent Profiles (Mosaic-defined constraints): │ │ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ │ │ │ RUNNER │ │ WEAVER │ │ INSPECTOR │ │ HERALD │ │ │ │ (read) │ │ (write) │ │ (validate)│ │ (report)│ │ │ └─────────┘ └─────────┘ └───────────┘ └─────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` **Key insight:** Agent profiles (runner, weaver, etc.) are **constraints passed to OpenClaw**, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do. ## Relationship to Non-AI Coordinator (M4.1) This architecture **complements** the Non-AI Coordinator Pattern: | Layer | Responsibility | Milestone | | --------------------------------- | ----------------------------------------------------------------------------------- | --------- | | **Non-AI Coordinator** | Orchestration logic (when to assign, context monitoring, quality gates enforcement) | M4.1 | | **Mosaic Component Architecture** | Execution infrastructure (job tracking, OpenClaw integration, chat commands) | M4.2 | The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs. ## Chat Integration (mosaic-bridge) **Control Mosaic Stack via Discord, Mattermost, Slack, etc.** ``` #mosaic-control ├── User: "@mosaic fix issue #42" ├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread] │ └── Thread: "Job #123: Fix issue #42" ├── 📖 Runner: Gathering context... ✓ ├── 🧵 Weaver: Implementing... ✓ ├── 🔍 Inspector: Running tests... ✓ ├── 📢 Herald: PR created → #456 └── [Full event log: /api/jobs/123/events] ``` ### Noise Management Strategy | Channel | Purpose | Verbosity | | ----------------------- | ---------------------------------- | ------------------------- | | `#mosaic-control` | Commands + summaries | Low (milestones only) | | Job threads | Per-job activity | Medium (step completions) | | `/api/jobs/{id}/events` | Full audit log | High (everything) | | DMs (optional) | Private updates to triggering user | Configurable | ### Commands (via chat) ``` @mosaic fix # Start job for issue @mosaic status # Get job status @mosaic cancel # Cancel running job @mosaic verbose # Stream full logs to thread @mosaic quiet # Reduce notifications @mosaic help # Show commands ``` ### Integration lives at Mosaic layer, not OpenClaw - **mosaic-bridge** handles Discord/Mattermost/Slack APIs - **mosaic-stitcher** receives commands, dispatches jobs - **mosaic-herald** sends status updates back through bridge - OpenClaw has NO direct chat access (stays focused on execution) ## Key Components ### 1. Mosaic-Stitcher (The Wrapper) The control layer that wraps OpenClaw: - Receives webhooks from @mosaic bot - Applies Guard Rails (capability permissions) - Applies Quality Rails (mandatory gates) - Tracks all job steps and events - Dispatches work to OpenClaw with constraints ### 2. OpenClaw (Execution Engine) Community-maintained agent swarm (355+ contributors): - Spawns and manages AI agent sessions - Multi-LLM support (Claude, GPT, Ollama, etc.) - Session management and recovery - We use as-is, wrapped by Mosaic-Stitcher ### 3. Agent Profiles (Constraints for OpenClaw) Mosaic-defined capability constraints passed to OpenClaw agents: - **runner** - read-only: fetch context, read files, query APIs - **weaver** - write: implement code, scoped to git worktree - **inspector** - validate: run gates, no modifications - **herald** - report: PR comments, notifications, status updates ### 4. Job Structure Every job contains granular steps: | Phase | Steps | | ---------- | ------------------------------------------------------- | | SETUP | Clone repo, create worktree, install deps | | EXECUTION | Read requirements, analyze code, implement, write tests | | VALIDATION | Lint gate, typecheck gate, test gate, coverage gate | | CLEANUP | Stage, commit, push, create PR | ### 5. Event Logging (Event Sourcing) Every action emits an event: - `job.created`, `job.queued`, `job.started`, `job.completed`, `job.failed` - `step.started`, `step.progress`, `step.output`, `step.completed` - `ai.tool_called`, `ai.tokens_used`, `ai.artifact_created` - `gate.started`, `gate.passed`, `gate.failed` Storage: - PostgreSQL: Immutable audit log (permanent) - Valkey Streams: Recent events (last 1000 per job) - Valkey Pub/Sub: Real-time streaming ### 6. Queue Architecture **BullMQ** over plain ValkeyService because: - Job progress tracking (0-100%) - Automatic retry with exponential backoff - Rate limiting - Job dependencies - Rich lifecycle events Uses same Valkey instance already configured. ## Database Schema ```sql -- Runner jobs (links to existing agent_tasks) CREATE TABLE runner_jobs ( id UUID PRIMARY KEY, workspace_id UUID NOT NULL, agent_task_id UUID REFERENCES agent_tasks(id), type VARCHAR(100), -- 'git-status', 'code-task', 'priority-calc' status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED priority INT, progress_percent INT, result JSONB, error TEXT, created_at TIMESTAMPTZ, started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ ); -- Job steps (granular tracking) CREATE TABLE job_steps ( id UUID PRIMARY KEY, job_id UUID REFERENCES runner_jobs(id), ordinal INT, phase VARCHAR(50), -- setup, execution, validation, cleanup name VARCHAR(255), type VARCHAR(50), -- command, ai-action, gate, artifact status VARCHAR(50), output TEXT, tokens_input INT, tokens_output INT, started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ, duration_ms INT ); -- Job events (immutable audit log) CREATE TABLE job_events ( id UUID PRIMARY KEY, job_id UUID REFERENCES runner_jobs(id), step_id UUID REFERENCES job_steps(id), type VARCHAR(100), timestamp TIMESTAMPTZ, actor VARCHAR(100), payload JSONB ); ``` ## Deployment Model **Mosaic wrapper + OpenClaw instance:** ``` docker-compose.yml: mosaic-stitcher: # Control layer (our code) mosaic-bridge: # Chat integrations (Discord, Mattermost, Slack) openclaw: # Execution layer (community code) valkey: # Queue + cache postgres: # Job store, events ``` **NOT separate containers per agent type.** Runner/weaver/inspector are **agent profiles** (constraints), not services. OpenClaw spawns agents with the profile constraints we define. All services: - Share Valkey (BullMQ queues) - Share PostgreSQL (job store, events) - Communicate via queue (stitcher → openclaw) ## New Modules (in API for now, extract to containers later) ``` apps/api/src/ ├── stitcher/ # Workflow engine, job creation ├── runner-jobs/ # Job CRUD, queue submission ├── job-steps/ # Step tracking ├── job-events/ # Event logging, WebSocket gateway └── workers/ # BullMQ processors (one per component type) ``` ## Implementation Phases 1. **Core Infrastructure** - BullMQ setup, database migrations 2. **Coordinator Service** - Job submission, status polling, cancel/retry 3. **Runner Worker** - Claude Code integration, step-by-step execution 4. **Real-time Status** - WebSocket gateway, SSE for CLI 5. **Integration Testing** - End-to-end tests ## Files to Modify - `apps/api/src/app.module.ts` - Import new modules - `apps/api/src/valkey/valkey.service.ts` - Share connection with BullMQ - `apps/api/src/quality-orchestrator/` - Integrate with runner for gates - `package.json` - Add `@nestjs/bullmq`, `bullmq` ## Verification 1. Create a test job via API 2. Verify job appears in BullMQ queue 3. Runner picks up and executes with step events 4. WebSocket receives real-time updates 5. All events persisted to PostgreSQL 6. Quality gates run before completion ## Related Documentation - [Guard Rails: Capability-Based Permission System](./guard-rails-capability-permissions.md) - [Quality Rails Architecture](./quality-rails-orchestration-architecture.md) - [Non-AI Coordinator Pattern](./non-ai-coordinator-architecture.md)