Files
stack/docs/3-architecture/mosaic-component-architecture.md
Jason Woltje 8f63b3e1dc
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
docs: Add Mosaic Component Architecture and Guard Rails design docs
- mosaic-component-architecture.md: OpenClaw wrapper pattern, component naming,
  job tracking, chat integration, database schema
- guard-rails-capability-permissions.md: Capability-based permission model

Related: #162 (M4.2 Infrastructure Epic)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 01:26:01 -06:00

14 KiB

Mosaic Component Architecture Design

Strategic Decision

OpenClaw as execution engine, Mosaic as control layer.

  • Now (M1-M2): Wrapper approach - use OpenClaw, add Mosaic controls
  • After M2: Evaluate - is OpenClaw working for us?
  • If needed: Fork or rebuild with lessons learned

Why: 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed.

Philosophy

Mosaic = pieces combining to create a beautiful, larger picture.

Each component has a dedicated function (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track.

Overview

Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging.

Component Naming

Component Dedicated Function Rails
@mosaic Gitea bot user - triggers workflow on issue assignment/mention Webhook receiver only
mosaic-stitcher Orchestrates workflow, sequences jobs, manages priorities Control plane only, no execution
mosaic-bridge Chat integrations (Discord, Mattermost, Slack) - commands in, status out I/O only, no execution
mosaic-runner Fetches information, gathers context, reads repos Read-only operations
mosaic-weaver Implements code changes, writes files Write operations, scoped to worktree
mosaic-inspector Runs quality gates (build, lint, test) Validation only, no modifications
mosaic-herald Reports status, creates PR comments, notifications Output/reporting only

Why this works: Each component has exactly ONE job. Can't go off rails if there's only one rail.

Note: Names are placeholders. Components are modular plugins—names can change later.

Architecture

┌─────────────────┐              ┌─────────────────┐
│    @mosaic      │              │  mosaic-bridge  │
│  (Gitea Bot)    │              │  (Chat I/O)     │
│  Webhook Trigger│              │  Discord/MM/etc │
└────────┬────────┘              └────────┬────────┘
         │ Issue assigned                 │ Commands
         └───────────────┬────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              MOSAIC STACK (Control Layer)                    │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              MOSAIC-STITCHER (Wrapper)               │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────────────┐   │   │
│  │  │ Guard     │ │ Quality   │ │ Job Tracking      │   │   │
│  │  │ Rails     │ │ Rails     │ │ (Events/Steps)    │   │   │
│  │  │ (perms)   │ │ (gates)   │ │                   │   │   │
│  │  └───────────┘ └───────────┘ └───────────────────┘   │   │
│  └──────────────────────────┬───────────────────────────┘   │
│                             │                                │
└─────────────────────────────┼────────────────────────────────┘
                              │ Dispatch with constraints
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 OPENCLAW (Execution Layer)                   │
│                 355+ contributors maintain                   │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐   │
│  │ Agent     │ │ Session   │ │ Multi-LLM │ │ Discord   │   │
│  │ Spawning  │ │ Manager   │ │ Support   │ │ Integr.   │   │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘   │
│                                                              │
│  Agent Profiles (Mosaic-defined constraints):               │
│  ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐         │
│  │ RUNNER  │ │ WEAVER  │ │ INSPECTOR │ │ HERALD  │         │
│  │ (read)  │ │ (write) │ │ (validate)│ │ (report)│         │
│  └─────────┘ └─────────┘ └───────────┘ └─────────┘         │
└─────────────────────────────────────────────────────────────┘

Key insight: Agent profiles (runner, weaver, etc.) are constraints passed to OpenClaw, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do.

Relationship to Non-AI Coordinator (M4.1)

This architecture complements the Non-AI Coordinator Pattern:

Layer Responsibility Milestone
Non-AI Coordinator Orchestration logic (when to assign, context monitoring, quality gates enforcement) M4.1
Mosaic Component Architecture Execution infrastructure (job tracking, OpenClaw integration, chat commands) M4.2

The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs.

Chat Integration (mosaic-bridge)

Control Mosaic Stack via Discord, Mattermost, Slack, etc.

#mosaic-control
├── User: "@mosaic fix issue #42"
├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread]
│
└── Thread: "Job #123: Fix issue #42"
    ├── 📖 Runner: Gathering context... ✓
    ├── 🧵 Weaver: Implementing... ✓
    ├── 🔍 Inspector: Running tests... ✓
    ├── 📢 Herald: PR created → #456
    └── [Full event log: /api/jobs/123/events]

Noise Management Strategy

Channel Purpose Verbosity
#mosaic-control Commands + summaries Low (milestones only)
Job threads Per-job activity Medium (step completions)
/api/jobs/{id}/events Full audit log High (everything)
DMs (optional) Private updates to triggering user Configurable

Commands (via chat)

@mosaic fix <issue>        # Start job for issue
@mosaic status <job>       # Get job status
@mosaic cancel <job>       # Cancel running job
@mosaic verbose <job>      # Stream full logs to thread
@mosaic quiet              # Reduce notifications
@mosaic help               # Show commands

Integration lives at Mosaic layer, not OpenClaw

  • mosaic-bridge handles Discord/Mattermost/Slack APIs
  • mosaic-stitcher receives commands, dispatches jobs
  • mosaic-herald sends status updates back through bridge
  • OpenClaw has NO direct chat access (stays focused on execution)

Key Components

1. Mosaic-Stitcher (The Wrapper)

The control layer that wraps OpenClaw:

  • Receives webhooks from @mosaic bot
  • Applies Guard Rails (capability permissions)
  • Applies Quality Rails (mandatory gates)
  • Tracks all job steps and events
  • Dispatches work to OpenClaw with constraints

2. OpenClaw (Execution Engine)

Community-maintained agent swarm (355+ contributors):

  • Spawns and manages AI agent sessions
  • Multi-LLM support (Claude, GPT, Ollama, etc.)
  • Session management and recovery
  • We use as-is, wrapped by Mosaic-Stitcher

3. Agent Profiles (Constraints for OpenClaw)

Mosaic-defined capability constraints passed to OpenClaw agents:

  • runner - read-only: fetch context, read files, query APIs
  • weaver - write: implement code, scoped to git worktree
  • inspector - validate: run gates, no modifications
  • herald - report: PR comments, notifications, status updates

4. Job Structure

Every job contains granular steps:

Phase Steps
SETUP Clone repo, create worktree, install deps
EXECUTION Read requirements, analyze code, implement, write tests
VALIDATION Lint gate, typecheck gate, test gate, coverage gate
CLEANUP Stage, commit, push, create PR

5. Event Logging (Event Sourcing)

Every action emits an event:

  • job.created, job.queued, job.started, job.completed, job.failed
  • step.started, step.progress, step.output, step.completed
  • ai.tool_called, ai.tokens_used, ai.artifact_created
  • gate.started, gate.passed, gate.failed

Storage:

  • PostgreSQL: Immutable audit log (permanent)
  • Valkey Streams: Recent events (last 1000 per job)
  • Valkey Pub/Sub: Real-time streaming

6. Queue Architecture

BullMQ over plain ValkeyService because:

  • Job progress tracking (0-100%)
  • Automatic retry with exponential backoff
  • Rate limiting
  • Job dependencies
  • Rich lifecycle events

Uses same Valkey instance already configured.

Database Schema

-- Runner jobs (links to existing agent_tasks)
CREATE TABLE runner_jobs (
  id UUID PRIMARY KEY,
  workspace_id UUID NOT NULL,
  agent_task_id UUID REFERENCES agent_tasks(id),
  type VARCHAR(100),  -- 'git-status', 'code-task', 'priority-calc'
  status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED
  priority INT,
  progress_percent INT,
  result JSONB,
  error TEXT,
  created_at TIMESTAMPTZ,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ
);

-- Job steps (granular tracking)
CREATE TABLE job_steps (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  ordinal INT,
  phase VARCHAR(50),  -- setup, execution, validation, cleanup
  name VARCHAR(255),
  type VARCHAR(50),   -- command, ai-action, gate, artifact
  status VARCHAR(50),
  output TEXT,
  tokens_input INT,
  tokens_output INT,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  duration_ms INT
);

-- Job events (immutable audit log)
CREATE TABLE job_events (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  step_id UUID REFERENCES job_steps(id),
  type VARCHAR(100),
  timestamp TIMESTAMPTZ,
  actor VARCHAR(100),
  payload JSONB
);

Deployment Model

Mosaic wrapper + OpenClaw instance:

docker-compose.yml:
  mosaic-stitcher:    # Control layer (our code)
  mosaic-bridge:      # Chat integrations (Discord, Mattermost, Slack)
  openclaw:           # Execution layer (community code)
  valkey:             # Queue + cache
  postgres:           # Job store, events

NOT separate containers per agent type. Runner/weaver/inspector are agent profiles (constraints), not services. OpenClaw spawns agents with the profile constraints we define.

All services:

  • Share Valkey (BullMQ queues)
  • Share PostgreSQL (job store, events)
  • Communicate via queue (stitcher → openclaw)

New Modules (in API for now, extract to containers later)

apps/api/src/
├── stitcher/         # Workflow engine, job creation
├── runner-jobs/      # Job CRUD, queue submission
├── job-steps/        # Step tracking
├── job-events/       # Event logging, WebSocket gateway
└── workers/          # BullMQ processors (one per component type)

Implementation Phases

  1. Core Infrastructure - BullMQ setup, database migrations
  2. Coordinator Service - Job submission, status polling, cancel/retry
  3. Runner Worker - Claude Code integration, step-by-step execution
  4. Real-time Status - WebSocket gateway, SSE for CLI
  5. Integration Testing - End-to-end tests

Files to Modify

  • apps/api/src/app.module.ts - Import new modules
  • apps/api/src/valkey/valkey.service.ts - Share connection with BullMQ
  • apps/api/src/quality-orchestrator/ - Integrate with runner for gates
  • package.json - Add @nestjs/bullmq, bullmq

Verification

  1. Create a test job via API
  2. Verify job appears in BullMQ queue
  3. Runner picks up and executes with step events
  4. WebSocket receives real-time updates
  5. All events persisted to PostgreSQL
  6. Quality gates run before completion