Files

ci/woodpecker/push/woodpecker Pipeline was successful

Details

docs: Add Mosaic Component Architecture and Guard Rails design docs

- mosaic-component-architecture.md: OpenClaw wrapper pattern, component naming,
  job tracking, chat integration, database schema
- guard-rails-capability-permissions.md: Capability-based permission model

Related: #162 (M4.2 Infrastructure Epic)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-01 01:26:01 -06:00

14 KiB

Raw Permalink Blame History

Mosaic Component Architecture Design

Strategic Decision

OpenClaw as execution engine, Mosaic as control layer.

Now (M1-M2): Wrapper approach - use OpenClaw, add Mosaic controls
After M2: Evaluate - is OpenClaw working for us?
If needed: Fork or rebuild with lessons learned

Why: 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed.

Philosophy

Mosaic = pieces combining to create a beautiful, larger picture.

Each component has a dedicated function (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track.

Overview

Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging.

Component Naming

Component	Dedicated Function	Rails
@mosaic	Gitea bot user - triggers workflow on issue assignment/mention	Webhook receiver only
mosaic-stitcher	Orchestrates workflow, sequences jobs, manages priorities	Control plane only, no execution
mosaic-bridge	Chat integrations (Discord, Mattermost, Slack) - commands in, status out	I/O only, no execution
mosaic-runner	Fetches information, gathers context, reads repos	Read-only operations
mosaic-weaver	Implements code changes, writes files	Write operations, scoped to worktree
mosaic-inspector	Runs quality gates (build, lint, test)	Validation only, no modifications
mosaic-herald	Reports status, creates PR comments, notifications	Output/reporting only

Why this works: Each component has exactly ONE job. Can't go off rails if there's only one rail.

Note: Names are placeholders. Components are modular plugins—names can change later.

Architecture

┌─────────────────┐              ┌─────────────────┐
│    @mosaic      │              │  mosaic-bridge  │
│  (Gitea Bot)    │              │  (Chat I/O)     │
│  Webhook Trigger│              │  Discord/MM/etc │
└────────┬────────┘              └────────┬────────┘
         │ Issue assigned                 │ Commands
         └───────────────┬────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              MOSAIC STACK (Control Layer)                    │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              MOSAIC-STITCHER (Wrapper)               │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────────────┐   │   │
│  │  │ Guard     │ │ Quality   │ │ Job Tracking      │   │   │
│  │  │ Rails     │ │ Rails     │ │ (Events/Steps)    │   │   │
│  │  │ (perms)   │ │ (gates)   │ │                   │   │   │
│  │  └───────────┘ └───────────┘ └───────────────────┘   │   │
│  └──────────────────────────┬───────────────────────────┘   │
│                             │                                │
└─────────────────────────────┼────────────────────────────────┘
                              │ Dispatch with constraints
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 OPENCLAW (Execution Layer)                   │
│                 355+ contributors maintain                   │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐   │
│  │ Agent     │ │ Session   │ │ Multi-LLM │ │ Discord   │   │
│  │ Spawning  │ │ Manager   │ │ Support   │ │ Integr.   │   │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘   │
│                                                              │
│  Agent Profiles (Mosaic-defined constraints):               │
│  ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐         │
│  │ RUNNER  │ │ WEAVER  │ │ INSPECTOR │ │ HERALD  │         │
│  │ (read)  │ │ (write) │ │ (validate)│ │ (report)│         │
│  └─────────┘ └─────────┘ └───────────┘ └─────────┘         │
└─────────────────────────────────────────────────────────────┘

Key insight: Agent profiles (runner, weaver, etc.) are constraints passed to OpenClaw, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do.

Relationship to Non-AI Coordinator (M4.1)

This architecture complements the Non-AI Coordinator Pattern:

Layer	Responsibility	Milestone
Non-AI Coordinator	Orchestration logic (when to assign, context monitoring, quality gates enforcement)	M4.1
Mosaic Component Architecture	Execution infrastructure (job tracking, OpenClaw integration, chat commands)	M4.2

The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs.

Chat Integration (mosaic-bridge)

Control Mosaic Stack via Discord, Mattermost, Slack, etc.

#mosaic-control
├── User: "@mosaic fix issue #42"
├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread]
│
└── Thread: "Job #123: Fix issue #42"
    ├── 📖 Runner: Gathering context... ✓
    ├── 🧵 Weaver: Implementing... ✓
    ├── 🔍 Inspector: Running tests... ✓
    ├── 📢 Herald: PR created → #456
    └── [Full event log: /api/jobs/123/events]

Noise Management Strategy

Channel	Purpose	Verbosity
`#mosaic-control`	Commands + summaries	Low (milestones only)
Job threads	Per-job activity	Medium (step completions)
`/api/jobs/{id}/events`	Full audit log	High (everything)
DMs (optional)	Private updates to triggering user	Configurable

Commands (via chat)

@mosaic fix <issue>        # Start job for issue
@mosaic status <job>       # Get job status
@mosaic cancel <job>       # Cancel running job
@mosaic verbose <job>      # Stream full logs to thread
@mosaic quiet              # Reduce notifications
@mosaic help               # Show commands

Integration lives at Mosaic layer, not OpenClaw

mosaic-bridge handles Discord/Mattermost/Slack APIs
mosaic-stitcher receives commands, dispatches jobs
mosaic-herald sends status updates back through bridge
OpenClaw has NO direct chat access (stays focused on execution)

Key Components

1. Mosaic-Stitcher (The Wrapper)

The control layer that wraps OpenClaw:

Receives webhooks from @mosaic bot
Applies Guard Rails (capability permissions)
Applies Quality Rails (mandatory gates)
Tracks all job steps and events
Dispatches work to OpenClaw with constraints

2. OpenClaw (Execution Engine)

Community-maintained agent swarm (355+ contributors):

Spawns and manages AI agent sessions
Multi-LLM support (Claude, GPT, Ollama, etc.)
Session management and recovery
We use as-is, wrapped by Mosaic-Stitcher

3. Agent Profiles (Constraints for OpenClaw)

Mosaic-defined capability constraints passed to OpenClaw agents:

runner - read-only: fetch context, read files, query APIs
weaver - write: implement code, scoped to git worktree
inspector - validate: run gates, no modifications
herald - report: PR comments, notifications, status updates

4. Job Structure

Every job contains granular steps:

Phase	Steps
SETUP	Clone repo, create worktree, install deps
EXECUTION	Read requirements, analyze code, implement, write tests
VALIDATION	Lint gate, typecheck gate, test gate, coverage gate
CLEANUP	Stage, commit, push, create PR

5. Event Logging (Event Sourcing)

Every action emits an event:

job.created, job.queued, job.started, job.completed, job.failed
step.started, step.progress, step.output, step.completed
ai.tool_called, ai.tokens_used, ai.artifact_created
gate.started, gate.passed, gate.failed

Storage:

PostgreSQL: Immutable audit log (permanent)
Valkey Streams: Recent events (last 1000 per job)
Valkey Pub/Sub: Real-time streaming

6. Queue Architecture

BullMQ over plain ValkeyService because:

Job progress tracking (0-100%)
Automatic retry with exponential backoff
Rate limiting
Job dependencies
Rich lifecycle events

Uses same Valkey instance already configured.

Database Schema

-- Runner jobs (links to existing agent_tasks)
CREATE TABLE runner_jobs (
  id UUID PRIMARY KEY,
  workspace_id UUID NOT NULL,
  agent_task_id UUID REFERENCES agent_tasks(id),
  type VARCHAR(100),  -- 'git-status', 'code-task', 'priority-calc'
  status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED
  priority INT,
  progress_percent INT,
  result JSONB,
  error TEXT,
  created_at TIMESTAMPTZ,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ
);

-- Job steps (granular tracking)
CREATE TABLE job_steps (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  ordinal INT,
  phase VARCHAR(50),  -- setup, execution, validation, cleanup
  name VARCHAR(255),
  type VARCHAR(50),   -- command, ai-action, gate, artifact
  status VARCHAR(50),
  output TEXT,
  tokens_input INT,
  tokens_output INT,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  duration_ms INT
);

-- Job events (immutable audit log)
CREATE TABLE job_events (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  step_id UUID REFERENCES job_steps(id),
  type VARCHAR(100),
  timestamp TIMESTAMPTZ,
  actor VARCHAR(100),
  payload JSONB
);

Deployment Model

Mosaic wrapper + OpenClaw instance:

docker-compose.yml:
  mosaic-stitcher:    # Control layer (our code)
  mosaic-bridge:      # Chat integrations (Discord, Mattermost, Slack)
  openclaw:           # Execution layer (community code)
  valkey:             # Queue + cache
  postgres:           # Job store, events

NOT separate containers per agent type. Runner/weaver/inspector are agent profiles (constraints), not services. OpenClaw spawns agents with the profile constraints we define.

All services:

Share Valkey (BullMQ queues)
Share PostgreSQL (job store, events)
Communicate via queue (stitcher → openclaw)

New Modules (in API for now, extract to containers later)

apps/api/src/
├── stitcher/         # Workflow engine, job creation
├── runner-jobs/      # Job CRUD, queue submission
├── job-steps/        # Step tracking
├── job-events/       # Event logging, WebSocket gateway
└── workers/          # BullMQ processors (one per component type)

Implementation Phases

Core Infrastructure - BullMQ setup, database migrations
Coordinator Service - Job submission, status polling, cancel/retry
Runner Worker - Claude Code integration, step-by-step execution
Real-time Status - WebSocket gateway, SSE for CLI
Integration Testing - End-to-end tests

Files to Modify

apps/api/src/app.module.ts - Import new modules
apps/api/src/valkey/valkey.service.ts - Share connection with BullMQ
apps/api/src/quality-orchestrator/ - Integrate with runner for gates
package.json - Add @nestjs/bullmq, bullmq

Verification

Create a test job via API
Verify job appears in BullMQ queue
Runner picks up and executes with step events
WebSocket receives real-time updates
All events persisted to PostgreSQL
Quality gates run before completion

14 KiB Raw Permalink Blame History