stack/docs/3-architecture/mosaic-component-architecture.md

# Mosaic Component Architecture Design

## Strategic Decision

**OpenClaw as execution engine, Mosaic as control layer.**

- **Now (M1-M2):** Wrapper approach - use OpenClaw, add Mosaic controls
- **After M2:** Evaluate - is OpenClaw working for us?
- **If needed:** Fork or rebuild with lessons learned

**Why:** 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed.

## Philosophy

**Mosaic** = pieces combining to create a beautiful, larger picture.

Each component has a **dedicated function** (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track.

## Overview

Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging.

## Component Naming

| Component            | Dedicated Function                                                       | Rails                                |
| -------------------- | ------------------------------------------------------------------------ | ------------------------------------ |
| **@mosaic**          | Gitea bot user - triggers workflow on issue assignment/mention           | Webhook receiver only                |
| **mosaic-stitcher**  | Orchestrates workflow, sequences jobs, manages priorities                | Control plane only, no execution     |
| **mosaic-bridge**    | Chat integrations (Discord, Mattermost, Slack) - commands in, status out | I/O only, no execution               |
| **mosaic-runner**    | Fetches information, gathers context, reads repos                        | Read-only operations                 |
| **mosaic-weaver**    | Implements code changes, writes files                                    | Write operations, scoped to worktree |
| **mosaic-inspector** | Runs quality gates (build, lint, test)                                   | Validation only, no modifications    |
| **mosaic-herald**    | Reports status, creates PR comments, notifications                       | Output/reporting only                |

**Why this works:** Each component has exactly ONE job. Can't go off rails if there's only one rail.

**Note:** Names are placeholders. Components are modular plugins—names can change later.

## Architecture

```
┌─────────────────┐              ┌─────────────────┐
│    @mosaic      │              │  mosaic-bridge  │
│  (Gitea Bot)    │              │  (Chat I/O)     │
│  Webhook Trigger│              │  Discord/MM/etc │
└────────┬────────┘              └────────┬────────┘
         │ Issue assigned                 │ Commands
         └───────────────┬────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              MOSAIC STACK (Control Layer)                    │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              MOSAIC-STITCHER (Wrapper)               │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────────────┐   │   │
│  │  │ Guard     │ │ Quality   │ │ Job Tracking      │   │   │
│  │  │ Rails     │ │ Rails     │ │ (Events/Steps)    │   │   │
│  │  │ (perms)   │ │ (gates)   │ │                   │   │   │
│  │  └───────────┘ └───────────┘ └───────────────────┘   │   │
│  └──────────────────────────┬───────────────────────────┘   │
│                             │                                │
└─────────────────────────────┼────────────────────────────────┘
                              │ Dispatch with constraints
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 OPENCLAW (Execution Layer)                   │
│                 355+ contributors maintain                   │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐   │
│  │ Agent     │ │ Session   │ │ Multi-LLM │ │ Discord   │   │
│  │ Spawning  │ │ Manager   │ │ Support   │ │ Integr.   │   │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘   │
│                                                              │
│  Agent Profiles (Mosaic-defined constraints):               │
│  ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐         │
│  │ RUNNER  │ │ WEAVER  │ │ INSPECTOR │ │ HERALD  │         │
│  │ (read)  │ │ (write) │ │ (validate)│ │ (report)│         │
│  └─────────┘ └─────────┘ └───────────┘ └─────────┘         │
└─────────────────────────────────────────────────────────────┘
```

**Key insight:** Agent profiles (runner, weaver, etc.) are **constraints passed to OpenClaw**, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do.

## Relationship to Non-AI Coordinator (M4.1)

This architecture **complements** the Non-AI Coordinator Pattern:

| Layer                             | Responsibility                                                                      | Milestone |
| --------------------------------- | ----------------------------------------------------------------------------------- | --------- |
| **Non-AI Coordinator**            | Orchestration logic (when to assign, context monitoring, quality gates enforcement) | M4.1      |
| **Mosaic Component Architecture** | Execution infrastructure (job tracking, OpenClaw integration, chat commands)        | M4.2      |

The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs.

## Chat Integration (mosaic-bridge)

**Control Mosaic Stack via Discord, Mattermost, Slack, etc.**

```
#mosaic-control
├── User: "@mosaic fix issue #42"
├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread]
│
└── Thread: "Job #123: Fix issue #42"
    ├── 📖 Runner: Gathering context... ✓
    ├── 🧵 Weaver: Implementing... ✓
    ├── 🔍 Inspector: Running tests... ✓
    ├── 📢 Herald: PR created → #456
    └── [Full event log: /api/jobs/123/events]
```

### Noise Management Strategy

| Channel                 | Purpose                            | Verbosity                 |
| ----------------------- | ---------------------------------- | ------------------------- |
| `#mosaic-control`       | Commands + summaries               | Low (milestones only)     |
| Job threads             | Per-job activity                   | Medium (step completions) |
| `/api/jobs/{id}/events` | Full audit log                     | High (everything)         |
| DMs (optional)          | Private updates to triggering user | Configurable              |

### Commands (via chat)

```
@mosaic fix <issue>        # Start job for issue
@mosaic status <job>       # Get job status
@mosaic cancel <job>       # Cancel running job
@mosaic verbose <job>      # Stream full logs to thread
@mosaic quiet              # Reduce notifications
@mosaic help               # Show commands
```

### Integration lives at Mosaic layer, not OpenClaw

- **mosaic-bridge** handles Discord/Mattermost/Slack APIs
- **mosaic-stitcher** receives commands, dispatches jobs
- **mosaic-herald** sends status updates back through bridge
- OpenClaw has NO direct chat access (stays focused on execution)

## Key Components

### 1. Mosaic-Stitcher (The Wrapper)

The control layer that wraps OpenClaw:

- Receives webhooks from @mosaic bot
- Applies Guard Rails (capability permissions)
- Applies Quality Rails (mandatory gates)
- Tracks all job steps and events
- Dispatches work to OpenClaw with constraints

### 2. OpenClaw (Execution Engine)

Community-maintained agent swarm (355+ contributors):

- Spawns and manages AI agent sessions
- Multi-LLM support (Claude, GPT, Ollama, etc.)
- Session management and recovery
- We use as-is, wrapped by Mosaic-Stitcher

### 3. Agent Profiles (Constraints for OpenClaw)

Mosaic-defined capability constraints passed to OpenClaw agents:

- **runner** - read-only: fetch context, read files, query APIs
- **weaver** - write: implement code, scoped to git worktree
- **inspector** - validate: run gates, no modifications
- **herald** - report: PR comments, notifications, status updates

### 4. Job Structure

Every job contains granular steps:

| Phase      | Steps                                                   |
| ---------- | ------------------------------------------------------- |
| SETUP      | Clone repo, create worktree, install deps               |
| EXECUTION  | Read requirements, analyze code, implement, write tests |
| VALIDATION | Lint gate, typecheck gate, test gate, coverage gate     |
| CLEANUP    | Stage, commit, push, create PR                          |

### 5. Event Logging (Event Sourcing)

Every action emits an event:

- `job.created`, `job.queued`, `job.started`, `job.completed`, `job.failed`
- `step.started`, `step.progress`, `step.output`, `step.completed`
- `ai.tool_called`, `ai.tokens_used`, `ai.artifact_created`
- `gate.started`, `gate.passed`, `gate.failed`

Storage:

- PostgreSQL: Immutable audit log (permanent)
- Valkey Streams: Recent events (last 1000 per job)
- Valkey Pub/Sub: Real-time streaming

### 6. Queue Architecture

**BullMQ** over plain ValkeyService because:

- Job progress tracking (0-100%)
- Automatic retry with exponential backoff
- Rate limiting
- Job dependencies
- Rich lifecycle events

Uses same Valkey instance already configured.

## Database Schema

```sql
-- Runner jobs (links to existing agent_tasks)
CREATE TABLE runner_jobs (
  id UUID PRIMARY KEY,
  workspace_id UUID NOT NULL,
  agent_task_id UUID REFERENCES agent_tasks(id),
  type VARCHAR(100),  -- 'git-status', 'code-task', 'priority-calc'
  status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED
  priority INT,
  progress_percent INT,
  result JSONB,
  error TEXT,
  created_at TIMESTAMPTZ,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ
);

-- Job steps (granular tracking)
CREATE TABLE job_steps (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  ordinal INT,
  phase VARCHAR(50),  -- setup, execution, validation, cleanup
  name VARCHAR(255),
  type VARCHAR(50),   -- command, ai-action, gate, artifact
  status VARCHAR(50),
  output TEXT,
  tokens_input INT,
  tokens_output INT,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  duration_ms INT
);

-- Job events (immutable audit log)
CREATE TABLE job_events (
  id UUID PRIMARY KEY,
  job_id UUID REFERENCES runner_jobs(id),
  step_id UUID REFERENCES job_steps(id),
  type VARCHAR(100),
  timestamp TIMESTAMPTZ,
  actor VARCHAR(100),
  payload JSONB
);
```

## Deployment Model

**Mosaic wrapper + OpenClaw instance:**

```
docker-compose.yml:
  mosaic-stitcher:    # Control layer (our code)
  mosaic-bridge:      # Chat integrations (Discord, Mattermost, Slack)
  openclaw:           # Execution layer (community code)
  valkey:             # Queue + cache
  postgres:           # Job store, events
```

**NOT separate containers per agent type.** Runner/weaver/inspector are **agent profiles** (constraints), not services. OpenClaw spawns agents with the profile constraints we define.

All services:

- Share Valkey (BullMQ queues)
- Share PostgreSQL (job store, events)
- Communicate via queue (stitcher → openclaw)

## New Modules (in API for now, extract to containers later)

```
apps/api/src/
├── stitcher/         # Workflow engine, job creation
├── runner-jobs/      # Job CRUD, queue submission
├── job-steps/        # Step tracking
├── job-events/       # Event logging, WebSocket gateway
└── workers/          # BullMQ processors (one per component type)
```

## Implementation Phases

1. **Core Infrastructure** - BullMQ setup, database migrations
2. **Coordinator Service** - Job submission, status polling, cancel/retry
3. **Runner Worker** - Claude Code integration, step-by-step execution
4. **Real-time Status** - WebSocket gateway, SSE for CLI
5. **Integration Testing** - End-to-end tests

## Files to Modify

- `apps/api/src/app.module.ts` - Import new modules
- `apps/api/src/valkey/valkey.service.ts` - Share connection with BullMQ
- `apps/api/src/quality-orchestrator/` - Integrate with runner for gates
- `package.json` - Add `@nestjs/bullmq`, `bullmq`

## Verification

1. Create a test job via API
2. Verify job appears in BullMQ queue
3. Runner picks up and executes with step events
4. WebSocket receives real-time updates
5. All events persisted to PostgreSQL
6. Quality gates run before completion

## Related Documentation

- [Guard Rails: Capability-Based Permission System](./guard-rails-capability-permissions.md)
- [Quality Rails Architecture](./quality-rails-orchestration-architecture.md)
- [Non-AI Coordinator Pattern](./non-ai-coordinator-architecture.md)