docs: Add Mosaic Component Architecture and Guard Rails design docs
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- mosaic-component-architecture.md: OpenClaw wrapper pattern, component naming, job tracking, chat integration, database schema - guard-rails-capability-permissions.md: Capability-based permission model Related: #162 (M4.2 Infrastructure Epic) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
314
docs/3-architecture/mosaic-component-architecture.md
Normal file
314
docs/3-architecture/mosaic-component-architecture.md
Normal file
@@ -0,0 +1,314 @@
|
||||
# Mosaic Component Architecture Design
|
||||
|
||||
## Strategic Decision
|
||||
|
||||
**OpenClaw as execution engine, Mosaic as control layer.**
|
||||
|
||||
- **Now (M1-M2):** Wrapper approach - use OpenClaw, add Mosaic controls
|
||||
- **After M2:** Evaluate - is OpenClaw working for us?
|
||||
- **If needed:** Fork or rebuild with lessons learned
|
||||
|
||||
**Why:** 355+ contributors maintain OpenClaw. We maintain only the wrapper. Ship faster, pivot later if needed.
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Mosaic** = pieces combining to create a beautiful, larger picture.
|
||||
|
||||
Each component has a **dedicated function** (single responsibility). Focused tasks = agents stay on rails. If an agent only does one thing, it can't wander off-track.
|
||||
|
||||
## Overview
|
||||
|
||||
Establish the pattern for how Mosaic's control layer wraps OpenClaw's execution layer, with full job step tracking and event logging.
|
||||
|
||||
## Component Naming
|
||||
|
||||
| Component | Dedicated Function | Rails |
|
||||
| -------------------- | ------------------------------------------------------------------------ | ------------------------------------ |
|
||||
| **@mosaic** | Gitea bot user - triggers workflow on issue assignment/mention | Webhook receiver only |
|
||||
| **mosaic-stitcher** | Orchestrates workflow, sequences jobs, manages priorities | Control plane only, no execution |
|
||||
| **mosaic-bridge** | Chat integrations (Discord, Mattermost, Slack) - commands in, status out | I/O only, no execution |
|
||||
| **mosaic-runner** | Fetches information, gathers context, reads repos | Read-only operations |
|
||||
| **mosaic-weaver** | Implements code changes, writes files | Write operations, scoped to worktree |
|
||||
| **mosaic-inspector** | Runs quality gates (build, lint, test) | Validation only, no modifications |
|
||||
| **mosaic-herald** | Reports status, creates PR comments, notifications | Output/reporting only |
|
||||
|
||||
**Why this works:** Each component has exactly ONE job. Can't go off rails if there's only one rail.
|
||||
|
||||
**Note:** Names are placeholders. Components are modular plugins—names can change later.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ @mosaic │ │ mosaic-bridge │
|
||||
│ (Gitea Bot) │ │ (Chat I/O) │
|
||||
│ Webhook Trigger│ │ Discord/MM/etc │
|
||||
└────────┬────────┘ └────────┬────────┘
|
||||
│ Issue assigned │ Commands
|
||||
└───────────────┬────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ MOSAIC STACK (Control Layer) │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ MOSAIC-STITCHER (Wrapper) │ │
|
||||
│ │ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │ │
|
||||
│ │ │ Guard │ │ Quality │ │ Job Tracking │ │ │
|
||||
│ │ │ Rails │ │ Rails │ │ (Events/Steps) │ │ │
|
||||
│ │ │ (perms) │ │ (gates) │ │ │ │ │
|
||||
│ │ └───────────┘ └───────────┘ └───────────────────┘ │ │
|
||||
│ └──────────────────────────┬───────────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────┼────────────────────────────────┘
|
||||
│ Dispatch with constraints
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ OPENCLAW (Execution Layer) │
|
||||
│ 355+ contributors maintain │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ Agent │ │ Session │ │ Multi-LLM │ │ Discord │ │
|
||||
│ │ Spawning │ │ Manager │ │ Support │ │ Integr. │ │
|
||||
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
|
||||
│ │
|
||||
│ Agent Profiles (Mosaic-defined constraints): │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ │
|
||||
│ │ RUNNER │ │ WEAVER │ │ INSPECTOR │ │ HERALD │ │
|
||||
│ │ (read) │ │ (write) │ │ (validate)│ │ (report)│ │
|
||||
│ └─────────┘ └─────────┘ └───────────┘ └─────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Key insight:** Agent profiles (runner, weaver, etc.) are **constraints passed to OpenClaw**, not separate containers. OpenClaw spawns agents, Mosaic controls what they're allowed to do.
|
||||
|
||||
## Relationship to Non-AI Coordinator (M4.1)
|
||||
|
||||
This architecture **complements** the Non-AI Coordinator Pattern:
|
||||
|
||||
| Layer | Responsibility | Milestone |
|
||||
| --------------------------------- | ----------------------------------------------------------------------------------- | --------- |
|
||||
| **Non-AI Coordinator** | Orchestration logic (when to assign, context monitoring, quality gates enforcement) | M4.1 |
|
||||
| **Mosaic Component Architecture** | Execution infrastructure (job tracking, OpenClaw integration, chat commands) | M4.2 |
|
||||
|
||||
The Non-AI Coordinator uses this infrastructure to dispatch and monitor jobs.
|
||||
|
||||
## Chat Integration (mosaic-bridge)
|
||||
|
||||
**Control Mosaic Stack via Discord, Mattermost, Slack, etc.**
|
||||
|
||||
```
|
||||
#mosaic-control
|
||||
├── User: "@mosaic fix issue #42"
|
||||
├── Mosaic: "🚀 Started job #123 for issue #42" [link to thread]
|
||||
│
|
||||
└── Thread: "Job #123: Fix issue #42"
|
||||
├── 📖 Runner: Gathering context... ✓
|
||||
├── 🧵 Weaver: Implementing... ✓
|
||||
├── 🔍 Inspector: Running tests... ✓
|
||||
├── 📢 Herald: PR created → #456
|
||||
└── [Full event log: /api/jobs/123/events]
|
||||
```
|
||||
|
||||
### Noise Management Strategy
|
||||
|
||||
| Channel | Purpose | Verbosity |
|
||||
| ----------------------- | ---------------------------------- | ------------------------- |
|
||||
| `#mosaic-control` | Commands + summaries | Low (milestones only) |
|
||||
| Job threads | Per-job activity | Medium (step completions) |
|
||||
| `/api/jobs/{id}/events` | Full audit log | High (everything) |
|
||||
| DMs (optional) | Private updates to triggering user | Configurable |
|
||||
|
||||
### Commands (via chat)
|
||||
|
||||
```
|
||||
@mosaic fix <issue> # Start job for issue
|
||||
@mosaic status <job> # Get job status
|
||||
@mosaic cancel <job> # Cancel running job
|
||||
@mosaic verbose <job> # Stream full logs to thread
|
||||
@mosaic quiet # Reduce notifications
|
||||
@mosaic help # Show commands
|
||||
```
|
||||
|
||||
### Integration lives at Mosaic layer, not OpenClaw
|
||||
|
||||
- **mosaic-bridge** handles Discord/Mattermost/Slack APIs
|
||||
- **mosaic-stitcher** receives commands, dispatches jobs
|
||||
- **mosaic-herald** sends status updates back through bridge
|
||||
- OpenClaw has NO direct chat access (stays focused on execution)
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Mosaic-Stitcher (The Wrapper)
|
||||
|
||||
The control layer that wraps OpenClaw:
|
||||
|
||||
- Receives webhooks from @mosaic bot
|
||||
- Applies Guard Rails (capability permissions)
|
||||
- Applies Quality Rails (mandatory gates)
|
||||
- Tracks all job steps and events
|
||||
- Dispatches work to OpenClaw with constraints
|
||||
|
||||
### 2. OpenClaw (Execution Engine)
|
||||
|
||||
Community-maintained agent swarm (355+ contributors):
|
||||
|
||||
- Spawns and manages AI agent sessions
|
||||
- Multi-LLM support (Claude, GPT, Ollama, etc.)
|
||||
- Session management and recovery
|
||||
- We use as-is, wrapped by Mosaic-Stitcher
|
||||
|
||||
### 3. Agent Profiles (Constraints for OpenClaw)
|
||||
|
||||
Mosaic-defined capability constraints passed to OpenClaw agents:
|
||||
|
||||
- **runner** - read-only: fetch context, read files, query APIs
|
||||
- **weaver** - write: implement code, scoped to git worktree
|
||||
- **inspector** - validate: run gates, no modifications
|
||||
- **herald** - report: PR comments, notifications, status updates
|
||||
|
||||
### 4. Job Structure
|
||||
|
||||
Every job contains granular steps:
|
||||
|
||||
| Phase | Steps |
|
||||
| ---------- | ------------------------------------------------------- |
|
||||
| SETUP | Clone repo, create worktree, install deps |
|
||||
| EXECUTION | Read requirements, analyze code, implement, write tests |
|
||||
| VALIDATION | Lint gate, typecheck gate, test gate, coverage gate |
|
||||
| CLEANUP | Stage, commit, push, create PR |
|
||||
|
||||
### 5. Event Logging (Event Sourcing)
|
||||
|
||||
Every action emits an event:
|
||||
|
||||
- `job.created`, `job.queued`, `job.started`, `job.completed`, `job.failed`
|
||||
- `step.started`, `step.progress`, `step.output`, `step.completed`
|
||||
- `ai.tool_called`, `ai.tokens_used`, `ai.artifact_created`
|
||||
- `gate.started`, `gate.passed`, `gate.failed`
|
||||
|
||||
Storage:
|
||||
|
||||
- PostgreSQL: Immutable audit log (permanent)
|
||||
- Valkey Streams: Recent events (last 1000 per job)
|
||||
- Valkey Pub/Sub: Real-time streaming
|
||||
|
||||
### 6. Queue Architecture
|
||||
|
||||
**BullMQ** over plain ValkeyService because:
|
||||
|
||||
- Job progress tracking (0-100%)
|
||||
- Automatic retry with exponential backoff
|
||||
- Rate limiting
|
||||
- Job dependencies
|
||||
- Rich lifecycle events
|
||||
|
||||
Uses same Valkey instance already configured.
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Runner jobs (links to existing agent_tasks)
|
||||
CREATE TABLE runner_jobs (
|
||||
id UUID PRIMARY KEY,
|
||||
workspace_id UUID NOT NULL,
|
||||
agent_task_id UUID REFERENCES agent_tasks(id),
|
||||
type VARCHAR(100), -- 'git-status', 'code-task', 'priority-calc'
|
||||
status VARCHAR(50), -- PENDING → QUEUED → RUNNING → COMPLETED/FAILED
|
||||
priority INT,
|
||||
progress_percent INT,
|
||||
result JSONB,
|
||||
error TEXT,
|
||||
created_at TIMESTAMPTZ,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ
|
||||
);
|
||||
|
||||
-- Job steps (granular tracking)
|
||||
CREATE TABLE job_steps (
|
||||
id UUID PRIMARY KEY,
|
||||
job_id UUID REFERENCES runner_jobs(id),
|
||||
ordinal INT,
|
||||
phase VARCHAR(50), -- setup, execution, validation, cleanup
|
||||
name VARCHAR(255),
|
||||
type VARCHAR(50), -- command, ai-action, gate, artifact
|
||||
status VARCHAR(50),
|
||||
output TEXT,
|
||||
tokens_input INT,
|
||||
tokens_output INT,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
duration_ms INT
|
||||
);
|
||||
|
||||
-- Job events (immutable audit log)
|
||||
CREATE TABLE job_events (
|
||||
id UUID PRIMARY KEY,
|
||||
job_id UUID REFERENCES runner_jobs(id),
|
||||
step_id UUID REFERENCES job_steps(id),
|
||||
type VARCHAR(100),
|
||||
timestamp TIMESTAMPTZ,
|
||||
actor VARCHAR(100),
|
||||
payload JSONB
|
||||
);
|
||||
```
|
||||
|
||||
## Deployment Model
|
||||
|
||||
**Mosaic wrapper + OpenClaw instance:**
|
||||
|
||||
```
|
||||
docker-compose.yml:
|
||||
mosaic-stitcher: # Control layer (our code)
|
||||
mosaic-bridge: # Chat integrations (Discord, Mattermost, Slack)
|
||||
openclaw: # Execution layer (community code)
|
||||
valkey: # Queue + cache
|
||||
postgres: # Job store, events
|
||||
```
|
||||
|
||||
**NOT separate containers per agent type.** Runner/weaver/inspector are **agent profiles** (constraints), not services. OpenClaw spawns agents with the profile constraints we define.
|
||||
|
||||
All services:
|
||||
|
||||
- Share Valkey (BullMQ queues)
|
||||
- Share PostgreSQL (job store, events)
|
||||
- Communicate via queue (stitcher → openclaw)
|
||||
|
||||
## New Modules (in API for now, extract to containers later)
|
||||
|
||||
```
|
||||
apps/api/src/
|
||||
├── stitcher/ # Workflow engine, job creation
|
||||
├── runner-jobs/ # Job CRUD, queue submission
|
||||
├── job-steps/ # Step tracking
|
||||
├── job-events/ # Event logging, WebSocket gateway
|
||||
└── workers/ # BullMQ processors (one per component type)
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
1. **Core Infrastructure** - BullMQ setup, database migrations
|
||||
2. **Coordinator Service** - Job submission, status polling, cancel/retry
|
||||
3. **Runner Worker** - Claude Code integration, step-by-step execution
|
||||
4. **Real-time Status** - WebSocket gateway, SSE for CLI
|
||||
5. **Integration Testing** - End-to-end tests
|
||||
|
||||
## Files to Modify
|
||||
|
||||
- `apps/api/src/app.module.ts` - Import new modules
|
||||
- `apps/api/src/valkey/valkey.service.ts` - Share connection with BullMQ
|
||||
- `apps/api/src/quality-orchestrator/` - Integrate with runner for gates
|
||||
- `package.json` - Add `@nestjs/bullmq`, `bullmq`
|
||||
|
||||
## Verification
|
||||
|
||||
1. Create a test job via API
|
||||
2. Verify job appears in BullMQ queue
|
||||
3. Runner picks up and executes with step events
|
||||
4. WebSocket receives real-time updates
|
||||
5. All events persisted to PostgreSQL
|
||||
6. Quality gates run before completion
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Guard Rails: Capability-Based Permission System](./guard-rails-capability-permissions.md)
|
||||
- [Quality Rails Architecture](./quality-rails-orchestration-architecture.md)
|
||||
- [Non-AI Coordinator Pattern](./non-ai-coordinator-architecture.md)
|
||||
Reference in New Issue
Block a user