Files

Jason Woltje 1bed5b3573 chore(orchestrator): Bootstrap MS23 mission-control TASKS.md + PRD (#698 )

Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>

2026-03-07 00:27:24 +00:00

28 KiB

Raw Blame History

PRD: MS23 — Mission Control Dashboard & Agent Provider Interface

Metadata

Owner: Jason Woltje
Date: 2026-03-06
Status: draft
Mission ID: ms23-mission-control-20260306
Target Version: 0.0.23
Roadmap Milestone: M6 — Orchestration (0.0.6 trajectory)
Depends On: MS22 Phase 2 (Named Agent Fleet) — COMPLETE
Related Docs:
- ~/src/jarvis-brain/docs/planning/MISSION-CONTROL-UI-PRD.md (concept origin)
- ~/src/jarvis-brain/docs/planning/FLEET-EVOLUTION-PLAN.md
- docs/PRD-MS22-P2-AGENT-FLEET.md

Problem Statement

The Mosaic orchestration backend is fully operational: agents spawn, execute tasks, publish lifecycle events via Valkey pub/sub, and can be killed via API. The frontend exposes rudimentary widgets (AgentStatusWidget, OrchestratorEventsWidget) that show aggregate status.

What's missing is operational visibility and control at the session level. There is no way to:

See what an individual agent is actually saying and doing (conversation stream per agent)
Inject a message into a running agent session without terminating it (barge-in)
Understand the parent/child relationship between orchestrators and their subagents
Connect Mosaic's orchestration layer to external agent runtimes (OpenClaw sessions, Codex ACP, raw PTY agents) through a consistent, extensible interface

Jason operates multiple projects in parallel — multiple orchestrating agents running simultaneously across missions. Today this requires context-switching between terminals, Discord channels, and status widgets. Mission Control solves this.

Mosaic is designed to be an enterprise-grade, multi-user AI operations platform. Not every user will use OpenClaw. Not every team will use Codex. Mosaic must provide a plugin adapter interface that allows any agent runtime to integrate with the same orchestration harness, control plane, and UI.

Objectives

Mission Control Dashboard — Single-pane-of-glass view: N orchestrator panels in a responsive grid, each showing a live agent chat stream with full operator controls
Per-Agent Conversation Streaming — Stream individual agent message logs (not just lifecycle events) to the frontend via SSE
Barge-In / Message Injection — Operator can inject messages directly into any running agent session with audit trail
Subagent Tree Tracking — Agents report parent/child relationships; UI renders the full agent roster as a tree
Agent Provider Interface (API) — Formal plugin adapter interface that any agent runtime can implement to integrate with Mosaic's orchestration layer
OpenClaw Provider Adapter — Reference implementation of the Agent Provider Interface for OpenClaw ACP sessions
Operator Controls — Pause, resume, graceful terminate, hard kill per agent; kill-all panic button
Audit Trail — All operator interventions (barge-in, kill, pause) logged with timestamp, user, target, and content

Scope

In Scope

Agent conversation log storage and streaming API (per-agent SSE stream of messages)
Barge-in endpoint: inject operator message into running agent session
Pause / resume agent execution
Subagent tree: parent-agent relationship on spawn registration
Agent Provider Interface: TypeScript interface + NestJS plugin module
OpenClaw adapter: implements Agent Provider Interface for OpenClaw sessions
Mission Control page (/mission-control) with grid of orchestrator panels
OrchestratorPanel component: live chat stream + barge-in input + operator controls
Global Agent Roster: tree view sidebar showing all agents + subagents with kill buttons
Audit log: UI and API for operator action history
Role: operator (full control) and observer (read-only) applied to all new endpoints

Out of Scope

Mobile layout (desktop-first, responsive grid min-width 1200px)
Multi-user concurrent barge-in coordination (single operator per session)
Historical session replay / time-travel debugging (future milestone)
Codex ACP adapter (follow-on after OpenClaw adapter validates interface)
Raw PTY adapter (follow-on)
Agent-to-agent communication graph visualization (future)
Agent marketplace / plugin registry UI (future)

Current State Assessment

What Exists (Do Not Rebuild)

Component	Location	Status
AgentSpawnerService	`apps/orchestrator/src/spawner/`	✅ Production
AgentLifecycleService	`apps/orchestrator/src/spawner/`	✅ Production
KillswitchService	`apps/orchestrator/src/killswitch/`	✅ Production
AgentEventsService	`apps/orchestrator/src/api/agents/`	✅ SSE lifecycle events
`GET /agents`	Orchestrator API	✅ Lists all agents
`POST /agents/:id/kill`	Orchestrator API	✅ Kills agent
`POST /agents/kill-all`	Orchestrator API	✅ Kills all
`GET /agents/events`	Orchestrator API	✅ SSE lifecycle stream
AgentStatusWidget	`apps/web/src/components/widgets/`	✅ Polls agent list
OrchestratorEventsWidget	`apps/web/src/components/widgets/`	✅ SSE lifecycle events
HUD widget grid	`apps/web/src/components/hud/`	✅ Drag/resize/add/remove
Chat component	`apps/web/src/components/chat/`	✅ Chat UI exists
Socket.io	`apps/api/` (speech.gateway.ts)	✅ WebSocket pattern established
CoordinatorIntegration	`apps/api/src/coordinator-integration/`	✅ API ↔ Orchestrator bridge

What's Missing (Build This)

Gap	Priority
Per-agent conversation message log (DB + API)	P0
Per-agent SSE message stream	P0
Barge-in endpoint (`POST /agents/:id/inject`)	P0
Pause / resume endpoints	P1
Subagent tree (parentAgentId on registration)	P0
Agent Provider Interface (plugin API)	P0
OpenClaw adapter (implements provider interface)	P1
Mission Control page (`/mission-control`)	P0
OrchestratorPanel component	P0
Global Agent Roster (tree view)	P0
Audit log (DB + API + UI)	P1

Architecture

Agent Provider Interface

Mosaic defines a standard contract. Any agent runtime that implements this interface integrates natively with Mission Control.

// packages/shared/src/agent-provider.interface.ts

export interface AgentSession {
  sessionId: string;
  parentSessionId?: string; // For subagent tree
  provider: string; // "internal" | "openclaw" | "codex" | ...
  status: AgentSessionStatus;
  taskId?: string;
  missionId?: string;
  agentType?: string;
  spawnedAt: string;
  startedAt?: string;
  completedAt?: string;
  error?: string;
  metadata?: Record<string, unknown>;
}

export type AgentSessionStatus =
  | "spawning"
  | "running"
  | "waiting"
  | "paused"
  | "completed"
  | "failed"
  | "killed";

export interface AgentMessage {
  messageId: string;
  sessionId: string;
  role: "agent" | "user" | "system" | "operator";
  content: string;
  timestamp: string;
  metadata?: Record<string, unknown>;
}

export interface IAgentProvider {
  readonly providerName: string;

  /** List all currently active sessions */
  listSessions(): Promise<AgentSession[]>;

  /** Get a single session's current state */
  getSession(sessionId: string): Promise<AgentSession | null>;

  /** Get recent messages for a session */
  getMessages(sessionId: string, limit?: number): Promise<AgentMessage[]>;

  /** Subscribe to a session's message stream. Returns unsubscribe fn. */
  subscribeToMessages(sessionId: string, handler: (message: AgentMessage) => void): () => void;

  /** Inject an operator message into a running session (barge-in) */
  injectMessage(sessionId: string, content: string, operatorId: string): Promise<void>;

  /** Pause a running agent session */
  pause(sessionId: string): Promise<void>;

  /** Resume a paused agent session */
  resume(sessionId: string): Promise<void>;

  /** Graceful terminate — allow agent to finish current step */
  terminate(sessionId: string): Promise<void>;

  /** Hard kill — immediate termination */
  kill(sessionId: string): Promise<void>;
}

Internal Provider

The existing orchestrator's Docker-based agents implement IAgentProvider as the "internal" provider. No behavior change — just wraps existing services behind the interface.

OpenClaw Provider

Connects to an OpenClaw gateway via its REST API:

GET /sessions → listSessions()
GET /sessions/:key/history → getMessages()
POST /sessions/:key/send → injectMessage()
OpenClaw SSE or polling → subscribeToMessages()

Config per workspace in DB (AgentProvider table): gateway URL, API token.

Provider Registry

// apps/api/src/agent-providers/provider-registry.service.ts
@Injectable()
export class AgentProviderRegistry {
  register(provider: IAgentProvider): void;
  getProvider(name: string): IAgentProvider;
  getAllProviders(): IAgentProvider[];
  listAllSessions(): Promise<AgentSession[]>; // Aggregates across all providers
}

Database Schema

New Tables

// AgentConversationMessage — stores all agent messages for streaming + history
model AgentConversationMessage {
  id          String   @id @default(cuid())
  sessionId   String                         // matches agentId in orchestrator
  provider    String   @default("internal")  // "internal" | "openclaw" | ...
  role        String                         // "agent" | "user" | "system" | "operator"
  content     String
  timestamp   DateTime @default(now())
  metadata    Json     @default("{}")

  @@index([sessionId, timestamp])
}

// AgentSessionTree — tracks parent/child relationships
model AgentSessionTree {
  id              String   @id @default(cuid())
  sessionId       String   @unique
  parentSessionId String?
  provider        String   @default("internal")
  missionId       String?
  taskId          String?
  agentType       String?
  status          String   @default("spawning")
  spawnedAt       DateTime @default(now())
  completedAt     DateTime?
  metadata        Json     @default("{}")

  @@index([parentSessionId])
  @@index([missionId])
}

// AgentProviderConfig — external provider registration per workspace
model AgentProviderConfig {
  id          String   @id @default(cuid())
  workspaceId String
  name        String                         // "openclaw-prod", "codex-team", ...
  provider    String                         // "openclaw" | "codex" | ...
  gatewayUrl  String
  credentials Json     @default("{}")        // Encrypted via CryptoService
  isActive    Boolean  @default(true)
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  @@unique([workspaceId, name])
}

// OperatorAuditLog — all operator interventions
model OperatorAuditLog {
  id         String   @id @default(cuid())
  userId     String
  sessionId  String
  provider   String
  action     String                          // "barge-in" | "kill" | "pause" | "resume" | "kill-all"
  content    String?                         // For barge-in: message injected
  metadata   Json     @default("{}")
  createdAt  DateTime @default(now())

  @@index([sessionId])
  @@index([userId])
  @@index([createdAt])
}

API Endpoints

Orchestrator API — New Endpoints

POST /agents/:agentId/inject       — Barge-in: inject operator message
POST /agents/:agentId/pause        — Pause agent execution
POST /agents/:agentId/resume       — Resume paused agent
GET  /agents/:agentId/messages     — Get message history (paginated)
GET  /agents/:agentId/messages/stream — SSE: live message stream for this agent
GET  /agents/tree                  — Full subagent tree (all agents with parent/child)

Main API — New Endpoints

# Agent Provider Management
GET    /api/agent-providers                     — List configured providers
POST   /api/agent-providers                     — Register external provider
PATCH  /api/agent-providers/:id                 — Update provider config
DELETE /api/agent-providers/:id                 — Remove provider

# Unified Session View (aggregates all providers)
GET    /api/mission-control/sessions            — All active sessions (all providers)
GET    /api/mission-control/sessions/:id        — Single session details
GET    /api/mission-control/sessions/:id/messages — Message history
GET    /api/mission-control/sessions/:id/stream   — SSE message stream (proxied)
POST   /api/mission-control/sessions/:id/inject   — Barge-in (proxied to provider)
POST   /api/mission-control/sessions/:id/pause    — Pause (proxied)
POST   /api/mission-control/sessions/:id/resume   — Resume (proxied)
POST   /api/mission-control/sessions/:id/kill     — Kill (proxied)
GET    /api/mission-control/tree                  — Full agent tree (all providers)
GET    /api/mission-control/audit                 — Operator audit log (paginated)

Authorization

All Mission Control endpoints require auth + workspace context.

operator role: full access (read + inject + kill + pause)
observer role: read-only (no inject, no kill, no pause)
admin role: full access + provider config management

Frontend — Mission Control Page

Route

/mission-control — new top-level page in the web app, linked in sidebar under "Orchestration"

Layout

┌─────────────────────────────────────────────────────────────────┐
│ ⚙ MISSION CONTROL              [+ Add Panel] [🔴 KILL ALL]     │
├──────────────────────────────────────┬──────────────────────────┤
│                                      │ ACTIVE AGENTS            │
│  ┌──────────────┬──────────────┐     │ ▼ ms22 [internal] 🟢    │
│  │ [Panel: ms22]│ [Panel: SAGE]│     │   ├ codex-1 task-api 🟢 │
│  │  🟢 3 agents │  🟡 1 agent  │     │   ├ codex-2 task-ui  🟢 │
│  │              │              │     │   └ glm-1   task-db  🟡 │
│  │ [chat stream]│ [chat stream]│     │ ▼ SAGE [openclaw] 🟢    │
│  │              │              │     │   └ codex-1 task-prd 🟢 │
│  │ [input    ▶] │ [input    ▶] │     │                          │
│  │ [⚡][⏸][💀] │ [⚡][⏸][💀] │     │ [⏸ pause] [💀 kill] per │
│  └──────────────┴──────────────┘     │ agent                   │
│                                      │                          │
│  [+ Add Orchestrator Panel]          │ [📋 Audit Log]          │
└──────────────────────────────────────┴──────────────────────────┘

Components

MissionControlPage (/app/mission-control/page.tsx)

Fetches active sessions from /api/mission-control/sessions
Renders N OrchestratorPanel in a responsive CSS grid
Sidebar: GlobalAgentRoster
Header: session count, Kill All button (confirm dialog)

OrchestratorPanel (components/mission-control/OrchestratorPanel.tsx)

Props: sessionId, provider, title
Subscribes to /api/mission-control/sessions/:id/stream (SSE)
Renders scrollable message list (role-tagged, styled by role)
Input box + Send button (barge-in → POST /inject)
Header: status badge, agent count, elapsed time, ⚡ Barge-In toggle, ⏸ Pause, 💀 Kill
Expandable to full-screen (modal overlay)
Color-coded border by status (green/yellow/red/gray)

GlobalAgentRoster (components/mission-control/GlobalAgentRoster.tsx)

Fetches /api/mission-control/tree
Renders tree: orch session → indented subagents
Per-row: provider badge, status dot, task label, elapsed, Kill button
Real-time updates via polling or SSE events

BargeInInput (components/mission-control/BargeInInput.tsx)

Elevated textarea that renders inside a panel
"Pause before send" checkbox
Sends to POST /inject, shows confirmation

AuditLogDrawer (components/mission-control/AuditLogDrawer.tsx)

Slide-in drawer from right
Paginated table: timestamp, user, action, session, content preview
Triggered from sidebar "Audit Log" button

KillAllDialog (components/mission-control/KillAllDialog.tsx)

Confirmation modal with provider scope selector
"Kill all internal agents" / "Kill all (all providers)"
Requires typing "KILL ALL" to confirm

Implementation Phases

Phase 0 — Foundation (Backend Core)

Backend infrastructure required before any UI work.

Task	Description	Scope	Est
MS23-P0-001	Prisma schema: AgentConversationMessage, AgentSessionTree, AgentProviderConfig, OperatorAuditLog — see mosaic-queue note below	api	15K
MS23-P0-002	Agent message ingestion: wire spawner/lifecycle to write messages to DB	orchestrator	20K
MS23-P0-003	Orchestrator API: `GET /agents/:id/messages` + SSE stream endpoint	orchestrator	20K
MS23-P0-004	Orchestrator API: `POST /agents/:id/inject` + pause/resume	orchestrator	15K
MS23-P0-005	Subagent tree: `parentAgentId` on spawn registration + `GET /agents/tree`	orchestrator	15K
MS23-P0-006	Unit + integration tests for all P0 orchestrator endpoints	orchestrator	20K

Phase 0 gate: All orchestrator endpoints tested and green. Per-agent message stream verified via curl/SSE client.

mosaic-queue Integration Note

mosaic-queue (~/src/mosaic-queue) is a standalone Valkey-backed task registry (CLI + MCP server) that agents use to claim and complete tasks in a pull model. It is complementary to — not a replacement for — the orchestrator's internal QueueService (which is push-based agent dispatch).

Schema impact on MS23-P0-001:

AgentSessionTree.taskId should be String? and optionally reference a mosaic-queue task key

Add AgentSessionTree.taskSource String? @default("internal") — values: "internal" | "mosaic-queue" | "external"

This allows Mission Control's agent roster to resolve task metadata (title, priority, status) from the correct source

Future integration point: mosaic-queue Phase 3 ("coordinator integration") will wire the coordinator to claim tasks from mosaic-queue and spawn orchestrator agents against them. When that ships, Mission Control will inherit rich task context (title, lane, priority, retry count) from the queue automatically — no rework needed in MS23's data model if taskSource is present from the start.

No blocking dependency: mosaic-queue Phase 3 is not required for MS23. The taskSource field is additive and can be null initially.

Phase 1 — Provider Interface (Plugin Architecture)

Task	Description	Scope	Est
MS23-P1-001	`IAgentProvider` interface + shared types in `packages/shared`	shared	10K
MS23-P1-002	`InternalAgentProvider`: wrap existing orchestrator services behind interface	api	20K
MS23-P1-003	`AgentProviderRegistry`: register/retrieve providers, aggregate listSessions	api	15K
MS23-P1-004	`AgentProviderConfig` CRUD API (`/api/agent-providers`)	api	15K
MS23-P1-005	Mission Control proxy API (`/api/mission-control/*`): routes to registry, handles SSE proxying, writes audit log	api	30K
MS23-P1-006	Unit tests for registry, proxy service, internal provider	api	20K

Phase 1 gate: Unified /api/mission-control/sessions returns sessions from internal provider. Proxy routes correctly to internal provider for kill/pause/inject. Audit log persisted.

Phase 2 — Mission Control UI

Task	Description	Scope	Est
MS23-P2-001	`/mission-control` page route + layout shell	web	10K
MS23-P2-002	`OrchestratorPanel` component: SSE message stream, chat display	web	25K
MS23-P2-003	`BargeInInput` component: inject message, pause-before-send	web	15K
MS23-P2-004	Panel operator controls: pause, resume, graceful kill, hard kill	web	15K
MS23-P2-005	`GlobalAgentRoster` sidebar: tree view, per-agent kill	web	20K
MS23-P2-006	`KillAllDialog`: confirmation modal with scope selector	web	10K
MS23-P2-007	`AuditLogDrawer`: paginated audit history	web	15K
MS23-P2-008	Panel grid: responsive layout, add/remove panels, expand to full-screen	web	20K
MS23-P2-009	Frontend tests (vitest + Playwright E2E for mission control page)	web	25K

Phase 2 gate: Mission Control page renders with live panels. Barge-in sends and displays. Kill triggers confirmation and removes agent from roster. Audit log shows entries. All tests green.

Phase 3 — OpenClaw Provider Adapter

Task	Description	Scope	Est
MS23-P3-001	`OpenClawProvider`: implement `IAgentProvider` against OpenClaw REST API	api	25K
MS23-P3-002	OpenClaw session polling / SSE bridge: translate OpenClaw events to `AgentMessage`	api	20K
MS23-P3-003	Provider config UI: register OpenClaw gateway (URL + API token) in Settings	web	15K
MS23-P3-004	E2E test: OpenClaw provider registered → sessions appear in Mission Control	api+web	20K

Phase 3 gate: OpenClaw sessions visible in Mission Control alongside internal agents. Barge-in to OpenClaw session injects message and shows in panel stream.

Phase 4 — Verification & Release

Task	Description	Scope	Est
MS23-P4-001	Full QA: all gates (lint, typecheck, unit, E2E)	stack	10K
MS23-P4-002	Security review: auth on all new endpoints, audit log integrity, barge-in rate limiting	api	10K
MS23-P4-003	Deploy to production (mosaic.woltje.com), smoke test with live agents	stack	5K
MS23-P4-004	Update ROADMAP.md + CHANGELOG.md, tag v0.0.23	stack	3K

Completion Gates (Mandatory)

Per Mosaic E2E delivery framework — a task is NOT done until:

Code review (independent review of every changed file)
Security review (auth, input validation, error leakage)
QA / tests green (pnpm turbo lint typecheck test)
CI pipeline green after merge
Gitea issue closed
Docs updated for any API or schema changes

Token Budget Estimate

Phase	Tasks	Estimate
Phase 0 — Backend Core	6	~105K
Phase 1 — Provider Interface	6	~110K
Phase 2 — Mission Control UI	9	~155K
Phase 3 — OpenClaw Adapter	4	~80K
Phase 4 — Verification	4	~28K
Total	29	~478K

Recommended split: Codex for UI (Phase 2) and routine API work. Sonnet for provider interface design and complex streaming logic.

Security Considerations

All Mission Control endpoints require authenticated session + workspace membership
Barge-in rate-limited: 10 requests/minute per operator per session
Kill All requires explicit confirmation (UI + double-confirm pattern)
External provider credentials stored encrypted (AES-256-GCM via CryptoService)
Audit log is append-only; no delete endpoint
SSE streams authenticated via session cookie (no unauthenticated streams)
Operator actions tagged with userId for full traceability
observer role enforced at middleware level — cannot be bypassed by frontend

Open Questions

Panel persistence: Should the grid layout (which sessions are pinned as panels) be stored in DB per user or in localStorage? Recommend DB for cross-device consistency.
Message retention: How long to keep AgentConversationMessage records? Suggest 30-day default with configurable workspace policy.
OpenClaw barge-in protocol: Does OpenClaw's sessions_send API support injection mid-run, or does it queue behind the current turn? Needs verification against OpenClaw API before MS23-P3-001.
Subagent reporting: Internal agents currently don't self-report a parentAgentId at spawn time. The orchestrator spawner needs to accept this field. Straightforward add to SpawnAgentDto.
SSE vs WebSocket for message streaming: Current orchestrator uses SSE (one-way push). For barge-in confirmation/ack, SSE is sufficient (inject is a separate REST call). No need to upgrade to bidirectional WebSocket for Phase 0-2.
mosaic-queue Phase 3 timing: mosaic-queue's coordinator integration phase is not yet scheduled. If it ships during MS23 development, the taskSource field in AgentSessionTree is the integration point — no schema migration required. The Mission Control roster can conditionally render task details from mosaic-queue when taskSource === "mosaic-queue" and the queue MCP/API is reachable.

Success Criteria

Operator can open Mission Control and see all running orchestrator sessions as live panels
Each panel shows the agent's actual conversation messages in real time
Operator can type into any panel and inject a message; it appears in the stream tagged [OPERATOR]
Operator can pause, resume, gracefully terminate, or hard-kill any agent from the panel or roster
Global Agent Roster shows the full parent → subagent tree across all providers
Kill All button with confirmation terminates all active agents
All operator actions appear in the Audit Log with full attribution
OpenClaw sessions registered as an external provider appear in Mission Control alongside internal agents
observer role users can see everything but cannot inject, pause, or kill
All CI gates green, deployed to production

28 KiB Raw Blame History