Files
stack/docs/PRD-MS23-mission-control.md
2026-03-07 00:27:24 +00:00

28 KiB

PRD: MS23 — Mission Control Dashboard & Agent Provider Interface

Metadata

  • Owner: Jason Woltje
  • Date: 2026-03-06
  • Status: draft
  • Mission ID: ms23-mission-control-20260306
  • Target Version: 0.0.23
  • Roadmap Milestone: M6 — Orchestration (0.0.6 trajectory)
  • Depends On: MS22 Phase 2 (Named Agent Fleet) — COMPLETE
  • Related Docs:
    • ~/src/jarvis-brain/docs/planning/MISSION-CONTROL-UI-PRD.md (concept origin)
    • ~/src/jarvis-brain/docs/planning/FLEET-EVOLUTION-PLAN.md
    • docs/PRD-MS22-P2-AGENT-FLEET.md

Problem Statement

The Mosaic orchestration backend is fully operational: agents spawn, execute tasks, publish lifecycle events via Valkey pub/sub, and can be killed via API. The frontend exposes rudimentary widgets (AgentStatusWidget, OrchestratorEventsWidget) that show aggregate status.

What's missing is operational visibility and control at the session level. There is no way to:

  1. See what an individual agent is actually saying and doing (conversation stream per agent)
  2. Inject a message into a running agent session without terminating it (barge-in)
  3. Understand the parent/child relationship between orchestrators and their subagents
  4. Connect Mosaic's orchestration layer to external agent runtimes (OpenClaw sessions, Codex ACP, raw PTY agents) through a consistent, extensible interface

Jason operates multiple projects in parallel — multiple orchestrating agents running simultaneously across missions. Today this requires context-switching between terminals, Discord channels, and status widgets. Mission Control solves this.

Mosaic is designed to be an enterprise-grade, multi-user AI operations platform. Not every user will use OpenClaw. Not every team will use Codex. Mosaic must provide a plugin adapter interface that allows any agent runtime to integrate with the same orchestration harness, control plane, and UI.


Objectives

  1. Mission Control Dashboard — Single-pane-of-glass view: N orchestrator panels in a responsive grid, each showing a live agent chat stream with full operator controls
  2. Per-Agent Conversation Streaming — Stream individual agent message logs (not just lifecycle events) to the frontend via SSE
  3. Barge-In / Message Injection — Operator can inject messages directly into any running agent session with audit trail
  4. Subagent Tree Tracking — Agents report parent/child relationships; UI renders the full agent roster as a tree
  5. Agent Provider Interface (API) — Formal plugin adapter interface that any agent runtime can implement to integrate with Mosaic's orchestration layer
  6. OpenClaw Provider Adapter — Reference implementation of the Agent Provider Interface for OpenClaw ACP sessions
  7. Operator Controls — Pause, resume, graceful terminate, hard kill per agent; kill-all panic button
  8. Audit Trail — All operator interventions (barge-in, kill, pause) logged with timestamp, user, target, and content

Scope

In Scope

  • Agent conversation log storage and streaming API (per-agent SSE stream of messages)
  • Barge-in endpoint: inject operator message into running agent session
  • Pause / resume agent execution
  • Subagent tree: parent-agent relationship on spawn registration
  • Agent Provider Interface: TypeScript interface + NestJS plugin module
  • OpenClaw adapter: implements Agent Provider Interface for OpenClaw sessions
  • Mission Control page (/mission-control) with grid of orchestrator panels
  • OrchestratorPanel component: live chat stream + barge-in input + operator controls
  • Global Agent Roster: tree view sidebar showing all agents + subagents with kill buttons
  • Audit log: UI and API for operator action history
  • Role: operator (full control) and observer (read-only) applied to all new endpoints

Out of Scope

  • Mobile layout (desktop-first, responsive grid min-width 1200px)
  • Multi-user concurrent barge-in coordination (single operator per session)
  • Historical session replay / time-travel debugging (future milestone)
  • Codex ACP adapter (follow-on after OpenClaw adapter validates interface)
  • Raw PTY adapter (follow-on)
  • Agent-to-agent communication graph visualization (future)
  • Agent marketplace / plugin registry UI (future)

Current State Assessment

What Exists (Do Not Rebuild)

Component Location Status
AgentSpawnerService apps/orchestrator/src/spawner/ Production
AgentLifecycleService apps/orchestrator/src/spawner/ Production
KillswitchService apps/orchestrator/src/killswitch/ Production
AgentEventsService apps/orchestrator/src/api/agents/ SSE lifecycle events
GET /agents Orchestrator API Lists all agents
POST /agents/:id/kill Orchestrator API Kills agent
POST /agents/kill-all Orchestrator API Kills all
GET /agents/events Orchestrator API SSE lifecycle stream
AgentStatusWidget apps/web/src/components/widgets/ Polls agent list
OrchestratorEventsWidget apps/web/src/components/widgets/ SSE lifecycle events
HUD widget grid apps/web/src/components/hud/ Drag/resize/add/remove
Chat component apps/web/src/components/chat/ Chat UI exists
Socket.io apps/api/ (speech.gateway.ts) WebSocket pattern established
CoordinatorIntegration apps/api/src/coordinator-integration/ API ↔ Orchestrator bridge

What's Missing (Build This)

Gap Priority
Per-agent conversation message log (DB + API) P0
Per-agent SSE message stream P0
Barge-in endpoint (POST /agents/:id/inject) P0
Pause / resume endpoints P1
Subagent tree (parentAgentId on registration) P0
Agent Provider Interface (plugin API) P0
OpenClaw adapter (implements provider interface) P1
Mission Control page (/mission-control) P0
OrchestratorPanel component P0
Global Agent Roster (tree view) P0
Audit log (DB + API + UI) P1

Architecture

Agent Provider Interface

Mosaic defines a standard contract. Any agent runtime that implements this interface integrates natively with Mission Control.

// packages/shared/src/agent-provider.interface.ts

export interface AgentSession {
  sessionId: string;
  parentSessionId?: string; // For subagent tree
  provider: string; // "internal" | "openclaw" | "codex" | ...
  status: AgentSessionStatus;
  taskId?: string;
  missionId?: string;
  agentType?: string;
  spawnedAt: string;
  startedAt?: string;
  completedAt?: string;
  error?: string;
  metadata?: Record<string, unknown>;
}

export type AgentSessionStatus =
  | "spawning"
  | "running"
  | "waiting"
  | "paused"
  | "completed"
  | "failed"
  | "killed";

export interface AgentMessage {
  messageId: string;
  sessionId: string;
  role: "agent" | "user" | "system" | "operator";
  content: string;
  timestamp: string;
  metadata?: Record<string, unknown>;
}

export interface IAgentProvider {
  readonly providerName: string;

  /** List all currently active sessions */
  listSessions(): Promise<AgentSession[]>;

  /** Get a single session's current state */
  getSession(sessionId: string): Promise<AgentSession | null>;

  /** Get recent messages for a session */
  getMessages(sessionId: string, limit?: number): Promise<AgentMessage[]>;

  /** Subscribe to a session's message stream. Returns unsubscribe fn. */
  subscribeToMessages(sessionId: string, handler: (message: AgentMessage) => void): () => void;

  /** Inject an operator message into a running session (barge-in) */
  injectMessage(sessionId: string, content: string, operatorId: string): Promise<void>;

  /** Pause a running agent session */
  pause(sessionId: string): Promise<void>;

  /** Resume a paused agent session */
  resume(sessionId: string): Promise<void>;

  /** Graceful terminate — allow agent to finish current step */
  terminate(sessionId: string): Promise<void>;

  /** Hard kill — immediate termination */
  kill(sessionId: string): Promise<void>;
}

Internal Provider

The existing orchestrator's Docker-based agents implement IAgentProvider as the "internal" provider. No behavior change — just wraps existing services behind the interface.

OpenClaw Provider

Connects to an OpenClaw gateway via its REST API:

  • GET /sessionslistSessions()
  • GET /sessions/:key/historygetMessages()
  • POST /sessions/:key/sendinjectMessage()
  • OpenClaw SSE or polling → subscribeToMessages()

Config per workspace in DB (AgentProvider table): gateway URL, API token.

Provider Registry

// apps/api/src/agent-providers/provider-registry.service.ts
@Injectable()
export class AgentProviderRegistry {
  register(provider: IAgentProvider): void;
  getProvider(name: string): IAgentProvider;
  getAllProviders(): IAgentProvider[];
  listAllSessions(): Promise<AgentSession[]>; // Aggregates across all providers
}

Database Schema

New Tables

// AgentConversationMessage — stores all agent messages for streaming + history
model AgentConversationMessage {
  id          String   @id @default(cuid())
  sessionId   String                         // matches agentId in orchestrator
  provider    String   @default("internal")  // "internal" | "openclaw" | ...
  role        String                         // "agent" | "user" | "system" | "operator"
  content     String
  timestamp   DateTime @default(now())
  metadata    Json     @default("{}")

  @@index([sessionId, timestamp])
}

// AgentSessionTree — tracks parent/child relationships
model AgentSessionTree {
  id              String   @id @default(cuid())
  sessionId       String   @unique
  parentSessionId String?
  provider        String   @default("internal")
  missionId       String?
  taskId          String?
  agentType       String?
  status          String   @default("spawning")
  spawnedAt       DateTime @default(now())
  completedAt     DateTime?
  metadata        Json     @default("{}")

  @@index([parentSessionId])
  @@index([missionId])
}

// AgentProviderConfig — external provider registration per workspace
model AgentProviderConfig {
  id          String   @id @default(cuid())
  workspaceId String
  name        String                         // "openclaw-prod", "codex-team", ...
  provider    String                         // "openclaw" | "codex" | ...
  gatewayUrl  String
  credentials Json     @default("{}")        // Encrypted via CryptoService
  isActive    Boolean  @default(true)
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  @@unique([workspaceId, name])
}

// OperatorAuditLog — all operator interventions
model OperatorAuditLog {
  id         String   @id @default(cuid())
  userId     String
  sessionId  String
  provider   String
  action     String                          // "barge-in" | "kill" | "pause" | "resume" | "kill-all"
  content    String?                         // For barge-in: message injected
  metadata   Json     @default("{}")
  createdAt  DateTime @default(now())

  @@index([sessionId])
  @@index([userId])
  @@index([createdAt])
}

API Endpoints

Orchestrator API — New Endpoints

POST /agents/:agentId/inject       — Barge-in: inject operator message
POST /agents/:agentId/pause        — Pause agent execution
POST /agents/:agentId/resume       — Resume paused agent
GET  /agents/:agentId/messages     — Get message history (paginated)
GET  /agents/:agentId/messages/stream — SSE: live message stream for this agent
GET  /agents/tree                  — Full subagent tree (all agents with parent/child)

Main API — New Endpoints

# Agent Provider Management
GET    /api/agent-providers                     — List configured providers
POST   /api/agent-providers                     — Register external provider
PATCH  /api/agent-providers/:id                 — Update provider config
DELETE /api/agent-providers/:id                 — Remove provider

# Unified Session View (aggregates all providers)
GET    /api/mission-control/sessions            — All active sessions (all providers)
GET    /api/mission-control/sessions/:id        — Single session details
GET    /api/mission-control/sessions/:id/messages — Message history
GET    /api/mission-control/sessions/:id/stream   — SSE message stream (proxied)
POST   /api/mission-control/sessions/:id/inject   — Barge-in (proxied to provider)
POST   /api/mission-control/sessions/:id/pause    — Pause (proxied)
POST   /api/mission-control/sessions/:id/resume   — Resume (proxied)
POST   /api/mission-control/sessions/:id/kill     — Kill (proxied)
GET    /api/mission-control/tree                  — Full agent tree (all providers)
GET    /api/mission-control/audit                 — Operator audit log (paginated)

Authorization

All Mission Control endpoints require auth + workspace context.

  • operator role: full access (read + inject + kill + pause)
  • observer role: read-only (no inject, no kill, no pause)
  • admin role: full access + provider config management

Frontend — Mission Control Page

Route

/mission-control — new top-level page in the web app, linked in sidebar under "Orchestration"

Layout

┌─────────────────────────────────────────────────────────────────┐
│ ⚙ MISSION CONTROL              [+ Add Panel] [🔴 KILL ALL]     │
├──────────────────────────────────────┬──────────────────────────┤
│                                      │ ACTIVE AGENTS            │
│  ┌──────────────┬──────────────┐     │ ▼ ms22 [internal] 🟢    │
│  │ [Panel: ms22]│ [Panel: SAGE]│     │   ├ codex-1 task-api 🟢 │
│  │  🟢 3 agents │  🟡 1 agent  │     │   ├ codex-2 task-ui  🟢 │
│  │              │              │     │   └ glm-1   task-db  🟡 │
│  │ [chat stream]│ [chat stream]│     │ ▼ SAGE [openclaw] 🟢    │
│  │              │              │     │   └ codex-1 task-prd 🟢 │
│  │ [input    ▶] │ [input    ▶] │     │                          │
│  │ [⚡][⏸][💀] │ [⚡][⏸][💀] │     │ [⏸ pause] [💀 kill] per │
│  └──────────────┴──────────────┘     │ agent                   │
│                                      │                          │
│  [+ Add Orchestrator Panel]          │ [📋 Audit Log]          │
└──────────────────────────────────────┴──────────────────────────┘

Components

MissionControlPage (/app/mission-control/page.tsx)

  • Fetches active sessions from /api/mission-control/sessions
  • Renders N OrchestratorPanel in a responsive CSS grid
  • Sidebar: GlobalAgentRoster
  • Header: session count, Kill All button (confirm dialog)

OrchestratorPanel (components/mission-control/OrchestratorPanel.tsx)

  • Props: sessionId, provider, title
  • Subscribes to /api/mission-control/sessions/:id/stream (SSE)
  • Renders scrollable message list (role-tagged, styled by role)
  • Input box + Send button (barge-in → POST /inject)
  • Header: status badge, agent count, elapsed time, Barge-In toggle, ⏸ Pause, 💀 Kill
  • Expandable to full-screen (modal overlay)
  • Color-coded border by status (green/yellow/red/gray)

GlobalAgentRoster (components/mission-control/GlobalAgentRoster.tsx)

  • Fetches /api/mission-control/tree
  • Renders tree: orch session → indented subagents
  • Per-row: provider badge, status dot, task label, elapsed, Kill button
  • Real-time updates via polling or SSE events

BargeInInput (components/mission-control/BargeInInput.tsx)

  • Elevated textarea that renders inside a panel
  • "Pause before send" checkbox
  • Sends to POST /inject, shows confirmation

AuditLogDrawer (components/mission-control/AuditLogDrawer.tsx)

  • Slide-in drawer from right
  • Paginated table: timestamp, user, action, session, content preview
  • Triggered from sidebar "Audit Log" button

KillAllDialog (components/mission-control/KillAllDialog.tsx)

  • Confirmation modal with provider scope selector
  • "Kill all internal agents" / "Kill all (all providers)"
  • Requires typing "KILL ALL" to confirm

Implementation Phases

Phase 0 — Foundation (Backend Core)

Backend infrastructure required before any UI work.

Task Description Scope Est
MS23-P0-001 Prisma schema: AgentConversationMessage, AgentSessionTree, AgentProviderConfig, OperatorAuditLog — see mosaic-queue note below api 15K
MS23-P0-002 Agent message ingestion: wire spawner/lifecycle to write messages to DB orchestrator 20K
MS23-P0-003 Orchestrator API: GET /agents/:id/messages + SSE stream endpoint orchestrator 20K
MS23-P0-004 Orchestrator API: POST /agents/:id/inject + pause/resume orchestrator 15K
MS23-P0-005 Subagent tree: parentAgentId on spawn registration + GET /agents/tree orchestrator 15K
MS23-P0-006 Unit + integration tests for all P0 orchestrator endpoints orchestrator 20K

Phase 0 gate: All orchestrator endpoints tested and green. Per-agent message stream verified via curl/SSE client.

mosaic-queue Integration Note

mosaic-queue (~/src/mosaic-queue) is a standalone Valkey-backed task registry (CLI + MCP server) that agents use to claim and complete tasks in a pull model. It is complementary to — not a replacement for — the orchestrator's internal QueueService (which is push-based agent dispatch).

Schema impact on MS23-P0-001:

  • AgentSessionTree.taskId should be String? and optionally reference a mosaic-queue task key
  • Add AgentSessionTree.taskSource String? @default("internal") — values: "internal" | "mosaic-queue" | "external"
  • This allows Mission Control's agent roster to resolve task metadata (title, priority, status) from the correct source

Future integration point: mosaic-queue Phase 3 ("coordinator integration") will wire the coordinator to claim tasks from mosaic-queue and spawn orchestrator agents against them. When that ships, Mission Control will inherit rich task context (title, lane, priority, retry count) from the queue automatically — no rework needed in MS23's data model if taskSource is present from the start.

No blocking dependency: mosaic-queue Phase 3 is not required for MS23. The taskSource field is additive and can be null initially.

Phase 1 — Provider Interface (Plugin Architecture)

Task Description Scope Est
MS23-P1-001 IAgentProvider interface + shared types in packages/shared shared 10K
MS23-P1-002 InternalAgentProvider: wrap existing orchestrator services behind interface api 20K
MS23-P1-003 AgentProviderRegistry: register/retrieve providers, aggregate listSessions api 15K
MS23-P1-004 AgentProviderConfig CRUD API (/api/agent-providers) api 15K
MS23-P1-005 Mission Control proxy API (/api/mission-control/*): routes to registry, handles SSE proxying, writes audit log api 30K
MS23-P1-006 Unit tests for registry, proxy service, internal provider api 20K

Phase 1 gate: Unified /api/mission-control/sessions returns sessions from internal provider. Proxy routes correctly to internal provider for kill/pause/inject. Audit log persisted.

Phase 2 — Mission Control UI

Task Description Scope Est
MS23-P2-001 /mission-control page route + layout shell web 10K
MS23-P2-002 OrchestratorPanel component: SSE message stream, chat display web 25K
MS23-P2-003 BargeInInput component: inject message, pause-before-send web 15K
MS23-P2-004 Panel operator controls: pause, resume, graceful kill, hard kill web 15K
MS23-P2-005 GlobalAgentRoster sidebar: tree view, per-agent kill web 20K
MS23-P2-006 KillAllDialog: confirmation modal with scope selector web 10K
MS23-P2-007 AuditLogDrawer: paginated audit history web 15K
MS23-P2-008 Panel grid: responsive layout, add/remove panels, expand to full-screen web 20K
MS23-P2-009 Frontend tests (vitest + Playwright E2E for mission control page) web 25K

Phase 2 gate: Mission Control page renders with live panels. Barge-in sends and displays. Kill triggers confirmation and removes agent from roster. Audit log shows entries. All tests green.

Phase 3 — OpenClaw Provider Adapter

Task Description Scope Est
MS23-P3-001 OpenClawProvider: implement IAgentProvider against OpenClaw REST API api 25K
MS23-P3-002 OpenClaw session polling / SSE bridge: translate OpenClaw events to AgentMessage api 20K
MS23-P3-003 Provider config UI: register OpenClaw gateway (URL + API token) in Settings web 15K
MS23-P3-004 E2E test: OpenClaw provider registered → sessions appear in Mission Control api+web 20K

Phase 3 gate: OpenClaw sessions visible in Mission Control alongside internal agents. Barge-in to OpenClaw session injects message and shows in panel stream.

Phase 4 — Verification & Release

Task Description Scope Est
MS23-P4-001 Full QA: all gates (lint, typecheck, unit, E2E) stack 10K
MS23-P4-002 Security review: auth on all new endpoints, audit log integrity, barge-in rate limiting api 10K
MS23-P4-003 Deploy to production (mosaic.woltje.com), smoke test with live agents stack 5K
MS23-P4-004 Update ROADMAP.md + CHANGELOG.md, tag v0.0.23 stack 3K

Completion Gates (Mandatory)

Per Mosaic E2E delivery framework — a task is NOT done until:

  • Code review (independent review of every changed file)
  • Security review (auth, input validation, error leakage)
  • QA / tests green (pnpm turbo lint typecheck test)
  • CI pipeline green after merge
  • Gitea issue closed
  • Docs updated for any API or schema changes

Token Budget Estimate

Phase Tasks Estimate
Phase 0 — Backend Core 6 ~105K
Phase 1 — Provider Interface 6 ~110K
Phase 2 — Mission Control UI 9 ~155K
Phase 3 — OpenClaw Adapter 4 ~80K
Phase 4 — Verification 4 ~28K
Total 29 ~478K

Recommended split: Codex for UI (Phase 2) and routine API work. Sonnet for provider interface design and complex streaming logic.


Security Considerations

  • All Mission Control endpoints require authenticated session + workspace membership
  • Barge-in rate-limited: 10 requests/minute per operator per session
  • Kill All requires explicit confirmation (UI + double-confirm pattern)
  • External provider credentials stored encrypted (AES-256-GCM via CryptoService)
  • Audit log is append-only; no delete endpoint
  • SSE streams authenticated via session cookie (no unauthenticated streams)
  • Operator actions tagged with userId for full traceability
  • observer role enforced at middleware level — cannot be bypassed by frontend

Open Questions

  1. Panel persistence: Should the grid layout (which sessions are pinned as panels) be stored in DB per user or in localStorage? Recommend DB for cross-device consistency.

  2. Message retention: How long to keep AgentConversationMessage records? Suggest 30-day default with configurable workspace policy.

  3. OpenClaw barge-in protocol: Does OpenClaw's sessions_send API support injection mid-run, or does it queue behind the current turn? Needs verification against OpenClaw API before MS23-P3-001.

  4. Subagent reporting: Internal agents currently don't self-report a parentAgentId at spawn time. The orchestrator spawner needs to accept this field. Straightforward add to SpawnAgentDto.

  5. SSE vs WebSocket for message streaming: Current orchestrator uses SSE (one-way push). For barge-in confirmation/ack, SSE is sufficient (inject is a separate REST call). No need to upgrade to bidirectional WebSocket for Phase 0-2.

  6. mosaic-queue Phase 3 timing: mosaic-queue's coordinator integration phase is not yet scheduled. If it ships during MS23 development, the taskSource field in AgentSessionTree is the integration point — no schema migration required. The Mission Control roster can conditionally render task details from mosaic-queue when taskSource === "mosaic-queue" and the queue MCP/API is reachable.


Success Criteria

  1. Operator can open Mission Control and see all running orchestrator sessions as live panels
  2. Each panel shows the agent's actual conversation messages in real time
  3. Operator can type into any panel and inject a message; it appears in the stream tagged [OPERATOR]
  4. Operator can pause, resume, gracefully terminate, or hard-kill any agent from the panel or roster
  5. Global Agent Roster shows the full parent → subagent tree across all providers
  6. Kill All button with confirmation terminates all active agents
  7. All operator actions appear in the Audit Log with full attribution
  8. OpenClaw sessions registered as an external provider appear in Mission Control alongside internal agents
  9. observer role users can see everything but cannot inject, pause, or kill
  10. All CI gates green, deployed to production