From 36095ad80fb0decc1c1e9be8bd87af4925ed0958 Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Sat, 21 Mar 2026 20:10:48 +0000 Subject: [PATCH] chore: bootstrap Harness Foundation mission (Phase 9) (#289) Co-authored-by: Jason Woltje Co-committed-by: Jason Woltje --- docs/MISSION-MANIFEST.md | 83 +++--- docs/PRD-Harness_Foundation.md | 391 +++++++++++++++++++++++++++ docs/TASKS.md | 162 +++++------ docs/scratchpads/harness-20260321.md | 60 ++++ 4 files changed, 556 insertions(+), 140 deletions(-) create mode 100644 docs/PRD-Harness_Foundation.md create mode 100644 docs/scratchpads/harness-20260321.md diff --git a/docs/MISSION-MANIFEST.md b/docs/MISSION-MANIFEST.md index e49ccb1..a4fd9af 100644 --- a/docs/MISSION-MANIFEST.md +++ b/docs/MISSION-MANIFEST.md @@ -1,45 +1,42 @@ -# Mission Manifest — MVP +# Mission Manifest — Harness Foundation > Persistent document tracking full mission scope, status, and session history. > Updated by the orchestrator at each phase transition and milestone completion. ## Mission -**ID:** mvp-20260312 -**Statement:** Build Mosaic Stack v0.1.0 — a self-hosted, multi-user AI agent platform with web dashboard, TUI, remote control, shared memory, mission orchestration, and extensible skill/plugin architecture. All TypeScript. Pi as agent harness. Brain as knowledge layer. Queue as coordination backbone. -**Phase:** Complete -**Current Milestone:** Phase 8: Polish & Beta (v0.1.0) — DONE -**Progress:** 9 / 9 milestones -**Status:** complete -**Last Updated:** 2026-03-16 UTC +**ID:** harness-20260321 +**Statement:** Transform Mosaic Stack from a functional demo into a real multi-provider, task-routing AI harness. Persist all conversations, integrate frontier LLM providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama), build granular task-aware agent routing, harden agent sessions, replace cron with BullMQ, and design the channel protocol for future Matrix/remote integration. +**Phase:** Execution +**Current Milestone:** M1: Conversation Persistence & Context +**Progress:** 0 / 7 milestones +**Status:** active +**Last Updated:** 2026-03-21 UTC ## Success Criteria -- [x] AC-1: Core chat flow — login, send message, streamed response, conversations persist -- [x] AC-2: TUI integration — `mosaic tui` connects to gateway, same context as web -- [x] AC-3: Discord remote control — bot responds, routes through gateway, threads work -- [x] AC-4: Gateway orchestration — multi-provider routing, fallback, concurrent sessions -- [x] AC-5: Task & project management — CRUD, kanban, mission tracking, brain MCP tools -- [x] AC-6: Memory system — auto-capture, semantic search, preferences, log summarization -- [x] AC-7: Auth & RBAC — email/password, Authentik SSO, role enforcement -- [x] AC-8: Multi-provider LLM — 3+ providers routing correctly -- [x] AC-9: MCP — gateway MCP endpoint, brain + queue tools via MCP -- [x] AC-10: Deployment — `docker compose up` from clean state, CLI on bare metal -- [x] AC-11: @mosaic/\* packages — all 7 migrated packages build, test, integrate +- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context +- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules +- [ ] AC-3: Two users exist, User A's memory searches never return User B's data +- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages +- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools +- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs` +- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved +- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations) +- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests +- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response ## Milestones -| # | ID | Name | Status | Branch | Issue | Started | Completed | -| --- | ------ | --------------------------------------- | ------ | ------ | ----- | ---------- | ---------- | -| 0 | ms-157 | Phase 0: Foundation (v0.0.1) | done | — | — | 2026-03-13 | 2026-03-13 | -| 1 | ms-158 | Phase 1: Core API (v0.0.2) | done | — | — | 2026-03-13 | 2026-03-13 | -| 2 | ms-159 | Phase 2: Agent Layer (v0.0.3) | done | — | — | 2026-03-13 | 2026-03-12 | -| 3 | ms-160 | Phase 3: Web Dashboard (v0.0.4) | done | — | — | 2026-03-12 | 2026-03-13 | -| 4 | ms-161 | Phase 4: Memory & Intelligence (v0.0.5) | done | — | — | 2026-03-13 | 2026-03-13 | -| 5 | ms-162 | Phase 5: Remote Control (v0.0.6) | done | — | #99 | 2026-03-14 | 2026-03-14 | -| 6 | ms-163 | Phase 6: CLI & Tools (v0.0.7) | done | — | #104 | 2026-03-14 | 2026-03-14 | -| 7 | ms-164 | Phase 7: Feature Completion (v0.0.8) | done | — | — | 2026-03-15 | 2026-03-15 | -| 8 | ms-165 | Phase 8: Polish & Beta (v0.1.0) | done | — | — | 2026-03-15 | 2026-03-15 | +| # | ID | Name | Status | Branch | Issue | Started | Completed | +| --- | ------ | ---------------------------------- | ----------- | ------ | --------- | ------- | --------- | +| 1 | ms-166 | Conversation Persistence & Context | not-started | — | #224–#231 | — | — | +| 2 | ms-167 | Security & Isolation | not-started | — | #232–#239 | — | — | +| 3 | ms-168 | Provider Integration | not-started | — | #240–#251 | — | — | +| 4 | ms-169 | Agent Routing Engine | not-started | — | #252–#264 | — | — | +| 5 | ms-170 | Agent Session Hardening | not-started | — | #265–#272 | — | — | +| 6 | ms-171 | Job Queue Foundation | not-started | — | #273–#280 | — | — | +| 7 | ms-172 | Channel Protocol Design | not-started | — | #281–#288 | — | — | ## Deployment @@ -48,6 +45,12 @@ | Docker Compose (dev) | localhost | docker compose up | | Production | TBD | Docker Swarm via Portainer | +## Coordination + +- **Primary Agent:** claude-opus-4-6 +- **Sibling Agents:** codex (for pure coding tasks), sonnet (for review/standard work) +- **Shared Contracts:** docs/PRD-Harness_Foundation.md, docs/TASKS.md + ## Token Budget | Metric | Value | @@ -58,22 +61,10 @@ ## Session History -| Session | Runtime | Started | Duration | Ended Reason | Last Task | -| ------- | ----------------- | -------------------- | -------- | ------------- | ---------------- | -| 1 | claude-opus-4-6 | 2026-03-13 01:00 UTC | — | context limit | Planning gate | -| 2 | claude-opus-4-6 | 2026-03-13 | — | context limit | P5-002, P6-005 | -| 3 | claude-opus-4-6 | 2026-03-13 | — | context limit | P0-006 | -| 4 | claude-opus-4-6 | 2026-03-12 | — | context limit | Docker fix | -| 5 | claude-opus-4-6 | 2026-03-12 | — | context limit | P1-009 | -| 6 | claude-opus-4-6 | 2026-03-12 | — | context limit | P2-006, FIX-01 | -| 7 | claude-opus-4-6 | 2026-03-12 | — | context limit | P2-007 | -| 8 | claude-opus-4-6 | 2026-03-12 | — | context limit | Phase 2 complete | -| 9 | claude-opus-4-6 | 2026-03-12 | — | context limit | P3-007 | -| 10 | claude-opus-4-6 | 2026-03-13 | — | context limit | P3-008 | -| 11 | claude-opus-4-6 | 2026-03-14 | — | context limit | P7 rescope | -| 12 | claude-opus-4-6 | 2026-03-15 | — | context limit | P7 planning | -| 13 | claude-sonnet-4-6 | 2026-03-16 | — | complete | P8-019 verify | +| Session | Runtime | Started | Duration | Ended Reason | Last Task | +| ------- | --------------- | ---------- | -------- | ------------ | ------------- | +| 1 | claude-opus-4-6 | 2026-03-21 | — | — | Planning gate | ## Scratchpad -Path: `docs/scratchpads/mvp-20260312.md` +Path: `docs/scratchpads/harness-20260321.md` diff --git a/docs/PRD-Harness_Foundation.md b/docs/PRD-Harness_Foundation.md new file mode 100644 index 0000000..36f6338 --- /dev/null +++ b/docs/PRD-Harness_Foundation.md @@ -0,0 +1,391 @@ +# PRD: Harness Foundation — Phase 9 + +## Metadata + +- **Owner:** Jason Woltje +- **Date:** 2026-03-21 +- **Status:** draft +- **Phase:** 9 (post-MVP) +- **Version Target:** v0.2.0 +- **Agent Harness:** [Pi SDK](https://github.com/badlogic/pi-mono) +- **Best-Guess Mode:** true +- **Repo:** `git.mosaicstack.dev/mosaic/mosaic-stack` + +--- + +## Problem Statement + +Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness: + +1. **Chat messages are fire-and-forget.** The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible. + +2. **Single provider (Ollama) with local models only.** No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers. + +3. **No task-aware agent routing.** A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization. + +4. **Memory is not user-scoped.** Insight vector search returns all users' data. Deploying multi-user is a security violation. + +5. **Agent configs exist in DB but are ignored.** Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The `/model` and `/agent` slash commands are stubbed. + +6. **No job queue.** Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration. + +7. **Plugin system is hollow.** Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+. + +**What this phase solves:** Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control. + +--- + +## Objectives + +1. **Persistent conversations** — Every message saved, every conversation resumable, full context available across interfaces +2. **Multi-provider LLM access** — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows +3. **Task-aware agent routing** — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain +4. **Security isolation** — All data queries user-scoped, ready for multi-user deployment +5. **Session hardening** — Agent configs apply, model/agent switching works mid-session +6. **Reliable background processing** — BullMQ job queue replaces fragile cron +7. **Channel protocol design** — Architecture for Matrix and remote interfaces, built into the foundation now + +--- + +## Scope + +### In Scope + +1. Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume +2. Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows +3. Task-aware agent routing — granular routing rules with task classification and fallback chains +4. Security isolation — user-scoped queries on all data paths (memory, conversations, agents) +5. Agent session hardening — configs apply, model/agent switching, session resume +6. Job queue — BullMQ replacing cron for background processing +7. Channel protocol design — architecture document for Matrix and remote interfaces +8. Embedding migration — Ollama-local embeddings replacing OpenAI dependency + +### Out of Scope + +1. Matrix homeserver deployment + appservice (Phase 10) +2. Multi-agent orchestration / supervisor-worker pattern (Phase 10+) +3. WebUI rebuild (future) +4. Self-managing memory — compaction, merge, forget (future) +5. Team workspace isolation (future) +6. Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix) +7. Fine-grained RBAC — project/agent/team roles (future) +8. Agent-to-agent communication (Phase 10+) + +## User/Stakeholder Requirements + +1. As a user, I can resume a conversation after closing the TUI and the agent remembers the full context +2. As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration +3. As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model) +4. As a user, I can override the automatic model selection with `/model ` at any time +5. As a user, I can switch between specialized agents mid-session with `/agent ` +6. As an admin, I can define routing rules that control which models handle which task types +7. As an admin, I can monitor background job health and retry failed jobs +8. As a user, my conversations, memories, and preferences are invisible to other users + +## Functional Requirements + +1. FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables +2. FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context +3. FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint +4. FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models) +5. FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern) +6. FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables +7. FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules +8. FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides +9. FR-9: Routing decisions are transparent — the TUI shows which model was selected and why +10. FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions +11. FR-11: `/model ` switches the active model for subsequent messages in the current session +12. FR-12: `/agent ` switches to a different agent config, loading its system prompt, tools, and default model +13. FR-13: All memory queries (insight vector search, preferences) filter by userId +14. FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring +15. FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency + +## Non-Functional Requirements + +1. **Security:** All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling. +2. **Performance:** Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests. +3. **Reliability:** BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart. +4. **Observability:** Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via `/api/providers/health`. Session metrics (tokens, model switches, duration) persisted in DB. + +## Acceptance Criteria + +- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context +- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules +- [ ] AC-3: Two users exist, User A's memory searches never return User B's data +- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages +- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools +- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs` +- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved +- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations) +- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests +- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response + +## Testing and Verification Expectations + +1. **Baseline checks:** `pnpm typecheck`, `pnpm lint`, `pnpm format:check` — all green before any push +2. **Unit tests:** Routing engine rules matching, task classifier, provider adapter registration, message persistence +3. **Integration tests:** Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008) +4. **Situational tests per milestone:** Each milestone has a verify task that exercises the delivered functionality end-to-end +5. **Evidence format:** Test output + manual verification notes in scratchpad per milestone + +## Constraints and Dependencies + +| Type | Item | Notes | +| ---------- | ------------------------------- | -------------------------------------------------------------------------------------- | +| Dependency | `@anthropic-ai/sdk` | npm, required for M3-002 | +| Dependency | `openai` | npm, required for M3-003 | +| Dependency | `bullmq` | npm, Valkey-compatible, required for M6 | +| Dependency | Ollama embedding models | `ollama pull nomic-embed-text`, required for M3-009 | +| Dependency | Pi SDK provider adapter support | ASSUMPTION: supported — verify in M3-001 | +| External | Anthropic OAuth credentials | Requires Anthropic Console setup | +| External | OpenAI OAuth credentials | Requires OpenAI Platform setup | +| External | Z.ai API key | Requires Z.ai account | +| External | OpenRouter API key | Requires OpenRouter account | +| Constraint | Valkey 8 compatibility | BullMQ requires Redis 6+; Valkey 8 is compatible | +| Constraint | Embedding dimension migration | Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start | + +--- + +## Assumptions + +1. ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. **Rationale:** Gateway already uses Pi with Ollama via a custom adapter pattern. +2. ASSUMPTION: BullMQ is Valkey-compatible. **Rationale:** BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible. +3. ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. **Rationale:** Ollama supports embedding endpoints natively. +4. ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). **Rationale:** Both providers offer standard OAuth 2.0 flows. +5. ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. **Rationale:** Most LLM providers converge on OpenAI-compatible APIs. +6. ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. **Rationale:** Acceptable fallback — context is persisted in DB. +7. ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. **Rationale:** Matrix protocol is well-documented; design is architecture, not integration. + +--- + +## Milestones + +### Milestone 1: Conversation Persistence & Context + +**Goal:** Every message persisted. Every conversation resumable with full context. + +| Task | Description | +| ------ | ------------------------------------------------------------------------------------------------------------ | +| M1-001 | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages | +| M1-002 | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) | +| M1-003 | Store message metadata: model used, provider, token counts, tool call details, timestamps | +| M1-004 | On session resume (existing conversationId), load message history from DB and inject into Pi session context | +| M1-005 | Context window management: if history exceeds model context, summarize older messages and prepend summary | +| M1-006 | Conversation search: full-text search on messages table via `/api/conversations/search` | +| M1-007 | TUI: `/history` command to display conversation message count and context usage | +| M1-008 | Verify: send messages → kill TUI → resume with `-c ` → agent references prior context | + +### Milestone 2: Security & Isolation + +**Goal:** All data queries user-scoped. Safe for multi-user deployment. + +| Task | Description | +| ------ | --------------------------------------------------------------------------------------------------------------- | +| M2-001 | Audit InsightsRepo: add `userId` filter to `searchByEmbedding()` vector search | +| M2-002 | Audit InsightsRepo: add `userId` filter to `findByUser()`, `decayOldInsights()` | +| M2-003 | Audit PreferencesRepo: verify all queries filter by userId | +| M2-004 | Audit agent memory tools: verify `memory_search`, `memory_save_*`, `memory_get_*` all scope to session user | +| M2-005 | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages | +| M2-006 | Audit AgentsRepo: verify `findAccessible()` returns only user's agents + system agents | +| M2-007 | Add integration test: create two users, populate data for each, verify cross-user isolation on every query path | +| M2-008 | Audit Valkey keys: verify session keys include userId or are not enumerable across users | + +### Milestone 3: Provider Integration + +**Goal:** Five providers operational with proper auth, health checking, and capability metadata. + +| Task | Description | +| ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | +| M3-001 | Refactor ProviderService into provider adapter pattern: `IProviderAdapter` interface with `register()`, `listModels()`, `healthCheck()`, `createClient()` | +| M3-002 | Anthropic adapter: `@anthropic-ai/sdk`, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback | +| M3-003 | OpenAI adapter: `openai` SDK, register Codex gpt-5.4, OAuth flow, API key fallback | +| M3-004 | OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from `/api/v1/models` | +| M3-005 | Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format | +| M3-006 | Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support | +| M3-007 | Provider health check: periodic probe (configurable interval), status per provider, expose via `/api/providers/health` | +| M3-008 | Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable) | +| M3-009 | Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large) | +| M3-010 | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow | +| M3-011 | Provider config UI support: `/api/providers` CRUD for user-scoped provider credentials | +| M3-012 | Verify: each provider connects, lists models, completes a chat request, handles errors gracefully | + +### Milestone 4: Agent Routing Engine + +**Goal:** Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization. + +| Task | Description | +| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| M4-001 | Define routing rule schema: `RoutingRule { name, priority, conditions[], action }` stored in DB | +| M4-002 | Condition types: `taskType` (coding, research, summarization, conversation, analysis, creative), `complexity` (simple, moderate, complex), `domain` (frontend, backend, devops, docs, general), `costTier` (cheap, standard, premium), `requiredCapabilities` (tools, vision, long-context, reasoning) | +| M4-003 | Action types: `routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? }` | +| M4-004 | Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2 | +| M4-005 | Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later) | +| M4-006 | Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable | +| M4-007 | Routing override: user can force a specific model via `/model ` regardless of routing rules | +| M4-008 | Routing transparency: include routing decision in `session:info` event (why this model was selected) | +| M4-009 | Routing rules CRUD: `/api/routing/rules` — list, create, update, delete, reorder priority | +| M4-010 | Per-user routing overrides: users can customize default rules for their sessions | +| M4-011 | Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets) | +| M4-012 | Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch | +| M4-013 | Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier | + +### Milestone 5: Agent Session Hardening + +**Goal:** Agent configs apply to sessions. Model and agent switching work mid-session. + +| Task | Description | +| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ | +| M5-001 | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills) | +| M5-002 | `/model ` command: end-to-end wiring — TUI → socket `command:execute` → gateway switches provider/model → new messages use new model | +| M5-003 | `/agent ` command: switch to different agent config mid-session — loads new system prompt, tools, and default model | +| M5-004 | Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID | +| M5-005 | Session info broadcast: on model/agent switch, emit `session:info` with updated provider, model, agent name | +| M5-006 | Agent creation from TUI: `/agent new` command creates agent config via gateway API | +| M5-007 | Session metrics: track per-session token usage, model switches, duration — persist in DB | +| M5-008 | Verify: start TUI → `/model claude-opus-4-6` → verify response uses Opus → `/agent research-bot` → verify system prompt changes | + +### Milestone 6: Job Queue Foundation + +**Goal:** Reliable background processing via BullMQ. Foundation for future agent task orchestration. + +| Task | Description | +| ------ | ------------------------------------------------------------------------------------------------------------ | +| M6-001 | Add BullMQ dependency, configure with Valkey connection | +| M6-002 | Create queue service: typed job definitions, worker registration, error handling with exponential backoff | +| M6-003 | Migrate summarization cron → BullMQ repeatable job | +| M6-004 | Migrate GC (session cleanup) → BullMQ repeatable job | +| M6-005 | Migrate tier management (log archival) → BullMQ repeatable job | +| M6-006 | Admin jobs API: `GET /api/admin/jobs` — list active/completed/failed jobs, retry failed, pause/resume queues | +| M6-007 | Job event logging: emit job start/complete/fail events to agent_logs for observability | +| M6-008 | Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history | + +### Milestone 7: Channel Protocol Design + +**Goal:** Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites. + +| Task | Description | +| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| M7-001 | Define `IChannelAdapter` interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user) | +| M7-002 | Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context) | +| M7-003 | Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation | +| M7-004 | Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events | +| M7-005 | Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning) | +| M7-006 | Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue | +| M7-007 | Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility | +| M7-008 | Publish architecture doc: `docs/architecture/channel-protocol.md` — reviewed and approved before Phase 10 | + +--- + +## Technical Approach + +### Pi SDK Provider Adapter Pattern + +The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi: + +``` +Provider SDKs (@anthropic-ai/sdk, openai, etc.) + → IProviderAdapter implementations + → ProviderRegistry (Pi SDK compatible) + → Agent Session (Pi SDK) — tool loops, streaming, context + → AgentService — lifecycle, routing, events + → ChatGateway — WebSocket to all interfaces +``` + +Adding a provider means implementing `IProviderAdapter`. Everything above stays unchanged. + +### Routing Decision Flow + +``` +User sends message + → Task classifier (regex/keyword, optionally LLM-assisted) + → { taskType, complexity, domain, requiredCapabilities } + → RoutingEngine.resolve(classification, userOverrides, availableProviders) + → Match rules by priority + → Check provider health + → Apply fallback chain + → Return { provider, model, agentConfigId } + → AgentService.createOrResumeSession(routingResult) + → Session uses selected provider/model + → Emit session:info with routing decision explanation +``` + +### Embedding Strategy + +Replace OpenAI-hardcoded embedding service with provider-agnostic interface: + +- **Default:** Ollama serving `nomic-embed-text` (768-dim) or `mxbai-embed-large` (1024-dim) +- **Fallback:** Any OpenAI-compatible embedding API +- **Migration:** Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models) +- **No external API dependency** for vector operations in default configuration + +### Context Window Management + +When conversation history exceeds model context: + +1. Calculate token count of full history +2. If exceeds 80% of model context window, trigger summarization +3. Summarize oldest N messages into a condensed context block +4. Prepend summary + keep recent messages within context budget +5. Store summary as a "context checkpoint" message in DB + +### Model Reference + +| Provider | Model | Tier | Context | Tools | Vision | Embedding | +| ---------- | ----------------- | ---------- | ------- | ------ | ------ | -------------- | +| Anthropic | Claude Opus 4.6 | premium | 200K | yes | yes | no | +| Anthropic | Claude Sonnet 4.6 | standard | 200K | yes | yes | no | +| Anthropic | Claude Haiku 4.5 | cheap | 200K | yes | yes | no | +| OpenAI | Codex gpt-5.4 | premium | 128K+ | yes | yes | no | +| Z.ai | GLM-5 | standard | TBD | TBD | TBD | no | +| OpenRouter | varies | varies | varies | varies | varies | no | +| Ollama | llama3.2 | local/free | 128K | yes | no | no | +| Ollama | nomic-embed-text | — | — | — | — | yes (768-dim) | +| Ollama | mxbai-embed-large | — | — | — | — | yes (1024-dim) | + +### Default Routing Rules (Seed Data) + +| Priority | Condition | Route To | +| -------- | ------------------------------------------------------------- | ------------- | +| 1 | taskType=coding AND complexity=complex | Opus 4.6 | +| 2 | taskType=coding AND complexity=moderate | Sonnet 4.6 | +| 3 | taskType=coding AND complexity=simple | Codex gpt-5.4 | +| 4 | taskType=research | Codex gpt-5.4 | +| 5 | taskType=summarization | GLM-5 | +| 6 | taskType=analysis AND requiredCapabilities includes reasoning | Opus 4.6 | +| 7 | taskType=conversation | Sonnet 4.6 | +| 8 | taskType=creative | Sonnet 4.6 | +| 9 | costTier=cheap OR domain=general | Haiku 4.5 | +| 10 | fallback (no rule matched) | Sonnet 4.6 | +| 99 | provider=ollama forced OR offline mode | llama3.2 | + +Rules are user-customizable. Admins set system defaults; users override for their sessions. + +--- + +## Risks and Open Questions + +| Risk | Impact | Mitigation | +| ------------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------- | +| Pi SDK doesn't support custom provider adapters cleanly | High — blocks M3 | Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format | +| BullMQ + Valkey incompatibility | Medium — blocks M6 | Test in M6-001 before migrating jobs; fallback: use `bullmq` with `ioredis` directly | +| Embedding dimension migration (1536 → 768/1024) | Medium — data migration | Run migration script to re-embed existing insights; or start fresh if insight count is low | +| Z.ai GLM-5 API undocumented | Low — blocks one provider | Deprioritize; other 4 providers cover all use cases | +| Context window summarization quality | Medium — affects UX | Start with simple truncation; add LLM summarization iteratively | +| OAuth flow complexity in TUI (no browser redirect) | Medium | URL-display + clipboard + Valkey poll token pattern (already designed in P8-012) | + +### Open Questions + +1. What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005) +2. Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later) +3. What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009) +4. Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides) + +--- + +## Milestone / Delivery Intent + +1. **Target version:** v0.2.0 +2. **Milestone count:** 7 +3. **Definition of done:** All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to `completed` +4. **Delivery order:** M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design) +5. **M1 and M2 are prerequisites** — no provider or routing work begins until conversations persist and data is user-scoped diff --git a/docs/TASKS.md b/docs/TASKS.md index 10f6f97..d98f9af 100644 --- a/docs/TASKS.md +++ b/docs/TASKS.md @@ -1,100 +1,74 @@ -# Tasks — MVP +# Tasks — Harness Foundation > Single-writer: orchestrator only. Workers read but never modify. > > **`agent` column values:** `codex` | `sonnet` | `haiku` | `glm-5` | `opus` | `—` (auto/default) > Pipeline crons pick the cheapest capable model. Override with a specific value when a task genuinely needs it. -> Examples: `opus` for major architecture decisions, `codex` for pure coding, `haiku` for review/verify gates, `glm-5` for cost-sensitive coding. -| id | status | agent | milestone | description | pr | notes | -| ------ | ------ | ------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ------------- | ----- | -| P0-001 | done | Phase 0 | Scaffold monorepo | #60 | #1 | -| P0-002 | done | Phase 0 | @mosaic/types — migrate and extend shared types | #65 | #2 | -| P0-003 | done | Phase 0 | @mosaic/db — Drizzle schema and PG connection | #67 | #3 | -| P0-004 | done | Phase 0 | @mosaic/auth — BetterAuth email/password setup | #68 | #4 | -| P0-005 | done | Phase 0 | Docker Compose — PG 17, Valkey 8, SigNoz | #65 | #5 | -| P0-006 | done | Phase 0 | OTEL foundation — OpenTelemetry SDK setup | #65 | #6 | -| P0-007 | done | Phase 0 | CI pipeline — Woodpecker config | #69 | #7 | -| P0-008 | done | Phase 0 | Project docs — AGENTS.md, CLAUDE.md, README | #69 | #8 | -| P0-009 | done | Phase 0 | Verify Phase 0 — CI green, all packages build | #70 | #9 | -| P1-001 | done | Phase 1 | apps/gateway scaffold — NestJS + Fastify adapter | #61 | #10 | -| P1-002 | done | Phase 1 | Auth middleware — BetterAuth session validation | #71 | #11 | -| P1-003 | done | Phase 1 | @mosaic/brain — migrate from v0, PG backend | #71 | #12 | -| P1-004 | done | Phase 1 | @mosaic/queue — migrate from v0 | #71 | #13 | -| P1-005 | done | Phase 1 | Gateway routes — conversations CRUD + messages | #72 | #14 | -| P1-006 | done | Phase 1 | Gateway routes — tasks, projects, missions CRUD | #72 | #15 | -| P1-007 | done | Phase 1 | WebSocket server — chat streaming | #61 | #16 | -| P1-008 | done | Phase 1 | Basic agent dispatch — single provider | #61 | #17 | -| P1-009 | done | Phase 1 | Verify Phase 1 — gateway functional, API tested | #73 | #18 | -| P2-001 | done | Phase 2 | @mosaic/agent — Pi SDK integration + agent pool | #61 | #19 | -| P2-002 | done | Phase 2 | Multi-provider support — Anthropic + Ollama | #74 | #20 | -| P2-003 | done | Phase 2 | Agent routing engine — cost/capability matrix | #75 | #21 | -| P2-004 | done | Phase 2 | Tool registration — brain, queue, memory tools | #76 | #22 | -| P2-005 | done | Phase 2 | @mosaic/coord — migrate from v0, gateway integration | #77 | #23 | -| P2-006 | done | Phase 2 | Agent session management — tmux + monitoring | #78 | #24 | -| P2-007 | done | Phase 2 | Verify Phase 2 — multi-provider routing works | #79 | #25 | -| P3-001 | done | Phase 3 | apps/web scaffold — Next.js 16 + BetterAuth + Tailwind | #82 | #26 | -| P3-002 | done | Phase 3 | Auth pages — login, registration, SSO redirect | #83 | #27 | -| P3-003 | done | Phase 3 | Chat UI — conversations, messages, streaming | #84 | #28 | -| P3-004 | done | Phase 3 | Task management — list view + kanban board | #86 | #29 | -| P3-005 | done | Phase 3 | Project & mission views — dashboard + PRD viewer | #87 | #30 | -| P3-006 | done | Phase 3 | Settings — provider config, profile, integrations | #88 | #31 | -| P3-007 | done | Phase 3 | Admin panel — user management, RBAC | #89 | #32 | -| P3-008 | done | Phase 3 | Verify Phase 3 — web dashboard functional E2E | — | #33 | -| P4-001 | done | Phase 4 | @mosaic/memory — preference + insight stores | — | #34 | -| P4-002 | done | Phase 4 | Semantic search — pgvector embeddings + search API | — | #35 | -| P4-003 | done | Phase 4 | @mosaic/log — log ingest, parsing, tiered storage | — | #36 | -| P4-004 | done | Phase 4 | Summarization pipeline — Haiku-tier LLM + cron | — | #37 | -| P4-005 | done | Phase 4 | Memory integration — inject into agent sessions | — | #38 | -| P4-006 | done | Phase 4 | Skill management — catalog, install, config | — | #39 | -| P4-007 | done | Phase 4 | Verify Phase 4 — memory + log pipeline working | — | #40 | -| P5-001 | done | Phase 5 | Plugin host — gateway plugin loading + channel interface | — | #41 | -| P5-002 | done | Phase 5 | @mosaic/discord-plugin — Discord bot + channel plugin | #61 | #42 | -| P5-003 | done | Phase 5 | @mosaic/telegram-plugin — Telegraf bot + channel plugin | — | #43 | -| P5-004 | done | Phase 5 | SSO — Authentik OIDC adapter end-to-end | — | #44 | -| P5-005 | done | Phase 5 | Verify Phase 5 — Discord + Telegram + SSO working | #99 | #45 | -| P6-001 | done | Phase 6 | @mosaic/cli — unified CLI binary + subcommands | #104 | #46 | -| P6-002 | done | Phase 6 | @mosaic/prdy — migrate PRD wizard from v0 | #101 | #47 | -| P6-003 | done | Phase 6 | @mosaic/quality-rails — migrate scaffolder from v0 | #100 | #48 | -| P6-004 | done | Phase 6 | @mosaic/mosaic — install wizard for v1 | #103 | #49 | -| P6-005 | done | Phase 6 | Pi TUI integration — mosaic tui | #61 | #50 | -| P6-006 | done | Phase 6 | Verify Phase 6 — CLI functional, all subcommands | — | #51 | -| P7-009 | done | Phase 7 | Web chat — WebSocket integration, streaming, conversation switching | #136 | #120 W1 done | -| P7-001 | done | Phase 7 | MCP endpoint hardening — streamable HTTP transport | #137 | #52 W1 done | -| P7-010 | done | Phase 7 | Web conversation management — list, search, rename, delete, archive | #139 | #121 W2 done | -| P7-015 | done | Phase 7 | Agent tool expansion — file ops, git, shell exec, web fetch | #138 | #126 W2 done | -| P7-011 | done | Phase 7 | Web project detail views — missions, tasks, PRDs, dashboards | #140 | #122 W3 done | -| P7-016 | done | Phase 7 | MCP client — gateway connects to external MCP servers as tools | #141 | #127 W3 done | -| P7-012 | done | Phase 7 | Web provider management UI — add, configure, test LLM providers | #142 | #123 W4 done | -| P7-017 | done | Phase 7 | Agent skill invocation — load and execute skills from catalog | #143 | #128 W4 done | -| P7-013 | done | Phase 7 | Web settings persistence — profile, preferences save to DB | #145 | #124 W5 done | -| P7-018 | done | Phase 7 | CLI model/provider switching — --model, --provider, /model in TUI | #144 | #129 W5 done | -| P7-014 | done | Phase 7 | Web admin panel — user CRUD, role assignment, system health | #150 | #125 W6 done | -| P7-019 | done | Phase 7 | CLI session management — list, resume, destroy sessions | #146 | #130 W6 done | -| P7-020 | done | Phase 7 | Coord DB migration — project-scoped missions, multi-tenant RBAC | #149 | #131 W7 done | -| FIX-02 | done | Backlog | TUI agent:end — fix React state updater side-effect | #147 | #133 W8 done | -| FIX-03 | done | Backlog | Agent session — cwd sandbox, system prompt, tool restrictions | #148 | #134 W8 done | -| P7-004 | done | Phase 7 | E2E test suite — Playwright critical paths | #152 | #55 W9 done | -| P7-006 | done | Phase 7 | Documentation — user guide, admin guide, dev guide | #151 | #57 W9 done | -| P7-007 | done | Phase 7 | Bare-metal deployment docs + .env.example | #153 | #58 W9 done | -| P7-021 | done | Phase 7 | Verify Phase 7 — feature-complete platform E2E | — | #132 W10 done | -| P8-005 | done | Phase 8 | CLI command architecture — DB schema + brain repo + gateway endpoints | #158 | | -| P8-006 | done | Phase 8 | CLI command architecture — agent, mission, prdy commands + TUI mods | #158 | | -| P8-007 | done | Phase 8 | DB migrations — preferences.mutable + teams + team_members + projects.teamId | #175 | #160 | -| P8-008 | done | Phase 8 | @mosaic/types — CommandDef, CommandManifest, new socket events | #174 | #161 | -| P8-009 | done | Phase 8 | TUI Phase 1 — slash command parsing, local commands, system message rendering, InputBar wiring | #176 | #162 | -| P8-010 | done | Phase 8 | Gateway Phase 2 — CommandRegistryService, CommandExecutorService, socket + REST commands | #178 | #163 | -| P8-011 | done | Phase 8 | Gateway Phase 3 — PreferencesService, /preferences REST, /system Valkey override, prompt injection | #180 | #164 | -| P8-012 | done | Phase 8 | Gateway Phase 4 — /agent, /provider (URL+clipboard), /mission, /prdy, /tools commands | #181 | #165 | -| P8-013 | done | Phase 8 | Gateway Phase 5 — MosaicPlugin lifecycle, ReloadService, hot reload, system:reload TUI | #182 | #166 | -| P8-014 | done | Phase 8 | Gateway Phase 6 — SessionGCService (all tiers), /gc command, cron integration | #179 | #167 | -| P8-015 | done | Phase 8 | Gateway Phase 7 — WorkspaceService, ProjectBootstrapService, teams project ownership | #183 | #168 | -| P8-016 | done | Phase 8 | Security — file/git/shell tool strict path hardening, sandbox escape prevention | #177 | #169 | -| P8-017 | done | Phase 8 | TUI Phase 8 — autocomplete sidebar, fuzzy match, arg hints, up-arrow history | #184 | #170 | -| P8-018 | done | Phase 8 | Spin-off plan stubs — Gatekeeper, Task Queue Unification, Chroot Sandboxing | — | #171 | -| P8-019 | done | Phase 8 | Verify Platform Architecture — integration + E2E verification | #185 | #172 | -| P8-001 | done | codex | Phase 8 | Additional SSO providers — WorkOS + Keycloak | #220 | #53 | -| P8-002 | done | codex | Phase 8 | Additional LLM providers — Codex, Z.ai, LM Studio, llama.cpp | #212 | #54 | -| P8-003 | done | codex | Phase 8 | Performance optimization | #211 | #56 | -| P8-004 | done | haiku | Phase 8 | Beta release gate — v0.1.0 tag | — | #59 | -| FIX-01 | done | Backlog | Call piSession.dispose() in AgentService.destroySession | #78 | #62 | +| id | status | agent | milestone | description | pr | notes | +| ------ | ----------- | ------ | ------------------ | --------------------------------------------------------------------------------------------------------------- | --- | ------------------------- | +| M1-001 | not-started | sonnet | M1: Persistence | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages | — | #224 | +| M1-002 | not-started | sonnet | M1: Persistence | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) | — | #225 | +| M1-003 | not-started | sonnet | M1: Persistence | Store message metadata: model used, provider, token counts, tool call details, timestamps | — | #226 | +| M1-004 | not-started | sonnet | M1: Persistence | On session resume, load message history from DB and inject into Pi session context | — | #227 | +| M1-005 | not-started | opus | M1: Persistence | Context window management: summarize older messages when history exceeds 80% of model context | — | #228 | +| M1-006 | not-started | sonnet | M1: Persistence | Conversation search: full-text search on messages table via /api/conversations/search | — | #229 | +| M1-007 | not-started | sonnet | M1: Persistence | TUI: /history command to display conversation message count and context usage | — | #230 | +| M1-008 | not-started | haiku | M1: Persistence | Verify: send messages → kill TUI → resume with -c → agent references prior context | — | #231 | +| M2-001 | not-started | sonnet | M2: Security | Audit InsightsRepo: add userId filter to searchByEmbedding() vector search | — | #232 | +| M2-002 | not-started | sonnet | M2: Security | Audit InsightsRepo: add userId filter to findByUser(), decayOldInsights() | — | #233 | +| M2-003 | not-started | sonnet | M2: Security | Audit PreferencesRepo: verify all queries filter by userId | — | #234 | +| M2-004 | not-started | sonnet | M2: Security | Audit agent memory tools: verify memory*search, memory_save*_, memory*get*_ scope to session user | — | #235 | +| M2-005 | not-started | sonnet | M2: Security | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages | — | #236 | +| M2-006 | not-started | sonnet | M2: Security | Audit AgentsRepo: verify findAccessible() returns only user's agents + system agents | — | #237 | +| M2-007 | not-started | sonnet | M2: Security | Integration test: create two users, populate data, verify cross-user isolation on every query path | — | #238 TDD | +| M2-008 | not-started | sonnet | M2: Security | Audit Valkey keys: verify session keys include userId or are not enumerable across users | — | #239 | +| M3-001 | not-started | opus | M3: Providers | Refactor ProviderService into IProviderAdapter pattern: register(), listModels(), healthCheck(), createClient() | — | #240 Verify Pi SDK compat | +| M3-002 | not-started | sonnet | M3: Providers | Anthropic adapter: @anthropic-ai/sdk, Claude Sonnet 4.6 + Opus 4.6 + Haiku 4.5, OAuth + API key | — | #241 | +| M3-003 | not-started | sonnet | M3: Providers | OpenAI adapter: openai SDK, Codex gpt-5.4, OAuth + API key | — | #242 | +| M3-004 | not-started | sonnet | M3: Providers | OpenRouter adapter: OpenAI-compatible client, API key, dynamic model list from /api/v1/models | — | #243 | +| M3-005 | not-started | sonnet | M3: Providers | Z.ai GLM adapter: GLM-5, API key, research API format | — | #244 | +| M3-006 | not-started | sonnet | M3: Providers | Ollama adapter: refactor existing integration into adapter pattern, add embedding model support | — | #245 | +| M3-007 | not-started | sonnet | M3: Providers | Provider health check: periodic probe, configurable interval, status per provider, /api/providers/health | — | #246 | +| M3-008 | not-started | sonnet | M3: Providers | Model capability matrix: per-model metadata (tier, context window, tool support, vision, streaming, embedding) | — | #247 | +| M3-009 | not-started | sonnet | M3: Providers | Refactor EmbeddingService: provider-agnostic interface, Ollama default (nomic-embed-text or mxbai-embed-large) | — | #248 Dim migration | +| M3-010 | not-started | sonnet | M3: Providers | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow | — | #249 | +| M3-011 | not-started | sonnet | M3: Providers | Provider config UI support: /api/providers CRUD for user-scoped provider credentials | — | #250 | +| M3-012 | not-started | haiku | M3: Providers | Verify: each provider connects, lists models, completes chat request, handles errors | — | #251 | +| M4-001 | not-started | opus | M4: Routing | Define routing rule schema: RoutingRule { name, priority, conditions[], action } stored in DB | — | #252 DB migration | +| M4-002 | not-started | opus | M4: Routing | Condition types: taskType, complexity, domain, costTier, requiredCapabilities | — | #253 | +| M4-003 | not-started | opus | M4: Routing | Action types: routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? } | — | #254 | +| M4-004 | not-started | sonnet | M4: Routing | Default routing rules seed data: coding→Opus, Q&A→Sonnet, summarization→GLM-5, research→Codex, offline→Ollama | — | #255 | +| M4-005 | not-started | opus | M4: Routing | Task classification: infer taskType + complexity from user message (regex/keyword first, LLM-assisted later) | — | #256 | +| M4-006 | not-started | opus | M4: Routing | Routing decision pipeline: classify → match rules → check health → fallback chain → return result | — | #257 | +| M4-007 | not-started | sonnet | M4: Routing | Routing override: /model forces specific model regardless of routing rules | — | #258 | +| M4-008 | not-started | sonnet | M4: Routing | Routing transparency: include routing decision in session:info event (model + reason) | — | #259 | +| M4-009 | not-started | sonnet | M4: Routing | Routing rules CRUD: /api/routing/rules — list, create, update, delete, reorder priority | — | #260 | +| M4-010 | not-started | sonnet | M4: Routing | Per-user routing overrides: users customize default rules for their sessions | — | #261 | +| M4-011 | not-started | sonnet | M4: Routing | Agent specialization: agents declare capabilities in config (domains, preferred models, tool sets) | — | #262 | +| M4-012 | not-started | sonnet | M4: Routing | Routing integration: wire into ChatGateway — every message triggers routing before agent dispatch | — | #263 | +| M4-013 | not-started | haiku | M4: Routing | Verify: coding→Opus, summarize→GLM-5, simple→Haiku, override via /model works | — | #264 | +| M5-001 | not-started | sonnet | M5: Sessions | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tools, skills) | — | #265 | +| M5-002 | not-started | sonnet | M5: Sessions | /model command: end-to-end wiring — TUI → socket → gateway switches provider/model → new messages use it | — | #266 | +| M5-003 | not-started | sonnet | M5: Sessions | /agent command: switch agent config mid-session — loads new system prompt, tools, default model | — | #267 | +| M5-004 | not-started | sonnet | M5: Sessions | Session ↔ conversation binding: persist sessionId on conversation record, resume via conversationId | — | #268 | +| M5-005 | not-started | sonnet | M5: Sessions | Session info broadcast: on model/agent switch, emit session:info with updated state | — | #269 | +| M5-006 | not-started | sonnet | M5: Sessions | Agent creation from TUI: /agent new command creates agent config via gateway API | — | #270 | +| M5-007 | not-started | sonnet | M5: Sessions | Session metrics: per-session token usage, model switches, duration — persist in DB | — | #271 | +| M5-008 | not-started | haiku | M5: Sessions | Verify: /model switches model, /agent switches agent, session resume loads config | — | #272 | +| M6-001 | not-started | sonnet | M6: Jobs | Add BullMQ dependency, configure with Valkey connection | — | #273 Test compat first | +| M6-002 | not-started | sonnet | M6: Jobs | Create queue service: typed job definitions, worker registration, error handling with exponential backoff | — | #274 | +| M6-003 | not-started | sonnet | M6: Jobs | Migrate summarization cron → BullMQ repeatable job | — | #275 | +| M6-004 | not-started | sonnet | M6: Jobs | Migrate GC (session cleanup) → BullMQ repeatable job | — | #276 | +| M6-005 | not-started | sonnet | M6: Jobs | Migrate tier management (log archival) → BullMQ repeatable job | — | #277 | +| M6-006 | not-started | sonnet | M6: Jobs | Admin jobs API: GET /api/admin/jobs — list, status, retry, pause/resume queues | — | #278 | +| M6-007 | not-started | sonnet | M6: Jobs | Job event logging: emit job start/complete/fail events to agent_logs | — | #279 | +| M6-008 | not-started | haiku | M6: Jobs | Verify: jobs execute on schedule, failure retries with backoff, admin endpoint shows history | — | #280 | +| M7-001 | not-started | opus | M7: Channel Design | Define IChannelAdapter interface: lifecycle, message flow, identity mapping | — | #281 Architecture | +| M7-002 | not-started | opus | M7: Channel Design | Define channel message protocol: canonical format all adapters translate to/from | — | #282 Architecture | +| M7-003 | not-started | opus | M7: Channel Design | Design Matrix integration: appservice, room↔conversation, space↔team, agent ghosts, power levels | — | #283 Architecture | +| M7-004 | not-started | opus | M7: Channel Design | Design conversation multiplexing: same conversation from TUI+WebUI+Matrix, real-time sync | — | #284 Architecture | +| M7-005 | not-started | opus | M7: Channel Design | Design remote auth bridging: Matrix/Discord auth → Mosaic identity (token linking, OAuth bridge) | — | #285 Architecture | +| M7-006 | not-started | opus | M7: Channel Design | Design agent-to-agent communication via Matrix rooms: room per agent pair, human observation | — | #286 Architecture | +| M7-007 | not-started | opus | M7: Channel Design | Design multi-user isolation in Matrix: space-per-team, room visibility, encryption, admin access | — | #287 Architecture | +| M7-008 | not-started | haiku | M7: Channel Design | Publish docs/architecture/channel-protocol.md — reviewed and approved | — | #288 | diff --git a/docs/scratchpads/harness-20260321.md b/docs/scratchpads/harness-20260321.md new file mode 100644 index 0000000..247181c --- /dev/null +++ b/docs/scratchpads/harness-20260321.md @@ -0,0 +1,60 @@ +# Mission Scratchpad — Harness Foundation + +> Append-only log. NEVER delete entries. NEVER overwrite sections. +> This is the orchestrator's working memory across sessions. + +## Original Mission Prompt + +``` +Jason wants to get the gateway and TUI working as a real daily-driver harness. +The system needs: multi-provider LLM access, task-aware agent routing, conversation persistence, +security isolation, session hardening, job queue foundation, and channel protocol design for +future Matrix/remote integration. + +Provider decisions: Anthropic (Sonnet 4.6, Opus 4.6), OpenAI (Codex gpt-5.4), Z.ai (GLM-5), +OpenRouter, Ollama. Embeddings via Ollama local models. + +Pi SDK stays as agent runtime. Build with Matrix integration in mind but foundation first. +Agent routing per task with granular specification is required. +``` + +## Planning Decisions + +### 2026-03-21 — Phase 9 PRD and mission setup + +- PRD created as `docs/PRD-Harness_Foundation.md` with canonical Mosaic template format +- 7 milestones, 71 tasks total +- Milestone order: M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design) +- M1 and M2 are hard prerequisites — no provider or routing work until conversations persist and data is user-scoped +- Pi SDK kept as agent runtime; providers plug in via adapter pattern underneath +- Embeddings migrated from OpenAI to Ollama local (nomic-embed-text or mxbai-embed-large) +- BullMQ chosen for job queue (Valkey-compatible, TypeScript-native) +- Channel protocol is design-only in this phase; Matrix implementation deferred to Phase 10 +- Models confirmed: Claude Sonnet 4.6, Opus 4.6, Haiku 4.5, Codex gpt-5.4, GLM-5, Ollama locals +- Routing engine: rule-based classification first, LLM-assisted later +- Default routing: coding-complex→Opus, coding-moderate→Sonnet, coding-simple→Codex, research→Codex, summarization→GLM-5, conversation→Sonnet, cheap/general→Haiku, offline→Ollama + +### Architecture decisions + +- Provider adapter pattern: each provider implements IProviderAdapter, registered in Pi SDK's provider registry +- Routing flow: classify message → match rules by priority → check provider health → fallback chain → dispatch +- Context window management: summarize older messages when history exceeds 80% of model context +- OAuth pattern: URL-display + clipboard + Valkey poll token (same as P8-012 design) +- Embedding dimension: migration from 1536 (OpenAI) to 768/1024 (Ollama) — may require re-embedding existing insights + +## Session Log + +| Session | Date | Milestone | Tasks Done | Outcome | +| ------- | ---------- | --------- | -------------------------------- | ---------------------------------------------- | +| 1 | 2026-03-21 | Planning | PRD, manifest, tasks, scratchpad | Mission initialized, planning gate in progress | + +## Open Questions + +1. Z.ai GLM-5 API format — OpenAI-compatible or custom? (Research in M3-005) +2. Which Ollama embedding model: nomic-embed-text (768-dim) vs mxbai-embed-large (1024-dim)? (Test in M3-009) +3. Provider credentials: env vars for system defaults + DB for per-user overrides? (ASSUMPTION: hybrid) +4. Pi SDK provider adapter support — needs verification in M3-001 before committing to adapter pattern + +## Corrections + +