chore: bootstrap Harness Foundation mission (Phase 9) (#289)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
This commit was merged in pull request #289.
This commit is contained in:
@@ -1,45 +1,42 @@
|
||||
# Mission Manifest — MVP
|
||||
# Mission Manifest — Harness Foundation
|
||||
|
||||
> Persistent document tracking full mission scope, status, and session history.
|
||||
> Updated by the orchestrator at each phase transition and milestone completion.
|
||||
|
||||
## Mission
|
||||
|
||||
**ID:** mvp-20260312
|
||||
**Statement:** Build Mosaic Stack v0.1.0 — a self-hosted, multi-user AI agent platform with web dashboard, TUI, remote control, shared memory, mission orchestration, and extensible skill/plugin architecture. All TypeScript. Pi as agent harness. Brain as knowledge layer. Queue as coordination backbone.
|
||||
**Phase:** Complete
|
||||
**Current Milestone:** Phase 8: Polish & Beta (v0.1.0) — DONE
|
||||
**Progress:** 9 / 9 milestones
|
||||
**Status:** complete
|
||||
**Last Updated:** 2026-03-16 UTC
|
||||
**ID:** harness-20260321
|
||||
**Statement:** Transform Mosaic Stack from a functional demo into a real multi-provider, task-routing AI harness. Persist all conversations, integrate frontier LLM providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama), build granular task-aware agent routing, harden agent sessions, replace cron with BullMQ, and design the channel protocol for future Matrix/remote integration.
|
||||
**Phase:** Execution
|
||||
**Current Milestone:** M1: Conversation Persistence & Context
|
||||
**Progress:** 0 / 7 milestones
|
||||
**Status:** active
|
||||
**Last Updated:** 2026-03-21 UTC
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [x] AC-1: Core chat flow — login, send message, streamed response, conversations persist
|
||||
- [x] AC-2: TUI integration — `mosaic tui` connects to gateway, same context as web
|
||||
- [x] AC-3: Discord remote control — bot responds, routes through gateway, threads work
|
||||
- [x] AC-4: Gateway orchestration — multi-provider routing, fallback, concurrent sessions
|
||||
- [x] AC-5: Task & project management — CRUD, kanban, mission tracking, brain MCP tools
|
||||
- [x] AC-6: Memory system — auto-capture, semantic search, preferences, log summarization
|
||||
- [x] AC-7: Auth & RBAC — email/password, Authentik SSO, role enforcement
|
||||
- [x] AC-8: Multi-provider LLM — 3+ providers routing correctly
|
||||
- [x] AC-9: MCP — gateway MCP endpoint, brain + queue tools via MCP
|
||||
- [x] AC-10: Deployment — `docker compose up` from clean state, CLI on bare metal
|
||||
- [x] AC-11: @mosaic/\* packages — all 7 migrated packages build, test, integrate
|
||||
- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
|
||||
- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
|
||||
- [ ] AC-3: Two users exist, User A's memory searches never return User B's data
|
||||
- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages
|
||||
- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools
|
||||
- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs`
|
||||
- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
|
||||
- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
|
||||
- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
|
||||
- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response
|
||||
|
||||
## Milestones
|
||||
|
||||
| # | ID | Name | Status | Branch | Issue | Started | Completed |
|
||||
| --- | ------ | --------------------------------------- | ------ | ------ | ----- | ---------- | ---------- |
|
||||
| 0 | ms-157 | Phase 0: Foundation (v0.0.1) | done | — | — | 2026-03-13 | 2026-03-13 |
|
||||
| 1 | ms-158 | Phase 1: Core API (v0.0.2) | done | — | — | 2026-03-13 | 2026-03-13 |
|
||||
| 2 | ms-159 | Phase 2: Agent Layer (v0.0.3) | done | — | — | 2026-03-13 | 2026-03-12 |
|
||||
| 3 | ms-160 | Phase 3: Web Dashboard (v0.0.4) | done | — | — | 2026-03-12 | 2026-03-13 |
|
||||
| 4 | ms-161 | Phase 4: Memory & Intelligence (v0.0.5) | done | — | — | 2026-03-13 | 2026-03-13 |
|
||||
| 5 | ms-162 | Phase 5: Remote Control (v0.0.6) | done | — | #99 | 2026-03-14 | 2026-03-14 |
|
||||
| 6 | ms-163 | Phase 6: CLI & Tools (v0.0.7) | done | — | #104 | 2026-03-14 | 2026-03-14 |
|
||||
| 7 | ms-164 | Phase 7: Feature Completion (v0.0.8) | done | — | — | 2026-03-15 | 2026-03-15 |
|
||||
| 8 | ms-165 | Phase 8: Polish & Beta (v0.1.0) | done | — | — | 2026-03-15 | 2026-03-15 |
|
||||
| # | ID | Name | Status | Branch | Issue | Started | Completed |
|
||||
| --- | ------ | ---------------------------------- | ----------- | ------ | --------- | ------- | --------- |
|
||||
| 1 | ms-166 | Conversation Persistence & Context | not-started | — | #224–#231 | — | — |
|
||||
| 2 | ms-167 | Security & Isolation | not-started | — | #232–#239 | — | — |
|
||||
| 3 | ms-168 | Provider Integration | not-started | — | #240–#251 | — | — |
|
||||
| 4 | ms-169 | Agent Routing Engine | not-started | — | #252–#264 | — | — |
|
||||
| 5 | ms-170 | Agent Session Hardening | not-started | — | #265–#272 | — | — |
|
||||
| 6 | ms-171 | Job Queue Foundation | not-started | — | #273–#280 | — | — |
|
||||
| 7 | ms-172 | Channel Protocol Design | not-started | — | #281–#288 | — | — |
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -48,6 +45,12 @@
|
||||
| Docker Compose (dev) | localhost | docker compose up |
|
||||
| Production | TBD | Docker Swarm via Portainer |
|
||||
|
||||
## Coordination
|
||||
|
||||
- **Primary Agent:** claude-opus-4-6
|
||||
- **Sibling Agents:** codex (for pure coding tasks), sonnet (for review/standard work)
|
||||
- **Shared Contracts:** docs/PRD-Harness_Foundation.md, docs/TASKS.md
|
||||
|
||||
## Token Budget
|
||||
|
||||
| Metric | Value |
|
||||
@@ -58,22 +61,10 @@
|
||||
|
||||
## Session History
|
||||
|
||||
| Session | Runtime | Started | Duration | Ended Reason | Last Task |
|
||||
| ------- | ----------------- | -------------------- | -------- | ------------- | ---------------- |
|
||||
| 1 | claude-opus-4-6 | 2026-03-13 01:00 UTC | — | context limit | Planning gate |
|
||||
| 2 | claude-opus-4-6 | 2026-03-13 | — | context limit | P5-002, P6-005 |
|
||||
| 3 | claude-opus-4-6 | 2026-03-13 | — | context limit | P0-006 |
|
||||
| 4 | claude-opus-4-6 | 2026-03-12 | — | context limit | Docker fix |
|
||||
| 5 | claude-opus-4-6 | 2026-03-12 | — | context limit | P1-009 |
|
||||
| 6 | claude-opus-4-6 | 2026-03-12 | — | context limit | P2-006, FIX-01 |
|
||||
| 7 | claude-opus-4-6 | 2026-03-12 | — | context limit | P2-007 |
|
||||
| 8 | claude-opus-4-6 | 2026-03-12 | — | context limit | Phase 2 complete |
|
||||
| 9 | claude-opus-4-6 | 2026-03-12 | — | context limit | P3-007 |
|
||||
| 10 | claude-opus-4-6 | 2026-03-13 | — | context limit | P3-008 |
|
||||
| 11 | claude-opus-4-6 | 2026-03-14 | — | context limit | P7 rescope |
|
||||
| 12 | claude-opus-4-6 | 2026-03-15 | — | context limit | P7 planning |
|
||||
| 13 | claude-sonnet-4-6 | 2026-03-16 | — | complete | P8-019 verify |
|
||||
| Session | Runtime | Started | Duration | Ended Reason | Last Task |
|
||||
| ------- | --------------- | ---------- | -------- | ------------ | ------------- |
|
||||
| 1 | claude-opus-4-6 | 2026-03-21 | — | — | Planning gate |
|
||||
|
||||
## Scratchpad
|
||||
|
||||
Path: `docs/scratchpads/mvp-20260312.md`
|
||||
Path: `docs/scratchpads/harness-20260321.md`
|
||||
|
||||
391
docs/PRD-Harness_Foundation.md
Normal file
391
docs/PRD-Harness_Foundation.md
Normal file
@@ -0,0 +1,391 @@
|
||||
# PRD: Harness Foundation — Phase 9
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Owner:** Jason Woltje
|
||||
- **Date:** 2026-03-21
|
||||
- **Status:** draft
|
||||
- **Phase:** 9 (post-MVP)
|
||||
- **Version Target:** v0.2.0
|
||||
- **Agent Harness:** [Pi SDK](https://github.com/badlogic/pi-mono)
|
||||
- **Best-Guess Mode:** true
|
||||
- **Repo:** `git.mosaicstack.dev/mosaic/mosaic-stack`
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness:
|
||||
|
||||
1. **Chat messages are fire-and-forget.** The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible.
|
||||
|
||||
2. **Single provider (Ollama) with local models only.** No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers.
|
||||
|
||||
3. **No task-aware agent routing.** A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization.
|
||||
|
||||
4. **Memory is not user-scoped.** Insight vector search returns all users' data. Deploying multi-user is a security violation.
|
||||
|
||||
5. **Agent configs exist in DB but are ignored.** Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The `/model` and `/agent` slash commands are stubbed.
|
||||
|
||||
6. **No job queue.** Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration.
|
||||
|
||||
7. **Plugin system is hollow.** Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+.
|
||||
|
||||
**What this phase solves:** Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control.
|
||||
|
||||
---
|
||||
|
||||
## Objectives
|
||||
|
||||
1. **Persistent conversations** — Every message saved, every conversation resumable, full context available across interfaces
|
||||
2. **Multi-provider LLM access** — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows
|
||||
3. **Task-aware agent routing** — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain
|
||||
4. **Security isolation** — All data queries user-scoped, ready for multi-user deployment
|
||||
5. **Session hardening** — Agent configs apply, model/agent switching works mid-session
|
||||
6. **Reliable background processing** — BullMQ job queue replaces fragile cron
|
||||
7. **Channel protocol design** — Architecture for Matrix and remote interfaces, built into the foundation now
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
1. Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume
|
||||
2. Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows
|
||||
3. Task-aware agent routing — granular routing rules with task classification and fallback chains
|
||||
4. Security isolation — user-scoped queries on all data paths (memory, conversations, agents)
|
||||
5. Agent session hardening — configs apply, model/agent switching, session resume
|
||||
6. Job queue — BullMQ replacing cron for background processing
|
||||
7. Channel protocol design — architecture document for Matrix and remote interfaces
|
||||
8. Embedding migration — Ollama-local embeddings replacing OpenAI dependency
|
||||
|
||||
### Out of Scope
|
||||
|
||||
1. Matrix homeserver deployment + appservice (Phase 10)
|
||||
2. Multi-agent orchestration / supervisor-worker pattern (Phase 10+)
|
||||
3. WebUI rebuild (future)
|
||||
4. Self-managing memory — compaction, merge, forget (future)
|
||||
5. Team workspace isolation (future)
|
||||
6. Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix)
|
||||
7. Fine-grained RBAC — project/agent/team roles (future)
|
||||
8. Agent-to-agent communication (Phase 10+)
|
||||
|
||||
## User/Stakeholder Requirements
|
||||
|
||||
1. As a user, I can resume a conversation after closing the TUI and the agent remembers the full context
|
||||
2. As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration
|
||||
3. As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model)
|
||||
4. As a user, I can override the automatic model selection with `/model <name>` at any time
|
||||
5. As a user, I can switch between specialized agents mid-session with `/agent <name>`
|
||||
6. As an admin, I can define routing rules that control which models handle which task types
|
||||
7. As an admin, I can monitor background job health and retry failed jobs
|
||||
8. As a user, my conversations, memories, and preferences are invisible to other users
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
1. FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables
|
||||
2. FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context
|
||||
3. FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint
|
||||
4. FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models)
|
||||
5. FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern)
|
||||
6. FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables
|
||||
7. FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules
|
||||
8. FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides
|
||||
9. FR-9: Routing decisions are transparent — the TUI shows which model was selected and why
|
||||
10. FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions
|
||||
11. FR-11: `/model <name>` switches the active model for subsequent messages in the current session
|
||||
12. FR-12: `/agent <name>` switches to a different agent config, loading its system prompt, tools, and default model
|
||||
13. FR-13: All memory queries (insight vector search, preferences) filter by userId
|
||||
14. FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring
|
||||
15. FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
1. **Security:** All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling.
|
||||
2. **Performance:** Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests.
|
||||
3. **Reliability:** BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart.
|
||||
4. **Observability:** Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via `/api/providers/health`. Session metrics (tokens, model switches, duration) persisted in DB.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
|
||||
- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
|
||||
- [ ] AC-3: Two users exist, User A's memory searches never return User B's data
|
||||
- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages
|
||||
- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools
|
||||
- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs`
|
||||
- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
|
||||
- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
|
||||
- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
|
||||
- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response
|
||||
|
||||
## Testing and Verification Expectations
|
||||
|
||||
1. **Baseline checks:** `pnpm typecheck`, `pnpm lint`, `pnpm format:check` — all green before any push
|
||||
2. **Unit tests:** Routing engine rules matching, task classifier, provider adapter registration, message persistence
|
||||
3. **Integration tests:** Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008)
|
||||
4. **Situational tests per milestone:** Each milestone has a verify task that exercises the delivered functionality end-to-end
|
||||
5. **Evidence format:** Test output + manual verification notes in scratchpad per milestone
|
||||
|
||||
## Constraints and Dependencies
|
||||
|
||||
| Type | Item | Notes |
|
||||
| ---------- | ------------------------------- | -------------------------------------------------------------------------------------- |
|
||||
| Dependency | `@anthropic-ai/sdk` | npm, required for M3-002 |
|
||||
| Dependency | `openai` | npm, required for M3-003 |
|
||||
| Dependency | `bullmq` | npm, Valkey-compatible, required for M6 |
|
||||
| Dependency | Ollama embedding models | `ollama pull nomic-embed-text`, required for M3-009 |
|
||||
| Dependency | Pi SDK provider adapter support | ASSUMPTION: supported — verify in M3-001 |
|
||||
| External | Anthropic OAuth credentials | Requires Anthropic Console setup |
|
||||
| External | OpenAI OAuth credentials | Requires OpenAI Platform setup |
|
||||
| External | Z.ai API key | Requires Z.ai account |
|
||||
| External | OpenRouter API key | Requires OpenRouter account |
|
||||
| Constraint | Valkey 8 compatibility | BullMQ requires Redis 6+; Valkey 8 is compatible |
|
||||
| Constraint | Embedding dimension migration | Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start |
|
||||
|
||||
---
|
||||
|
||||
## Assumptions
|
||||
|
||||
1. ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. **Rationale:** Gateway already uses Pi with Ollama via a custom adapter pattern.
|
||||
2. ASSUMPTION: BullMQ is Valkey-compatible. **Rationale:** BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible.
|
||||
3. ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. **Rationale:** Ollama supports embedding endpoints natively.
|
||||
4. ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). **Rationale:** Both providers offer standard OAuth 2.0 flows.
|
||||
5. ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. **Rationale:** Most LLM providers converge on OpenAI-compatible APIs.
|
||||
6. ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. **Rationale:** Acceptable fallback — context is persisted in DB.
|
||||
7. ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. **Rationale:** Matrix protocol is well-documented; design is architecture, not integration.
|
||||
|
||||
---
|
||||
|
||||
## Milestones
|
||||
|
||||
### Milestone 1: Conversation Persistence & Context
|
||||
|
||||
**Goal:** Every message persisted. Every conversation resumable with full context.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | ------------------------------------------------------------------------------------------------------------ |
|
||||
| M1-001 | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages |
|
||||
| M1-002 | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) |
|
||||
| M1-003 | Store message metadata: model used, provider, token counts, tool call details, timestamps |
|
||||
| M1-004 | On session resume (existing conversationId), load message history from DB and inject into Pi session context |
|
||||
| M1-005 | Context window management: if history exceeds model context, summarize older messages and prepend summary |
|
||||
| M1-006 | Conversation search: full-text search on messages table via `/api/conversations/search` |
|
||||
| M1-007 | TUI: `/history` command to display conversation message count and context usage |
|
||||
| M1-008 | Verify: send messages → kill TUI → resume with `-c <id>` → agent references prior context |
|
||||
|
||||
### Milestone 2: Security & Isolation
|
||||
|
||||
**Goal:** All data queries user-scoped. Safe for multi-user deployment.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | --------------------------------------------------------------------------------------------------------------- |
|
||||
| M2-001 | Audit InsightsRepo: add `userId` filter to `searchByEmbedding()` vector search |
|
||||
| M2-002 | Audit InsightsRepo: add `userId` filter to `findByUser()`, `decayOldInsights()` |
|
||||
| M2-003 | Audit PreferencesRepo: verify all queries filter by userId |
|
||||
| M2-004 | Audit agent memory tools: verify `memory_search`, `memory_save_*`, `memory_get_*` all scope to session user |
|
||||
| M2-005 | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages |
|
||||
| M2-006 | Audit AgentsRepo: verify `findAccessible()` returns only user's agents + system agents |
|
||||
| M2-007 | Add integration test: create two users, populate data for each, verify cross-user isolation on every query path |
|
||||
| M2-008 | Audit Valkey keys: verify session keys include userId or are not enumerable across users |
|
||||
|
||||
### Milestone 3: Provider Integration
|
||||
|
||||
**Goal:** Five providers operational with proper auth, health checking, and capability metadata.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| M3-001 | Refactor ProviderService into provider adapter pattern: `IProviderAdapter` interface with `register()`, `listModels()`, `healthCheck()`, `createClient()` |
|
||||
| M3-002 | Anthropic adapter: `@anthropic-ai/sdk`, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback |
|
||||
| M3-003 | OpenAI adapter: `openai` SDK, register Codex gpt-5.4, OAuth flow, API key fallback |
|
||||
| M3-004 | OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from `/api/v1/models` |
|
||||
| M3-005 | Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format |
|
||||
| M3-006 | Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support |
|
||||
| M3-007 | Provider health check: periodic probe (configurable interval), status per provider, expose via `/api/providers/health` |
|
||||
| M3-008 | Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable) |
|
||||
| M3-009 | Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large) |
|
||||
| M3-010 | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow |
|
||||
| M3-011 | Provider config UI support: `/api/providers` CRUD for user-scoped provider credentials |
|
||||
| M3-012 | Verify: each provider connects, lists models, completes a chat request, handles errors gracefully |
|
||||
|
||||
### Milestone 4: Agent Routing Engine
|
||||
|
||||
**Goal:** Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| M4-001 | Define routing rule schema: `RoutingRule { name, priority, conditions[], action }` stored in DB |
|
||||
| M4-002 | Condition types: `taskType` (coding, research, summarization, conversation, analysis, creative), `complexity` (simple, moderate, complex), `domain` (frontend, backend, devops, docs, general), `costTier` (cheap, standard, premium), `requiredCapabilities` (tools, vision, long-context, reasoning) |
|
||||
| M4-003 | Action types: `routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? }` |
|
||||
| M4-004 | Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2 |
|
||||
| M4-005 | Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later) |
|
||||
| M4-006 | Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable |
|
||||
| M4-007 | Routing override: user can force a specific model via `/model <name>` regardless of routing rules |
|
||||
| M4-008 | Routing transparency: include routing decision in `session:info` event (why this model was selected) |
|
||||
| M4-009 | Routing rules CRUD: `/api/routing/rules` — list, create, update, delete, reorder priority |
|
||||
| M4-010 | Per-user routing overrides: users can customize default rules for their sessions |
|
||||
| M4-011 | Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets) |
|
||||
| M4-012 | Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch |
|
||||
| M4-013 | Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier |
|
||||
|
||||
### Milestone 5: Agent Session Hardening
|
||||
|
||||
**Goal:** Agent configs apply to sessions. Model and agent switching work mid-session.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| M5-001 | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills) |
|
||||
| M5-002 | `/model <name>` command: end-to-end wiring — TUI → socket `command:execute` → gateway switches provider/model → new messages use new model |
|
||||
| M5-003 | `/agent <name>` command: switch to different agent config mid-session — loads new system prompt, tools, and default model |
|
||||
| M5-004 | Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID |
|
||||
| M5-005 | Session info broadcast: on model/agent switch, emit `session:info` with updated provider, model, agent name |
|
||||
| M5-006 | Agent creation from TUI: `/agent new` command creates agent config via gateway API |
|
||||
| M5-007 | Session metrics: track per-session token usage, model switches, duration — persist in DB |
|
||||
| M5-008 | Verify: start TUI → `/model claude-opus-4-6` → verify response uses Opus → `/agent research-bot` → verify system prompt changes |
|
||||
|
||||
### Milestone 6: Job Queue Foundation
|
||||
|
||||
**Goal:** Reliable background processing via BullMQ. Foundation for future agent task orchestration.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | ------------------------------------------------------------------------------------------------------------ |
|
||||
| M6-001 | Add BullMQ dependency, configure with Valkey connection |
|
||||
| M6-002 | Create queue service: typed job definitions, worker registration, error handling with exponential backoff |
|
||||
| M6-003 | Migrate summarization cron → BullMQ repeatable job |
|
||||
| M6-004 | Migrate GC (session cleanup) → BullMQ repeatable job |
|
||||
| M6-005 | Migrate tier management (log archival) → BullMQ repeatable job |
|
||||
| M6-006 | Admin jobs API: `GET /api/admin/jobs` — list active/completed/failed jobs, retry failed, pause/resume queues |
|
||||
| M6-007 | Job event logging: emit job start/complete/fail events to agent_logs for observability |
|
||||
| M6-008 | Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history |
|
||||
|
||||
### Milestone 7: Channel Protocol Design
|
||||
|
||||
**Goal:** Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites.
|
||||
|
||||
| Task | Description |
|
||||
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| M7-001 | Define `IChannelAdapter` interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user) |
|
||||
| M7-002 | Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context) |
|
||||
| M7-003 | Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation |
|
||||
| M7-004 | Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events |
|
||||
| M7-005 | Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning) |
|
||||
| M7-006 | Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue |
|
||||
| M7-007 | Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility |
|
||||
| M7-008 | Publish architecture doc: `docs/architecture/channel-protocol.md` — reviewed and approved before Phase 10 |
|
||||
|
||||
---
|
||||
|
||||
## Technical Approach
|
||||
|
||||
### Pi SDK Provider Adapter Pattern
|
||||
|
||||
The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi:
|
||||
|
||||
```
|
||||
Provider SDKs (@anthropic-ai/sdk, openai, etc.)
|
||||
→ IProviderAdapter implementations
|
||||
→ ProviderRegistry (Pi SDK compatible)
|
||||
→ Agent Session (Pi SDK) — tool loops, streaming, context
|
||||
→ AgentService — lifecycle, routing, events
|
||||
→ ChatGateway — WebSocket to all interfaces
|
||||
```
|
||||
|
||||
Adding a provider means implementing `IProviderAdapter`. Everything above stays unchanged.
|
||||
|
||||
### Routing Decision Flow
|
||||
|
||||
```
|
||||
User sends message
|
||||
→ Task classifier (regex/keyword, optionally LLM-assisted)
|
||||
→ { taskType, complexity, domain, requiredCapabilities }
|
||||
→ RoutingEngine.resolve(classification, userOverrides, availableProviders)
|
||||
→ Match rules by priority
|
||||
→ Check provider health
|
||||
→ Apply fallback chain
|
||||
→ Return { provider, model, agentConfigId }
|
||||
→ AgentService.createOrResumeSession(routingResult)
|
||||
→ Session uses selected provider/model
|
||||
→ Emit session:info with routing decision explanation
|
||||
```
|
||||
|
||||
### Embedding Strategy
|
||||
|
||||
Replace OpenAI-hardcoded embedding service with provider-agnostic interface:
|
||||
|
||||
- **Default:** Ollama serving `nomic-embed-text` (768-dim) or `mxbai-embed-large` (1024-dim)
|
||||
- **Fallback:** Any OpenAI-compatible embedding API
|
||||
- **Migration:** Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models)
|
||||
- **No external API dependency** for vector operations in default configuration
|
||||
|
||||
### Context Window Management
|
||||
|
||||
When conversation history exceeds model context:
|
||||
|
||||
1. Calculate token count of full history
|
||||
2. If exceeds 80% of model context window, trigger summarization
|
||||
3. Summarize oldest N messages into a condensed context block
|
||||
4. Prepend summary + keep recent messages within context budget
|
||||
5. Store summary as a "context checkpoint" message in DB
|
||||
|
||||
### Model Reference
|
||||
|
||||
| Provider | Model | Tier | Context | Tools | Vision | Embedding |
|
||||
| ---------- | ----------------- | ---------- | ------- | ------ | ------ | -------------- |
|
||||
| Anthropic | Claude Opus 4.6 | premium | 200K | yes | yes | no |
|
||||
| Anthropic | Claude Sonnet 4.6 | standard | 200K | yes | yes | no |
|
||||
| Anthropic | Claude Haiku 4.5 | cheap | 200K | yes | yes | no |
|
||||
| OpenAI | Codex gpt-5.4 | premium | 128K+ | yes | yes | no |
|
||||
| Z.ai | GLM-5 | standard | TBD | TBD | TBD | no |
|
||||
| OpenRouter | varies | varies | varies | varies | varies | no |
|
||||
| Ollama | llama3.2 | local/free | 128K | yes | no | no |
|
||||
| Ollama | nomic-embed-text | — | — | — | — | yes (768-dim) |
|
||||
| Ollama | mxbai-embed-large | — | — | — | — | yes (1024-dim) |
|
||||
|
||||
### Default Routing Rules (Seed Data)
|
||||
|
||||
| Priority | Condition | Route To |
|
||||
| -------- | ------------------------------------------------------------- | ------------- |
|
||||
| 1 | taskType=coding AND complexity=complex | Opus 4.6 |
|
||||
| 2 | taskType=coding AND complexity=moderate | Sonnet 4.6 |
|
||||
| 3 | taskType=coding AND complexity=simple | Codex gpt-5.4 |
|
||||
| 4 | taskType=research | Codex gpt-5.4 |
|
||||
| 5 | taskType=summarization | GLM-5 |
|
||||
| 6 | taskType=analysis AND requiredCapabilities includes reasoning | Opus 4.6 |
|
||||
| 7 | taskType=conversation | Sonnet 4.6 |
|
||||
| 8 | taskType=creative | Sonnet 4.6 |
|
||||
| 9 | costTier=cheap OR domain=general | Haiku 4.5 |
|
||||
| 10 | fallback (no rule matched) | Sonnet 4.6 |
|
||||
| 99 | provider=ollama forced OR offline mode | llama3.2 |
|
||||
|
||||
Rules are user-customizable. Admins set system defaults; users override for their sessions.
|
||||
|
||||
---
|
||||
|
||||
## Risks and Open Questions
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
| ------------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------- |
|
||||
| Pi SDK doesn't support custom provider adapters cleanly | High — blocks M3 | Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format |
|
||||
| BullMQ + Valkey incompatibility | Medium — blocks M6 | Test in M6-001 before migrating jobs; fallback: use `bullmq` with `ioredis` directly |
|
||||
| Embedding dimension migration (1536 → 768/1024) | Medium — data migration | Run migration script to re-embed existing insights; or start fresh if insight count is low |
|
||||
| Z.ai GLM-5 API undocumented | Low — blocks one provider | Deprioritize; other 4 providers cover all use cases |
|
||||
| Context window summarization quality | Medium — affects UX | Start with simple truncation; add LLM summarization iteratively |
|
||||
| OAuth flow complexity in TUI (no browser redirect) | Medium | URL-display + clipboard + Valkey poll token pattern (already designed in P8-012) |
|
||||
|
||||
### Open Questions
|
||||
|
||||
1. What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005)
|
||||
2. Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later)
|
||||
3. What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009)
|
||||
4. Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides)
|
||||
|
||||
---
|
||||
|
||||
## Milestone / Delivery Intent
|
||||
|
||||
1. **Target version:** v0.2.0
|
||||
2. **Milestone count:** 7
|
||||
3. **Definition of done:** All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to `completed`
|
||||
4. **Delivery order:** M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
|
||||
5. **M1 and M2 are prerequisites** — no provider or routing work begins until conversations persist and data is user-scoped
|
||||
162
docs/TASKS.md
162
docs/TASKS.md
@@ -1,100 +1,74 @@
|
||||
# Tasks — MVP
|
||||
# Tasks — Harness Foundation
|
||||
|
||||
> Single-writer: orchestrator only. Workers read but never modify.
|
||||
>
|
||||
> **`agent` column values:** `codex` | `sonnet` | `haiku` | `glm-5` | `opus` | `—` (auto/default)
|
||||
> Pipeline crons pick the cheapest capable model. Override with a specific value when a task genuinely needs it.
|
||||
> Examples: `opus` for major architecture decisions, `codex` for pure coding, `haiku` for review/verify gates, `glm-5` for cost-sensitive coding.
|
||||
|
||||
| id | status | agent | milestone | description | pr | notes |
|
||||
| ------ | ------ | ------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ------------- | ----- |
|
||||
| P0-001 | done | Phase 0 | Scaffold monorepo | #60 | #1 |
|
||||
| P0-002 | done | Phase 0 | @mosaic/types — migrate and extend shared types | #65 | #2 |
|
||||
| P0-003 | done | Phase 0 | @mosaic/db — Drizzle schema and PG connection | #67 | #3 |
|
||||
| P0-004 | done | Phase 0 | @mosaic/auth — BetterAuth email/password setup | #68 | #4 |
|
||||
| P0-005 | done | Phase 0 | Docker Compose — PG 17, Valkey 8, SigNoz | #65 | #5 |
|
||||
| P0-006 | done | Phase 0 | OTEL foundation — OpenTelemetry SDK setup | #65 | #6 |
|
||||
| P0-007 | done | Phase 0 | CI pipeline — Woodpecker config | #69 | #7 |
|
||||
| P0-008 | done | Phase 0 | Project docs — AGENTS.md, CLAUDE.md, README | #69 | #8 |
|
||||
| P0-009 | done | Phase 0 | Verify Phase 0 — CI green, all packages build | #70 | #9 |
|
||||
| P1-001 | done | Phase 1 | apps/gateway scaffold — NestJS + Fastify adapter | #61 | #10 |
|
||||
| P1-002 | done | Phase 1 | Auth middleware — BetterAuth session validation | #71 | #11 |
|
||||
| P1-003 | done | Phase 1 | @mosaic/brain — migrate from v0, PG backend | #71 | #12 |
|
||||
| P1-004 | done | Phase 1 | @mosaic/queue — migrate from v0 | #71 | #13 |
|
||||
| P1-005 | done | Phase 1 | Gateway routes — conversations CRUD + messages | #72 | #14 |
|
||||
| P1-006 | done | Phase 1 | Gateway routes — tasks, projects, missions CRUD | #72 | #15 |
|
||||
| P1-007 | done | Phase 1 | WebSocket server — chat streaming | #61 | #16 |
|
||||
| P1-008 | done | Phase 1 | Basic agent dispatch — single provider | #61 | #17 |
|
||||
| P1-009 | done | Phase 1 | Verify Phase 1 — gateway functional, API tested | #73 | #18 |
|
||||
| P2-001 | done | Phase 2 | @mosaic/agent — Pi SDK integration + agent pool | #61 | #19 |
|
||||
| P2-002 | done | Phase 2 | Multi-provider support — Anthropic + Ollama | #74 | #20 |
|
||||
| P2-003 | done | Phase 2 | Agent routing engine — cost/capability matrix | #75 | #21 |
|
||||
| P2-004 | done | Phase 2 | Tool registration — brain, queue, memory tools | #76 | #22 |
|
||||
| P2-005 | done | Phase 2 | @mosaic/coord — migrate from v0, gateway integration | #77 | #23 |
|
||||
| P2-006 | done | Phase 2 | Agent session management — tmux + monitoring | #78 | #24 |
|
||||
| P2-007 | done | Phase 2 | Verify Phase 2 — multi-provider routing works | #79 | #25 |
|
||||
| P3-001 | done | Phase 3 | apps/web scaffold — Next.js 16 + BetterAuth + Tailwind | #82 | #26 |
|
||||
| P3-002 | done | Phase 3 | Auth pages — login, registration, SSO redirect | #83 | #27 |
|
||||
| P3-003 | done | Phase 3 | Chat UI — conversations, messages, streaming | #84 | #28 |
|
||||
| P3-004 | done | Phase 3 | Task management — list view + kanban board | #86 | #29 |
|
||||
| P3-005 | done | Phase 3 | Project & mission views — dashboard + PRD viewer | #87 | #30 |
|
||||
| P3-006 | done | Phase 3 | Settings — provider config, profile, integrations | #88 | #31 |
|
||||
| P3-007 | done | Phase 3 | Admin panel — user management, RBAC | #89 | #32 |
|
||||
| P3-008 | done | Phase 3 | Verify Phase 3 — web dashboard functional E2E | — | #33 |
|
||||
| P4-001 | done | Phase 4 | @mosaic/memory — preference + insight stores | — | #34 |
|
||||
| P4-002 | done | Phase 4 | Semantic search — pgvector embeddings + search API | — | #35 |
|
||||
| P4-003 | done | Phase 4 | @mosaic/log — log ingest, parsing, tiered storage | — | #36 |
|
||||
| P4-004 | done | Phase 4 | Summarization pipeline — Haiku-tier LLM + cron | — | #37 |
|
||||
| P4-005 | done | Phase 4 | Memory integration — inject into agent sessions | — | #38 |
|
||||
| P4-006 | done | Phase 4 | Skill management — catalog, install, config | — | #39 |
|
||||
| P4-007 | done | Phase 4 | Verify Phase 4 — memory + log pipeline working | — | #40 |
|
||||
| P5-001 | done | Phase 5 | Plugin host — gateway plugin loading + channel interface | — | #41 |
|
||||
| P5-002 | done | Phase 5 | @mosaic/discord-plugin — Discord bot + channel plugin | #61 | #42 |
|
||||
| P5-003 | done | Phase 5 | @mosaic/telegram-plugin — Telegraf bot + channel plugin | — | #43 |
|
||||
| P5-004 | done | Phase 5 | SSO — Authentik OIDC adapter end-to-end | — | #44 |
|
||||
| P5-005 | done | Phase 5 | Verify Phase 5 — Discord + Telegram + SSO working | #99 | #45 |
|
||||
| P6-001 | done | Phase 6 | @mosaic/cli — unified CLI binary + subcommands | #104 | #46 |
|
||||
| P6-002 | done | Phase 6 | @mosaic/prdy — migrate PRD wizard from v0 | #101 | #47 |
|
||||
| P6-003 | done | Phase 6 | @mosaic/quality-rails — migrate scaffolder from v0 | #100 | #48 |
|
||||
| P6-004 | done | Phase 6 | @mosaic/mosaic — install wizard for v1 | #103 | #49 |
|
||||
| P6-005 | done | Phase 6 | Pi TUI integration — mosaic tui | #61 | #50 |
|
||||
| P6-006 | done | Phase 6 | Verify Phase 6 — CLI functional, all subcommands | — | #51 |
|
||||
| P7-009 | done | Phase 7 | Web chat — WebSocket integration, streaming, conversation switching | #136 | #120 W1 done |
|
||||
| P7-001 | done | Phase 7 | MCP endpoint hardening — streamable HTTP transport | #137 | #52 W1 done |
|
||||
| P7-010 | done | Phase 7 | Web conversation management — list, search, rename, delete, archive | #139 | #121 W2 done |
|
||||
| P7-015 | done | Phase 7 | Agent tool expansion — file ops, git, shell exec, web fetch | #138 | #126 W2 done |
|
||||
| P7-011 | done | Phase 7 | Web project detail views — missions, tasks, PRDs, dashboards | #140 | #122 W3 done |
|
||||
| P7-016 | done | Phase 7 | MCP client — gateway connects to external MCP servers as tools | #141 | #127 W3 done |
|
||||
| P7-012 | done | Phase 7 | Web provider management UI — add, configure, test LLM providers | #142 | #123 W4 done |
|
||||
| P7-017 | done | Phase 7 | Agent skill invocation — load and execute skills from catalog | #143 | #128 W4 done |
|
||||
| P7-013 | done | Phase 7 | Web settings persistence — profile, preferences save to DB | #145 | #124 W5 done |
|
||||
| P7-018 | done | Phase 7 | CLI model/provider switching — --model, --provider, /model in TUI | #144 | #129 W5 done |
|
||||
| P7-014 | done | Phase 7 | Web admin panel — user CRUD, role assignment, system health | #150 | #125 W6 done |
|
||||
| P7-019 | done | Phase 7 | CLI session management — list, resume, destroy sessions | #146 | #130 W6 done |
|
||||
| P7-020 | done | Phase 7 | Coord DB migration — project-scoped missions, multi-tenant RBAC | #149 | #131 W7 done |
|
||||
| FIX-02 | done | Backlog | TUI agent:end — fix React state updater side-effect | #147 | #133 W8 done |
|
||||
| FIX-03 | done | Backlog | Agent session — cwd sandbox, system prompt, tool restrictions | #148 | #134 W8 done |
|
||||
| P7-004 | done | Phase 7 | E2E test suite — Playwright critical paths | #152 | #55 W9 done |
|
||||
| P7-006 | done | Phase 7 | Documentation — user guide, admin guide, dev guide | #151 | #57 W9 done |
|
||||
| P7-007 | done | Phase 7 | Bare-metal deployment docs + .env.example | #153 | #58 W9 done |
|
||||
| P7-021 | done | Phase 7 | Verify Phase 7 — feature-complete platform E2E | — | #132 W10 done |
|
||||
| P8-005 | done | Phase 8 | CLI command architecture — DB schema + brain repo + gateway endpoints | #158 | |
|
||||
| P8-006 | done | Phase 8 | CLI command architecture — agent, mission, prdy commands + TUI mods | #158 | |
|
||||
| P8-007 | done | Phase 8 | DB migrations — preferences.mutable + teams + team_members + projects.teamId | #175 | #160 |
|
||||
| P8-008 | done | Phase 8 | @mosaic/types — CommandDef, CommandManifest, new socket events | #174 | #161 |
|
||||
| P8-009 | done | Phase 8 | TUI Phase 1 — slash command parsing, local commands, system message rendering, InputBar wiring | #176 | #162 |
|
||||
| P8-010 | done | Phase 8 | Gateway Phase 2 — CommandRegistryService, CommandExecutorService, socket + REST commands | #178 | #163 |
|
||||
| P8-011 | done | Phase 8 | Gateway Phase 3 — PreferencesService, /preferences REST, /system Valkey override, prompt injection | #180 | #164 |
|
||||
| P8-012 | done | Phase 8 | Gateway Phase 4 — /agent, /provider (URL+clipboard), /mission, /prdy, /tools commands | #181 | #165 |
|
||||
| P8-013 | done | Phase 8 | Gateway Phase 5 — MosaicPlugin lifecycle, ReloadService, hot reload, system:reload TUI | #182 | #166 |
|
||||
| P8-014 | done | Phase 8 | Gateway Phase 6 — SessionGCService (all tiers), /gc command, cron integration | #179 | #167 |
|
||||
| P8-015 | done | Phase 8 | Gateway Phase 7 — WorkspaceService, ProjectBootstrapService, teams project ownership | #183 | #168 |
|
||||
| P8-016 | done | Phase 8 | Security — file/git/shell tool strict path hardening, sandbox escape prevention | #177 | #169 |
|
||||
| P8-017 | done | Phase 8 | TUI Phase 8 — autocomplete sidebar, fuzzy match, arg hints, up-arrow history | #184 | #170 |
|
||||
| P8-018 | done | Phase 8 | Spin-off plan stubs — Gatekeeper, Task Queue Unification, Chroot Sandboxing | — | #171 |
|
||||
| P8-019 | done | Phase 8 | Verify Platform Architecture — integration + E2E verification | #185 | #172 |
|
||||
| P8-001 | done | codex | Phase 8 | Additional SSO providers — WorkOS + Keycloak | #220 | #53 |
|
||||
| P8-002 | done | codex | Phase 8 | Additional LLM providers — Codex, Z.ai, LM Studio, llama.cpp | #212 | #54 |
|
||||
| P8-003 | done | codex | Phase 8 | Performance optimization | #211 | #56 |
|
||||
| P8-004 | done | haiku | Phase 8 | Beta release gate — v0.1.0 tag | — | #59 |
|
||||
| FIX-01 | done | Backlog | Call piSession.dispose() in AgentService.destroySession | #78 | #62 |
|
||||
| id | status | agent | milestone | description | pr | notes |
|
||||
| ------ | ----------- | ------ | ------------------ | --------------------------------------------------------------------------------------------------------------- | --- | ------------------------- |
|
||||
| M1-001 | not-started | sonnet | M1: Persistence | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages | — | #224 |
|
||||
| M1-002 | not-started | sonnet | M1: Persistence | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) | — | #225 |
|
||||
| M1-003 | not-started | sonnet | M1: Persistence | Store message metadata: model used, provider, token counts, tool call details, timestamps | — | #226 |
|
||||
| M1-004 | not-started | sonnet | M1: Persistence | On session resume, load message history from DB and inject into Pi session context | — | #227 |
|
||||
| M1-005 | not-started | opus | M1: Persistence | Context window management: summarize older messages when history exceeds 80% of model context | — | #228 |
|
||||
| M1-006 | not-started | sonnet | M1: Persistence | Conversation search: full-text search on messages table via /api/conversations/search | — | #229 |
|
||||
| M1-007 | not-started | sonnet | M1: Persistence | TUI: /history command to display conversation message count and context usage | — | #230 |
|
||||
| M1-008 | not-started | haiku | M1: Persistence | Verify: send messages → kill TUI → resume with -c → agent references prior context | — | #231 |
|
||||
| M2-001 | not-started | sonnet | M2: Security | Audit InsightsRepo: add userId filter to searchByEmbedding() vector search | — | #232 |
|
||||
| M2-002 | not-started | sonnet | M2: Security | Audit InsightsRepo: add userId filter to findByUser(), decayOldInsights() | — | #233 |
|
||||
| M2-003 | not-started | sonnet | M2: Security | Audit PreferencesRepo: verify all queries filter by userId | — | #234 |
|
||||
| M2-004 | not-started | sonnet | M2: Security | Audit agent memory tools: verify memory*search, memory_save*_, memory*get*_ scope to session user | — | #235 |
|
||||
| M2-005 | not-started | sonnet | M2: Security | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages | — | #236 |
|
||||
| M2-006 | not-started | sonnet | M2: Security | Audit AgentsRepo: verify findAccessible() returns only user's agents + system agents | — | #237 |
|
||||
| M2-007 | not-started | sonnet | M2: Security | Integration test: create two users, populate data, verify cross-user isolation on every query path | — | #238 TDD |
|
||||
| M2-008 | not-started | sonnet | M2: Security | Audit Valkey keys: verify session keys include userId or are not enumerable across users | — | #239 |
|
||||
| M3-001 | not-started | opus | M3: Providers | Refactor ProviderService into IProviderAdapter pattern: register(), listModels(), healthCheck(), createClient() | — | #240 Verify Pi SDK compat |
|
||||
| M3-002 | not-started | sonnet | M3: Providers | Anthropic adapter: @anthropic-ai/sdk, Claude Sonnet 4.6 + Opus 4.6 + Haiku 4.5, OAuth + API key | — | #241 |
|
||||
| M3-003 | not-started | sonnet | M3: Providers | OpenAI adapter: openai SDK, Codex gpt-5.4, OAuth + API key | — | #242 |
|
||||
| M3-004 | not-started | sonnet | M3: Providers | OpenRouter adapter: OpenAI-compatible client, API key, dynamic model list from /api/v1/models | — | #243 |
|
||||
| M3-005 | not-started | sonnet | M3: Providers | Z.ai GLM adapter: GLM-5, API key, research API format | — | #244 |
|
||||
| M3-006 | not-started | sonnet | M3: Providers | Ollama adapter: refactor existing integration into adapter pattern, add embedding model support | — | #245 |
|
||||
| M3-007 | not-started | sonnet | M3: Providers | Provider health check: periodic probe, configurable interval, status per provider, /api/providers/health | — | #246 |
|
||||
| M3-008 | not-started | sonnet | M3: Providers | Model capability matrix: per-model metadata (tier, context window, tool support, vision, streaming, embedding) | — | #247 |
|
||||
| M3-009 | not-started | sonnet | M3: Providers | Refactor EmbeddingService: provider-agnostic interface, Ollama default (nomic-embed-text or mxbai-embed-large) | — | #248 Dim migration |
|
||||
| M3-010 | not-started | sonnet | M3: Providers | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow | — | #249 |
|
||||
| M3-011 | not-started | sonnet | M3: Providers | Provider config UI support: /api/providers CRUD for user-scoped provider credentials | — | #250 |
|
||||
| M3-012 | not-started | haiku | M3: Providers | Verify: each provider connects, lists models, completes chat request, handles errors | — | #251 |
|
||||
| M4-001 | not-started | opus | M4: Routing | Define routing rule schema: RoutingRule { name, priority, conditions[], action } stored in DB | — | #252 DB migration |
|
||||
| M4-002 | not-started | opus | M4: Routing | Condition types: taskType, complexity, domain, costTier, requiredCapabilities | — | #253 |
|
||||
| M4-003 | not-started | opus | M4: Routing | Action types: routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? } | — | #254 |
|
||||
| M4-004 | not-started | sonnet | M4: Routing | Default routing rules seed data: coding→Opus, Q&A→Sonnet, summarization→GLM-5, research→Codex, offline→Ollama | — | #255 |
|
||||
| M4-005 | not-started | opus | M4: Routing | Task classification: infer taskType + complexity from user message (regex/keyword first, LLM-assisted later) | — | #256 |
|
||||
| M4-006 | not-started | opus | M4: Routing | Routing decision pipeline: classify → match rules → check health → fallback chain → return result | — | #257 |
|
||||
| M4-007 | not-started | sonnet | M4: Routing | Routing override: /model forces specific model regardless of routing rules | — | #258 |
|
||||
| M4-008 | not-started | sonnet | M4: Routing | Routing transparency: include routing decision in session:info event (model + reason) | — | #259 |
|
||||
| M4-009 | not-started | sonnet | M4: Routing | Routing rules CRUD: /api/routing/rules — list, create, update, delete, reorder priority | — | #260 |
|
||||
| M4-010 | not-started | sonnet | M4: Routing | Per-user routing overrides: users customize default rules for their sessions | — | #261 |
|
||||
| M4-011 | not-started | sonnet | M4: Routing | Agent specialization: agents declare capabilities in config (domains, preferred models, tool sets) | — | #262 |
|
||||
| M4-012 | not-started | sonnet | M4: Routing | Routing integration: wire into ChatGateway — every message triggers routing before agent dispatch | — | #263 |
|
||||
| M4-013 | not-started | haiku | M4: Routing | Verify: coding→Opus, summarize→GLM-5, simple→Haiku, override via /model works | — | #264 |
|
||||
| M5-001 | not-started | sonnet | M5: Sessions | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tools, skills) | — | #265 |
|
||||
| M5-002 | not-started | sonnet | M5: Sessions | /model command: end-to-end wiring — TUI → socket → gateway switches provider/model → new messages use it | — | #266 |
|
||||
| M5-003 | not-started | sonnet | M5: Sessions | /agent command: switch agent config mid-session — loads new system prompt, tools, default model | — | #267 |
|
||||
| M5-004 | not-started | sonnet | M5: Sessions | Session ↔ conversation binding: persist sessionId on conversation record, resume via conversationId | — | #268 |
|
||||
| M5-005 | not-started | sonnet | M5: Sessions | Session info broadcast: on model/agent switch, emit session:info with updated state | — | #269 |
|
||||
| M5-006 | not-started | sonnet | M5: Sessions | Agent creation from TUI: /agent new command creates agent config via gateway API | — | #270 |
|
||||
| M5-007 | not-started | sonnet | M5: Sessions | Session metrics: per-session token usage, model switches, duration — persist in DB | — | #271 |
|
||||
| M5-008 | not-started | haiku | M5: Sessions | Verify: /model switches model, /agent switches agent, session resume loads config | — | #272 |
|
||||
| M6-001 | not-started | sonnet | M6: Jobs | Add BullMQ dependency, configure with Valkey connection | — | #273 Test compat first |
|
||||
| M6-002 | not-started | sonnet | M6: Jobs | Create queue service: typed job definitions, worker registration, error handling with exponential backoff | — | #274 |
|
||||
| M6-003 | not-started | sonnet | M6: Jobs | Migrate summarization cron → BullMQ repeatable job | — | #275 |
|
||||
| M6-004 | not-started | sonnet | M6: Jobs | Migrate GC (session cleanup) → BullMQ repeatable job | — | #276 |
|
||||
| M6-005 | not-started | sonnet | M6: Jobs | Migrate tier management (log archival) → BullMQ repeatable job | — | #277 |
|
||||
| M6-006 | not-started | sonnet | M6: Jobs | Admin jobs API: GET /api/admin/jobs — list, status, retry, pause/resume queues | — | #278 |
|
||||
| M6-007 | not-started | sonnet | M6: Jobs | Job event logging: emit job start/complete/fail events to agent_logs | — | #279 |
|
||||
| M6-008 | not-started | haiku | M6: Jobs | Verify: jobs execute on schedule, failure retries with backoff, admin endpoint shows history | — | #280 |
|
||||
| M7-001 | not-started | opus | M7: Channel Design | Define IChannelAdapter interface: lifecycle, message flow, identity mapping | — | #281 Architecture |
|
||||
| M7-002 | not-started | opus | M7: Channel Design | Define channel message protocol: canonical format all adapters translate to/from | — | #282 Architecture |
|
||||
| M7-003 | not-started | opus | M7: Channel Design | Design Matrix integration: appservice, room↔conversation, space↔team, agent ghosts, power levels | — | #283 Architecture |
|
||||
| M7-004 | not-started | opus | M7: Channel Design | Design conversation multiplexing: same conversation from TUI+WebUI+Matrix, real-time sync | — | #284 Architecture |
|
||||
| M7-005 | not-started | opus | M7: Channel Design | Design remote auth bridging: Matrix/Discord auth → Mosaic identity (token linking, OAuth bridge) | — | #285 Architecture |
|
||||
| M7-006 | not-started | opus | M7: Channel Design | Design agent-to-agent communication via Matrix rooms: room per agent pair, human observation | — | #286 Architecture |
|
||||
| M7-007 | not-started | opus | M7: Channel Design | Design multi-user isolation in Matrix: space-per-team, room visibility, encryption, admin access | — | #287 Architecture |
|
||||
| M7-008 | not-started | haiku | M7: Channel Design | Publish docs/architecture/channel-protocol.md — reviewed and approved | — | #288 |
|
||||
|
||||
60
docs/scratchpads/harness-20260321.md
Normal file
60
docs/scratchpads/harness-20260321.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Mission Scratchpad — Harness Foundation
|
||||
|
||||
> Append-only log. NEVER delete entries. NEVER overwrite sections.
|
||||
> This is the orchestrator's working memory across sessions.
|
||||
|
||||
## Original Mission Prompt
|
||||
|
||||
```
|
||||
Jason wants to get the gateway and TUI working as a real daily-driver harness.
|
||||
The system needs: multi-provider LLM access, task-aware agent routing, conversation persistence,
|
||||
security isolation, session hardening, job queue foundation, and channel protocol design for
|
||||
future Matrix/remote integration.
|
||||
|
||||
Provider decisions: Anthropic (Sonnet 4.6, Opus 4.6), OpenAI (Codex gpt-5.4), Z.ai (GLM-5),
|
||||
OpenRouter, Ollama. Embeddings via Ollama local models.
|
||||
|
||||
Pi SDK stays as agent runtime. Build with Matrix integration in mind but foundation first.
|
||||
Agent routing per task with granular specification is required.
|
||||
```
|
||||
|
||||
## Planning Decisions
|
||||
|
||||
### 2026-03-21 — Phase 9 PRD and mission setup
|
||||
|
||||
- PRD created as `docs/PRD-Harness_Foundation.md` with canonical Mosaic template format
|
||||
- 7 milestones, 71 tasks total
|
||||
- Milestone order: M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
|
||||
- M1 and M2 are hard prerequisites — no provider or routing work until conversations persist and data is user-scoped
|
||||
- Pi SDK kept as agent runtime; providers plug in via adapter pattern underneath
|
||||
- Embeddings migrated from OpenAI to Ollama local (nomic-embed-text or mxbai-embed-large)
|
||||
- BullMQ chosen for job queue (Valkey-compatible, TypeScript-native)
|
||||
- Channel protocol is design-only in this phase; Matrix implementation deferred to Phase 10
|
||||
- Models confirmed: Claude Sonnet 4.6, Opus 4.6, Haiku 4.5, Codex gpt-5.4, GLM-5, Ollama locals
|
||||
- Routing engine: rule-based classification first, LLM-assisted later
|
||||
- Default routing: coding-complex→Opus, coding-moderate→Sonnet, coding-simple→Codex, research→Codex, summarization→GLM-5, conversation→Sonnet, cheap/general→Haiku, offline→Ollama
|
||||
|
||||
### Architecture decisions
|
||||
|
||||
- Provider adapter pattern: each provider implements IProviderAdapter, registered in Pi SDK's provider registry
|
||||
- Routing flow: classify message → match rules by priority → check provider health → fallback chain → dispatch
|
||||
- Context window management: summarize older messages when history exceeds 80% of model context
|
||||
- OAuth pattern: URL-display + clipboard + Valkey poll token (same as P8-012 design)
|
||||
- Embedding dimension: migration from 1536 (OpenAI) to 768/1024 (Ollama) — may require re-embedding existing insights
|
||||
|
||||
## Session Log
|
||||
|
||||
| Session | Date | Milestone | Tasks Done | Outcome |
|
||||
| ------- | ---------- | --------- | -------------------------------- | ---------------------------------------------- |
|
||||
| 1 | 2026-03-21 | Planning | PRD, manifest, tasks, scratchpad | Mission initialized, planning gate in progress |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Z.ai GLM-5 API format — OpenAI-compatible or custom? (Research in M3-005)
|
||||
2. Which Ollama embedding model: nomic-embed-text (768-dim) vs mxbai-embed-large (1024-dim)? (Test in M3-009)
|
||||
3. Provider credentials: env vars for system defaults + DB for per-user overrides? (ASSUMPTION: hybrid)
|
||||
4. Pi SDK provider adapter support — needs verification in M3-001 before committing to adapter pattern
|
||||
|
||||
## Corrections
|
||||
|
||||
<!-- Record any corrections to earlier decisions or assumptions. -->
|
||||
Reference in New Issue
Block a user