Files
stack/docs/PRD-Harness_Foundation.md
Jason Woltje 36095ad80f
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
chore: bootstrap Harness Foundation mission (Phase 9) (#289)
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-03-21 20:10:48 +00:00

392 lines
34 KiB
Markdown

# PRD: Harness Foundation — Phase 9
## Metadata
- **Owner:** Jason Woltje
- **Date:** 2026-03-21
- **Status:** draft
- **Phase:** 9 (post-MVP)
- **Version Target:** v0.2.0
- **Agent Harness:** [Pi SDK](https://github.com/badlogic/pi-mono)
- **Best-Guess Mode:** true
- **Repo:** `git.mosaicstack.dev/mosaic/mosaic-stack`
---
## Problem Statement
Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness:
1. **Chat messages are fire-and-forget.** The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible.
2. **Single provider (Ollama) with local models only.** No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers.
3. **No task-aware agent routing.** A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization.
4. **Memory is not user-scoped.** Insight vector search returns all users' data. Deploying multi-user is a security violation.
5. **Agent configs exist in DB but are ignored.** Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The `/model` and `/agent` slash commands are stubbed.
6. **No job queue.** Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration.
7. **Plugin system is hollow.** Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+.
**What this phase solves:** Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control.
---
## Objectives
1. **Persistent conversations** — Every message saved, every conversation resumable, full context available across interfaces
2. **Multi-provider LLM access** — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows
3. **Task-aware agent routing** — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain
4. **Security isolation** — All data queries user-scoped, ready for multi-user deployment
5. **Session hardening** — Agent configs apply, model/agent switching works mid-session
6. **Reliable background processing** — BullMQ job queue replaces fragile cron
7. **Channel protocol design** — Architecture for Matrix and remote interfaces, built into the foundation now
---
## Scope
### In Scope
1. Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume
2. Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows
3. Task-aware agent routing — granular routing rules with task classification and fallback chains
4. Security isolation — user-scoped queries on all data paths (memory, conversations, agents)
5. Agent session hardening — configs apply, model/agent switching, session resume
6. Job queue — BullMQ replacing cron for background processing
7. Channel protocol design — architecture document for Matrix and remote interfaces
8. Embedding migration — Ollama-local embeddings replacing OpenAI dependency
### Out of Scope
1. Matrix homeserver deployment + appservice (Phase 10)
2. Multi-agent orchestration / supervisor-worker pattern (Phase 10+)
3. WebUI rebuild (future)
4. Self-managing memory — compaction, merge, forget (future)
5. Team workspace isolation (future)
6. Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix)
7. Fine-grained RBAC — project/agent/team roles (future)
8. Agent-to-agent communication (Phase 10+)
## User/Stakeholder Requirements
1. As a user, I can resume a conversation after closing the TUI and the agent remembers the full context
2. As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration
3. As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model)
4. As a user, I can override the automatic model selection with `/model <name>` at any time
5. As a user, I can switch between specialized agents mid-session with `/agent <name>`
6. As an admin, I can define routing rules that control which models handle which task types
7. As an admin, I can monitor background job health and retry failed jobs
8. As a user, my conversations, memories, and preferences are invisible to other users
## Functional Requirements
1. FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables
2. FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context
3. FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint
4. FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models)
5. FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern)
6. FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables
7. FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules
8. FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides
9. FR-9: Routing decisions are transparent — the TUI shows which model was selected and why
10. FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions
11. FR-11: `/model <name>` switches the active model for subsequent messages in the current session
12. FR-12: `/agent <name>` switches to a different agent config, loading its system prompt, tools, and default model
13. FR-13: All memory queries (insight vector search, preferences) filter by userId
14. FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring
15. FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency
## Non-Functional Requirements
1. **Security:** All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling.
2. **Performance:** Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests.
3. **Reliability:** BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart.
4. **Observability:** Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via `/api/providers/health`. Session metrics (tokens, model switches, duration) persisted in DB.
## Acceptance Criteria
- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
- [ ] AC-3: Two users exist, User A's memory searches never return User B's data
- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages
- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools
- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs`
- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response
## Testing and Verification Expectations
1. **Baseline checks:** `pnpm typecheck`, `pnpm lint`, `pnpm format:check` — all green before any push
2. **Unit tests:** Routing engine rules matching, task classifier, provider adapter registration, message persistence
3. **Integration tests:** Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008)
4. **Situational tests per milestone:** Each milestone has a verify task that exercises the delivered functionality end-to-end
5. **Evidence format:** Test output + manual verification notes in scratchpad per milestone
## Constraints and Dependencies
| Type | Item | Notes |
| ---------- | ------------------------------- | -------------------------------------------------------------------------------------- |
| Dependency | `@anthropic-ai/sdk` | npm, required for M3-002 |
| Dependency | `openai` | npm, required for M3-003 |
| Dependency | `bullmq` | npm, Valkey-compatible, required for M6 |
| Dependency | Ollama embedding models | `ollama pull nomic-embed-text`, required for M3-009 |
| Dependency | Pi SDK provider adapter support | ASSUMPTION: supported — verify in M3-001 |
| External | Anthropic OAuth credentials | Requires Anthropic Console setup |
| External | OpenAI OAuth credentials | Requires OpenAI Platform setup |
| External | Z.ai API key | Requires Z.ai account |
| External | OpenRouter API key | Requires OpenRouter account |
| Constraint | Valkey 8 compatibility | BullMQ requires Redis 6+; Valkey 8 is compatible |
| Constraint | Embedding dimension migration | Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start |
---
## Assumptions
1. ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. **Rationale:** Gateway already uses Pi with Ollama via a custom adapter pattern.
2. ASSUMPTION: BullMQ is Valkey-compatible. **Rationale:** BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible.
3. ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. **Rationale:** Ollama supports embedding endpoints natively.
4. ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). **Rationale:** Both providers offer standard OAuth 2.0 flows.
5. ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. **Rationale:** Most LLM providers converge on OpenAI-compatible APIs.
6. ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. **Rationale:** Acceptable fallback — context is persisted in DB.
7. ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. **Rationale:** Matrix protocol is well-documented; design is architecture, not integration.
---
## Milestones
### Milestone 1: Conversation Persistence & Context
**Goal:** Every message persisted. Every conversation resumable with full context.
| Task | Description |
| ------ | ------------------------------------------------------------------------------------------------------------ |
| M1-001 | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages |
| M1-002 | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) |
| M1-003 | Store message metadata: model used, provider, token counts, tool call details, timestamps |
| M1-004 | On session resume (existing conversationId), load message history from DB and inject into Pi session context |
| M1-005 | Context window management: if history exceeds model context, summarize older messages and prepend summary |
| M1-006 | Conversation search: full-text search on messages table via `/api/conversations/search` |
| M1-007 | TUI: `/history` command to display conversation message count and context usage |
| M1-008 | Verify: send messages → kill TUI → resume with `-c <id>` → agent references prior context |
### Milestone 2: Security & Isolation
**Goal:** All data queries user-scoped. Safe for multi-user deployment.
| Task | Description |
| ------ | --------------------------------------------------------------------------------------------------------------- |
| M2-001 | Audit InsightsRepo: add `userId` filter to `searchByEmbedding()` vector search |
| M2-002 | Audit InsightsRepo: add `userId` filter to `findByUser()`, `decayOldInsights()` |
| M2-003 | Audit PreferencesRepo: verify all queries filter by userId |
| M2-004 | Audit agent memory tools: verify `memory_search`, `memory_save_*`, `memory_get_*` all scope to session user |
| M2-005 | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages |
| M2-006 | Audit AgentsRepo: verify `findAccessible()` returns only user's agents + system agents |
| M2-007 | Add integration test: create two users, populate data for each, verify cross-user isolation on every query path |
| M2-008 | Audit Valkey keys: verify session keys include userId or are not enumerable across users |
### Milestone 3: Provider Integration
**Goal:** Five providers operational with proper auth, health checking, and capability metadata.
| Task | Description |
| ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| M3-001 | Refactor ProviderService into provider adapter pattern: `IProviderAdapter` interface with `register()`, `listModels()`, `healthCheck()`, `createClient()` |
| M3-002 | Anthropic adapter: `@anthropic-ai/sdk`, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback |
| M3-003 | OpenAI adapter: `openai` SDK, register Codex gpt-5.4, OAuth flow, API key fallback |
| M3-004 | OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from `/api/v1/models` |
| M3-005 | Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format |
| M3-006 | Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support |
| M3-007 | Provider health check: periodic probe (configurable interval), status per provider, expose via `/api/providers/health` |
| M3-008 | Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable) |
| M3-009 | Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large) |
| M3-010 | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow |
| M3-011 | Provider config UI support: `/api/providers` CRUD for user-scoped provider credentials |
| M3-012 | Verify: each provider connects, lists models, completes a chat request, handles errors gracefully |
### Milestone 4: Agent Routing Engine
**Goal:** Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization.
| Task | Description |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| M4-001 | Define routing rule schema: `RoutingRule { name, priority, conditions[], action }` stored in DB |
| M4-002 | Condition types: `taskType` (coding, research, summarization, conversation, analysis, creative), `complexity` (simple, moderate, complex), `domain` (frontend, backend, devops, docs, general), `costTier` (cheap, standard, premium), `requiredCapabilities` (tools, vision, long-context, reasoning) |
| M4-003 | Action types: `routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? }` |
| M4-004 | Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2 |
| M4-005 | Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later) |
| M4-006 | Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable |
| M4-007 | Routing override: user can force a specific model via `/model <name>` regardless of routing rules |
| M4-008 | Routing transparency: include routing decision in `session:info` event (why this model was selected) |
| M4-009 | Routing rules CRUD: `/api/routing/rules` — list, create, update, delete, reorder priority |
| M4-010 | Per-user routing overrides: users can customize default rules for their sessions |
| M4-011 | Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets) |
| M4-012 | Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch |
| M4-013 | Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier |
### Milestone 5: Agent Session Hardening
**Goal:** Agent configs apply to sessions. Model and agent switching work mid-session.
| Task | Description |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| M5-001 | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills) |
| M5-002 | `/model <name>` command: end-to-end wiring — TUI → socket `command:execute` → gateway switches provider/model → new messages use new model |
| M5-003 | `/agent <name>` command: switch to different agent config mid-session — loads new system prompt, tools, and default model |
| M5-004 | Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID |
| M5-005 | Session info broadcast: on model/agent switch, emit `session:info` with updated provider, model, agent name |
| M5-006 | Agent creation from TUI: `/agent new` command creates agent config via gateway API |
| M5-007 | Session metrics: track per-session token usage, model switches, duration — persist in DB |
| M5-008 | Verify: start TUI → `/model claude-opus-4-6` → verify response uses Opus → `/agent research-bot` → verify system prompt changes |
### Milestone 6: Job Queue Foundation
**Goal:** Reliable background processing via BullMQ. Foundation for future agent task orchestration.
| Task | Description |
| ------ | ------------------------------------------------------------------------------------------------------------ |
| M6-001 | Add BullMQ dependency, configure with Valkey connection |
| M6-002 | Create queue service: typed job definitions, worker registration, error handling with exponential backoff |
| M6-003 | Migrate summarization cron → BullMQ repeatable job |
| M6-004 | Migrate GC (session cleanup) → BullMQ repeatable job |
| M6-005 | Migrate tier management (log archival) → BullMQ repeatable job |
| M6-006 | Admin jobs API: `GET /api/admin/jobs` — list active/completed/failed jobs, retry failed, pause/resume queues |
| M6-007 | Job event logging: emit job start/complete/fail events to agent_logs for observability |
| M6-008 | Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history |
### Milestone 7: Channel Protocol Design
**Goal:** Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites.
| Task | Description |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| M7-001 | Define `IChannelAdapter` interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user) |
| M7-002 | Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context) |
| M7-003 | Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation |
| M7-004 | Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events |
| M7-005 | Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning) |
| M7-006 | Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue |
| M7-007 | Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility |
| M7-008 | Publish architecture doc: `docs/architecture/channel-protocol.md` — reviewed and approved before Phase 10 |
---
## Technical Approach
### Pi SDK Provider Adapter Pattern
The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi:
```
Provider SDKs (@anthropic-ai/sdk, openai, etc.)
→ IProviderAdapter implementations
→ ProviderRegistry (Pi SDK compatible)
→ Agent Session (Pi SDK) — tool loops, streaming, context
→ AgentService — lifecycle, routing, events
→ ChatGateway — WebSocket to all interfaces
```
Adding a provider means implementing `IProviderAdapter`. Everything above stays unchanged.
### Routing Decision Flow
```
User sends message
→ Task classifier (regex/keyword, optionally LLM-assisted)
→ { taskType, complexity, domain, requiredCapabilities }
→ RoutingEngine.resolve(classification, userOverrides, availableProviders)
→ Match rules by priority
→ Check provider health
→ Apply fallback chain
→ Return { provider, model, agentConfigId }
→ AgentService.createOrResumeSession(routingResult)
→ Session uses selected provider/model
→ Emit session:info with routing decision explanation
```
### Embedding Strategy
Replace OpenAI-hardcoded embedding service with provider-agnostic interface:
- **Default:** Ollama serving `nomic-embed-text` (768-dim) or `mxbai-embed-large` (1024-dim)
- **Fallback:** Any OpenAI-compatible embedding API
- **Migration:** Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models)
- **No external API dependency** for vector operations in default configuration
### Context Window Management
When conversation history exceeds model context:
1. Calculate token count of full history
2. If exceeds 80% of model context window, trigger summarization
3. Summarize oldest N messages into a condensed context block
4. Prepend summary + keep recent messages within context budget
5. Store summary as a "context checkpoint" message in DB
### Model Reference
| Provider | Model | Tier | Context | Tools | Vision | Embedding |
| ---------- | ----------------- | ---------- | ------- | ------ | ------ | -------------- |
| Anthropic | Claude Opus 4.6 | premium | 200K | yes | yes | no |
| Anthropic | Claude Sonnet 4.6 | standard | 200K | yes | yes | no |
| Anthropic | Claude Haiku 4.5 | cheap | 200K | yes | yes | no |
| OpenAI | Codex gpt-5.4 | premium | 128K+ | yes | yes | no |
| Z.ai | GLM-5 | standard | TBD | TBD | TBD | no |
| OpenRouter | varies | varies | varies | varies | varies | no |
| Ollama | llama3.2 | local/free | 128K | yes | no | no |
| Ollama | nomic-embed-text | — | — | — | — | yes (768-dim) |
| Ollama | mxbai-embed-large | — | — | — | — | yes (1024-dim) |
### Default Routing Rules (Seed Data)
| Priority | Condition | Route To |
| -------- | ------------------------------------------------------------- | ------------- |
| 1 | taskType=coding AND complexity=complex | Opus 4.6 |
| 2 | taskType=coding AND complexity=moderate | Sonnet 4.6 |
| 3 | taskType=coding AND complexity=simple | Codex gpt-5.4 |
| 4 | taskType=research | Codex gpt-5.4 |
| 5 | taskType=summarization | GLM-5 |
| 6 | taskType=analysis AND requiredCapabilities includes reasoning | Opus 4.6 |
| 7 | taskType=conversation | Sonnet 4.6 |
| 8 | taskType=creative | Sonnet 4.6 |
| 9 | costTier=cheap OR domain=general | Haiku 4.5 |
| 10 | fallback (no rule matched) | Sonnet 4.6 |
| 99 | provider=ollama forced OR offline mode | llama3.2 |
Rules are user-customizable. Admins set system defaults; users override for their sessions.
---
## Risks and Open Questions
| Risk | Impact | Mitigation |
| ------------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| Pi SDK doesn't support custom provider adapters cleanly | High — blocks M3 | Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format |
| BullMQ + Valkey incompatibility | Medium — blocks M6 | Test in M6-001 before migrating jobs; fallback: use `bullmq` with `ioredis` directly |
| Embedding dimension migration (1536 → 768/1024) | Medium — data migration | Run migration script to re-embed existing insights; or start fresh if insight count is low |
| Z.ai GLM-5 API undocumented | Low — blocks one provider | Deprioritize; other 4 providers cover all use cases |
| Context window summarization quality | Medium — affects UX | Start with simple truncation; add LLM summarization iteratively |
| OAuth flow complexity in TUI (no browser redirect) | Medium | URL-display + clipboard + Valkey poll token pattern (already designed in P8-012) |
### Open Questions
1. What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005)
2. Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later)
3. What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009)
4. Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides)
---
## Milestone / Delivery Intent
1. **Target version:** v0.2.0
2. **Milestone count:** 7
3. **Definition of done:** All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to `completed`
4. **Delivery order:** M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
5. **M1 and M2 are prerequisites** — no provider or routing work begins until conversations persist and data is user-scoped