stack/docs/PRD-Harness_Foundation.md

# PRD: Harness Foundation — Phase 9

## Metadata

- **Owner:** Jason Woltje
- **Date:** 2026-03-21
- **Status:** draft
- **Phase:** 9 (post-MVP)
- **Version Target:** v0.2.0
- **Agent Harness:** [Pi SDK](https://github.com/badlogic/pi-mono)
- **Best-Guess Mode:** true
- **Repo:** `git.mosaicstack.dev/mosaic/mosaic-stack`

---

## Problem Statement

Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness:

1. **Chat messages are fire-and-forget.** The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible.

2. **Single provider (Ollama) with local models only.** No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers.

3. **No task-aware agent routing.** A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization.

4. **Memory is not user-scoped.** Insight vector search returns all users' data. Deploying multi-user is a security violation.

5. **Agent configs exist in DB but are ignored.** Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The `/model` and `/agent` slash commands are stubbed.

6. **No job queue.** Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration.

7. **Plugin system is hollow.** Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+.

**What this phase solves:** Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control.

---

## Objectives

1. **Persistent conversations** — Every message saved, every conversation resumable, full context available across interfaces
2. **Multi-provider LLM access** — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows
3. **Task-aware agent routing** — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain
4. **Security isolation** — All data queries user-scoped, ready for multi-user deployment
5. **Session hardening** — Agent configs apply, model/agent switching works mid-session
6. **Reliable background processing** — BullMQ job queue replaces fragile cron
7. **Channel protocol design** — Architecture for Matrix and remote interfaces, built into the foundation now

---

## Scope

### In Scope

1. Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume
2. Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows
3. Task-aware agent routing — granular routing rules with task classification and fallback chains
4. Security isolation — user-scoped queries on all data paths (memory, conversations, agents)
5. Agent session hardening — configs apply, model/agent switching, session resume
6. Job queue — BullMQ replacing cron for background processing
7. Channel protocol design — architecture document for Matrix and remote interfaces
8. Embedding migration — Ollama-local embeddings replacing OpenAI dependency

### Out of Scope

1. Matrix homeserver deployment + appservice (Phase 10)
2. Multi-agent orchestration / supervisor-worker pattern (Phase 10+)
3. WebUI rebuild (future)
4. Self-managing memory — compaction, merge, forget (future)
5. Team workspace isolation (future)
6. Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix)
7. Fine-grained RBAC — project/agent/team roles (future)
8. Agent-to-agent communication (Phase 10+)

## User/Stakeholder Requirements

1. As a user, I can resume a conversation after closing the TUI and the agent remembers the full context
2. As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration
3. As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model)
4. As a user, I can override the automatic model selection with `/model <name>` at any time
5. As a user, I can switch between specialized agents mid-session with `/agent <name>`
6. As an admin, I can define routing rules that control which models handle which task types
7. As an admin, I can monitor background job health and retry failed jobs
8. As a user, my conversations, memories, and preferences are invisible to other users

## Functional Requirements

1. FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables
2. FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context
3. FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint
4. FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models)
5. FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern)
6. FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables
7. FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules
8. FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides
9. FR-9: Routing decisions are transparent — the TUI shows which model was selected and why
10. FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions
11. FR-11: `/model <name>` switches the active model for subsequent messages in the current session
12. FR-12: `/agent <name>` switches to a different agent config, loading its system prompt, tools, and default model
13. FR-13: All memory queries (insight vector search, preferences) filter by userId
14. FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring
15. FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency

## Non-Functional Requirements

1. **Security:** All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling.
2. **Performance:** Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests.
3. **Reliability:** BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart.
4. **Observability:** Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via `/api/providers/health`. Session metrics (tokens, model switches, duration) persisted in DB.

## Acceptance Criteria

- [ ] AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
- [ ] AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
- [ ] AC-3: Two users exist, User A's memory searches never return User B's data
- [ ] AC-4: `/model claude-sonnet-4-6` in TUI switches the active model for subsequent messages
- [ ] AC-5: `/agent coding-agent` in TUI switches to a different agent with different system prompt and tools
- [ ] AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via `/api/admin/jobs`
- [ ] AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
- [ ] AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
- [ ] AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
- [ ] AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response

## Testing and Verification Expectations

1. **Baseline checks:** `pnpm typecheck`, `pnpm lint`, `pnpm format:check` — all green before any push
2. **Unit tests:** Routing engine rules matching, task classifier, provider adapter registration, message persistence
3. **Integration tests:** Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008)
4. **Situational tests per milestone:** Each milestone has a verify task that exercises the delivered functionality end-to-end
5. **Evidence format:** Test output + manual verification notes in scratchpad per milestone

## Constraints and Dependencies

| Type       | Item                            | Notes                                                                                  |
| ---------- | ------------------------------- | -------------------------------------------------------------------------------------- |
| Dependency | `@anthropic-ai/sdk`             | npm, required for M3-002                                                               |
| Dependency | `openai`                        | npm, required for M3-003                                                               |
| Dependency | `bullmq`                        | npm, Valkey-compatible, required for M6                                                |
| Dependency | Ollama embedding models         | `ollama pull nomic-embed-text`, required for M3-009                                    |
| Dependency | Pi SDK provider adapter support | ASSUMPTION: supported — verify in M3-001                                               |
| External   | Anthropic OAuth credentials     | Requires Anthropic Console setup                                                       |
| External   | OpenAI OAuth credentials        | Requires OpenAI Platform setup                                                         |
| External   | Z.ai API key                    | Requires Z.ai account                                                                  |
| External   | OpenRouter API key              | Requires OpenRouter account                                                            |
| Constraint | Valkey 8 compatibility          | BullMQ requires Redis 6+; Valkey 8 is compatible                                       |
| Constraint | Embedding dimension migration   | Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start |

---

## Assumptions

1. ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. **Rationale:** Gateway already uses Pi with Ollama via a custom adapter pattern.
2. ASSUMPTION: BullMQ is Valkey-compatible. **Rationale:** BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible.
3. ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. **Rationale:** Ollama supports embedding endpoints natively.
4. ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). **Rationale:** Both providers offer standard OAuth 2.0 flows.
5. ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. **Rationale:** Most LLM providers converge on OpenAI-compatible APIs.
6. ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. **Rationale:** Acceptable fallback — context is persisted in DB.
7. ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. **Rationale:** Matrix protocol is well-documented; design is architecture, not integration.

---

## Milestones

### Milestone 1: Conversation Persistence & Context

**Goal:** Every message persisted. Every conversation resumable with full context.

| Task   | Description                                                                                                  |
| ------ | ------------------------------------------------------------------------------------------------------------ |
| M1-001 | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages                          |
| M1-002 | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) |
| M1-003 | Store message metadata: model used, provider, token counts, tool call details, timestamps                    |
| M1-004 | On session resume (existing conversationId), load message history from DB and inject into Pi session context |
| M1-005 | Context window management: if history exceeds model context, summarize older messages and prepend summary    |
| M1-006 | Conversation search: full-text search on messages table via `/api/conversations/search`                      |
| M1-007 | TUI: `/history` command to display conversation message count and context usage                              |
| M1-008 | Verify: send messages → kill TUI → resume with `-c <id>` → agent references prior context                    |

### Milestone 2: Security & Isolation

**Goal:** All data queries user-scoped. Safe for multi-user deployment.

| Task   | Description                                                                                                     |
| ------ | --------------------------------------------------------------------------------------------------------------- |
| M2-001 | Audit InsightsRepo: add `userId` filter to `searchByEmbedding()` vector search                                  |
| M2-002 | Audit InsightsRepo: add `userId` filter to `findByUser()`, `decayOldInsights()`                                 |
| M2-003 | Audit PreferencesRepo: verify all queries filter by userId                                                      |
| M2-004 | Audit agent memory tools: verify `memory_search`, `memory_save_*`, `memory_get_*` all scope to session user     |
| M2-005 | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages           |
| M2-006 | Audit AgentsRepo: verify `findAccessible()` returns only user's agents + system agents                          |
| M2-007 | Add integration test: create two users, populate data for each, verify cross-user isolation on every query path |
| M2-008 | Audit Valkey keys: verify session keys include userId or are not enumerable across users                        |

### Milestone 3: Provider Integration

**Goal:** Five providers operational with proper auth, health checking, and capability metadata.

| Task   | Description                                                                                                                                               |
| ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| M3-001 | Refactor ProviderService into provider adapter pattern: `IProviderAdapter` interface with `register()`, `listModels()`, `healthCheck()`, `createClient()` |
| M3-002 | Anthropic adapter: `@anthropic-ai/sdk`, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback                      |
| M3-003 | OpenAI adapter: `openai` SDK, register Codex gpt-5.4, OAuth flow, API key fallback                                                                        |
| M3-004 | OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from `/api/v1/models`                                                      |
| M3-005 | Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format                                                                         |
| M3-006 | Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support                                                    |
| M3-007 | Provider health check: periodic probe (configurable interval), status per provider, expose via `/api/providers/health`                                    |
| M3-008 | Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable)                             |
| M3-009 | Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large)    |
| M3-010 | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow                                                                     |
| M3-011 | Provider config UI support: `/api/providers` CRUD for user-scoped provider credentials                                                                    |
| M3-012 | Verify: each provider connects, lists models, completes a chat request, handles errors gracefully                                                         |

### Milestone 4: Agent Routing Engine

**Goal:** Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization.

| Task   | Description                                                                                                                                                                                                                                                                                            |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| M4-001 | Define routing rule schema: `RoutingRule { name, priority, conditions[], action }` stored in DB                                                                                                                                                                                                        |
| M4-002 | Condition types: `taskType` (coding, research, summarization, conversation, analysis, creative), `complexity` (simple, moderate, complex), `domain` (frontend, backend, devops, docs, general), `costTier` (cheap, standard, premium), `requiredCapabilities` (tools, vision, long-context, reasoning) |
| M4-003 | Action types: `routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? }`                                                                                                                                                                                                     |
| M4-004 | Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2                                                                                                                                        |
| M4-005 | Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later)                                                                                                                                        |
| M4-006 | Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable                                                                                                                                                      |
| M4-007 | Routing override: user can force a specific model via `/model <name>` regardless of routing rules                                                                                                                                                                                                      |
| M4-008 | Routing transparency: include routing decision in `session:info` event (why this model was selected)                                                                                                                                                                                                   |
| M4-009 | Routing rules CRUD: `/api/routing/rules` — list, create, update, delete, reorder priority                                                                                                                                                                                                              |
| M4-010 | Per-user routing overrides: users can customize default rules for their sessions                                                                                                                                                                                                                       |
| M4-011 | Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets)                                                                                                                                                                                           |
| M4-012 | Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch                                                                                                                                                                          |
| M4-013 | Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier                                                                                                                                                                |

### Milestone 5: Agent Session Hardening

**Goal:** Agent configs apply to sessions. Model and agent switching work mid-session.

| Task   | Description                                                                                                                                |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| M5-001 | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills)                    |
| M5-002 | `/model <name>` command: end-to-end wiring — TUI → socket `command:execute` → gateway switches provider/model → new messages use new model |
| M5-003 | `/agent <name>` command: switch to different agent config mid-session — loads new system prompt, tools, and default model                  |
| M5-004 | Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID                         |
| M5-005 | Session info broadcast: on model/agent switch, emit `session:info` with updated provider, model, agent name                                |
| M5-006 | Agent creation from TUI: `/agent new` command creates agent config via gateway API                                                         |
| M5-007 | Session metrics: track per-session token usage, model switches, duration — persist in DB                                                   |
| M5-008 | Verify: start TUI → `/model claude-opus-4-6` → verify response uses Opus → `/agent research-bot` → verify system prompt changes            |

### Milestone 6: Job Queue Foundation

**Goal:** Reliable background processing via BullMQ. Foundation for future agent task orchestration.

| Task   | Description                                                                                                  |
| ------ | ------------------------------------------------------------------------------------------------------------ |
| M6-001 | Add BullMQ dependency, configure with Valkey connection                                                      |
| M6-002 | Create queue service: typed job definitions, worker registration, error handling with exponential backoff    |
| M6-003 | Migrate summarization cron → BullMQ repeatable job                                                           |
| M6-004 | Migrate GC (session cleanup) → BullMQ repeatable job                                                         |
| M6-005 | Migrate tier management (log archival) → BullMQ repeatable job                                               |
| M6-006 | Admin jobs API: `GET /api/admin/jobs` — list active/completed/failed jobs, retry failed, pause/resume queues |
| M6-007 | Job event logging: emit job start/complete/fail events to agent_logs for observability                       |
| M6-008 | Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history  |

### Milestone 7: Channel Protocol Design

**Goal:** Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites.

| Task   | Description                                                                                                                                                                                |
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| M7-001 | Define `IChannelAdapter` interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user) |
| M7-002 | Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context)                                             |
| M7-003 | Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation                               |
| M7-004 | Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events                                                 |
| M7-005 | Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning)                                                 |
| M7-006 | Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue                                         |
| M7-007 | Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility                                                                  |
| M7-008 | Publish architecture doc: `docs/architecture/channel-protocol.md` — reviewed and approved before Phase 10                                                                                  |

---

## Technical Approach

### Pi SDK Provider Adapter Pattern

The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi:

```
Provider SDKs (@anthropic-ai/sdk, openai, etc.)
  → IProviderAdapter implementations
    → ProviderRegistry (Pi SDK compatible)
      → Agent Session (Pi SDK) — tool loops, streaming, context
        → AgentService — lifecycle, routing, events
          → ChatGateway — WebSocket to all interfaces
```

Adding a provider means implementing `IProviderAdapter`. Everything above stays unchanged.

### Routing Decision Flow

```
User sends message
  → Task classifier (regex/keyword, optionally LLM-assisted)
    → { taskType, complexity, domain, requiredCapabilities }
  → RoutingEngine.resolve(classification, userOverrides, availableProviders)
    → Match rules by priority
    → Check provider health
    → Apply fallback chain
    → Return { provider, model, agentConfigId }
  → AgentService.createOrResumeSession(routingResult)
    → Session uses selected provider/model
  → Emit session:info with routing decision explanation
```

### Embedding Strategy

Replace OpenAI-hardcoded embedding service with provider-agnostic interface:

- **Default:** Ollama serving `nomic-embed-text` (768-dim) or `mxbai-embed-large` (1024-dim)
- **Fallback:** Any OpenAI-compatible embedding API
- **Migration:** Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models)
- **No external API dependency** for vector operations in default configuration

### Context Window Management

When conversation history exceeds model context:

1. Calculate token count of full history
2. If exceeds 80% of model context window, trigger summarization
3. Summarize oldest N messages into a condensed context block
4. Prepend summary + keep recent messages within context budget
5. Store summary as a "context checkpoint" message in DB

### Model Reference

| Provider   | Model             | Tier       | Context | Tools  | Vision | Embedding      |
| ---------- | ----------------- | ---------- | ------- | ------ | ------ | -------------- |
| Anthropic  | Claude Opus 4.6   | premium    | 200K    | yes    | yes    | no             |
| Anthropic  | Claude Sonnet 4.6 | standard   | 200K    | yes    | yes    | no             |
| Anthropic  | Claude Haiku 4.5  | cheap      | 200K    | yes    | yes    | no             |
| OpenAI     | Codex gpt-5.4     | premium    | 128K+   | yes    | yes    | no             |
| Z.ai       | GLM-5             | standard   | TBD     | TBD    | TBD    | no             |
| OpenRouter | varies            | varies     | varies  | varies | varies | no             |
| Ollama     | llama3.2          | local/free | 128K    | yes    | no     | no             |
| Ollama     | nomic-embed-text  | —          | —       | —      | —      | yes (768-dim)  |
| Ollama     | mxbai-embed-large | —          | —       | —      | —      | yes (1024-dim) |

### Default Routing Rules (Seed Data)

| Priority | Condition                                                     | Route To      |
| -------- | ------------------------------------------------------------- | ------------- |
| 1        | taskType=coding AND complexity=complex                        | Opus 4.6      |
| 2        | taskType=coding AND complexity=moderate                       | Sonnet 4.6    |
| 3        | taskType=coding AND complexity=simple                         | Codex gpt-5.4 |
| 4        | taskType=research                                             | Codex gpt-5.4 |
| 5        | taskType=summarization                                        | GLM-5         |
| 6        | taskType=analysis AND requiredCapabilities includes reasoning | Opus 4.6      |
| 7        | taskType=conversation                                         | Sonnet 4.6    |
| 8        | taskType=creative                                             | Sonnet 4.6    |
| 9        | costTier=cheap OR domain=general                              | Haiku 4.5     |
| 10       | fallback (no rule matched)                                    | Sonnet 4.6    |
| 99       | provider=ollama forced OR offline mode                        | llama3.2      |

Rules are user-customizable. Admins set system defaults; users override for their sessions.

---

## Risks and Open Questions

| Risk                                                    | Impact                    | Mitigation                                                                                                        |
| ------------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| Pi SDK doesn't support custom provider adapters cleanly | High — blocks M3          | Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format |
| BullMQ + Valkey incompatibility                         | Medium — blocks M6        | Test in M6-001 before migrating jobs; fallback: use `bullmq` with `ioredis` directly                              |
| Embedding dimension migration (1536 → 768/1024)         | Medium — data migration   | Run migration script to re-embed existing insights; or start fresh if insight count is low                        |
| Z.ai GLM-5 API undocumented                             | Low — blocks one provider | Deprioritize; other 4 providers cover all use cases                                                               |
| Context window summarization quality                    | Medium — affects UX       | Start with simple truncation; add LLM summarization iteratively                                                   |
| OAuth flow complexity in TUI (no browser redirect)      | Medium                    | URL-display + clipboard + Valkey poll token pattern (already designed in P8-012)                                  |

### Open Questions

1. What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005)
2. Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later)
3. What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009)
4. Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides)

---

## Milestone / Delivery Intent

1. **Target version:** v0.2.0
2. **Milestone count:** 7
3. **Definition of done:** All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to `completed`
4. **Delivery order:** M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
5. **M1 and M2 are prerequisites** — no provider or routing work begins until conversations persist and data is user-scoped