# Knowledge Module - Design Document > **Status:** Draft > **Author:** Agent (Jarvis) > **Created:** 2025-01-29 > **Related:** [Agent Orchestration](./agent-orchestration.md) ## Problem Statement Development teams and AI agents working on complex projects need a way to: 1. **Capture decisions** — Why was X chosen over Y? 2. **Track connections** — How does component A relate to concept B? 3. **Search contextually** — Find relevant context without knowing exact keywords 4. **Evolve understanding** — Knowledge changes; track that evolution 5. **Share across boundaries** — Human and agent access to the same knowledge base ### Current Pain Points - **Scattered documentation** — README, comments, Slack threads, memory files - **No explicit linking** — Connections exist but aren't captured - **Agent amnesia** — Each session starts fresh, relies on file search - **No decision archaeology** — Hard to find *why* something was decided - **Human/agent mismatch** — Humans browse, agents grep ## Requirements ### Functional Requirements | ID | Requirement | Priority | |----|-------------|----------| | FR1 | Create, read, update, delete knowledge entries | P0 | | FR2 | Wiki-style linking between entries (`[[link]]` syntax) | P0 | | FR3 | Tagging and categorization | P0 | | FR4 | Full-text search | P0 | | FR5 | Semantic/vector search for agents | P1 | | FR6 | Graph visualization of connections | P1 | | FR7 | Version history and diff view | P1 | | FR8 | Timeline view of changes | P2 | | FR9 | Import from markdown files | P2 | | FR10 | Export to markdown/PDF | P2 | ### Non-Functional Requirements | ID | Requirement | Target | |----|-------------|--------| | NFR1 | Search response time | < 200ms | | NFR2 | Entry render time | < 100ms | | NFR3 | Graph render (< 1000 nodes) | < 500ms | | NFR4 | Multi-tenant isolation | Complete | | NFR5 | API-first design | All features via API | ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Mosaic Web UI │ ├─────────────────────────────────────────────────────────────────┤ │ Knowledge Browser │ Graph View │ Search │ Timeline │ └─────────┬───────────┴──────┬───────┴────┬─────┴────┬────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Knowledge API (NestJS) │ ├─────────────────────────────────────────────────────────────────┤ │ EntryController │ SearchController │ GraphController │ │ TagController │ LinkController │ VersionController │ └─────────┬─────────┴─────────┬──────────┴──────────┬─────────────┘ │ │ │ ▼ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ PostgreSQL │ │ Valkey │ │ Vector Store │ │ │ │ │ │ (pgvector) │ │ - entries │ │ - search cache │ │ │ │ - entry_versions │ │ - graph cache │ │ - embeddings │ │ - entry_links │ │ - hot entries │ │ - semantic index │ │ - tags │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────┘ ``` ## Data Model ### Core Entities ```prisma // Entry - A single knowledge entry (document/page) model KnowledgeEntry { id String @id @default(cuid()) workspaceId String workspace Workspace @relation(fields: [workspaceId], references: [id]) slug String // URL-friendly identifier title String content String @db.Text // Markdown content contentHtml String? @db.Text // Rendered HTML (cached) summary String? // Auto-generated or manual summary status EntryStatus @default(DRAFT) visibility Visibility @default(PRIVATE) // Metadata createdAt DateTime @default(now()) updatedAt DateTime @updatedAt createdBy String updatedBy String // Relations tags KnowledgeEntryTag[] outgoingLinks KnowledgeLink[] @relation("SourceEntry") incomingLinks KnowledgeLink[] @relation("TargetEntry") versions KnowledgeEntryVersion[] embedding KnowledgeEmbedding? @@unique([workspaceId, slug]) @@index([workspaceId, status]) @@index([workspaceId, updatedAt]) } enum EntryStatus { DRAFT PUBLISHED ARCHIVED } enum Visibility { PRIVATE // Only creator WORKSPACE // All workspace members PUBLIC // Anyone with link } // Version history model KnowledgeEntryVersion { id String @id @default(cuid()) entryId String entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade) version Int title String content String @db.Text summary String? createdAt DateTime @default(now()) createdBy String changeNote String? // Optional commit message @@unique([entryId, version]) @@index([entryId, version]) } // Wiki-style links between entries model KnowledgeLink { id String @id @default(cuid()) sourceId String source KnowledgeEntry @relation("SourceEntry", fields: [sourceId], references: [id], onDelete: Cascade) targetId String target KnowledgeEntry @relation("TargetEntry", fields: [targetId], references: [id], onDelete: Cascade) // Link metadata linkText String // The text used in [[link|display text]] context String? // Surrounding text for context createdAt DateTime @default(now()) @@unique([sourceId, targetId]) @@index([sourceId]) @@index([targetId]) } // Tags for categorization model KnowledgeTag { id String @id @default(cuid()) workspaceId String workspace Workspace @relation(fields: [workspaceId], references: [id]) name String slug String color String? // Hex color for UI description String? entries KnowledgeEntryTag[] @@unique([workspaceId, slug]) } model KnowledgeEntryTag { entryId String entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade) tagId String tag KnowledgeTag @relation(fields: [tagId], references: [id], onDelete: Cascade) @@id([entryId, tagId]) } // Vector embeddings for semantic search model KnowledgeEmbedding { id String @id @default(cuid()) entryId String @unique entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade) embedding Unsupported("vector(1536)") // OpenAI ada-002 dimension model String // Which model generated this createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@index([embedding], type: Hnsw(ops: VectorCosineOps)) } ``` ### Frontmatter Schema Entries support YAML frontmatter for structured metadata: ```yaml --- title: Agent Orchestration Design status: published tags: [architecture, agents, orchestration] created: 2025-01-29 updated: 2025-01-29 author: jarvis related: - "[[task-queues]]" - "[[valkey-patterns]]" decision: status: accepted date: 2025-01-29 participants: [jason, jarvis] supersedes: null --- ``` ## API Endpoints ### Entry Management ``` POST /api/knowledge/entries Create entry GET /api/knowledge/entries List entries (paginated) GET /api/knowledge/entries/:slug Get entry by slug PUT /api/knowledge/entries/:slug Update entry DELETE /api/knowledge/entries/:slug Delete entry (soft delete → archive) GET /api/knowledge/entries/:slug/versions List versions GET /api/knowledge/entries/:slug/versions/:v Get specific version POST /api/knowledge/entries/:slug/restore/:v Restore to version ``` ### Search ``` GET /api/knowledge/search?q=... Full-text search POST /api/knowledge/search/semantic Semantic search (vector) GET /api/knowledge/search/suggestions Autocomplete suggestions ``` ### Graph ``` GET /api/knowledge/graph Full graph (nodes + edges) GET /api/knowledge/graph/:slug Subgraph centered on entry GET /api/knowledge/graph/stats Graph statistics ``` ### Tags ``` GET /api/knowledge/tags List all tags POST /api/knowledge/tags Create tag PUT /api/knowledge/tags/:slug Update tag DELETE /api/knowledge/tags/:slug Delete tag GET /api/knowledge/tags/:slug/entries Entries with tag ``` ### Links ``` GET /api/knowledge/entries/:slug/links/outgoing Outgoing links GET /api/knowledge/entries/:slug/links/incoming Incoming links (backlinks) GET /api/knowledge/entries/:slug/links/broken Broken links POST /api/knowledge/links/resolve Resolve [[link]] to entry ``` ## Link Processing ### Wiki-Link Syntax The module supports Obsidian-compatible wiki-link syntax: ```markdown Basic link: [[entry-slug]] Display text: [[entry-slug|Custom Display Text]] Header link: [[entry-slug#section-header]] Block link: [[entry-slug#^block-id]] ``` ### Link Resolution Flow ``` ┌─────────────────┐ │ Entry Content │ │ "See [[design]] │ │ for details" │ └────────┬────────┘ │ Parse ▼ ┌─────────────────┐ │ Extract Links │ │ [[design]] │ └────────┬────────┘ │ Resolve ▼ ┌─────────────────┐ │ Find Target │ │ slug: "design" │ │ OR title match │ │ OR fuzzy match │ └────────┬────────┘ │ ┌────┴────┐ ▼ ▼ ┌───────┐ ┌───────────┐ │ Found │ │ Not Found │ │ │ │ (broken) │ └───┬───┘ └─────┬─────┘ │ │ ▼ ▼ ┌───────────────────────┐ │ Create/Update Link │ │ Record in entry_links │ │ Mark broken if needed │ └───────────────────────┘ ``` ### Automatic Link Detection On entry save: 1. Parse content for `[[...]]` patterns 2. Resolve each link to target entry 3. Update `KnowledgeLink` records 4. Flag broken links for UI warning ## Search Implementation ### Full-Text Search (PostgreSQL) ```sql -- Create search index ALTER TABLE knowledge_entries ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(summary, '')), 'B') || setweight(to_tsvector('english', coalesce(content, '')), 'C') ) STORED; CREATE INDEX idx_knowledge_search ON knowledge_entries USING GIN(search_vector); -- Search query SELECT id, slug, title, ts_rank(search_vector, query) as rank, ts_headline('english', content, query) as snippet FROM knowledge_entries, plainto_tsquery('english', $1) query WHERE search_vector @@ query AND workspace_id = $2 ORDER BY rank DESC LIMIT 20; ``` ### Semantic Search (pgvector) ```sql -- Semantic search query SELECT e.id, e.slug, e.title, e.summary, 1 - (emb.embedding <=> $1::vector) as similarity FROM knowledge_entries e JOIN knowledge_embeddings emb ON e.id = emb.entry_id WHERE e.workspace_id = $2 AND 1 - (emb.embedding <=> $1::vector) > 0.7 -- similarity threshold ORDER BY emb.embedding <=> $1::vector LIMIT 10; ``` ### Embedding Generation ```typescript async function generateEmbedding(entry: KnowledgeEntry): Promise { const text = `${entry.title}\n\n${entry.summary || ''}\n\n${entry.content}`; // Use OpenAI or local model const response = await openai.embeddings.create({ model: 'text-embedding-ada-002', input: text.slice(0, 8000), // Token limit }); return response.data[0].embedding; } ``` ## Graph Visualization ### Data Structure ```typescript interface KnowledgeGraph { nodes: GraphNode[]; edges: GraphEdge[]; stats: GraphStats; } interface GraphNode { id: string; slug: string; title: string; type: 'entry' | 'tag' | 'external'; status: EntryStatus; linkCount: number; // in + out tags: string[]; updatedAt: string; } interface GraphEdge { id: string; source: string; // node id target: string; // node id type: 'link' | 'tag'; label?: string; } interface GraphStats { nodeCount: number; edgeCount: number; orphanCount: number; // entries with no links brokenLinkCount: number; avgConnections: number; } ``` ### Graph Query ```sql -- Get full graph for workspace WITH nodes AS ( SELECT id, slug, title, 'entry' as type, status, (SELECT COUNT(*) FROM knowledge_links WHERE source_id = e.id OR target_id = e.id) as link_count, updated_at FROM knowledge_entries e WHERE workspace_id = $1 AND status != 'ARCHIVED' ), edges AS ( SELECT l.id, l.source_id as source, l.target_id as target, 'link' as type, l.link_text as label FROM knowledge_links l JOIN knowledge_entries e ON l.source_id = e.id WHERE e.workspace_id = $1 ) SELECT json_build_object( 'nodes', (SELECT json_agg(nodes) FROM nodes), 'edges', (SELECT json_agg(edges) FROM edges) ) as graph; ``` ### Frontend Rendering Use D3.js force-directed graph or Cytoscape.js: ```typescript // Graph component configuration const graphConfig = { layout: 'force-directed', physics: { repulsion: 100, springLength: 150, springStrength: 0.05, }, nodeSize: (node) => Math.sqrt(node.linkCount) * 10 + 20, nodeColor: (node) => { switch (node.status) { case 'PUBLISHED': return '#22c55e'; case 'DRAFT': return '#f59e0b'; case 'ARCHIVED': return '#6b7280'; } }, edgeStyle: { color: '#94a3b8', width: 1, arrows: 'to', }, }; ``` ## Caching Strategy ### Valkey Key Patterns ``` knowledge:{workspaceId}:entry:{slug} Entry cache (JSON) knowledge:{workspaceId}:entry:{slug}:html Rendered HTML cache knowledge:{workspaceId}:graph Full graph cache knowledge:{workspaceId}:graph:{slug} Subgraph cache knowledge:{workspaceId}:search:{hash} Search result cache knowledge:{workspaceId}:tags Tag list cache knowledge:{workspaceId}:recent Recent entries list ``` ### Cache Invalidation ```typescript async function invalidateEntryCache(workspaceId: string, slug: string) { const keys = [ `knowledge:${workspaceId}:entry:${slug}`, `knowledge:${workspaceId}:entry:${slug}:html`, `knowledge:${workspaceId}:graph`, // Full graph affected `knowledge:${workspaceId}:graph:${slug}`, `knowledge:${workspaceId}:recent`, ]; // Also invalidate subgraphs for linked entries const linkedSlugs = await getLinkedEntrySlugs(workspaceId, slug); for (const linked of linkedSlugs) { keys.push(`knowledge:${workspaceId}:graph:${linked}`); } await valkey.del(...keys); // Invalidate search caches (pattern delete) const searchKeys = await valkey.keys(`knowledge:${workspaceId}:search:*`); if (searchKeys.length) await valkey.del(...searchKeys); } ``` ## UI Components ### Entry Editor ``` ┌────────────────────────────────────────────────────────────────┐ │ [📄] Agent Orchestration Design [Save] [···]│ ├────────────────────────────────────────────────────────────────┤ │ Status: [Published ▼] Tags: [architecture] [agents] [+] │ ├────────────────────────────────────────────────────────────────┤ │ │ │ # Problem Statement │ │ │ │ Development teams and AI agents working on complex projects │ │ need a way to [[capture-decisions|capture decisions]]... │ │ │ │ See also: [[task-queues]] and [[valkey-patterns]] │ │ │ │ ─────────────────────────────────────────────────────────────│ │ Backlinks (3): │ │ • [[mosaic-roadmap]] - "...implements agent orchestration..." │ │ • [[design-index]] - "Core designs: [[agent-orchestration]]" │ │ • [[jarvis-memory]] - "Created orchestration design..." │ │ │ └────────────────────────────────────────────────────────────────┘ ``` ### Graph View ``` ┌────────────────────────────────────────────────────────────────┐ │ Knowledge Graph [Filter ▼] [Layout ▼] │ ├────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ │ │ │ valkey │ │ │ │patterns │ │ │ └────┬────┘ │ │ │ │ │ ┌────────────┼────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────┐ ┌────────┐ ┌────────┐ │ │ │cache │ │ task │ │ agent │◄─────┐ │ │ │layer │ │ queues │ │ orch │ │ │ │ └──────┘ └────────┘ └───┬────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌────────┐ ┌────┴───┐ │ │ │recovery│ │ mosaic │ │ │ │patterns│ │roadmap │ │ │ └────────┘ └────────┘ │ │ │ │ 🟢 Published (6) 🟡 Draft (2) ⚪ Orphan (0) │ └────────────────────────────────────────────────────────────────┘ ``` ### Search Results ``` ┌────────────────────────────────────────────────────────────────┐ │ 🔍 [agent recovery ] [Search]│ ├────────────────────────────────────────────────────────────────┤ │ │ │ 📄 Agent Orchestration - Recovery Patterns │ │ ...automatic **recovery** when an **agent** fails or the... │ │ Tags: architecture, agents • Updated 2 hours ago │ │ │ │ 📄 Agent Health Monitoring │ │ ...heartbeat monitoring enables **recovery** of stale... │ │ Tags: agents, monitoring • Updated 1 day ago │ │ │ │ 📄 Task Queue Design │ │ ...retry logic with exponential backoff for **agent**... │ │ Tags: architecture, queues • Updated 3 days ago │ │ │ │ ───────────────────────────────────────────────────────────── │ │ Also try: Semantic search for conceptually related entries │ │ │ └────────────────────────────────────────────────────────────────┘ ``` ## Implementation Phases ### Phase 1: Foundation (Week 1-2) **Goal:** Basic CRUD + storage working - [ ] Database schema + migrations - [ ] Entry CRUD API endpoints - [ ] Basic markdown rendering - [ ] Tag management - [ ] Entry list/detail pages **Deliverables:** - Can create, edit, view, delete entries - Tags work - Basic search (title/slug match) ### Phase 2: Linking (Week 2-3) **Goal:** Wiki-link functionality - [ ] Link parser (`[[...]]` syntax) - [ ] Link resolution logic - [ ] Broken link detection - [ ] Backlinks display - [ ] Link autocomplete in editor **Deliverables:** - Links between entries work - Backlinks show on entry pages - Editor suggests links as you type ### Phase 3: Search (Week 3-4) **Goal:** Full-text + semantic search - [ ] PostgreSQL full-text search setup - [ ] Search API endpoint - [ ] Search UI with highlighting - [ ] pgvector extension setup - [ ] Embedding generation pipeline - [ ] Semantic search API **Deliverables:** - Fast full-text search - Semantic search for "fuzzy" queries - Search results with snippets ### Phase 4: Graph (Week 4-5) **Goal:** Visual knowledge graph - [ ] Graph data API - [ ] D3.js/Cytoscape integration - [ ] Interactive graph view - [ ] Subgraph (entry-centered) view - [ ] Graph statistics **Deliverables:** - Can view full knowledge graph - Can explore from any entry - Visual indicators for status/orphans ### Phase 5: Polish (Week 5-6) **Goal:** Production-ready - [ ] Version history UI - [ ] Diff view between versions - [ ] Import from markdown files - [ ] Export functionality - [ ] Performance optimization - [ ] Caching implementation - [ ] Documentation **Deliverables:** - Version history works - Can import existing docs - Performance is acceptable - Module is documented ## Integration Points ### Agent Access The Knowledge module should be accessible to agents via API: ```typescript // Agent tool for knowledge access interface KnowledgeTools { // Search searchKnowledge(query: string): Promise; semanticSearch(query: string): Promise; // CRUD getEntry(slug: string): Promise; createEntry(data: CreateEntryInput): Promise; updateEntry(slug: string, data: UpdateEntryInput): Promise; // Graph getRelatedEntries(slug: string): Promise; getBacklinks(slug: string): Promise; } ``` ### Clawdbot Integration For Clawdbot specifically, the Knowledge module could: 1. Sync with `memory/*.md` files 2. Provide semantic search for `memory_search` tool 3. Generate embeddings for memory entries 4. Visualize agent memory as a knowledge graph ## Success Metrics | Metric | Target | Measurement | |--------|--------|-------------| | Entry creation time | < 200ms | API response time | | Search latency (full-text) | < 100ms | p95 response time | | Search latency (semantic) | < 300ms | p95 response time | | Graph render (100 nodes) | < 200ms | Client-side time | | Graph render (1000 nodes) | < 1s | Client-side time | | Adoption | 50+ entries/workspace | After 1 month | | Link density | > 2 links/entry avg | Graph statistics | ## Open Questions 1. **Embedding model** — Use OpenAI embeddings or self-hosted? (Cost vs privacy) 2. **Real-time collab** — Do we need multiplayer editing? (CRDT complexity) 3. **Permissions** — Entry-level permissions or workspace-level only? 4. **Templates** — Support entry templates (ADR, design doc, etc.)? 5. **Attachments** — Allow images/files in entries? ## References - [Obsidian](https://obsidian.md/) — Wiki-link syntax inspiration - [Roam Research](https://roamresearch.com/) — Block-level linking - [pgvector](https://github.com/pgvector/pgvector) — PostgreSQL vector extension - [Mosaic Agent Orchestration](./agent-orchestration.md) — Related design