Files
stack/docs/design/knowledge-module.md
Jason Woltje 91399f597f docs(design): add Knowledge Module design and implementation plan
- Full design document with architecture, data model, API specs
- 28 implementation issues across 5 phases (~127h total)
- Wiki-link syntax, semantic search, graph visualization
- Integration points for agent access

Ref: memory/2025-01-29-agent-orchestration.md
2026-01-29 15:38:50 -06:00

759 lines
27 KiB
Markdown

# Knowledge Module - Design Document
> **Status:** Draft
> **Author:** Agent (Jarvis)
> **Created:** 2025-01-29
> **Related:** [Agent Orchestration](./agent-orchestration.md)
## Problem Statement
Development teams and AI agents working on complex projects need a way to:
1. **Capture decisions** — Why was X chosen over Y?
2. **Track connections** — How does component A relate to concept B?
3. **Search contextually** — Find relevant context without knowing exact keywords
4. **Evolve understanding** — Knowledge changes; track that evolution
5. **Share across boundaries** — Human and agent access to the same knowledge base
### Current Pain Points
- **Scattered documentation** — README, comments, Slack threads, memory files
- **No explicit linking** — Connections exist but aren't captured
- **Agent amnesia** — Each session starts fresh, relies on file search
- **No decision archaeology** — Hard to find *why* something was decided
- **Human/agent mismatch** — Humans browse, agents grep
## Requirements
### Functional Requirements
| ID | Requirement | Priority |
|----|-------------|----------|
| FR1 | Create, read, update, delete knowledge entries | P0 |
| FR2 | Wiki-style linking between entries (`[[link]]` syntax) | P0 |
| FR3 | Tagging and categorization | P0 |
| FR4 | Full-text search | P0 |
| FR5 | Semantic/vector search for agents | P1 |
| FR6 | Graph visualization of connections | P1 |
| FR7 | Version history and diff view | P1 |
| FR8 | Timeline view of changes | P2 |
| FR9 | Import from markdown files | P2 |
| FR10 | Export to markdown/PDF | P2 |
### Non-Functional Requirements
| ID | Requirement | Target |
|----|-------------|--------|
| NFR1 | Search response time | < 200ms |
| NFR2 | Entry render time | < 100ms |
| NFR3 | Graph render (< 1000 nodes) | < 500ms |
| NFR4 | Multi-tenant isolation | Complete |
| NFR5 | API-first design | All features via API |
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ Mosaic Web UI │
├─────────────────────────────────────────────────────────────────┤
│ Knowledge Browser │ Graph View │ Search │ Timeline │
└─────────┬───────────┴──────┬───────┴────┬─────┴────┬────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Knowledge API (NestJS) │
├─────────────────────────────────────────────────────────────────┤
│ EntryController │ SearchController │ GraphController │
│ TagController │ LinkController │ VersionController │
└─────────┬─────────┴─────────┬──────────┴──────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ PostgreSQL │ │ Valkey │ │ Vector Store │
│ │ │ │ │ (pgvector) │
│ - entries │ │ - search cache │ │ │
│ - entry_versions │ │ - graph cache │ │ - embeddings │
│ - entry_links │ │ - hot entries │ │ - semantic index │
│ - tags │ │ │ │ │
└──────────────────┘ └──────────────────┘ └──────────────────────┘
```
## Data Model
### Core Entities
```prisma
// Entry - A single knowledge entry (document/page)
model KnowledgeEntry {
id String @id @default(cuid())
workspaceId String
workspace Workspace @relation(fields: [workspaceId], references: [id])
slug String // URL-friendly identifier
title String
content String @db.Text // Markdown content
contentHtml String? @db.Text // Rendered HTML (cached)
summary String? // Auto-generated or manual summary
status EntryStatus @default(DRAFT)
visibility Visibility @default(PRIVATE)
// Metadata
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
createdBy String
updatedBy String
// Relations
tags KnowledgeEntryTag[]
outgoingLinks KnowledgeLink[] @relation("SourceEntry")
incomingLinks KnowledgeLink[] @relation("TargetEntry")
versions KnowledgeEntryVersion[]
embedding KnowledgeEmbedding?
@@unique([workspaceId, slug])
@@index([workspaceId, status])
@@index([workspaceId, updatedAt])
}
enum EntryStatus {
DRAFT
PUBLISHED
ARCHIVED
}
enum Visibility {
PRIVATE // Only creator
WORKSPACE // All workspace members
PUBLIC // Anyone with link
}
// Version history
model KnowledgeEntryVersion {
id String @id @default(cuid())
entryId String
entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)
version Int
title String
content String @db.Text
summary String?
createdAt DateTime @default(now())
createdBy String
changeNote String? // Optional commit message
@@unique([entryId, version])
@@index([entryId, version])
}
// Wiki-style links between entries
model KnowledgeLink {
id String @id @default(cuid())
sourceId String
source KnowledgeEntry @relation("SourceEntry", fields: [sourceId], references: [id], onDelete: Cascade)
targetId String
target KnowledgeEntry @relation("TargetEntry", fields: [targetId], references: [id], onDelete: Cascade)
// Link metadata
linkText String // The text used in [[link|display text]]
context String? // Surrounding text for context
createdAt DateTime @default(now())
@@unique([sourceId, targetId])
@@index([sourceId])
@@index([targetId])
}
// Tags for categorization
model KnowledgeTag {
id String @id @default(cuid())
workspaceId String
workspace Workspace @relation(fields: [workspaceId], references: [id])
name String
slug String
color String? // Hex color for UI
description String?
entries KnowledgeEntryTag[]
@@unique([workspaceId, slug])
}
model KnowledgeEntryTag {
entryId String
entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)
tagId String
tag KnowledgeTag @relation(fields: [tagId], references: [id], onDelete: Cascade)
@@id([entryId, tagId])
}
// Vector embeddings for semantic search
model KnowledgeEmbedding {
id String @id @default(cuid())
entryId String @unique
entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)
embedding Unsupported("vector(1536)") // OpenAI ada-002 dimension
model String // Which model generated this
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([embedding], type: Hnsw(ops: VectorCosineOps))
}
```
### Frontmatter Schema
Entries support YAML frontmatter for structured metadata:
```yaml
---
title: Agent Orchestration Design
status: published
tags: [architecture, agents, orchestration]
created: 2025-01-29
updated: 2025-01-29
author: jarvis
related:
- "[[task-queues]]"
- "[[valkey-patterns]]"
decision:
status: accepted
date: 2025-01-29
participants: [jason, jarvis]
supersedes: null
---
```
## API Endpoints
### Entry Management
```
POST /api/knowledge/entries Create entry
GET /api/knowledge/entries List entries (paginated)
GET /api/knowledge/entries/:slug Get entry by slug
PUT /api/knowledge/entries/:slug Update entry
DELETE /api/knowledge/entries/:slug Delete entry (soft delete → archive)
GET /api/knowledge/entries/:slug/versions List versions
GET /api/knowledge/entries/:slug/versions/:v Get specific version
POST /api/knowledge/entries/:slug/restore/:v Restore to version
```
### Search
```
GET /api/knowledge/search?q=... Full-text search
POST /api/knowledge/search/semantic Semantic search (vector)
GET /api/knowledge/search/suggestions Autocomplete suggestions
```
### Graph
```
GET /api/knowledge/graph Full graph (nodes + edges)
GET /api/knowledge/graph/:slug Subgraph centered on entry
GET /api/knowledge/graph/stats Graph statistics
```
### Tags
```
GET /api/knowledge/tags List all tags
POST /api/knowledge/tags Create tag
PUT /api/knowledge/tags/:slug Update tag
DELETE /api/knowledge/tags/:slug Delete tag
GET /api/knowledge/tags/:slug/entries Entries with tag
```
### Links
```
GET /api/knowledge/entries/:slug/links/outgoing Outgoing links
GET /api/knowledge/entries/:slug/links/incoming Incoming links (backlinks)
GET /api/knowledge/entries/:slug/links/broken Broken links
POST /api/knowledge/links/resolve Resolve [[link]] to entry
```
## Link Processing
### Wiki-Link Syntax
The module supports Obsidian-compatible wiki-link syntax:
```markdown
Basic link: [[entry-slug]]
Display text: [[entry-slug|Custom Display Text]]
Header link: [[entry-slug#section-header]]
Block link: [[entry-slug#^block-id]]
```
### Link Resolution Flow
```
┌─────────────────┐
│ Entry Content │
│ "See [[design]] │
│ for details" │
└────────┬────────┘
│ Parse
┌─────────────────┐
│ Extract Links │
│ [[design]] │
└────────┬────────┘
│ Resolve
┌─────────────────┐
│ Find Target │
│ slug: "design" │
│ OR title match │
│ OR fuzzy match │
└────────┬────────┘
┌────┴────┐
▼ ▼
┌───────┐ ┌───────────┐
│ Found │ │ Not Found │
│ │ │ (broken) │
└───┬───┘ └─────┬─────┘
│ │
▼ ▼
┌───────────────────────┐
│ Create/Update Link │
│ Record in entry_links │
│ Mark broken if needed │
└───────────────────────┘
```
### Automatic Link Detection
On entry save:
1. Parse content for `[[...]]` patterns
2. Resolve each link to target entry
3. Update `KnowledgeLink` records
4. Flag broken links for UI warning
## Search Implementation
### Full-Text Search (PostgreSQL)
```sql
-- Create search index
ALTER TABLE knowledge_entries
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
setweight(to_tsvector('english', coalesce(content, '')), 'C')
) STORED;
CREATE INDEX idx_knowledge_search ON knowledge_entries USING GIN(search_vector);
-- Search query
SELECT id, slug, title,
ts_rank(search_vector, query) as rank,
ts_headline('english', content, query) as snippet
FROM knowledge_entries, plainto_tsquery('english', $1) query
WHERE search_vector @@ query
AND workspace_id = $2
ORDER BY rank DESC
LIMIT 20;
```
### Semantic Search (pgvector)
```sql
-- Semantic search query
SELECT e.id, e.slug, e.title, e.summary,
1 - (emb.embedding <=> $1::vector) as similarity
FROM knowledge_entries e
JOIN knowledge_embeddings emb ON e.id = emb.entry_id
WHERE e.workspace_id = $2
AND 1 - (emb.embedding <=> $1::vector) > 0.7 -- similarity threshold
ORDER BY emb.embedding <=> $1::vector
LIMIT 10;
```
### Embedding Generation
```typescript
async function generateEmbedding(entry: KnowledgeEntry): Promise<number[]> {
const text = `${entry.title}\n\n${entry.summary || ''}\n\n${entry.content}`;
// Use OpenAI or local model
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: text.slice(0, 8000), // Token limit
});
return response.data[0].embedding;
}
```
## Graph Visualization
### Data Structure
```typescript
interface KnowledgeGraph {
nodes: GraphNode[];
edges: GraphEdge[];
stats: GraphStats;
}
interface GraphNode {
id: string;
slug: string;
title: string;
type: 'entry' | 'tag' | 'external';
status: EntryStatus;
linkCount: number; // in + out
tags: string[];
updatedAt: string;
}
interface GraphEdge {
id: string;
source: string; // node id
target: string; // node id
type: 'link' | 'tag';
label?: string;
}
interface GraphStats {
nodeCount: number;
edgeCount: number;
orphanCount: number; // entries with no links
brokenLinkCount: number;
avgConnections: number;
}
```
### Graph Query
```sql
-- Get full graph for workspace
WITH nodes AS (
SELECT
id, slug, title, 'entry' as type, status,
(SELECT COUNT(*) FROM knowledge_links WHERE source_id = e.id OR target_id = e.id) as link_count,
updated_at
FROM knowledge_entries e
WHERE workspace_id = $1 AND status != 'ARCHIVED'
),
edges AS (
SELECT
l.id, l.source_id as source, l.target_id as target, 'link' as type, l.link_text as label
FROM knowledge_links l
JOIN knowledge_entries e ON l.source_id = e.id
WHERE e.workspace_id = $1
)
SELECT
json_build_object(
'nodes', (SELECT json_agg(nodes) FROM nodes),
'edges', (SELECT json_agg(edges) FROM edges)
) as graph;
```
### Frontend Rendering
Use D3.js force-directed graph or Cytoscape.js:
```typescript
// Graph component configuration
const graphConfig = {
layout: 'force-directed',
physics: {
repulsion: 100,
springLength: 150,
springStrength: 0.05,
},
nodeSize: (node) => Math.sqrt(node.linkCount) * 10 + 20,
nodeColor: (node) => {
switch (node.status) {
case 'PUBLISHED': return '#22c55e';
case 'DRAFT': return '#f59e0b';
case 'ARCHIVED': return '#6b7280';
}
},
edgeStyle: {
color: '#94a3b8',
width: 1,
arrows: 'to',
},
};
```
## Caching Strategy
### Valkey Key Patterns
```
knowledge:{workspaceId}:entry:{slug} Entry cache (JSON)
knowledge:{workspaceId}:entry:{slug}:html Rendered HTML cache
knowledge:{workspaceId}:graph Full graph cache
knowledge:{workspaceId}:graph:{slug} Subgraph cache
knowledge:{workspaceId}:search:{hash} Search result cache
knowledge:{workspaceId}:tags Tag list cache
knowledge:{workspaceId}:recent Recent entries list
```
### Cache Invalidation
```typescript
async function invalidateEntryCache(workspaceId: string, slug: string) {
const keys = [
`knowledge:${workspaceId}:entry:${slug}`,
`knowledge:${workspaceId}:entry:${slug}:html`,
`knowledge:${workspaceId}:graph`, // Full graph affected
`knowledge:${workspaceId}:graph:${slug}`,
`knowledge:${workspaceId}:recent`,
];
// Also invalidate subgraphs for linked entries
const linkedSlugs = await getLinkedEntrySlugs(workspaceId, slug);
for (const linked of linkedSlugs) {
keys.push(`knowledge:${workspaceId}:graph:${linked}`);
}
await valkey.del(...keys);
// Invalidate search caches (pattern delete)
const searchKeys = await valkey.keys(`knowledge:${workspaceId}:search:*`);
if (searchKeys.length) await valkey.del(...searchKeys);
}
```
## UI Components
### Entry Editor
```
┌────────────────────────────────────────────────────────────────┐
│ [📄] Agent Orchestration Design [Save] [···]│
├────────────────────────────────────────────────────────────────┤
│ Status: [Published ▼] Tags: [architecture] [agents] [+] │
├────────────────────────────────────────────────────────────────┤
│ │
│ # Problem Statement │
│ │
│ Development teams and AI agents working on complex projects │
│ need a way to [[capture-decisions|capture decisions]]... │
│ │
│ See also: [[task-queues]] and [[valkey-patterns]] │
│ │
│ ─────────────────────────────────────────────────────────────│
│ Backlinks (3): │
│ • [[mosaic-roadmap]] - "...implements agent orchestration..." │
│ • [[design-index]] - "Core designs: [[agent-orchestration]]" │
│ • [[jarvis-memory]] - "Created orchestration design..." │
│ │
└────────────────────────────────────────────────────────────────┘
```
### Graph View
```
┌────────────────────────────────────────────────────────────────┐
│ Knowledge Graph [Filter ▼] [Layout ▼] │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ │
│ │ valkey │ │
│ │patterns │ │
│ └────┬────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌────────┐ ┌────────┐ │
│ │cache │ │ task │ │ agent │◄─────┐ │
│ │layer │ │ queues │ │ orch │ │ │
│ └──────┘ └────────┘ └───┬────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌────────┐ ┌────┴───┐ │
│ │recovery│ │ mosaic │ │
│ │patterns│ │roadmap │ │
│ └────────┘ └────────┘ │
│ │
│ 🟢 Published (6) 🟡 Draft (2) ⚪ Orphan (0) │
└────────────────────────────────────────────────────────────────┘
```
### Search Results
```
┌────────────────────────────────────────────────────────────────┐
│ 🔍 [agent recovery ] [Search]│
├────────────────────────────────────────────────────────────────┤
│ │
│ 📄 Agent Orchestration - Recovery Patterns │
│ ...automatic **recovery** when an **agent** fails or the... │
│ Tags: architecture, agents • Updated 2 hours ago │
│ │
│ 📄 Agent Health Monitoring │
│ ...heartbeat monitoring enables **recovery** of stale... │
│ Tags: agents, monitoring • Updated 1 day ago │
│ │
│ 📄 Task Queue Design │
│ ...retry logic with exponential backoff for **agent**... │
│ Tags: architecture, queues • Updated 3 days ago │
│ │
│ ───────────────────────────────────────────────────────────── │
│ Also try: Semantic search for conceptually related entries │
│ │
└────────────────────────────────────────────────────────────────┘
```
## Implementation Phases
### Phase 1: Foundation (Week 1-2)
**Goal:** Basic CRUD + storage working
- [ ] Database schema + migrations
- [ ] Entry CRUD API endpoints
- [ ] Basic markdown rendering
- [ ] Tag management
- [ ] Entry list/detail pages
**Deliverables:**
- Can create, edit, view, delete entries
- Tags work
- Basic search (title/slug match)
### Phase 2: Linking (Week 2-3)
**Goal:** Wiki-link functionality
- [ ] Link parser (`[[...]]` syntax)
- [ ] Link resolution logic
- [ ] Broken link detection
- [ ] Backlinks display
- [ ] Link autocomplete in editor
**Deliverables:**
- Links between entries work
- Backlinks show on entry pages
- Editor suggests links as you type
### Phase 3: Search (Week 3-4)
**Goal:** Full-text + semantic search
- [ ] PostgreSQL full-text search setup
- [ ] Search API endpoint
- [ ] Search UI with highlighting
- [ ] pgvector extension setup
- [ ] Embedding generation pipeline
- [ ] Semantic search API
**Deliverables:**
- Fast full-text search
- Semantic search for "fuzzy" queries
- Search results with snippets
### Phase 4: Graph (Week 4-5)
**Goal:** Visual knowledge graph
- [ ] Graph data API
- [ ] D3.js/Cytoscape integration
- [ ] Interactive graph view
- [ ] Subgraph (entry-centered) view
- [ ] Graph statistics
**Deliverables:**
- Can view full knowledge graph
- Can explore from any entry
- Visual indicators for status/orphans
### Phase 5: Polish (Week 5-6)
**Goal:** Production-ready
- [ ] Version history UI
- [ ] Diff view between versions
- [ ] Import from markdown files
- [ ] Export functionality
- [ ] Performance optimization
- [ ] Caching implementation
- [ ] Documentation
**Deliverables:**
- Version history works
- Can import existing docs
- Performance is acceptable
- Module is documented
## Integration Points
### Agent Access
The Knowledge module should be accessible to agents via API:
```typescript
// Agent tool for knowledge access
interface KnowledgeTools {
// Search
searchKnowledge(query: string): Promise<SearchResult[]>;
semanticSearch(query: string): Promise<SearchResult[]>;
// CRUD
getEntry(slug: string): Promise<KnowledgeEntry>;
createEntry(data: CreateEntryInput): Promise<KnowledgeEntry>;
updateEntry(slug: string, data: UpdateEntryInput): Promise<KnowledgeEntry>;
// Graph
getRelatedEntries(slug: string): Promise<KnowledgeEntry[]>;
getBacklinks(slug: string): Promise<KnowledgeEntry[]>;
}
```
### Clawdbot Integration
For Clawdbot specifically, the Knowledge module could:
1. Sync with `memory/*.md` files
2. Provide semantic search for `memory_search` tool
3. Generate embeddings for memory entries
4. Visualize agent memory as a knowledge graph
## Success Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Entry creation time | < 200ms | API response time |
| Search latency (full-text) | < 100ms | p95 response time |
| Search latency (semantic) | < 300ms | p95 response time |
| Graph render (100 nodes) | < 200ms | Client-side time |
| Graph render (1000 nodes) | < 1s | Client-side time |
| Adoption | 50+ entries/workspace | After 1 month |
| Link density | > 2 links/entry avg | Graph statistics |
## Open Questions
1. **Embedding model** — Use OpenAI embeddings or self-hosted? (Cost vs privacy)
2. **Real-time collab** — Do we need multiplayer editing? (CRDT complexity)
3. **Permissions** — Entry-level permissions or workspace-level only?
4. **Templates** — Support entry templates (ADR, design doc, etc.)?
5. **Attachments** — Allow images/files in entries?
## References
- [Obsidian](https://obsidian.md/) — Wiki-link syntax inspiration
- [Roam Research](https://roamresearch.com/) — Block-level linking
- [pgvector](https://github.com/pgvector/pgvector) — PostgreSQL vector extension
- [Mosaic Agent Orchestration](./agent-orchestration.md) — Related design