stack/docs/design/knowledge-module.md

# Knowledge Module - Design Document

> **Status:** Draft
> **Author:** Agent (Jarvis)
> **Created:** 2025-01-29
> **Related:** [Agent Orchestration](./agent-orchestration.md)

## Problem Statement

Development teams and AI agents working on complex projects need a way to:

1. **Capture decisions** — Why was X chosen over Y?
2. **Track connections** — How does component A relate to concept B?
3. **Search contextually** — Find relevant context without knowing exact keywords
4. **Evolve understanding** — Knowledge changes; track that evolution
5. **Share across boundaries** — Human and agent access to the same knowledge base

### Current Pain Points

- **Scattered documentation** — README, comments, Slack threads, memory files
- **No explicit linking** — Connections exist but aren't captured
- **Agent amnesia** — Each session starts fresh, relies on file search
- **No decision archaeology** — Hard to find *why* something was decided
- **Human/agent mismatch** — Humans browse, agents grep

## Requirements

### Functional Requirements

| ID | Requirement | Priority |
|----|-------------|----------|
| FR1 | Create, read, update, delete knowledge entries | P0 |
| FR2 | Wiki-style linking between entries (`[[link]]` syntax) | P0 |
| FR3 | Tagging and categorization | P0 |
| FR4 | Full-text search | P0 |
| FR5 | Semantic/vector search for agents | P1 |
| FR6 | Graph visualization of connections | P1 |
| FR7 | Version history and diff view | P1 |
| FR8 | Timeline view of changes | P2 |
| FR9 | Import from markdown files | P2 |
| FR10 | Export to markdown/PDF | P2 |

### Non-Functional Requirements

| ID | Requirement | Target |
|----|-------------|--------|
| NFR1 | Search response time | < 200ms |
| NFR2 | Entry render time | < 100ms |
| NFR3 | Graph render (< 1000 nodes) | < 500ms |
| NFR4 | Multi-tenant isolation | Complete |
| NFR5 | API-first design | All features via API |

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                        Mosaic Web UI                            │
├─────────────────────────────────────────────────────────────────┤
│  Knowledge Browser  │  Graph View  │  Search  │  Timeline       │
└─────────┬───────────┴──────┬───────┴────┬─────┴────┬────────────┘
          │                  │            │          │
          ▼                  ▼            ▼          ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Knowledge API (NestJS)                     │
├─────────────────────────────────────────────────────────────────┤
│  EntryController  │  SearchController  │  GraphController       │
│  TagController    │  LinkController    │  VersionController     │
└─────────┬─────────┴─────────┬──────────┴──────────┬─────────────┘
          │                   │                     │
          ▼                   ▼                     ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│   PostgreSQL     │ │     Valkey       │ │   Vector Store       │
│                  │ │                  │ │   (pgvector)         │
│ - entries        │ │ - search cache   │ │                      │
│ - entry_versions │ │ - graph cache    │ │ - embeddings         │
│ - entry_links    │ │ - hot entries    │ │ - semantic index     │
│ - tags           │ │                  │ │                      │
└──────────────────┘ └──────────────────┘ └──────────────────────┘
```

## Data Model

### Core Entities

```prisma
// Entry - A single knowledge entry (document/page)
model KnowledgeEntry {
  id            String    @id @default(cuid())
  workspaceId   String
  workspace     Workspace @relation(fields: [workspaceId], references: [id])

  slug          String    // URL-friendly identifier
  title         String
  content       String    @db.Text  // Markdown content
  contentHtml   String?   @db.Text  // Rendered HTML (cached)
  summary       String?   // Auto-generated or manual summary

  status        EntryStatus @default(DRAFT)
  visibility    Visibility  @default(PRIVATE)

  // Metadata
  createdAt     DateTime  @default(now())
  updatedAt     DateTime  @updatedAt
  createdBy     String
  updatedBy     String

  // Relations
  tags          KnowledgeEntryTag[]
  outgoingLinks KnowledgeLink[] @relation("SourceEntry")
  incomingLinks KnowledgeLink[] @relation("TargetEntry")
  versions      KnowledgeEntryVersion[]
  embedding     KnowledgeEmbedding?

  @@unique([workspaceId, slug])
  @@index([workspaceId, status])
  @@index([workspaceId, updatedAt])
}

enum EntryStatus {
  DRAFT
  PUBLISHED
  ARCHIVED
}

enum Visibility {
  PRIVATE     // Only creator
  WORKSPACE   // All workspace members
  PUBLIC      // Anyone with link
}

// Version history
model KnowledgeEntryVersion {
  id          String   @id @default(cuid())
  entryId     String
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  version     Int
  title       String
  content     String   @db.Text
  summary     String?

  createdAt   DateTime @default(now())
  createdBy   String
  changeNote  String?  // Optional commit message

  @@unique([entryId, version])
  @@index([entryId, version])
}

// Wiki-style links between entries
model KnowledgeLink {
  id          String   @id @default(cuid())

  sourceId    String
  source      KnowledgeEntry @relation("SourceEntry", fields: [sourceId], references: [id], onDelete: Cascade)

  targetId    String
  target      KnowledgeEntry @relation("TargetEntry", fields: [targetId], references: [id], onDelete: Cascade)

  // Link metadata
  linkText    String   // The text used in [[link|display text]]
  context     String?  // Surrounding text for context

  createdAt   DateTime @default(now())

  @@unique([sourceId, targetId])
  @@index([sourceId])
  @@index([targetId])
}

// Tags for categorization
model KnowledgeTag {
  id          String   @id @default(cuid())
  workspaceId String
  workspace   Workspace @relation(fields: [workspaceId], references: [id])

  name        String
  slug        String
  color       String?  // Hex color for UI
  description String?

  entries     KnowledgeEntryTag[]

  @@unique([workspaceId, slug])
}

model KnowledgeEntryTag {
  entryId     String
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  tagId       String
  tag         KnowledgeTag @relation(fields: [tagId], references: [id], onDelete: Cascade)

  @@id([entryId, tagId])
}

// Vector embeddings for semantic search
model KnowledgeEmbedding {
  id          String   @id @default(cuid())
  entryId     String   @unique
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  embedding   Unsupported("vector(1536)")  // OpenAI ada-002 dimension
  model       String   // Which model generated this

  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  @@index([embedding], type: Hnsw(ops: VectorCosineOps))
}
```

### Frontmatter Schema

Entries support YAML frontmatter for structured metadata:

```yaml
---
title: Agent Orchestration Design
status: published
tags: [architecture, agents, orchestration]
created: 2025-01-29
updated: 2025-01-29
author: jarvis
related:
  - "[[task-queues]]"
  - "[[valkey-patterns]]"
decision:
  status: accepted
  date: 2025-01-29
  participants: [jason, jarvis]
  supersedes: null
---
```

## API Endpoints

### Entry Management

```
POST   /api/knowledge/entries              Create entry
GET    /api/knowledge/entries              List entries (paginated)
GET    /api/knowledge/entries/:slug        Get entry by slug
PUT    /api/knowledge/entries/:slug        Update entry
DELETE /api/knowledge/entries/:slug        Delete entry (soft delete → archive)

GET    /api/knowledge/entries/:slug/versions     List versions
GET    /api/knowledge/entries/:slug/versions/:v  Get specific version
POST   /api/knowledge/entries/:slug/restore/:v   Restore to version
```

### Search

```
GET    /api/knowledge/search?q=...         Full-text search
POST   /api/knowledge/search/semantic      Semantic search (vector)
GET    /api/knowledge/search/suggestions   Autocomplete suggestions
```

### Graph

```
GET    /api/knowledge/graph                Full graph (nodes + edges)
GET    /api/knowledge/graph/:slug          Subgraph centered on entry
GET    /api/knowledge/graph/stats          Graph statistics
```

### Tags

```
GET    /api/knowledge/tags                 List all tags
POST   /api/knowledge/tags                 Create tag
PUT    /api/knowledge/tags/:slug           Update tag
DELETE /api/knowledge/tags/:slug           Delete tag
GET    /api/knowledge/tags/:slug/entries   Entries with tag
```

### Links

```
GET    /api/knowledge/entries/:slug/links/outgoing  Outgoing links
GET    /api/knowledge/entries/:slug/links/incoming  Incoming links (backlinks)
GET    /api/knowledge/entries/:slug/links/broken    Broken links
POST   /api/knowledge/links/resolve                 Resolve [[link]] to entry
```

## Link Processing

### Wiki-Link Syntax

The module supports Obsidian-compatible wiki-link syntax:

```markdown
Basic link: [[entry-slug]]
Display text: [[entry-slug|Custom Display Text]]
Header link: [[entry-slug#section-header]]
Block link: [[entry-slug#^block-id]]
```

### Link Resolution Flow

```
┌─────────────────┐
│ Entry Content   │
│ "See [[design]] │
│  for details"   │
└────────┬────────┘
         │ Parse
         ▼
┌─────────────────┐
│ Extract Links   │
│ [[design]]      │
└────────┬────────┘
         │ Resolve
         ▼
┌─────────────────┐
│ Find Target     │
│ slug: "design"  │
│ OR title match  │
│ OR fuzzy match  │
└────────┬────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌───────┐ ┌───────────┐
│ Found │ │ Not Found │
│       │ │ (broken)  │
└───┬───┘ └─────┬─────┘
    │           │
    ▼           ▼
┌───────────────────────┐
│ Create/Update Link    │
│ Record in entry_links │
│ Mark broken if needed │
└───────────────────────┘
```

### Automatic Link Detection

On entry save:
1. Parse content for `[[...]]` patterns
2. Resolve each link to target entry
3. Update `KnowledgeLink` records
4. Flag broken links for UI warning

## Search Implementation

### Full-Text Search (PostgreSQL)

```sql
-- Create search index
ALTER TABLE knowledge_entries
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
  setweight(to_tsvector('english', coalesce(content, '')), 'C')
) STORED;

CREATE INDEX idx_knowledge_search ON knowledge_entries USING GIN(search_vector);

-- Search query
SELECT id, slug, title,
       ts_rank(search_vector, query) as rank,
       ts_headline('english', content, query) as snippet
FROM knowledge_entries, plainto_tsquery('english', $1) query
WHERE search_vector @@ query
  AND workspace_id = $2
ORDER BY rank DESC
LIMIT 20;
```

### Semantic Search (pgvector)

```sql
-- Semantic search query
SELECT e.id, e.slug, e.title, e.summary,
       1 - (emb.embedding <=> $1::vector) as similarity
FROM knowledge_entries e
JOIN knowledge_embeddings emb ON e.id = emb.entry_id
WHERE e.workspace_id = $2
  AND 1 - (emb.embedding <=> $1::vector) > 0.7  -- similarity threshold
ORDER BY emb.embedding <=> $1::vector
LIMIT 10;
```

### Embedding Generation

```typescript
async function generateEmbedding(entry: KnowledgeEntry): Promise<number[]> {
  const text = `${entry.title}\n\n${entry.summary || ''}\n\n${entry.content}`;

  // Use OpenAI or local model
  const response = await openai.embeddings.create({
    model: 'text-embedding-ada-002',
    input: text.slice(0, 8000), // Token limit
  });

  return response.data[0].embedding;
}
```

## Graph Visualization

### Data Structure

```typescript
interface KnowledgeGraph {
  nodes: GraphNode[];
  edges: GraphEdge[];
  stats: GraphStats;
}

interface GraphNode {
  id: string;
  slug: string;
  title: string;
  type: 'entry' | 'tag' | 'external';
  status: EntryStatus;
  linkCount: number;      // in + out
  tags: string[];
  updatedAt: string;
}

interface GraphEdge {
  id: string;
  source: string;         // node id
  target: string;         // node id
  type: 'link' | 'tag';
  label?: string;
}

interface GraphStats {
  nodeCount: number;
  edgeCount: number;
  orphanCount: number;    // entries with no links
  brokenLinkCount: number;
  avgConnections: number;
}
```

### Graph Query

```sql
-- Get full graph for workspace
WITH nodes AS (
  SELECT
    id, slug, title, 'entry' as type, status,
    (SELECT COUNT(*) FROM knowledge_links WHERE source_id = e.id OR target_id = e.id) as link_count,
    updated_at
  FROM knowledge_entries e
  WHERE workspace_id = $1 AND status != 'ARCHIVED'
),
edges AS (
  SELECT
    l.id, l.source_id as source, l.target_id as target, 'link' as type, l.link_text as label
  FROM knowledge_links l
  JOIN knowledge_entries e ON l.source_id = e.id
  WHERE e.workspace_id = $1
)
SELECT
  json_build_object(
    'nodes', (SELECT json_agg(nodes) FROM nodes),
    'edges', (SELECT json_agg(edges) FROM edges)
  ) as graph;
```

### Frontend Rendering

Use D3.js force-directed graph or Cytoscape.js:

```typescript
// Graph component configuration
const graphConfig = {
  layout: 'force-directed',
  physics: {
    repulsion: 100,
    springLength: 150,
    springStrength: 0.05,
  },
  nodeSize: (node) => Math.sqrt(node.linkCount) * 10 + 20,
  nodeColor: (node) => {
    switch (node.status) {
      case 'PUBLISHED': return '#22c55e';
      case 'DRAFT': return '#f59e0b';
      case 'ARCHIVED': return '#6b7280';
    }
  },
  edgeStyle: {
    color: '#94a3b8',
    width: 1,
    arrows: 'to',
  },
};
```

## Caching Strategy

### Valkey Key Patterns

```
knowledge:{workspaceId}:entry:{slug}          Entry cache (JSON)
knowledge:{workspaceId}:entry:{slug}:html     Rendered HTML cache
knowledge:{workspaceId}:graph                 Full graph cache
knowledge:{workspaceId}:graph:{slug}          Subgraph cache
knowledge:{workspaceId}:search:{hash}         Search result cache
knowledge:{workspaceId}:tags                  Tag list cache
knowledge:{workspaceId}:recent                Recent entries list
```

### Cache Invalidation

```typescript
async function invalidateEntryCache(workspaceId: string, slug: string) {
  const keys = [
    `knowledge:${workspaceId}:entry:${slug}`,
    `knowledge:${workspaceId}:entry:${slug}:html`,
    `knowledge:${workspaceId}:graph`,  // Full graph affected
    `knowledge:${workspaceId}:graph:${slug}`,
    `knowledge:${workspaceId}:recent`,
  ];

  // Also invalidate subgraphs for linked entries
  const linkedSlugs = await getLinkedEntrySlugs(workspaceId, slug);
  for (const linked of linkedSlugs) {
    keys.push(`knowledge:${workspaceId}:graph:${linked}`);
  }

  await valkey.del(...keys);

  // Invalidate search caches (pattern delete)
  const searchKeys = await valkey.keys(`knowledge:${workspaceId}:search:*`);
  if (searchKeys.length) await valkey.del(...searchKeys);
}
```

## UI Components

### Entry Editor

```
┌────────────────────────────────────────────────────────────────┐
│ [📄] Agent Orchestration Design                    [Save] [···]│
├────────────────────────────────────────────────────────────────┤
│ Status: [Published ▼]  Tags: [architecture] [agents] [+]       │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  # Problem Statement                                           │
│                                                                │
│  Development teams and AI agents working on complex projects   │
│  need a way to [[capture-decisions|capture decisions]]...      │
│                                                                │
│  See also: [[task-queues]] and [[valkey-patterns]]            │
│                                                                │
│  ─────────────────────────────────────────────────────────────│
│  Backlinks (3):                                                │
│  • [[mosaic-roadmap]] - "...implements agent orchestration..." │
│  • [[design-index]] - "Core designs: [[agent-orchestration]]"  │
│  • [[jarvis-memory]] - "Created orchestration design..."       │
│                                                                │
└────────────────────────────────────────────────────────────────┘
```

### Graph View

```
┌────────────────────────────────────────────────────────────────┐
│ Knowledge Graph                          [Filter ▼] [Layout ▼] │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│            ┌─────────┐                                         │
│            │ valkey  │                                         │
│            │patterns │                                         │
│            └────┬────┘                                         │
│                 │                                              │
│    ┌────────────┼────────────┐                                │
│    │            │            │                                │
│    ▼            ▼            ▼                                │
│ ┌──────┐   ┌────────┐   ┌────────┐                           │
│ │cache │   │  task  │   │ agent  │◄─────┐                    │
│ │layer │   │ queues │   │ orch   │      │                    │
│ └──────┘   └────────┘   └───┬────┘      │                    │
│                             │           │                     │
│                             ▼           │                     │
│                        ┌────────┐  ┌────┴───┐                │
│                        │recovery│  │ mosaic │                │
│                        │patterns│  │roadmap │                │
│                        └────────┘  └────────┘                │
│                                                                │
│ 🟢 Published (6)  🟡 Draft (2)  ⚪ Orphan (0)                  │
└────────────────────────────────────────────────────────────────┘
```

### Search Results

```
┌────────────────────────────────────────────────────────────────┐
│ 🔍 [agent recovery                                    ] [Search]│
├────────────────────────────────────────────────────────────────┤
│                                                                │
│ 📄 Agent Orchestration - Recovery Patterns                     │
│    ...automatic **recovery** when an **agent** fails or the... │
│    Tags: architecture, agents  •  Updated 2 hours ago          │
│                                                                │
│ 📄 Agent Health Monitoring                                     │
│    ...heartbeat monitoring enables **recovery** of stale...    │
│    Tags: agents, monitoring  •  Updated 1 day ago              │
│                                                                │
│ 📄 Task Queue Design                                           │
│    ...retry logic with exponential backoff for **agent**...    │
│    Tags: architecture, queues  •  Updated 3 days ago           │
│                                                                │
│ ─────────────────────────────────────────────────────────────  │
│ Also try: Semantic search for conceptually related entries     │
│                                                                │
└────────────────────────────────────────────────────────────────┘
```

## Implementation Phases

### Phase 1: Foundation (Week 1-2)

**Goal:** Basic CRUD + storage working

- [ ] Database schema + migrations
- [ ] Entry CRUD API endpoints
- [ ] Basic markdown rendering
- [ ] Tag management
- [ ] Entry list/detail pages

**Deliverables:**
- Can create, edit, view, delete entries
- Tags work
- Basic search (title/slug match)

### Phase 2: Linking (Week 2-3)

**Goal:** Wiki-link functionality

- [ ] Link parser (`[[...]]` syntax)
- [ ] Link resolution logic
- [ ] Broken link detection
- [ ] Backlinks display
- [ ] Link autocomplete in editor

**Deliverables:**
- Links between entries work
- Backlinks show on entry pages
- Editor suggests links as you type

### Phase 3: Search (Week 3-4)

**Goal:** Full-text + semantic search

- [ ] PostgreSQL full-text search setup
- [ ] Search API endpoint
- [ ] Search UI with highlighting
- [ ] pgvector extension setup
- [ ] Embedding generation pipeline
- [ ] Semantic search API

**Deliverables:**
- Fast full-text search
- Semantic search for "fuzzy" queries
- Search results with snippets

### Phase 4: Graph (Week 4-5)

**Goal:** Visual knowledge graph

- [ ] Graph data API
- [ ] D3.js/Cytoscape integration
- [ ] Interactive graph view
- [ ] Subgraph (entry-centered) view
- [ ] Graph statistics

**Deliverables:**
- Can view full knowledge graph
- Can explore from any entry
- Visual indicators for status/orphans

### Phase 5: Polish (Week 5-6)

**Goal:** Production-ready

- [ ] Version history UI
- [ ] Diff view between versions
- [ ] Import from markdown files
- [ ] Export functionality
- [ ] Performance optimization
- [ ] Caching implementation
- [ ] Documentation

**Deliverables:**
- Version history works
- Can import existing docs
- Performance is acceptable
- Module is documented

## Integration Points

### Agent Access

The Knowledge module should be accessible to agents via API:

```typescript
// Agent tool for knowledge access
interface KnowledgeTools {
  // Search
  searchKnowledge(query: string): Promise<SearchResult[]>;
  semanticSearch(query: string): Promise<SearchResult[]>;

  // CRUD
  getEntry(slug: string): Promise<KnowledgeEntry>;
  createEntry(data: CreateEntryInput): Promise<KnowledgeEntry>;
  updateEntry(slug: string, data: UpdateEntryInput): Promise<KnowledgeEntry>;

  // Graph
  getRelatedEntries(slug: string): Promise<KnowledgeEntry[]>;
  getBacklinks(slug: string): Promise<KnowledgeEntry[]>;
}
```

### Clawdbot Integration

For Clawdbot specifically, the Knowledge module could:

1. Sync with `memory/*.md` files
2. Provide semantic search for `memory_search` tool
3. Generate embeddings for memory entries
4. Visualize agent memory as a knowledge graph

## Success Metrics

| Metric | Target | Measurement |
|--------|--------|-------------|
| Entry creation time | < 200ms | API response time |
| Search latency (full-text) | < 100ms | p95 response time |
| Search latency (semantic) | < 300ms | p95 response time |
| Graph render (100 nodes) | < 200ms | Client-side time |
| Graph render (1000 nodes) | < 1s | Client-side time |
| Adoption | 50+ entries/workspace | After 1 month |
| Link density | > 2 links/entry avg | Graph statistics |

## Open Questions

1. **Embedding model** — Use OpenAI embeddings or self-hosted? (Cost vs privacy)
2. **Real-time collab** — Do we need multiplayer editing? (CRDT complexity)
3. **Permissions** — Entry-level permissions or workspace-level only?
4. **Templates** — Support entry templates (ADR, design doc, etc.)?
5. **Attachments** — Allow images/files in entries?

## References

- [Obsidian](https://obsidian.md/) — Wiki-link syntax inspiration
- [Roam Research](https://roamresearch.com/) — Block-level linking
- [pgvector](https://github.com/pgvector/pgvector) — PostgreSQL vector extension
- [Mosaic Agent Orchestration](./agent-orchestration.md) — Related design