Files
stack/docs/design/knowledge-module.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

27 KiB

Knowledge Module - Design Document

Status: Draft
Author: Agent (Jarvis)
Created: 2025-01-29
Related: Agent Orchestration

Problem Statement

Development teams and AI agents working on complex projects need a way to:

  1. Capture decisions — Why was X chosen over Y?
  2. Track connections — How does component A relate to concept B?
  3. Search contextually — Find relevant context without knowing exact keywords
  4. Evolve understanding — Knowledge changes; track that evolution
  5. Share across boundaries — Human and agent access to the same knowledge base

Current Pain Points

  • Scattered documentation — README, comments, Slack threads, memory files
  • No explicit linking — Connections exist but aren't captured
  • Agent amnesia — Each session starts fresh, relies on file search
  • No decision archaeology — Hard to find why something was decided
  • Human/agent mismatch — Humans browse, agents grep

Requirements

Functional Requirements

ID Requirement Priority
FR1 Create, read, update, delete knowledge entries P0
FR2 Wiki-style linking between entries ([[link]] syntax) P0
FR3 Tagging and categorization P0
FR4 Full-text search P0
FR5 Semantic/vector search for agents P1
FR6 Graph visualization of connections P1
FR7 Version history and diff view P1
FR8 Timeline view of changes P2
FR9 Import from markdown files P2
FR10 Export to markdown/PDF P2

Non-Functional Requirements

ID Requirement Target
NFR1 Search response time < 200ms
NFR2 Entry render time < 100ms
NFR3 Graph render (< 1000 nodes) < 500ms
NFR4 Multi-tenant isolation Complete
NFR5 API-first design All features via API

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Mosaic Web UI                            │
├─────────────────────────────────────────────────────────────────┤
│  Knowledge Browser  │  Graph View  │  Search  │  Timeline       │
└─────────┬───────────┴──────┬───────┴────┬─────┴────┬────────────┘
          │                  │            │          │
          ▼                  ▼            ▼          ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Knowledge API (NestJS)                     │
├─────────────────────────────────────────────────────────────────┤
│  EntryController  │  SearchController  │  GraphController       │
│  TagController    │  LinkController    │  VersionController     │
└─────────┬─────────┴─────────┬──────────┴──────────┬─────────────┘
          │                   │                     │
          ▼                   ▼                     ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│   PostgreSQL     │ │     Valkey       │ │   Vector Store       │
│                  │ │                  │ │   (pgvector)         │
│ - entries        │ │ - search cache   │ │                      │
│ - entry_versions │ │ - graph cache    │ │ - embeddings         │
│ - entry_links    │ │ - hot entries    │ │ - semantic index     │
│ - tags           │ │                  │ │                      │
└──────────────────┘ └──────────────────┘ └──────────────────────┘

Data Model

Core Entities

// Entry - A single knowledge entry (document/page)
model KnowledgeEntry {
  id            String    @id @default(cuid())
  workspaceId   String
  workspace     Workspace @relation(fields: [workspaceId], references: [id])

  slug          String    // URL-friendly identifier
  title         String
  content       String    @db.Text  // Markdown content
  contentHtml   String?   @db.Text  // Rendered HTML (cached)
  summary       String?   // Auto-generated or manual summary

  status        EntryStatus @default(DRAFT)
  visibility    Visibility  @default(PRIVATE)

  // Metadata
  createdAt     DateTime  @default(now())
  updatedAt     DateTime  @updatedAt
  createdBy     String
  updatedBy     String

  // Relations
  tags          KnowledgeEntryTag[]
  outgoingLinks KnowledgeLink[] @relation("SourceEntry")
  incomingLinks KnowledgeLink[] @relation("TargetEntry")
  versions      KnowledgeEntryVersion[]
  embedding     KnowledgeEmbedding?

  @@unique([workspaceId, slug])
  @@index([workspaceId, status])
  @@index([workspaceId, updatedAt])
}

enum EntryStatus {
  DRAFT
  PUBLISHED
  ARCHIVED
}

enum Visibility {
  PRIVATE     // Only creator
  WORKSPACE   // All workspace members
  PUBLIC      // Anyone with link
}

// Version history
model KnowledgeEntryVersion {
  id          String   @id @default(cuid())
  entryId     String
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  version     Int
  title       String
  content     String   @db.Text
  summary     String?

  createdAt   DateTime @default(now())
  createdBy   String
  changeNote  String?  // Optional commit message

  @@unique([entryId, version])
  @@index([entryId, version])
}

// Wiki-style links between entries
model KnowledgeLink {
  id          String   @id @default(cuid())

  sourceId    String
  source      KnowledgeEntry @relation("SourceEntry", fields: [sourceId], references: [id], onDelete: Cascade)

  targetId    String
  target      KnowledgeEntry @relation("TargetEntry", fields: [targetId], references: [id], onDelete: Cascade)

  // Link metadata
  linkText    String   // The text used in [[link|display text]]
  context     String?  // Surrounding text for context

  createdAt   DateTime @default(now())

  @@unique([sourceId, targetId])
  @@index([sourceId])
  @@index([targetId])
}

// Tags for categorization
model KnowledgeTag {
  id          String   @id @default(cuid())
  workspaceId String
  workspace   Workspace @relation(fields: [workspaceId], references: [id])

  name        String
  slug        String
  color       String?  // Hex color for UI
  description String?

  entries     KnowledgeEntryTag[]

  @@unique([workspaceId, slug])
}

model KnowledgeEntryTag {
  entryId     String
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  tagId       String
  tag         KnowledgeTag @relation(fields: [tagId], references: [id], onDelete: Cascade)

  @@id([entryId, tagId])
}

// Vector embeddings for semantic search
model KnowledgeEmbedding {
  id          String   @id @default(cuid())
  entryId     String   @unique
  entry       KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  embedding   Unsupported("vector(1536)")  // OpenAI ada-002 dimension
  model       String   // Which model generated this

  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  @@index([embedding], type: Hnsw(ops: VectorCosineOps))
}

Frontmatter Schema

Entries support YAML frontmatter for structured metadata:

---
title: Agent Orchestration Design
status: published
tags: [architecture, agents, orchestration]
created: 2025-01-29
updated: 2025-01-29
author: jarvis
related:
  - "[[task-queues]]"
  - "[[valkey-patterns]]"
decision:
  status: accepted
  date: 2025-01-29
  participants: [jason, jarvis]
  supersedes: null
---

API Endpoints

Entry Management

POST   /api/knowledge/entries              Create entry
GET    /api/knowledge/entries              List entries (paginated)
GET    /api/knowledge/entries/:slug        Get entry by slug
PUT    /api/knowledge/entries/:slug        Update entry
DELETE /api/knowledge/entries/:slug        Delete entry (soft delete → archive)

GET    /api/knowledge/entries/:slug/versions     List versions
GET    /api/knowledge/entries/:slug/versions/:v  Get specific version
POST   /api/knowledge/entries/:slug/restore/:v   Restore to version
GET    /api/knowledge/search?q=...         Full-text search
POST   /api/knowledge/search/semantic      Semantic search (vector)
GET    /api/knowledge/search/suggestions   Autocomplete suggestions

Graph

GET    /api/knowledge/graph                Full graph (nodes + edges)
GET    /api/knowledge/graph/:slug          Subgraph centered on entry
GET    /api/knowledge/graph/stats          Graph statistics

Tags

GET    /api/knowledge/tags                 List all tags
POST   /api/knowledge/tags                 Create tag
PUT    /api/knowledge/tags/:slug           Update tag
DELETE /api/knowledge/tags/:slug           Delete tag
GET    /api/knowledge/tags/:slug/entries   Entries with tag
GET    /api/knowledge/entries/:slug/links/outgoing  Outgoing links
GET    /api/knowledge/entries/:slug/links/incoming  Incoming links (backlinks)
GET    /api/knowledge/entries/:slug/links/broken    Broken links
POST   /api/knowledge/links/resolve                 Resolve [[link]] to entry

The module supports Obsidian-compatible wiki-link syntax:

Basic link: [[entry-slug]]
Display text: [[entry-slug|Custom Display Text]]
Header link: [[entry-slug#section-header]]
Block link: [[entry-slug#^block-id]]
┌─────────────────┐
│ Entry Content   │
│ "See [[design]] │
│  for details"   │
└────────┬────────┘
         │ Parse
         ▼
┌─────────────────┐
│ Extract Links   │
│ [[design]]      │
└────────┬────────┘
         │ Resolve
         ▼
┌─────────────────┐
│ Find Target     │
│ slug: "design"  │
│ OR title match  │
│ OR fuzzy match  │
└────────┬────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌───────┐ ┌───────────┐
│ Found │ │ Not Found │
│       │ │ (broken)  │
└───┬───┘ └─────┬─────┘
    │           │
    ▼           ▼
┌───────────────────────┐
│ Create/Update Link    │
│ Record in entry_links │
│ Mark broken if needed │
└───────────────────────┘

On entry save:

  1. Parse content for [[...]] patterns
  2. Resolve each link to target entry
  3. Update KnowledgeLink records
  4. Flag broken links for UI warning

Search Implementation

Full-Text Search (PostgreSQL)

-- Create search index
ALTER TABLE knowledge_entries
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
  setweight(to_tsvector('english', coalesce(content, '')), 'C')
) STORED;

CREATE INDEX idx_knowledge_search ON knowledge_entries USING GIN(search_vector);

-- Search query
SELECT id, slug, title,
       ts_rank(search_vector, query) as rank,
       ts_headline('english', content, query) as snippet
FROM knowledge_entries, plainto_tsquery('english', $1) query
WHERE search_vector @@ query
  AND workspace_id = $2
ORDER BY rank DESC
LIMIT 20;

Semantic Search (pgvector)

-- Semantic search query
SELECT e.id, e.slug, e.title, e.summary,
       1 - (emb.embedding <=> $1::vector) as similarity
FROM knowledge_entries e
JOIN knowledge_embeddings emb ON e.id = emb.entry_id
WHERE e.workspace_id = $2
  AND 1 - (emb.embedding <=> $1::vector) > 0.7  -- similarity threshold
ORDER BY emb.embedding <=> $1::vector
LIMIT 10;

Embedding Generation

async function generateEmbedding(entry: KnowledgeEntry): Promise<number[]> {
  const text = `${entry.title}\n\n${entry.summary || ""}\n\n${entry.content}`;

  // Use OpenAI or local model
  const response = await openai.embeddings.create({
    model: "text-embedding-ada-002",
    input: text.slice(0, 8000), // Token limit
  });

  return response.data[0].embedding;
}

Graph Visualization

Data Structure

interface KnowledgeGraph {
  nodes: GraphNode[];
  edges: GraphEdge[];
  stats: GraphStats;
}

interface GraphNode {
  id: string;
  slug: string;
  title: string;
  type: "entry" | "tag" | "external";
  status: EntryStatus;
  linkCount: number; // in + out
  tags: string[];
  updatedAt: string;
}

interface GraphEdge {
  id: string;
  source: string; // node id
  target: string; // node id
  type: "link" | "tag";
  label?: string;
}

interface GraphStats {
  nodeCount: number;
  edgeCount: number;
  orphanCount: number; // entries with no links
  brokenLinkCount: number;
  avgConnections: number;
}

Graph Query

-- Get full graph for workspace
WITH nodes AS (
  SELECT
    id, slug, title, 'entry' as type, status,
    (SELECT COUNT(*) FROM knowledge_links WHERE source_id = e.id OR target_id = e.id) as link_count,
    updated_at
  FROM knowledge_entries e
  WHERE workspace_id = $1 AND status != 'ARCHIVED'
),
edges AS (
  SELECT
    l.id, l.source_id as source, l.target_id as target, 'link' as type, l.link_text as label
  FROM knowledge_links l
  JOIN knowledge_entries e ON l.source_id = e.id
  WHERE e.workspace_id = $1
)
SELECT
  json_build_object(
    'nodes', (SELECT json_agg(nodes) FROM nodes),
    'edges', (SELECT json_agg(edges) FROM edges)
  ) as graph;

Frontend Rendering

Use D3.js force-directed graph or Cytoscape.js:

// Graph component configuration
const graphConfig = {
  layout: "force-directed",
  physics: {
    repulsion: 100,
    springLength: 150,
    springStrength: 0.05,
  },
  nodeSize: (node) => Math.sqrt(node.linkCount) * 10 + 20,
  nodeColor: (node) => {
    switch (node.status) {
      case "PUBLISHED":
        return "#22c55e";
      case "DRAFT":
        return "#f59e0b";
      case "ARCHIVED":
        return "#6b7280";
    }
  },
  edgeStyle: {
    color: "#94a3b8",
    width: 1,
    arrows: "to",
  },
};

Caching Strategy

Valkey Key Patterns

knowledge:{workspaceId}:entry:{slug}          Entry cache (JSON)
knowledge:{workspaceId}:entry:{slug}:html     Rendered HTML cache
knowledge:{workspaceId}:graph                 Full graph cache
knowledge:{workspaceId}:graph:{slug}          Subgraph cache
knowledge:{workspaceId}:search:{hash}         Search result cache
knowledge:{workspaceId}:tags                  Tag list cache
knowledge:{workspaceId}:recent                Recent entries list

Cache Invalidation

async function invalidateEntryCache(workspaceId: string, slug: string) {
  const keys = [
    `knowledge:${workspaceId}:entry:${slug}`,
    `knowledge:${workspaceId}:entry:${slug}:html`,
    `knowledge:${workspaceId}:graph`, // Full graph affected
    `knowledge:${workspaceId}:graph:${slug}`,
    `knowledge:${workspaceId}:recent`,
  ];

  // Also invalidate subgraphs for linked entries
  const linkedSlugs = await getLinkedEntrySlugs(workspaceId, slug);
  for (const linked of linkedSlugs) {
    keys.push(`knowledge:${workspaceId}:graph:${linked}`);
  }

  await valkey.del(...keys);

  // Invalidate search caches (pattern delete)
  const searchKeys = await valkey.keys(`knowledge:${workspaceId}:search:*`);
  if (searchKeys.length) await valkey.del(...searchKeys);
}

UI Components

Entry Editor

┌────────────────────────────────────────────────────────────────┐
│ [📄] Agent Orchestration Design                    [Save] [···]│
├────────────────────────────────────────────────────────────────┤
│ Status: [Published ▼]  Tags: [architecture] [agents] [+]       │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  # Problem Statement                                           │
│                                                                │
│  Development teams and AI agents working on complex projects   │
│  need a way to [[capture-decisions|capture decisions]]...      │
│                                                                │
│  See also: [[task-queues]] and [[valkey-patterns]]            │
│                                                                │
│  ─────────────────────────────────────────────────────────────│
│  Backlinks (3):                                                │
│  • [[mosaic-roadmap]] - "...implements agent orchestration..." │
│  • [[design-index]] - "Core designs: [[agent-orchestration]]"  │
│  • [[jarvis-memory]] - "Created orchestration design..."       │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Graph View

┌────────────────────────────────────────────────────────────────┐
│ Knowledge Graph                          [Filter ▼] [Layout ▼] │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│            ┌─────────┐                                         │
│            │ valkey  │                                         │
│            │patterns │                                         │
│            └────┬────┘                                         │
│                 │                                              │
│    ┌────────────┼────────────┐                                │
│    │            │            │                                │
│    ▼            ▼            ▼                                │
│ ┌──────┐   ┌────────┐   ┌────────┐                           │
│ │cache │   │  task  │   │ agent  │◄─────┐                    │
│ │layer │   │ queues │   │ orch   │      │                    │
│ └──────┘   └────────┘   └───┬────┘      │                    │
│                             │           │                     │
│                             ▼           │                     │
│                        ┌────────┐  ┌────┴───┐                │
│                        │recovery│  │ mosaic │                │
│                        │patterns│  │roadmap │                │
│                        └────────┘  └────────┘                │
│                                                                │
│ 🟢 Published (6)  🟡 Draft (2)  ⚪ Orphan (0)                  │
└────────────────────────────────────────────────────────────────┘

Search Results

┌────────────────────────────────────────────────────────────────┐
│ 🔍 [agent recovery                                    ] [Search]│
├────────────────────────────────────────────────────────────────┤
│                                                                │
│ 📄 Agent Orchestration - Recovery Patterns                     │
│    ...automatic **recovery** when an **agent** fails or the... │
│    Tags: architecture, agents  •  Updated 2 hours ago          │
│                                                                │
│ 📄 Agent Health Monitoring                                     │
│    ...heartbeat monitoring enables **recovery** of stale...    │
│    Tags: agents, monitoring  •  Updated 1 day ago              │
│                                                                │
│ 📄 Task Queue Design                                           │
│    ...retry logic with exponential backoff for **agent**...    │
│    Tags: architecture, queues  •  Updated 3 days ago           │
│                                                                │
│ ─────────────────────────────────────────────────────────────  │
│ Also try: Semantic search for conceptually related entries     │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase 1: Foundation (Week 1-2)

Goal: Basic CRUD + storage working

  • Database schema + migrations
  • Entry CRUD API endpoints
  • Basic markdown rendering
  • Tag management
  • Entry list/detail pages

Deliverables:

  • Can create, edit, view, delete entries
  • Tags work
  • Basic search (title/slug match)

Phase 2: Linking (Week 2-3)

Goal: Wiki-link functionality

  • Link parser ([[...]] syntax)
  • Link resolution logic
  • Broken link detection
  • Backlinks display
  • Link autocomplete in editor

Deliverables:

  • Links between entries work
  • Backlinks show on entry pages
  • Editor suggests links as you type

Phase 3: Search (Week 3-4)

Goal: Full-text + semantic search

  • PostgreSQL full-text search setup
  • Search API endpoint
  • Search UI with highlighting
  • pgvector extension setup
  • Embedding generation pipeline
  • Semantic search API

Deliverables:

  • Fast full-text search
  • Semantic search for "fuzzy" queries
  • Search results with snippets

Phase 4: Graph (Week 4-5)

Goal: Visual knowledge graph

  • Graph data API
  • D3.js/Cytoscape integration
  • Interactive graph view
  • Subgraph (entry-centered) view
  • Graph statistics

Deliverables:

  • Can view full knowledge graph
  • Can explore from any entry
  • Visual indicators for status/orphans

Phase 5: Polish (Week 5-6)

Goal: Production-ready

  • Version history UI
  • Diff view between versions
  • Import from markdown files
  • Export functionality
  • Performance optimization
  • Caching implementation
  • Documentation

Deliverables:

  • Version history works
  • Can import existing docs
  • Performance is acceptable
  • Module is documented

Integration Points

Agent Access

The Knowledge module should be accessible to agents via API:

// Agent tool for knowledge access
interface KnowledgeTools {
  // Search
  searchKnowledge(query: string): Promise<SearchResult[]>;
  semanticSearch(query: string): Promise<SearchResult[]>;

  // CRUD
  getEntry(slug: string): Promise<KnowledgeEntry>;
  createEntry(data: CreateEntryInput): Promise<KnowledgeEntry>;
  updateEntry(slug: string, data: UpdateEntryInput): Promise<KnowledgeEntry>;

  // Graph
  getRelatedEntries(slug: string): Promise<KnowledgeEntry[]>;
  getBacklinks(slug: string): Promise<KnowledgeEntry[]>;
}

Clawdbot Integration

For Clawdbot specifically, the Knowledge module could:

  1. Sync with memory/*.md files
  2. Provide semantic search for memory_search tool
  3. Generate embeddings for memory entries
  4. Visualize agent memory as a knowledge graph

Success Metrics

Metric Target Measurement
Entry creation time < 200ms API response time
Search latency (full-text) < 100ms p95 response time
Search latency (semantic) < 300ms p95 response time
Graph render (100 nodes) < 200ms Client-side time
Graph render (1000 nodes) < 1s Client-side time
Adoption 50+ entries/workspace After 1 month
Link density > 2 links/entry avg Graph statistics

Open Questions

  1. Embedding model — Use OpenAI embeddings or self-hosted? (Cost vs privacy)
  2. Real-time collab — Do we need multiplayer editing? (CRDT complexity)
  3. Permissions — Entry-level permissions or workspace-level only?
  4. Templates — Support entry templates (ADR, design doc, etc.)?
  5. Attachments — Allow images/files in entries?

References