Files
stack/docs/scratchpads/69-embedding-generation.md
Jason Woltje 3dfa603a03 feat(#69): implement embedding generation pipeline
Generate embeddings for knowledge entries using Ollama via BullMQ job queue.

Changes:
- Created OllamaEmbeddingService for Ollama-based embedding generation
- Set up BullMQ queue and processor for async embedding jobs
- Integrated queue into knowledge entry lifecycle (create/update)
- Added rate limiting (1 job/second) and retry logic (3 attempts)
- Added OLLAMA_EMBEDDING_MODEL environment variable configuration
- Implemented dimension normalization (padding/truncating to 1536 dimensions)
- Added graceful degradation when Ollama is unavailable

Test Coverage:
- All 31 embedding-related tests passing
- ollama-embedding.service.spec.ts: 13 tests
- embedding-queue.spec.ts: 6 tests
- embedding.processor.spec.ts: 5 tests
- Build and linting successful

Fixes #69

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 15:06:11 -06:00

4.0 KiB

Issue #69: [KNOW-017] Embedding Generation Pipeline

Objective

Generate embeddings for knowledge entries using the LLM infrastructure (Ollama) to enable semantic search capabilities.

Approach

  1. Create an embedding service that interfaces with Ollama
  2. Set up BullMQ job queue for async embedding generation
  3. Create background worker to process embedding jobs
  4. Hook into entry creation/update lifecycle to queue jobs
  5. Handle rate limiting and error scenarios gracefully
  6. Add configuration for model selection

Progress

  • Create scratchpad
  • Review existing schema for embedding column
  • Review existing Ollama integration
  • Set up BullMQ infrastructure
  • Write tests for embedding service (TDD)
  • Implement embedding service (OllamaEmbeddingService)
  • Create job queue and worker
  • Hook into entry lifecycle
  • Add rate limiting (1 job per second via queue delay)
  • Add configuration (OLLAMA_EMBEDDING_MODEL env var)
  • Build and verify (all tests passing, build successful)
  • Run code review
  • Run QA checks
  • Commit and close issue

Summary

Successfully implemented embedding generation pipeline for knowledge entries using Ollama.

Files Created

  1. apps/api/src/knowledge/services/ollama-embedding.service.ts - Ollama-based embedding service
  2. apps/api/src/knowledge/services/ollama-embedding.service.spec.ts - Tests (13 tests)
  3. apps/api/src/knowledge/queues/embedding-queue.service.ts - BullMQ queue service
  4. apps/api/src/knowledge/queues/embedding-queue.spec.ts - Tests (6 tests)
  5. apps/api/src/knowledge/queues/embedding.processor.ts - Background worker processor
  6. apps/api/src/knowledge/queues/embedding.processor.spec.ts - Tests (5 tests)
  7. apps/api/src/knowledge/queues/index.ts - Export index

Files Modified

  1. apps/api/src/knowledge/knowledge.module.ts - Added BullMQ queue registration and new services
  2. apps/api/src/knowledge/knowledge.service.ts - Updated to use queue for async embedding generation
  3. apps/api/src/app.module.ts - Added BullModule.forRoot() configuration
  4. .env.example - Added OLLAMA_EMBEDDING_MODEL configuration

Key Features

  • Async embedding generation using BullMQ job queue
  • Automatic queuing on entry create/update
  • Rate limiting: 1 job per second to prevent overwhelming Ollama
  • Retry logic: 3 attempts with exponential backoff
  • Configurable embedding model via OLLAMA_EMBEDDING_MODEL env var
  • Dimension normalization (padding/truncating to 1536 dimensions)
  • Graceful degradation if Ollama is unavailable
  • Job cleanup: auto-remove completed jobs after 24h, failed after 7 days

Test Coverage

  • All 31 embedding-related tests passing
  • Build successful
  • Linting clean
  • TypeScript compilation successful

Testing

  • Unit tests for embedding service
  • Integration tests for job queue
  • E2E tests for entry creation with embedding generation
  • Target: 85% coverage minimum

Notes

  • Using Ollama for embedding generation (local/remote)
  • BullMQ for job queue (Redis-compatible, works with Valkey)
  • Embeddings stored in pgvector column from schema (knowledge_embeddings table)
  • Need to ensure graceful degradation if Ollama unavailable
  • BullMQ is already installed (@nestjs/bullmq: ^11.0.4, bullmq: ^5.67.2)
  • Existing EmbeddingService uses OpenAI - need to refactor to use Ollama
  • OllamaService already has embed() method for generating embeddings
  • Default embedding model for Ollama: "nomic-embed-text" (produces 768-dim vectors)
  • Schema expects 1536-dim vectors - need to check if we need to update schema or use different model

Technical Decisions

  1. Refactor existing EmbeddingService to use Ollama instead of OpenAI
  2. Keep the same public API for EmbeddingService to minimize changes
  3. Add BullMQ queue module for async processing
  4. Create a consumer/processor for embedding jobs
  5. Hook into knowledge entry lifecycle (onCreate, onUpdate) to queue jobs
  6. Add configuration for embedding model selection
  7. Implement rate limiting using delays between jobs
  8. Add retry logic for failed embedding generation