Generate embeddings for knowledge entries using Ollama via BullMQ job queue. Changes: - Created OllamaEmbeddingService for Ollama-based embedding generation - Set up BullMQ queue and processor for async embedding jobs - Integrated queue into knowledge entry lifecycle (create/update) - Added rate limiting (1 job/second) and retry logic (3 attempts) - Added OLLAMA_EMBEDDING_MODEL environment variable configuration - Implemented dimension normalization (padding/truncating to 1536 dimensions) - Added graceful degradation when Ollama is unavailable Test Coverage: - All 31 embedding-related tests passing - ollama-embedding.service.spec.ts: 13 tests - embedding-queue.spec.ts: 6 tests - embedding.processor.spec.ts: 5 tests - Build and linting successful Fixes #69 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
4.0 KiB
4.0 KiB
Issue #69: [KNOW-017] Embedding Generation Pipeline
Objective
Generate embeddings for knowledge entries using the LLM infrastructure (Ollama) to enable semantic search capabilities.
Approach
- Create an embedding service that interfaces with Ollama
- Set up BullMQ job queue for async embedding generation
- Create background worker to process embedding jobs
- Hook into entry creation/update lifecycle to queue jobs
- Handle rate limiting and error scenarios gracefully
- Add configuration for model selection
Progress
- Create scratchpad
- Review existing schema for embedding column
- Review existing Ollama integration
- Set up BullMQ infrastructure
- Write tests for embedding service (TDD)
- Implement embedding service (OllamaEmbeddingService)
- Create job queue and worker
- Hook into entry lifecycle
- Add rate limiting (1 job per second via queue delay)
- Add configuration (OLLAMA_EMBEDDING_MODEL env var)
- Build and verify (all tests passing, build successful)
- Run code review
- Run QA checks
- Commit and close issue
Summary
Successfully implemented embedding generation pipeline for knowledge entries using Ollama.
Files Created
apps/api/src/knowledge/services/ollama-embedding.service.ts- Ollama-based embedding serviceapps/api/src/knowledge/services/ollama-embedding.service.spec.ts- Tests (13 tests)apps/api/src/knowledge/queues/embedding-queue.service.ts- BullMQ queue serviceapps/api/src/knowledge/queues/embedding-queue.spec.ts- Tests (6 tests)apps/api/src/knowledge/queues/embedding.processor.ts- Background worker processorapps/api/src/knowledge/queues/embedding.processor.spec.ts- Tests (5 tests)apps/api/src/knowledge/queues/index.ts- Export index
Files Modified
apps/api/src/knowledge/knowledge.module.ts- Added BullMQ queue registration and new servicesapps/api/src/knowledge/knowledge.service.ts- Updated to use queue for async embedding generationapps/api/src/app.module.ts- Added BullModule.forRoot() configuration.env.example- Added OLLAMA_EMBEDDING_MODEL configuration
Key Features
- Async embedding generation using BullMQ job queue
- Automatic queuing on entry create/update
- Rate limiting: 1 job per second to prevent overwhelming Ollama
- Retry logic: 3 attempts with exponential backoff
- Configurable embedding model via OLLAMA_EMBEDDING_MODEL env var
- Dimension normalization (padding/truncating to 1536 dimensions)
- Graceful degradation if Ollama is unavailable
- Job cleanup: auto-remove completed jobs after 24h, failed after 7 days
Test Coverage
- All 31 embedding-related tests passing
- Build successful
- Linting clean
- TypeScript compilation successful
Testing
- Unit tests for embedding service
- Integration tests for job queue
- E2E tests for entry creation with embedding generation
- Target: 85% coverage minimum
Notes
- Using Ollama for embedding generation (local/remote)
- BullMQ for job queue (Redis-compatible, works with Valkey)
- Embeddings stored in pgvector column from schema (knowledge_embeddings table)
- Need to ensure graceful degradation if Ollama unavailable
- BullMQ is already installed (@nestjs/bullmq: ^11.0.4, bullmq: ^5.67.2)
- Existing EmbeddingService uses OpenAI - need to refactor to use Ollama
- OllamaService already has embed() method for generating embeddings
- Default embedding model for Ollama: "nomic-embed-text" (produces 768-dim vectors)
- Schema expects 1536-dim vectors - need to check if we need to update schema or use different model
Technical Decisions
- Refactor existing EmbeddingService to use Ollama instead of OpenAI
- Keep the same public API for EmbeddingService to minimize changes
- Add BullMQ queue module for async processing
- Create a consumer/processor for embedding jobs
- Hook into knowledge entry lifecycle (onCreate, onUpdate) to queue jobs
- Add configuration for embedding model selection
- Implement rate limiting using delays between jobs
- Add retry logic for failed embedding generation