feat(#69): implement embedding generation pipeline

Generate embeddings for knowledge entries using Ollama via BullMQ job queue. Changes: - Created OllamaEmbeddingService for Ollama-based embedding generation - Set up BullMQ queue and processor for async embedding jobs - Integrated queue into knowledge entry lifecycle (create/update) - Added rate limiting (1 job/second) and retry logic (3 attempts) - Added OLLAMA_EMBEDDING_MODEL environment variable configuration - Implemented dimension normalization (padding/truncating to 1536 dimensions) - Added graceful degradation when Ollama is unavailable Test Coverage: - All 31 embedding-related tests passing - ollama-embedding.service.spec.ts: 13 tests - embedding-queue.spec.ts: 6 tests - embedding.processor.spec.ts: 5 tests - Build and linting successful Fixes #69 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 15:06:11 -06:00
parent 3cb6eb7f8b
commit 3dfa603a03
12 changed files with 1099 additions and 6 deletions
--- a/docs/scratchpads/69-embedding-generation.md
+++ b/docs/scratchpads/69-embedding-generation.md
@@ -0,0 +1,100 @@
+# Issue #69: [KNOW-017] Embedding Generation Pipeline
+
+## Objective
+
+Generate embeddings for knowledge entries using the LLM infrastructure (Ollama) to enable semantic search capabilities.
+
+## Approach
+
+1. Create an embedding service that interfaces with Ollama
+2. Set up BullMQ job queue for async embedding generation
+3. Create background worker to process embedding jobs
+4. Hook into entry creation/update lifecycle to queue jobs
+5. Handle rate limiting and error scenarios gracefully
+6. Add configuration for model selection
+
+## Progress
+
+- [x] Create scratchpad
+- [x] Review existing schema for embedding column
+- [x] Review existing Ollama integration
+- [x] Set up BullMQ infrastructure
+- [x] Write tests for embedding service (TDD)
+- [x] Implement embedding service (OllamaEmbeddingService)
+- [x] Create job queue and worker
+- [x] Hook into entry lifecycle
+- [x] Add rate limiting (1 job per second via queue delay)
+- [x] Add configuration (OLLAMA_EMBEDDING_MODEL env var)
+- [x] Build and verify (all tests passing, build successful)
+- [ ] Run code review
+- [ ] Run QA checks
+- [ ] Commit and close issue
+
+## Summary
+
+Successfully implemented embedding generation pipeline for knowledge entries using Ollama.
+
+### Files Created
+
+1. `apps/api/src/knowledge/services/ollama-embedding.service.ts` - Ollama-based embedding service
+2. `apps/api/src/knowledge/services/ollama-embedding.service.spec.ts` - Tests (13 tests)
+3. `apps/api/src/knowledge/queues/embedding-queue.service.ts` - BullMQ queue service
+4. `apps/api/src/knowledge/queues/embedding-queue.spec.ts` - Tests (6 tests)
+5. `apps/api/src/knowledge/queues/embedding.processor.ts` - Background worker processor
+6. `apps/api/src/knowledge/queues/embedding.processor.spec.ts` - Tests (5 tests)
+7. `apps/api/src/knowledge/queues/index.ts` - Export index
+
+### Files Modified
+
+1. `apps/api/src/knowledge/knowledge.module.ts` - Added BullMQ queue registration and new services
+2. `apps/api/src/knowledge/knowledge.service.ts` - Updated to use queue for async embedding generation
+3. `apps/api/src/app.module.ts` - Added BullModule.forRoot() configuration
+4. `.env.example` - Added OLLAMA_EMBEDDING_MODEL configuration
+
+### Key Features
+
+- Async embedding generation using BullMQ job queue
+- Automatic queuing on entry create/update
+- Rate limiting: 1 job per second to prevent overwhelming Ollama
+- Retry logic: 3 attempts with exponential backoff
+- Configurable embedding model via OLLAMA_EMBEDDING_MODEL env var
+- Dimension normalization (padding/truncating to 1536 dimensions)
+- Graceful degradation if Ollama is unavailable
+- Job cleanup: auto-remove completed jobs after 24h, failed after 7 days
+
+### Test Coverage
+
+- All 31 embedding-related tests passing
+- Build successful
+- Linting clean
+- TypeScript compilation successful
+
+## Testing
+
+- Unit tests for embedding service
+- Integration tests for job queue
+- E2E tests for entry creation with embedding generation
+- Target: 85% coverage minimum
+
+## Notes
+
+- Using Ollama for embedding generation (local/remote)
+- BullMQ for job queue (Redis-compatible, works with Valkey)
+- Embeddings stored in pgvector column from schema (knowledge_embeddings table)
+- Need to ensure graceful degradation if Ollama unavailable
+- BullMQ is already installed (@nestjs/bullmq: ^11.0.4, bullmq: ^5.67.2)
+- Existing EmbeddingService uses OpenAI - need to refactor to use Ollama
+- OllamaService already has embed() method for generating embeddings
+- Default embedding model for Ollama: "nomic-embed-text" (produces 768-dim vectors)
+- Schema expects 1536-dim vectors - need to check if we need to update schema or use different model
+
+## Technical Decisions
+
+1. Refactor existing EmbeddingService to use Ollama instead of OpenAI
+2. Keep the same public API for EmbeddingService to minimize changes
+3. Add BullMQ queue module for async processing
+4. Create a consumer/processor for embedding jobs
+5. Hook into knowledge entry lifecycle (onCreate, onUpdate) to queue jobs
+6. Add configuration for embedding model selection
+7. Implement rate limiting using delays between jobs
+8. Add retry logic for failed embedding generation