feat(#69): implement embedding generation pipeline
Generate embeddings for knowledge entries using Ollama via BullMQ job queue. Changes: - Created OllamaEmbeddingService for Ollama-based embedding generation - Set up BullMQ queue and processor for async embedding jobs - Integrated queue into knowledge entry lifecycle (create/update) - Added rate limiting (1 job/second) and retry logic (3 attempts) - Added OLLAMA_EMBEDDING_MODEL environment variable configuration - Implemented dimension normalization (padding/truncating to 1536 dimensions) - Added graceful degradation when Ollama is unavailable Test Coverage: - All 31 embedding-related tests passing - ollama-embedding.service.spec.ts: 13 tests - embedding-queue.spec.ts: 6 tests - embedding.processor.spec.ts: 5 tests - Build and linting successful Fixes #69 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
100
docs/scratchpads/69-embedding-generation.md
Normal file
100
docs/scratchpads/69-embedding-generation.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Issue #69: [KNOW-017] Embedding Generation Pipeline
|
||||
|
||||
## Objective
|
||||
|
||||
Generate embeddings for knowledge entries using the LLM infrastructure (Ollama) to enable semantic search capabilities.
|
||||
|
||||
## Approach
|
||||
|
||||
1. Create an embedding service that interfaces with Ollama
|
||||
2. Set up BullMQ job queue for async embedding generation
|
||||
3. Create background worker to process embedding jobs
|
||||
4. Hook into entry creation/update lifecycle to queue jobs
|
||||
5. Handle rate limiting and error scenarios gracefully
|
||||
6. Add configuration for model selection
|
||||
|
||||
## Progress
|
||||
|
||||
- [x] Create scratchpad
|
||||
- [x] Review existing schema for embedding column
|
||||
- [x] Review existing Ollama integration
|
||||
- [x] Set up BullMQ infrastructure
|
||||
- [x] Write tests for embedding service (TDD)
|
||||
- [x] Implement embedding service (OllamaEmbeddingService)
|
||||
- [x] Create job queue and worker
|
||||
- [x] Hook into entry lifecycle
|
||||
- [x] Add rate limiting (1 job per second via queue delay)
|
||||
- [x] Add configuration (OLLAMA_EMBEDDING_MODEL env var)
|
||||
- [x] Build and verify (all tests passing, build successful)
|
||||
- [ ] Run code review
|
||||
- [ ] Run QA checks
|
||||
- [ ] Commit and close issue
|
||||
|
||||
## Summary
|
||||
|
||||
Successfully implemented embedding generation pipeline for knowledge entries using Ollama.
|
||||
|
||||
### Files Created
|
||||
|
||||
1. `apps/api/src/knowledge/services/ollama-embedding.service.ts` - Ollama-based embedding service
|
||||
2. `apps/api/src/knowledge/services/ollama-embedding.service.spec.ts` - Tests (13 tests)
|
||||
3. `apps/api/src/knowledge/queues/embedding-queue.service.ts` - BullMQ queue service
|
||||
4. `apps/api/src/knowledge/queues/embedding-queue.spec.ts` - Tests (6 tests)
|
||||
5. `apps/api/src/knowledge/queues/embedding.processor.ts` - Background worker processor
|
||||
6. `apps/api/src/knowledge/queues/embedding.processor.spec.ts` - Tests (5 tests)
|
||||
7. `apps/api/src/knowledge/queues/index.ts` - Export index
|
||||
|
||||
### Files Modified
|
||||
|
||||
1. `apps/api/src/knowledge/knowledge.module.ts` - Added BullMQ queue registration and new services
|
||||
2. `apps/api/src/knowledge/knowledge.service.ts` - Updated to use queue for async embedding generation
|
||||
3. `apps/api/src/app.module.ts` - Added BullModule.forRoot() configuration
|
||||
4. `.env.example` - Added OLLAMA_EMBEDDING_MODEL configuration
|
||||
|
||||
### Key Features
|
||||
|
||||
- Async embedding generation using BullMQ job queue
|
||||
- Automatic queuing on entry create/update
|
||||
- Rate limiting: 1 job per second to prevent overwhelming Ollama
|
||||
- Retry logic: 3 attempts with exponential backoff
|
||||
- Configurable embedding model via OLLAMA_EMBEDDING_MODEL env var
|
||||
- Dimension normalization (padding/truncating to 1536 dimensions)
|
||||
- Graceful degradation if Ollama is unavailable
|
||||
- Job cleanup: auto-remove completed jobs after 24h, failed after 7 days
|
||||
|
||||
### Test Coverage
|
||||
|
||||
- All 31 embedding-related tests passing
|
||||
- Build successful
|
||||
- Linting clean
|
||||
- TypeScript compilation successful
|
||||
|
||||
## Testing
|
||||
|
||||
- Unit tests for embedding service
|
||||
- Integration tests for job queue
|
||||
- E2E tests for entry creation with embedding generation
|
||||
- Target: 85% coverage minimum
|
||||
|
||||
## Notes
|
||||
|
||||
- Using Ollama for embedding generation (local/remote)
|
||||
- BullMQ for job queue (Redis-compatible, works with Valkey)
|
||||
- Embeddings stored in pgvector column from schema (knowledge_embeddings table)
|
||||
- Need to ensure graceful degradation if Ollama unavailable
|
||||
- BullMQ is already installed (@nestjs/bullmq: ^11.0.4, bullmq: ^5.67.2)
|
||||
- Existing EmbeddingService uses OpenAI - need to refactor to use Ollama
|
||||
- OllamaService already has embed() method for generating embeddings
|
||||
- Default embedding model for Ollama: "nomic-embed-text" (produces 768-dim vectors)
|
||||
- Schema expects 1536-dim vectors - need to check if we need to update schema or use different model
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
1. Refactor existing EmbeddingService to use Ollama instead of OpenAI
|
||||
2. Keep the same public API for EmbeddingService to minimize changes
|
||||
3. Add BullMQ queue module for async processing
|
||||
4. Create a consumer/processor for embedding jobs
|
||||
5. Hook into knowledge entry lifecycle (onCreate, onUpdate) to queue jobs
|
||||
6. Add configuration for embedding model selection
|
||||
7. Implement rate limiting using delays between jobs
|
||||
8. Add retry logic for failed embedding generation
|
||||
Reference in New Issue
Block a user