stack/docs/scratchpads/69-embedding-generation.md

# Issue #69: [KNOW-017] Embedding Generation Pipeline

## Objective

Generate embeddings for knowledge entries using the LLM infrastructure (Ollama) to enable semantic search capabilities.

## Approach

1. Create an embedding service that interfaces with Ollama
2. Set up BullMQ job queue for async embedding generation
3. Create background worker to process embedding jobs
4. Hook into entry creation/update lifecycle to queue jobs
5. Handle rate limiting and error scenarios gracefully
6. Add configuration for model selection

## Progress

- [x] Create scratchpad
- [x] Review existing schema for embedding column
- [x] Review existing Ollama integration
- [x] Set up BullMQ infrastructure
- [x] Write tests for embedding service (TDD)
- [x] Implement embedding service (OllamaEmbeddingService)
- [x] Create job queue and worker
- [x] Hook into entry lifecycle
- [x] Add rate limiting (1 job per second via queue delay)
- [x] Add configuration (OLLAMA_EMBEDDING_MODEL env var)
- [x] Build and verify (all tests passing, build successful)
- [x] Commit changes (commit 3dfa603)
- [x] Close issue #69

## Summary

Successfully implemented embedding generation pipeline for knowledge entries using Ollama.

### Files Created

1. `apps/api/src/knowledge/services/ollama-embedding.service.ts` - Ollama-based embedding service
2. `apps/api/src/knowledge/services/ollama-embedding.service.spec.ts` - Tests (13 tests)
3. `apps/api/src/knowledge/queues/embedding-queue.service.ts` - BullMQ queue service
4. `apps/api/src/knowledge/queues/embedding-queue.spec.ts` - Tests (6 tests)
5. `apps/api/src/knowledge/queues/embedding.processor.ts` - Background worker processor
6. `apps/api/src/knowledge/queues/embedding.processor.spec.ts` - Tests (5 tests)
7. `apps/api/src/knowledge/queues/index.ts` - Export index

### Files Modified

1. `apps/api/src/knowledge/knowledge.module.ts` - Added BullMQ queue registration and new services
2. `apps/api/src/knowledge/knowledge.service.ts` - Updated to use queue for async embedding generation
3. `apps/api/src/app.module.ts` - Added BullModule.forRoot() configuration
4. `.env.example` - Added OLLAMA_EMBEDDING_MODEL configuration

### Key Features

- Async embedding generation using BullMQ job queue
- Automatic queuing on entry create/update
- Rate limiting: 1 job per second to prevent overwhelming Ollama
- Retry logic: 3 attempts with exponential backoff
- Configurable embedding model via OLLAMA_EMBEDDING_MODEL env var
- Dimension normalization (padding/truncating to 1536 dimensions)
- Graceful degradation if Ollama is unavailable
- Job cleanup: auto-remove completed jobs after 24h, failed after 7 days

### Test Coverage

- All 31 embedding-related tests passing
- Build successful
- Linting clean
- TypeScript compilation successful

## Testing

- Unit tests for embedding service
- Integration tests for job queue
- E2E tests for entry creation with embedding generation
- Target: 85% coverage minimum

## Notes

- Using Ollama for embedding generation (local/remote)
- BullMQ for job queue (Redis-compatible, works with Valkey)
- Embeddings stored in pgvector column from schema (knowledge_embeddings table)
- Need to ensure graceful degradation if Ollama unavailable
- BullMQ is already installed (@nestjs/bullmq: ^11.0.4, bullmq: ^5.67.2)
- Existing EmbeddingService uses OpenAI - need to refactor to use Ollama
- OllamaService already has embed() method for generating embeddings
- Default embedding model for Ollama: "nomic-embed-text" (produces 768-dim vectors)
- Schema expects 1536-dim vectors - need to check if we need to update schema or use different model

## Technical Decisions

1. Refactor existing EmbeddingService to use Ollama instead of OpenAI
2. Keep the same public API for EmbeddingService to minimize changes
3. Add BullMQ queue module for async processing
4. Create a consumer/processor for embedding jobs
5. Hook into knowledge entry lifecycle (onCreate, onUpdate) to queue jobs
6. Add configuration for embedding model selection
7. Implement rate limiting using delays between jobs
8. Add retry logic for failed embedding generation