Issues resolved: - #68: pgvector Setup * Added pgvector vector index migration for knowledge_embeddings * Vector index uses HNSW algorithm with cosine distance * Optimized for 1536-dimension OpenAI embeddings - #69: Embedding Generation Pipeline * Created EmbeddingService with OpenAI integration * Automatic embedding generation on entry create/update * Batch processing endpoint for existing entries * Async generation to avoid blocking API responses * Content preparation with title weighting - #70: Semantic Search API * POST /api/knowledge/search/semantic - pure vector search * POST /api/knowledge/search/hybrid - RRF combined search * POST /api/knowledge/embeddings/batch - batch generation * Comprehensive test coverage * Full documentation in docs/SEMANTIC_SEARCH.md Technical details: - Uses OpenAI text-embedding-3-small model (1536 dims) - HNSW index for O(log n) similarity search - Reciprocal Rank Fusion for hybrid search - Graceful degradation when OpenAI not configured - Async embedding generation for performance Configuration: - Added OPENAI_API_KEY to .env.example - Optional feature - disabled if API key not set - Falls back to keyword search in hybrid mode
8.0 KiB
Semantic Search Implementation
This document describes the semantic search implementation for the Mosaic Stack Knowledge Module using OpenAI embeddings and PostgreSQL pgvector.
Overview
The semantic search feature enables AI-powered similarity search across knowledge entries using vector embeddings. It complements the existing full-text search with semantic understanding, allowing users to find relevant content even when exact keywords don't match.
Architecture
Components
- EmbeddingService - Generates and manages OpenAI embeddings
- SearchService - Enhanced with semantic and hybrid search methods
- KnowledgeService - Automatically generates embeddings on entry create/update
- pgvector - PostgreSQL extension for vector similarity search
Database Schema
Knowledge Embeddings Table
model KnowledgeEmbedding {
id String @id @default(uuid()) @db.Uuid
entryId String @unique @map("entry_id") @db.Uuid
entry KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)
embedding Unsupported("vector(1536)")
model String
createdAt DateTime @default(now()) @map("created_at") @db.Timestamptz
updatedAt DateTime @updatedAt @map("updated_at") @db.Timestamptz
@@index([entryId])
@@map("knowledge_embeddings")
}
Vector Index
An HNSW (Hierarchical Navigable Small World) index is created for fast similarity search:
CREATE INDEX knowledge_embeddings_embedding_idx
ON knowledge_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Configuration
Environment Variables
Add to your .env file:
# Optional: Required for semantic search
OPENAI_API_KEY=sk-...
Get your API key from: https://platform.openai.com/api-keys
OpenAI Model
The default embedding model is text-embedding-3-small (1536 dimensions). This provides:
- High quality embeddings
- Cost-effective pricing
- Fast generation speed
API Endpoints
1. Semantic Search
POST /api/knowledge/search/semantic
Search using vector similarity only.
Request:
{
"query": "database performance optimization",
"status": "PUBLISHED"
}
Query Parameters:
page(optional): Page number (default: 1)limit(optional): Results per page (default: 20)
Response:
{
"data": [
{
"id": "uuid",
"slug": "postgres-indexing",
"title": "PostgreSQL Indexing Strategies",
"content": "...",
"rank": 0.87,
"tags": [...],
...
}
],
"pagination": {
"page": 1,
"limit": 20,
"total": 15,
"totalPages": 1
},
"query": "database performance optimization"
}
2. Hybrid Search (Recommended)
POST /api/knowledge/search/hybrid
Combines vector similarity and full-text search using Reciprocal Rank Fusion (RRF).
Request:
{
"query": "indexing strategies",
"status": "PUBLISHED"
}
Benefits of Hybrid Search:
- Best of both worlds: semantic understanding + keyword matching
- Better ranking for exact matches
- Improved recall and precision
- Resilient to edge cases
3. Batch Embedding Generation
POST /api/knowledge/embeddings/batch
Generate embeddings for all existing entries. Useful for:
- Initial setup after enabling semantic search
- Regenerating embeddings after model updates
Request:
{
"status": "PUBLISHED"
}
Response:
{
"message": "Generated 42 embeddings out of 45 entries",
"total": 45,
"success": 42
}
Permissions: Requires ADMIN role
Automatic Embedding Generation
Embeddings are automatically generated when:
- Creating an entry - Embedding generated asynchronously after creation
- Updating an entry - Embedding regenerated if title or content changes
The generation happens asynchronously to avoid blocking API responses.
Content Preparation
Before generating embeddings, content is prepared by:
- Combining title and content
- Weighting title more heavily (appears twice)
- This improves semantic matching on titles
prepareContentForEmbedding(title, content) {
return `${title}\n\n${title}\n\n${content}`.trim();
}
Search Algorithms
Vector Similarity Search
Uses cosine distance to find semantically similar entries:
SELECT *
FROM knowledge_entries e
INNER JOIN knowledge_embeddings emb ON e.id = emb.entry_id
ORDER BY emb.embedding <=> query_embedding
LIMIT 20
<=>operator: cosine distance- Lower distance = higher similarity
- Efficient with HNSW index
Hybrid Search (RRF Algorithm)
Reciprocal Rank Fusion combines rankings from multiple sources:
RRF(d) = sum(1 / (k + rank_i))
Where:
d= documentk= constant (60 is standard)rank_i= rank from source i
Example:
Document ranks in two searches:
- Vector search: rank 3
- Keyword search: rank 1
RRF score = 1/(60+3) + 1/(60+1) = 0.0159 + 0.0164 = 0.0323
Higher RRF score = better combined ranking.
Performance Considerations
Index Parameters
The HNSW index uses:
m = 16: Max connections per layer (balances accuracy/memory)ef_construction = 64: Build quality (higher = more accurate, slower build)
Query Performance
- Typical query time: 10-50ms (with index)
- Without index: 1000ms+ (not recommended)
- Embedding generation: 100-300ms per entry
Cost (OpenAI API)
Using text-embedding-3-small:
- ~$0.00002 per 1000 tokens
- Average entry (~500 tokens): $0.00001
- 10,000 entries: ~$0.10
Very cost-effective for most use cases.
Migration Guide
1. Run Migrations
cd apps/api
pnpm prisma migrate deploy
This creates:
knowledge_embeddingstable- Vector index on embeddings
2. Configure OpenAI API Key
# Add to .env
OPENAI_API_KEY=sk-...
3. Generate Embeddings for Existing Entries
curl -X POST http://localhost:3001/api/knowledge/embeddings/batch \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"status": "PUBLISHED"}'
Or use the web UI (Admin dashboard → Knowledge → Generate Embeddings).
4. Test Semantic Search
curl -X POST http://localhost:3001/api/knowledge/search/hybrid \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "your search query"}'
Troubleshooting
"OpenAI API key not configured"
Cause: OPENAI_API_KEY environment variable not set
Solution: Add the API key to your .env file and restart the API server
Semantic search returns no results
Possible causes:
-
No embeddings generated
- Run batch generation endpoint
- Check
knowledge_embeddingstable
-
Query too specific
- Try broader terms
- Use hybrid search for better recall
-
Index not created
- Check migration status
- Verify index exists:
\di knowledge_embeddings_embedding_idxin psql
Slow query performance
Solutions:
-
Verify index exists and is being used:
EXPLAIN ANALYZE SELECT * FROM knowledge_embeddings ORDER BY embedding <=> '[...]'::vector LIMIT 20; -
Adjust index parameters (requires recreation):
DROP INDEX knowledge_embeddings_embedding_idx; CREATE INDEX knowledge_embeddings_embedding_idx ON knowledge_embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); -- Higher values
Future Enhancements
Potential improvements:
- Custom embeddings: Support for local embedding models (Ollama, etc.)
- Chunking: Split large entries into chunks for better granularity
- Reranking: Add cross-encoder reranking for top results
- Caching: Cache query embeddings for repeated searches
- Multi-modal: Support image/file embeddings