stack/docs/SEMANTIC_SEARCH.md

# Semantic Search Implementation

This document describes the semantic search implementation for the Mosaic Stack Knowledge Module using OpenAI embeddings and PostgreSQL pgvector.

## Overview

The semantic search feature enables AI-powered similarity search across knowledge entries using vector embeddings. It complements the existing full-text search with semantic understanding, allowing users to find relevant content even when exact keywords don't match.

## Architecture

### Components

1. **EmbeddingService** - Generates and manages OpenAI embeddings
2. **SearchService** - Enhanced with semantic and hybrid search methods
3. **KnowledgeService** - Automatically generates embeddings on entry create/update
4. **pgvector** - PostgreSQL extension for vector similarity search

### Database Schema

#### Knowledge Embeddings Table

```prisma
model KnowledgeEmbedding {
  id      String         @id @default(uuid()) @db.Uuid
  entryId String         @unique @map("entry_id") @db.Uuid
  entry   KnowledgeEntry @relation(fields: [entryId], references: [id], onDelete: Cascade)

  embedding Unsupported("vector(1536)")
  model     String

  createdAt DateTime @default(now()) @map("created_at") @db.Timestamptz
  updatedAt DateTime @updatedAt @map("updated_at") @db.Timestamptz

  @@index([entryId])
  @@map("knowledge_embeddings")
}
```

#### Vector Index

An HNSW (Hierarchical Navigable Small World) index is created for fast similarity search:

```sql
CREATE INDEX knowledge_embeddings_embedding_idx
ON knowledge_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
```

## Configuration

### Environment Variables

Add to your `.env` file:

```env
# Optional: Required for semantic search
OPENAI_API_KEY=sk-...
```

Get your API key from: https://platform.openai.com/api-keys

### OpenAI Model

The default embedding model is `text-embedding-3-small` (1536 dimensions). This provides:

- High quality embeddings
- Cost-effective pricing
- Fast generation speed

## API Endpoints

### 1. Semantic Search

**POST** `/api/knowledge/search/semantic`

Search using vector similarity only.

**Request:**

```json
{
  "query": "database performance optimization",
  "status": "PUBLISHED"
}
```

**Query Parameters:**

- `page` (optional): Page number (default: 1)
- `limit` (optional): Results per page (default: 20)

**Response:**

```json
{
  "data": [
    {
      "id": "uuid",
      "slug": "postgres-indexing",
      "title": "PostgreSQL Indexing Strategies",
      "content": "...",
      "rank": 0.87,
      "tags": [...],
      ...
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "total": 15,
    "totalPages": 1
  },
  "query": "database performance optimization"
}
```

### 2. Hybrid Search (Recommended)

**POST** `/api/knowledge/search/hybrid`

Combines vector similarity and full-text search using Reciprocal Rank Fusion (RRF).

**Request:**

```json
{
  "query": "indexing strategies",
  "status": "PUBLISHED"
}
```

**Benefits of Hybrid Search:**

- Best of both worlds: semantic understanding + keyword matching
- Better ranking for exact matches
- Improved recall and precision
- Resilient to edge cases

### 3. Batch Embedding Generation

**POST** `/api/knowledge/embeddings/batch`

Generate embeddings for all existing entries. Useful for:

- Initial setup after enabling semantic search
- Regenerating embeddings after model updates

**Request:**

```json
{
  "status": "PUBLISHED"
}
```

**Response:**

```json
{
  "message": "Generated 42 embeddings out of 45 entries",
  "total": 45,
  "success": 42
}
```

**Permissions:** Requires ADMIN role

## Automatic Embedding Generation

Embeddings are automatically generated when:

1. **Creating an entry** - Embedding generated asynchronously after creation
2. **Updating an entry** - Embedding regenerated if title or content changes

The generation happens asynchronously to avoid blocking API responses.

### Content Preparation

Before generating embeddings, content is prepared by:

1. Combining title and content
2. Weighting title more heavily (appears twice)
3. This improves semantic matching on titles

```typescript
prepareContentForEmbedding(title, content) {
  return `${title}\n\n${title}\n\n${content}`.trim();
}
```

## Search Algorithms

### Vector Similarity Search

Uses cosine distance to find semantically similar entries:

```sql
SELECT *
FROM knowledge_entries e
INNER JOIN knowledge_embeddings emb ON e.id = emb.entry_id
ORDER BY emb.embedding <=> query_embedding
LIMIT 20
```

- `<=>` operator: cosine distance
- Lower distance = higher similarity
- Efficient with HNSW index

### Hybrid Search (RRF Algorithm)

Reciprocal Rank Fusion combines rankings from multiple sources:

```
RRF(d) = sum(1 / (k + rank_i))
```

Where:

- `d` = document
- `k` = constant (60 is standard)
- `rank_i` = rank from source i

**Example:**

Document ranks in two searches:

- Vector search: rank 3
- Keyword search: rank 1

RRF score = 1/(60+3) + 1/(60+1) = 0.0159 + 0.0164 = 0.0323

Higher RRF score = better combined ranking.

## Performance Considerations

### Index Parameters

The HNSW index uses:

- `m = 16`: Max connections per layer (balances accuracy/memory)
- `ef_construction = 64`: Build quality (higher = more accurate, slower build)

### Query Performance

- **Typical query time:** 10-50ms (with index)
- **Without index:** 1000ms+ (not recommended)
- **Embedding generation:** 100-300ms per entry

### Cost (OpenAI API)

Using `text-embedding-3-small`:

- ~$0.00002 per 1000 tokens
- Average entry (~500 tokens): $0.00001
- 10,000 entries: ~$0.10

Very cost-effective for most use cases.

## Migration Guide

### 1. Run Migrations

```bash
cd apps/api
pnpm prisma migrate deploy
```

This creates:

- `knowledge_embeddings` table
- Vector index on embeddings

### 2. Configure OpenAI API Key

```bash
# Add to .env
OPENAI_API_KEY=sk-...
```

### 3. Generate Embeddings for Existing Entries

```bash
curl -X POST http://localhost:3001/api/knowledge/embeddings/batch \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"status": "PUBLISHED"}'
```

Or use the web UI (Admin dashboard → Knowledge → Generate Embeddings).

### 4. Test Semantic Search

```bash
curl -X POST http://localhost:3001/api/knowledge/search/hybrid \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "your search query"}'
```

## Troubleshooting

### "OpenAI API key not configured"

**Cause:** `OPENAI_API_KEY` environment variable not set

**Solution:** Add the API key to your `.env` file and restart the API server

### Semantic search returns no results

**Possible causes:**

1. **No embeddings generated**
   - Run batch generation endpoint
   - Check `knowledge_embeddings` table

2. **Query too specific**
   - Try broader terms
   - Use hybrid search for better recall

3. **Index not created**
   - Check migration status
   - Verify index exists: `\di knowledge_embeddings_embedding_idx` in psql

### Slow query performance

**Solutions:**

1. Verify index exists and is being used:

   ```sql
   EXPLAIN ANALYZE
   SELECT * FROM knowledge_embeddings
   ORDER BY embedding <=> '[...]'::vector
   LIMIT 20;
   ```

2. Adjust index parameters (requires recreation):
   ```sql
   DROP INDEX knowledge_embeddings_embedding_idx;
   CREATE INDEX knowledge_embeddings_embedding_idx
   ON knowledge_embeddings
   USING hnsw (embedding vector_cosine_ops)
   WITH (m = 32, ef_construction = 128); -- Higher values
   ```

## Future Enhancements

Potential improvements:

1. **Custom embeddings**: Support for local embedding models (Ollama, etc.)
2. **Chunking**: Split large entries into chunks for better granularity
3. **Reranking**: Add cross-encoder reranking for top results
4. **Caching**: Cache query embeddings for repeated searches
5. **Multi-modal**: Support image/file embeddings

## References

- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
- [pgvector Documentation](https://github.com/pgvector/pgvector)
- [HNSW Algorithm Paper](https://arxiv.org/abs/1603.09320)
- [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)