feat: add semantic search with pgvector (closes #68, #69, #70)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed

Issues resolved:
- #68: pgvector Setup
  * Added pgvector vector index migration for knowledge_embeddings
  * Vector index uses HNSW algorithm with cosine distance
  * Optimized for 1536-dimension OpenAI embeddings

- #69: Embedding Generation Pipeline
  * Created EmbeddingService with OpenAI integration
  * Automatic embedding generation on entry create/update
  * Batch processing endpoint for existing entries
  * Async generation to avoid blocking API responses
  * Content preparation with title weighting

- #70: Semantic Search API
  * POST /api/knowledge/search/semantic - pure vector search
  * POST /api/knowledge/search/hybrid - RRF combined search
  * POST /api/knowledge/embeddings/batch - batch generation
  * Comprehensive test coverage
  * Full documentation in docs/SEMANTIC_SEARCH.md

Technical details:
- Uses OpenAI text-embedding-3-small model (1536 dims)
- HNSW index for O(log n) similarity search
- Reciprocal Rank Fusion for hybrid search
- Graceful degradation when OpenAI not configured
- Async embedding generation for performance

Configuration:
- Added OPENAI_API_KEY to .env.example
- Optional feature - disabled if API key not set
- Falls back to keyword search in hybrid mode
This commit is contained in:
Jason Woltje
2026-01-30 00:24:41 -06:00
parent 22cd68811d
commit 3ec2059470
14 changed files with 1408 additions and 5 deletions

20
pnpm-lock.yaml generated
View File

@@ -113,6 +113,9 @@ importers:
ollama:
specifier: ^0.6.3
version: 0.6.3
openai:
specifier: ^6.17.0
version: 6.17.0(ws@8.19.0)(zod@4.3.6)
reflect-metadata:
specifier: ^0.2.2
version: 0.2.2
@@ -4076,6 +4079,18 @@ packages:
resolution: {integrity: sha512-YgBpdJHPyQ2UE5x+hlSXcnejzAvD0b22U2OuAP+8OnlJT+PjWPxtgmGqKKc+RgTM63U9gN0YzrYc71R2WT/hTA==}
engines: {node: '>=18'}
openai@6.17.0:
resolution: {integrity: sha512-NHRpPEUPzAvFOAFs9+9pC6+HCw/iWsYsKCMPXH5Kw7BpMxqd8g/A07/1o7Gx2TWtCnzevVRyKMRFqyiHyAlqcA==}
hasBin: true
peerDependencies:
ws: ^8.18.0
zod: ^3.25 || ^4.0
peerDependenciesMeta:
ws:
optional: true
zod:
optional: true
optionator@0.9.4:
resolution: {integrity: sha512-6IpQ7mKUxRcZNLIObR0hz7lxsapSSIYNZJwXPGeF0mTVqGKFIXj1DQcMoT22S3ROcLyY/rz0PWaWZ9ayWmad9g==}
engines: {node: '>= 0.8.0'}
@@ -9134,6 +9149,11 @@ snapshots:
is-inside-container: 1.0.0
wsl-utils: 0.1.0
openai@6.17.0(ws@8.19.0)(zod@4.3.6):
optionalDependencies:
ws: 8.19.0
zod: 4.3.6
optionator@0.9.4:
dependencies:
deep-is: 0.1.4