feat: Add semantic search with pgvector (closes #68, #69, #70) #119

Merged
jason.woltje merged 2 commits from feature/semantic-search into develop 2026-01-30 21:20:32 +00:00
Owner

Summary

  • Implement embedding service with pgvector support
  • Add semantic search service with similarity ranking
  • Create comprehensive integration tests
  • Add SEMANTIC_SEARCH.md documentation

Test plan

  • Unit tests pass for embedding service
  • Integration tests pass for semantic search
  • Vector search returns relevant results
  • Database migration works correctly

Closes #68, #69, #70

## Summary - Implement embedding service with pgvector support - Add semantic search service with similarity ranking - Create comprehensive integration tests - Add SEMANTIC_SEARCH.md documentation ## Test plan - [x] Unit tests pass for embedding service - [x] Integration tests pass for semantic search - [x] Vector search returns relevant results - [x] Database migration works correctly Closes #68, #69, #70
jason.woltje added 6 commits 2026-01-30 21:19:24 +00:00
Quality Rails provides mechanical enforcement of code quality through
pre-commit hooks and CI/CD pipelines, preventing ~70% of common issues.

What's added:
- Pre-commit hooks via husky (formatting enforcement enabled)
- Enhanced ESLint rules (no-explicit-any, security plugin, etc.)
- lint-staged configuration (currently formatting-only mode)
- Woodpecker CI pipeline template (.woodpecker.yml)
- eslint-plugin-security for vulnerability detection
- Documentation (docs/quality-rails-status.md)

Current status:
- Strict enforcement DISABLED until existing violations are fixed
- Found 1,226 violations (1,121 errors, 105 warnings)
- Priority: Fix explicit 'any' types first
- Pre-commit currently only enforces Prettier formatting

Next steps:
1. Fix existing lint violations
2. Enable strict pre-commit enforcement
3. Configure CI/CD pipeline

Based on quality-rails from ~/src/quality-rails (monorepo template)
See docs/quality-rails-status.md for detailed roadmap.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Strict enforcement now active:
- Format all changed files (auto-fix)
- Lint entire packages that have changed files
- Type-check affected packages
- Block commit if ANY warnings or errors

Impact: If you touch a file in a package with existing violations,
you must clean up the entire package before committing.

This forces incremental cleanup while preventing new violations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
BREAKING CHANGE: Strict lint enforcement is now ACTIVE

Pre-commit hooks now block commits if:
- Affected package has ANY lint errors or warnings
- Affected package has ANY type errors

Impact: If you touch a file in a package with existing violations,
you MUST fix ALL violations in that package before committing.

This forces incremental cleanup:
- Work in @mosaic/shared → Fix all @mosaic/shared violations
- Work in @mosaic/api → Fix all @mosaic/api violations
- Work in clean packages → No extra work required

Fixed regex to handle absolute paths from lint-staged.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Strict enforcement is now ACTIVE and blocking commits.

Updated documentation to reflect:
- Pre-commit hooks are actively blocking violations
- Package-level enforcement strategy
- How developers should handle blocked commits
- Next steps for incremental cleanup

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove deprecated shebang that will fail in husky v10.

Before (deprecated):
  #!/bin/sh

After (v10-compatible):
  Direct commands without shebang

Ref: https://github.com/typicode/husky/issues/1476

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
feat: add semantic search with pgvector (closes #68, #69, #70)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
3ec2059470
Issues resolved:
- #68: pgvector Setup
  * Added pgvector vector index migration for knowledge_embeddings
  * Vector index uses HNSW algorithm with cosine distance
  * Optimized for 1536-dimension OpenAI embeddings

- #69: Embedding Generation Pipeline
  * Created EmbeddingService with OpenAI integration
  * Automatic embedding generation on entry create/update
  * Batch processing endpoint for existing entries
  * Async generation to avoid blocking API responses
  * Content preparation with title weighting

- #70: Semantic Search API
  * POST /api/knowledge/search/semantic - pure vector search
  * POST /api/knowledge/search/hybrid - RRF combined search
  * POST /api/knowledge/embeddings/batch - batch generation
  * Comprehensive test coverage
  * Full documentation in docs/SEMANTIC_SEARCH.md

Technical details:
- Uses OpenAI text-embedding-3-small model (1536 dims)
- HNSW index for O(log n) similarity search
- Reciprocal Rank Fusion for hybrid search
- Graceful degradation when OpenAI not configured
- Async embedding generation for performance

Configuration:
- Added OPENAI_API_KEY to .env.example
- Optional feature - disabled if API key not set
- Falls back to keyword search in hybrid mode
jason.woltje added 1 commit 2026-01-30 21:20:24 +00:00
Merge branch 'develop' into feature/semantic-search
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
eca6a9efe2
jason.woltje merged commit f64e04c10c into develop 2026-01-30 21:20:32 +00:00
jason.woltje deleted branch feature/semantic-search 2026-01-30 21:20:33 +00:00
Sign in to join this conversation.