mosaic/stack

Files

Jason Woltje 7101864a15 fix(#189 ): add composite database index for job_events table

Add composite index [jobId, timestamp] to improve query performance
for the most common job_events access patterns.

Changes:
- Add @@index([jobId, timestamp]) to JobEvent model in schema.prisma
- Create migration 20260202122655_add_job_events_composite_index
- Add performance tests to validate index effectiveness
- Document index design rationale in scratchpad
- Fix lint errors in api-key.guard, herald.service, runner-jobs.service

Rationale:
The composite index [jobId, timestamp] optimizes the dominant query
pattern used across all services:
- JobEventsService.getEventsByJobId (WHERE jobId, ORDER BY timestamp)
- RunnerJobsService.streamEvents (WHERE jobId + timestamp range)
- RunnerJobsService.findOne (implicit jobId filter + timestamp order)

This index provides:
- Fast filtering by jobId (highly selective)
- Efficient timestamp-based ordering
- Optimal support for timestamp range queries
- Backward compatibility with jobId-only queries

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-02 12:30:19 -06:00

5.6 KiB

Raw Permalink Blame History

Issue #189: Add Composite Database Index for job_events Table

Objective

Add an optimal composite index to the job_events table to improve query performance based on common access patterns identified in the codebase.

Analysis of Query Patterns

Current Schema (line 1193-1213 in schema.prisma)

model JobEvent {
  id     String  @id @default(uuid()) @db.Uuid
  jobId  String  @map("job_id") @db.Uuid
  stepId String? @map("step_id") @db.Uuid

  // Event details
  type      String
  timestamp DateTime @db.Timestamptz
  actor     String
  payload   Json

  // Relations
  job  RunnerJob @relation(fields: [jobId], references: [id], onDelete: Cascade)
  step JobStep?  @relation(fields: [stepId], references: [id], onDelete: Cascade)

  @@index([jobId])
  @@index([stepId])
  @@index([timestamp])
  @@index([type])
  @@map("job_events")
}

Identified Query Patterns

1. JobEventsService.getEventsByJobId (lines 71-106)

// WHERE clause: { jobId, [type?], [stepId?] }
// ORDER BY: { timestamp: "asc" }
// Pagination: skip, take

Columns used in WHERE: jobId, optionally type, optionally stepId
Columns used in ORDER BY: timestamp

2. JobEventsService.findByJob (lines 202-219)

// WHERE clause: { jobId }
// ORDER BY: { timestamp: "asc" }

Columns used in WHERE: jobId
Columns used in ORDER BY: timestamp

3. RunnerJobsService.findOne (lines 120-144)

// events: { orderBy: { timestamp: "asc" } }

Uses relation through jobId (implicit WHERE)
Columns used in ORDER BY: timestamp

4. RunnerJobsService.streamEvents (lines 269-275)

// WHERE clause: { jobId, timestamp: { gt: lastEventTime } }
// ORDER BY: { timestamp: "asc" }

Columns used in WHERE: jobId, timestamp (range query)
Columns used in ORDER BY: timestamp

5. HeraldService.broadcastJobEvent (lines 73-81)

// WHERE clause: { jobId, type: JOB_CREATED }
// Uses findFirst

Columns used in WHERE: jobId, type

Composite Index Design

Most Common Access Pattern

The dominant query pattern across all services is:

WHERE jobId = ? [AND type = ?] [AND stepId = ?]
ORDER BY timestamp ASC

Recommended Composite Index

@@index([jobId, timestamp])

Rationale

Covers the most frequent query: Filtering by jobId + ordering by timestamp
Efficient for range queries: RunnerJobsService.streamEvents uses timestamp > lastEventTime which benefits from the composite index
Supports partial matching: Queries filtering only by jobId can still use the index effectively
Complements existing indexes: We keep the single-column indexes for type and stepId since they're used independently in some queries

Alternative Considered

@@index([jobId, type, timestamp])

Rejected because:

type filtering is used in only 2 out of 5 query patterns
Would create a larger index with marginal benefit
Single-column type index is sufficient for the rare queries that filter by type alone

Approach

Step 1: Write Performance Tests (TDD - RED)

Create test file: apps/api/src/job-events/job-events.performance.spec.ts

Test query performance for getEventsByJobId
Test query performance for streamEvents with timestamp range
Measure query execution time before index

Step 2: Create Prisma Migration (TDD - GREEN)

Add composite index @@index([jobId, timestamp]) to schema.prisma
Generate migration using pnpm prisma:migrate dev
Run migration against test database

Step 3: Verify Performance Improvement

Re-run performance tests
Verify query times improved
Document results in this scratchpad

Step 4: Commit and Update Issue

Commit with format: fix(#189): add composite database index for job_events table
Update issue #189 with completion status

Progress

Analyze schema and query patterns
Identify optimal composite index
Document rationale
Write performance tests
Add composite index to schema
Create migration file
Apply migration (pending database schema sync)
Run performance tests
Verify performance improvement
Commit changes
Update issue

Testing

Performance tests will validate:

Query execution time improvement for jobId + timestamp queries
Index is used by PostgreSQL query planner (EXPLAIN ANALYZE)
No regression in other query patterns

Notes

The composite index [jobId, timestamp] is optimal because:
- jobId is highly selective (unique per job)
- timestamp ordering is always required
- This pattern appears in 100% of job event queries
Existing single-column indexes remain valuable for admin queries that filter by type or stepId alone
PostgreSQL can efficiently use this composite index for range queries on timestamp

Migration Status

Migration file created: 20260202122655_add_job_events_composite_index/migration.sql
Database status: The job_events table hasn't been created yet in the local database
Pending migrations: The database has migration history divergence. The following migrations need to be applied first:
- 20260129232349_add_agent_task_model
- 20260130002000_add_knowledge_embeddings_vector_index
- 20260131115600_add_llm_provider_instance
- 20260201205935_add_job_tracking (creates job_events table)
- 20260202122655_add_job_events_composite_index (this migration)
Note: The migration is ready and will be applied automatically when prisma migrate dev or prisma migrate deploy is run with synchronized migration history