Files
stack/docs/reports/codebase-review-2026-02-05/03-qa-test-coverage.md
Jason Woltje 9dfbf8cf61
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
chore: Remove pre-created task files, add review reports
- Delete docs/tasks.md (let orchestrator bootstrap from scratch)
- Delete docs/claude/task-tracking.md (superseded by universal guide)
- Add codebase review reports for orchestrator to parse

Tests orchestrator's autonomous bootstrap capability.
2026-02-05 15:08:02 -06:00

11 KiB

Mosaic Stack - QA & Test Coverage Report

Date: 2026-02-05 Scope: All workspaces (api, web, orchestrator, coordinator, packages) Total Test Files: 552 | Total Test Cases: ~3,685


Overall Test Health

Workspace Tests Files Coverage Grade Key Issue
apps/orchestrator ~452 19 85% enforced A Near-complete, well-structured
apps/api ~2,174 143 Not enforced B- 21 untested services, weak assertions
apps/web ~555 51 85% on components/lib C+ 76 untested components, 23 skipped
apps/coordinator ~504 23 16% reported D Coverage crisis despite test files
packages/shared ~25 1 N/A B+ Adequate for scope
packages/ui ~15 1 N/A D+ 9 of 10 components untested

Critical Coverage Gaps

GAP-1: Coordinator 16% Line Coverage [CRITICAL - Priority 10/10]

Despite having 23 test files and ~504 test cases, the coordinator reports only 16% line coverage with 14 of 22 source files at 0% execution. Files at 0% include the core coordinator.py, queue.py, webhook.py, security.py, parser.py, and metrics.py.

Root Cause (likely): Tests import types/models but mock everything, so actual source code never executes; or coverage run only executes a subset of tests.

Action: Run cd apps/coordinator && python -m pytest tests/ -v --cov=src --cov-report=term-missing and diagnose.

GAP-2: knowledge.service.ts - 916 Lines, No Tests [CRITICAL - Priority 9/10]

The largest service file in the API has no direct unit tests. Core CRUD operations, pagination, filtering, slug generation, cache invalidation, and embedding queue integration are all untested. Only version-specific tests exist.

Regressions at risk: Pagination off-by-one, slug collision handling, stale cache after updates, embedding queue not triggered.

GAP-3: admin.guard.ts - Security Guard, No Tests [CRITICAL - Priority 9/10]

This guard determines system admin access by checking workspace ownership. No tests verify it correctly grants/denies admin access.

Regressions at risk: Non-admin users gaining admin access, valid admins locked out, missing ForbiddenException.

GAP-4: embeddings.service.ts - 249 Lines, Raw SQL, No Tests [CRITICAL - Priority 9/10]

Uses raw SQL for pgvector operations. No tests exist for embedding validation, vector SQL construction, or similarity search.

Regressions at risk: SQL injection through embedding data, invalid vector dimensions, wrong search results.

GAP-5: widget-data.service.ts - 695 Lines, No Tests [HIGH - Priority 8/10]

Second-largest untested file. Fetches data from multiple sources for dashboard widgets.

GAP-6: ideas.service.ts - 321 Lines, No Tests [HIGH - Priority 8/10]

User-facing CRUD feature with domain/project associations and activity logging.


Untested Files by Workspace

apps/api - 21 Untested Service/Controller Files

File Lines Risk
knowledge/knowledge.service.ts 916 HIGH
widgets/widget-data.service.ts 695 HIGH
ideas/ideas.service.ts 321 HIGH
database/embeddings.service.ts 249 HIGH
ideas/ideas.controller.ts 123 MEDIUM
widgets/widgets.controller.ts 129 MEDIUM
widgets/widgets.service.ts 59 MEDIUM
users/preferences.service.ts 99 MEDIUM
users/preferences.controller.ts 56 MEDIUM
common/throttler/throttler-storage.service.ts 80+ MEDIUM
auth/guards/admin.guard.ts 46 SECURITY
federation/audit.service.ts 80+ LOW
common/throttler/throttler-api-key.guard.ts - MEDIUM
knowledge/import-export.controller.ts - MEDIUM
knowledge/knowledge.controller.ts - MEDIUM
knowledge/stats.controller.ts - LOW
knowledge/queues/embedding-queue.service.ts - MEDIUM
layouts/layouts.controller.ts - LOW
cron/cron.controller.ts - LOW
bridge/parser/command-parser.service.ts - MEDIUM
app.service.ts - LOW

Additionally, 22 DTO directories lack validation tests.

apps/web - 76 Untested Component/Page Files

Critical pages (user-facing routes):

  • Main dashboard page.tsx
  • Calendar page
  • Knowledge page + 5 sub-pages
  • Federation connections + 2 sub-pages
  • Settings (4 sub-pages)

Critical components:

  • Chat system: Chat.tsx, ChatInput.tsx, MessageList.tsx, ConversationSidebar.tsx, BackendStatusBanner.tsx
  • Dashboard widgets: DomainOverview, QuickCapture, RecentTasks, UpcomingEvents
  • HUD system: HUD.tsx, WidgetGrid.tsx, WidgetRenderer.tsx, WidgetWrapper.tsx
  • Knowledge: EntryCard, EntryList, EntryViewer, EntryFilters, VersionHistory, ImportExportActions, StatsDashboard
  • Navigation.tsx
  • Workspace: WorkspaceCard, WorkspaceSettings, MemberList, InviteMember
  • Team: TeamCard, TeamMemberList, TeamSettings

Untested API client modules (11 files):

  • chat.ts, domains.ts, events.ts, federation.ts, ideas.ts, knowledge.ts, personalities.ts, teams.ts, api.ts, auth-client.ts

Untested hooks: useChat.ts, useLayout.ts

apps/orchestrator - 1 Untested File

  • health/health.service.ts (minimal risk)

packages/ui - 9 Untested Components

  • Avatar, Badge, Card, Input, Modal, Select, Textarea, Toast (only Button tested)

23 Skipped Tests (apps/web)

File Count Reason
CalendarWidget.test.tsx 5 Component migrated from setTimeout mock data to real API
TasksWidget.test.tsx 6 Same - setTimeout mock data mismatch
QuickCaptureWidget.test.tsx 3 Submit and keyboard shortcut tests
LinkAutocomplete.test.tsx 9 Debounce search, keyboard nav, link insertion, dropdown

Action: Re-enable and update tests to match current component implementations.


Test Anti-Patterns Found

Placeholder Assertions (expect(true).toBe(true))

File Line Context
ChatOverlay.test.tsx 259, 267 Responsive design tests
rejection-handler.service.spec.ts 307 Notification sending
semantic-search.integration.spec.ts 122 Conditional branch

Impact: Tests always pass, provide zero regression protection.

Sole toBeDefined() Assertions (30+ instances)

Most concerning in:

  • llm-telemetry.decorator.spec.ts -- 6 tests verify decorator doesn't throw but never check span attributes
  • federation/query.service.spec.ts -- 8 tests
  • federation/query.controller.spec.ts -- 3 tests
  • layouts.service.spec.ts -- 2 tests
  • workspace-settings.service.spec.ts -- 1 test

Impact: Tests verify existence but not correctness. Regressions slip through.

Testing Implementation Details Instead of Behavior

  • cors.spec.ts -- Tests CORS by asserting on JS objects, not actual HTTP headers/middleware
  • Button.test.tsx -- Asserts on CSS class names (bg-blue-600) instead of behavior

Impact: Tests break on implementation changes even when behavior is correct.

Potential Flaky Patterns

setTimeout-based timing in 5 test files:

  • runner-jobs.service.spec.ts:620,833
  • semantic-search.integration.spec.ts:153
  • mcp/stdio-transport.spec.ts (6 instances)
  • coordinator-integration.service.concurrency.spec.ts:170
  • health.controller.spec.ts:63 (1100ms wait)

Missing Test Categories

No Playwright E2E Tests

The project documents Playwright as the E2E framework but no playwright.config.ts or E2E test files exist.

No DTO Validation Tests

22 DTO directories lack validation testing. DTOs define input validation rules via class-validator decorators, but these are never tested in isolation.

Limited Integration Tests

Only 8 integration test files exist across the entire codebase. Most module interactions are untested.


Priority Item Effort Impact
P0 Investigate coordinator 16% coverage 2hr Unblocks all coordinator testing
P0 knowledge.service.ts unit tests 4hr Covers largest untested service
P0 admin.guard.ts unit tests 1hr Security-critical
P0 embeddings.service.ts unit tests 2hr Raw SQL validation
P1 widget-data.service.ts unit tests 3hr Dashboard reliability
P1 ideas.service.ts unit tests 2hr User-facing CRUD
P1 Re-enable 23 skipped widget tests 2hr Immediate coverage gain
P1 Replace placeholder assertions 1hr Fix false-positive tests
P2 Chat system component tests 3hr Core user interaction
P2 API client module tests (11 files) 3hr Request/response validation
P2 Throttler storage tests 2hr Security infrastructure
P2 Preferences service tests 1hr User settings
P3 Strengthen toBeDefined-only tests 2hr Better regression detection
P3 UI package component tests 3hr Design system reliability
P3 Playwright E2E setup + smoke tests 4hr End-to-end confidence

Estimated total effort: ~5-6 days for P0+P1 items


Positive Test Observations

  1. Orchestrator is exemplary -- 452 tests, near-complete coverage, behavioral testing, good mocking
  2. Federation security tests are thorough -- Crypto, signature, timeout, workspace access, capability guard
  3. API client test (web) is comprehensive -- 721 lines covering error handling, retries, auth
  4. Sanitization utilities well-tested -- XSS prevention, log sanitization, query builder
  5. Coverage thresholds enforced -- 85% on orchestrator and web components/lib
  6. Concurrency tests exist -- coordinator-integration and runner-jobs
  7. Good test infrastructure -- Shared fixtures, proper NestJS testing module usage