Files
stack/docs/reports/codebase-review-2026-02-05/03-qa-test-coverage.md
Jason Woltje 9dfbf8cf61
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
chore: Remove pre-created task files, add review reports
- Delete docs/tasks.md (let orchestrator bootstrap from scratch)
- Delete docs/claude/task-tracking.md (superseded by universal guide)
- Add codebase review reports for orchestrator to parse

Tests orchestrator's autonomous bootstrap capability.
2026-02-05 15:08:02 -06:00

231 lines
11 KiB
Markdown

# Mosaic Stack - QA & Test Coverage Report
**Date:** 2026-02-05
**Scope:** All workspaces (api, web, orchestrator, coordinator, packages)
**Total Test Files:** 552 | **Total Test Cases:** ~3,685
---
## Overall Test Health
| Workspace | Tests | Files | Coverage | Grade | Key Issue |
| ----------------- | ------ | ----- | --------------------- | ------ | ------------------------------------- |
| apps/orchestrator | ~452 | 19 | 85% enforced | **A** | Near-complete, well-structured |
| apps/api | ~2,174 | 143 | Not enforced | **B-** | 21 untested services, weak assertions |
| apps/web | ~555 | 51 | 85% on components/lib | **C+** | 76 untested components, 23 skipped |
| apps/coordinator | ~504 | 23 | **16% reported** | **D** | Coverage crisis despite test files |
| packages/shared | ~25 | 1 | N/A | **B+** | Adequate for scope |
| packages/ui | ~15 | 1 | N/A | **D+** | 9 of 10 components untested |
---
## Critical Coverage Gaps
### GAP-1: Coordinator 16% Line Coverage [CRITICAL - Priority 10/10]
Despite having 23 test files and ~504 test cases, the coordinator reports only 16% line coverage with 14 of 22 source files at 0% execution. Files at 0% include the core `coordinator.py`, `queue.py`, `webhook.py`, `security.py`, `parser.py`, and `metrics.py`.
**Root Cause (likely):** Tests import types/models but mock everything, so actual source code never executes; or coverage run only executes a subset of tests.
**Action:** Run `cd apps/coordinator && python -m pytest tests/ -v --cov=src --cov-report=term-missing` and diagnose.
### GAP-2: knowledge.service.ts - 916 Lines, No Tests [CRITICAL - Priority 9/10]
The largest service file in the API has no direct unit tests. Core CRUD operations, pagination, filtering, slug generation, cache invalidation, and embedding queue integration are all untested. Only version-specific tests exist.
**Regressions at risk:** Pagination off-by-one, slug collision handling, stale cache after updates, embedding queue not triggered.
### GAP-3: admin.guard.ts - Security Guard, No Tests [CRITICAL - Priority 9/10]
This guard determines system admin access by checking workspace ownership. No tests verify it correctly grants/denies admin access.
**Regressions at risk:** Non-admin users gaining admin access, valid admins locked out, missing ForbiddenException.
### GAP-4: embeddings.service.ts - 249 Lines, Raw SQL, No Tests [CRITICAL - Priority 9/10]
Uses raw SQL for pgvector operations. No tests exist for embedding validation, vector SQL construction, or similarity search.
**Regressions at risk:** SQL injection through embedding data, invalid vector dimensions, wrong search results.
### GAP-5: widget-data.service.ts - 695 Lines, No Tests [HIGH - Priority 8/10]
Second-largest untested file. Fetches data from multiple sources for dashboard widgets.
### GAP-6: ideas.service.ts - 321 Lines, No Tests [HIGH - Priority 8/10]
User-facing CRUD feature with domain/project associations and activity logging.
---
## Untested Files by Workspace
### apps/api - 21 Untested Service/Controller Files
| File | Lines | Risk |
| --------------------------------------------- | ----- | -------- |
| knowledge/knowledge.service.ts | 916 | HIGH |
| widgets/widget-data.service.ts | 695 | HIGH |
| ideas/ideas.service.ts | 321 | HIGH |
| database/embeddings.service.ts | 249 | HIGH |
| ideas/ideas.controller.ts | 123 | MEDIUM |
| widgets/widgets.controller.ts | 129 | MEDIUM |
| widgets/widgets.service.ts | 59 | MEDIUM |
| users/preferences.service.ts | 99 | MEDIUM |
| users/preferences.controller.ts | 56 | MEDIUM |
| common/throttler/throttler-storage.service.ts | 80+ | MEDIUM |
| auth/guards/admin.guard.ts | 46 | SECURITY |
| federation/audit.service.ts | 80+ | LOW |
| common/throttler/throttler-api-key.guard.ts | - | MEDIUM |
| knowledge/import-export.controller.ts | - | MEDIUM |
| knowledge/knowledge.controller.ts | - | MEDIUM |
| knowledge/stats.controller.ts | - | LOW |
| knowledge/queues/embedding-queue.service.ts | - | MEDIUM |
| layouts/layouts.controller.ts | - | LOW |
| cron/cron.controller.ts | - | LOW |
| bridge/parser/command-parser.service.ts | - | MEDIUM |
| app.service.ts | - | LOW |
Additionally, 22 DTO directories lack validation tests.
### apps/web - 76 Untested Component/Page Files
**Critical pages (user-facing routes):**
- Main dashboard page.tsx
- Calendar page
- Knowledge page + 5 sub-pages
- Federation connections + 2 sub-pages
- Settings (4 sub-pages)
**Critical components:**
- Chat system: Chat.tsx, ChatInput.tsx, MessageList.tsx, ConversationSidebar.tsx, BackendStatusBanner.tsx
- Dashboard widgets: DomainOverview, QuickCapture, RecentTasks, UpcomingEvents
- HUD system: HUD.tsx, WidgetGrid.tsx, WidgetRenderer.tsx, WidgetWrapper.tsx
- Knowledge: EntryCard, EntryList, EntryViewer, EntryFilters, VersionHistory, ImportExportActions, StatsDashboard
- Navigation.tsx
- Workspace: WorkspaceCard, WorkspaceSettings, MemberList, InviteMember
- Team: TeamCard, TeamMemberList, TeamSettings
**Untested API client modules (11 files):**
- chat.ts, domains.ts, events.ts, federation.ts, ideas.ts, knowledge.ts, personalities.ts, teams.ts, api.ts, auth-client.ts
**Untested hooks:** useChat.ts, useLayout.ts
### apps/orchestrator - 1 Untested File
- health/health.service.ts (minimal risk)
### packages/ui - 9 Untested Components
- Avatar, Badge, Card, Input, Modal, Select, Textarea, Toast (only Button tested)
---
## 23 Skipped Tests (apps/web)
| File | Count | Reason |
| --------------------------- | ----- | -------------------------------------------------------- |
| CalendarWidget.test.tsx | 5 | Component migrated from setTimeout mock data to real API |
| TasksWidget.test.tsx | 6 | Same - setTimeout mock data mismatch |
| QuickCaptureWidget.test.tsx | 3 | Submit and keyboard shortcut tests |
| LinkAutocomplete.test.tsx | 9 | Debounce search, keyboard nav, link insertion, dropdown |
**Action:** Re-enable and update tests to match current component implementations.
---
## Test Anti-Patterns Found
### Placeholder Assertions (expect(true).toBe(true))
| File | Line | Context |
| ----------------------------------- | -------- | ----------------------- |
| ChatOverlay.test.tsx | 259, 267 | Responsive design tests |
| rejection-handler.service.spec.ts | 307 | Notification sending |
| semantic-search.integration.spec.ts | 122 | Conditional branch |
**Impact:** Tests always pass, provide zero regression protection.
### Sole toBeDefined() Assertions (30+ instances)
Most concerning in:
- `llm-telemetry.decorator.spec.ts` -- 6 tests verify decorator doesn't throw but never check span attributes
- `federation/query.service.spec.ts` -- 8 tests
- `federation/query.controller.spec.ts` -- 3 tests
- `layouts.service.spec.ts` -- 2 tests
- `workspace-settings.service.spec.ts` -- 1 test
**Impact:** Tests verify existence but not correctness. Regressions slip through.
### Testing Implementation Details Instead of Behavior
- `cors.spec.ts` -- Tests CORS by asserting on JS objects, not actual HTTP headers/middleware
- `Button.test.tsx` -- Asserts on CSS class names (`bg-blue-600`) instead of behavior
**Impact:** Tests break on implementation changes even when behavior is correct.
### Potential Flaky Patterns
setTimeout-based timing in 5 test files:
- `runner-jobs.service.spec.ts:620,833`
- `semantic-search.integration.spec.ts:153`
- `mcp/stdio-transport.spec.ts` (6 instances)
- `coordinator-integration.service.concurrency.spec.ts:170`
- `health.controller.spec.ts:63` (1100ms wait)
---
## Missing Test Categories
### No Playwright E2E Tests
The project documents Playwright as the E2E framework but no playwright.config.ts or E2E test files exist.
### No DTO Validation Tests
22 DTO directories lack validation testing. DTOs define input validation rules via class-validator decorators, but these are never tested in isolation.
### Limited Integration Tests
Only 8 integration test files exist across the entire codebase. Most module interactions are untested.
---
## Recommended Test Additions (Priority Order)
| Priority | Item | Effort | Impact |
| -------- | ------------------------------------ | ------ | -------------------------------- |
| P0 | Investigate coordinator 16% coverage | 2hr | Unblocks all coordinator testing |
| P0 | knowledge.service.ts unit tests | 4hr | Covers largest untested service |
| P0 | admin.guard.ts unit tests | 1hr | Security-critical |
| P0 | embeddings.service.ts unit tests | 2hr | Raw SQL validation |
| P1 | widget-data.service.ts unit tests | 3hr | Dashboard reliability |
| P1 | ideas.service.ts unit tests | 2hr | User-facing CRUD |
| P1 | Re-enable 23 skipped widget tests | 2hr | Immediate coverage gain |
| P1 | Replace placeholder assertions | 1hr | Fix false-positive tests |
| P2 | Chat system component tests | 3hr | Core user interaction |
| P2 | API client module tests (11 files) | 3hr | Request/response validation |
| P2 | Throttler storage tests | 2hr | Security infrastructure |
| P2 | Preferences service tests | 1hr | User settings |
| P3 | Strengthen toBeDefined-only tests | 2hr | Better regression detection |
| P3 | UI package component tests | 3hr | Design system reliability |
| P3 | Playwright E2E setup + smoke tests | 4hr | End-to-end confidence |
**Estimated total effort: ~5-6 days for P0+P1 items**
---
## Positive Test Observations
1. **Orchestrator is exemplary** -- 452 tests, near-complete coverage, behavioral testing, good mocking
2. **Federation security tests are thorough** -- Crypto, signature, timeout, workspace access, capability guard
3. **API client test (web) is comprehensive** -- 721 lines covering error handling, retries, auth
4. **Sanitization utilities well-tested** -- XSS prevention, log sanitization, query builder
5. **Coverage thresholds enforced** -- 85% on orchestrator and web components/lib
6. **Concurrency tests exist** -- coordinator-integration and runner-jobs
7. **Good test infrastructure** -- Shared fixtures, proper NestJS testing module usage