- Delete docs/tasks.md (let orchestrator bootstrap from scratch) - Delete docs/claude/task-tracking.md (superseded by universal guide) - Add codebase review reports for orchestrator to parse Tests orchestrator's autonomous bootstrap capability.
11 KiB
Mosaic Stack - QA & Test Coverage Report
Date: 2026-02-05 Scope: All workspaces (api, web, orchestrator, coordinator, packages) Total Test Files: 552 | Total Test Cases: ~3,685
Overall Test Health
| Workspace | Tests | Files | Coverage | Grade | Key Issue |
|---|---|---|---|---|---|
| apps/orchestrator | ~452 | 19 | 85% enforced | A | Near-complete, well-structured |
| apps/api | ~2,174 | 143 | Not enforced | B- | 21 untested services, weak assertions |
| apps/web | ~555 | 51 | 85% on components/lib | C+ | 76 untested components, 23 skipped |
| apps/coordinator | ~504 | 23 | 16% reported | D | Coverage crisis despite test files |
| packages/shared | ~25 | 1 | N/A | B+ | Adequate for scope |
| packages/ui | ~15 | 1 | N/A | D+ | 9 of 10 components untested |
Critical Coverage Gaps
GAP-1: Coordinator 16% Line Coverage [CRITICAL - Priority 10/10]
Despite having 23 test files and ~504 test cases, the coordinator reports only 16% line coverage with 14 of 22 source files at 0% execution. Files at 0% include the core coordinator.py, queue.py, webhook.py, security.py, parser.py, and metrics.py.
Root Cause (likely): Tests import types/models but mock everything, so actual source code never executes; or coverage run only executes a subset of tests.
Action: Run cd apps/coordinator && python -m pytest tests/ -v --cov=src --cov-report=term-missing and diagnose.
GAP-2: knowledge.service.ts - 916 Lines, No Tests [CRITICAL - Priority 9/10]
The largest service file in the API has no direct unit tests. Core CRUD operations, pagination, filtering, slug generation, cache invalidation, and embedding queue integration are all untested. Only version-specific tests exist.
Regressions at risk: Pagination off-by-one, slug collision handling, stale cache after updates, embedding queue not triggered.
GAP-3: admin.guard.ts - Security Guard, No Tests [CRITICAL - Priority 9/10]
This guard determines system admin access by checking workspace ownership. No tests verify it correctly grants/denies admin access.
Regressions at risk: Non-admin users gaining admin access, valid admins locked out, missing ForbiddenException.
GAP-4: embeddings.service.ts - 249 Lines, Raw SQL, No Tests [CRITICAL - Priority 9/10]
Uses raw SQL for pgvector operations. No tests exist for embedding validation, vector SQL construction, or similarity search.
Regressions at risk: SQL injection through embedding data, invalid vector dimensions, wrong search results.
GAP-5: widget-data.service.ts - 695 Lines, No Tests [HIGH - Priority 8/10]
Second-largest untested file. Fetches data from multiple sources for dashboard widgets.
GAP-6: ideas.service.ts - 321 Lines, No Tests [HIGH - Priority 8/10]
User-facing CRUD feature with domain/project associations and activity logging.
Untested Files by Workspace
apps/api - 21 Untested Service/Controller Files
| File | Lines | Risk |
|---|---|---|
| knowledge/knowledge.service.ts | 916 | HIGH |
| widgets/widget-data.service.ts | 695 | HIGH |
| ideas/ideas.service.ts | 321 | HIGH |
| database/embeddings.service.ts | 249 | HIGH |
| ideas/ideas.controller.ts | 123 | MEDIUM |
| widgets/widgets.controller.ts | 129 | MEDIUM |
| widgets/widgets.service.ts | 59 | MEDIUM |
| users/preferences.service.ts | 99 | MEDIUM |
| users/preferences.controller.ts | 56 | MEDIUM |
| common/throttler/throttler-storage.service.ts | 80+ | MEDIUM |
| auth/guards/admin.guard.ts | 46 | SECURITY |
| federation/audit.service.ts | 80+ | LOW |
| common/throttler/throttler-api-key.guard.ts | - | MEDIUM |
| knowledge/import-export.controller.ts | - | MEDIUM |
| knowledge/knowledge.controller.ts | - | MEDIUM |
| knowledge/stats.controller.ts | - | LOW |
| knowledge/queues/embedding-queue.service.ts | - | MEDIUM |
| layouts/layouts.controller.ts | - | LOW |
| cron/cron.controller.ts | - | LOW |
| bridge/parser/command-parser.service.ts | - | MEDIUM |
| app.service.ts | - | LOW |
Additionally, 22 DTO directories lack validation tests.
apps/web - 76 Untested Component/Page Files
Critical pages (user-facing routes):
- Main dashboard page.tsx
- Calendar page
- Knowledge page + 5 sub-pages
- Federation connections + 2 sub-pages
- Settings (4 sub-pages)
Critical components:
- Chat system: Chat.tsx, ChatInput.tsx, MessageList.tsx, ConversationSidebar.tsx, BackendStatusBanner.tsx
- Dashboard widgets: DomainOverview, QuickCapture, RecentTasks, UpcomingEvents
- HUD system: HUD.tsx, WidgetGrid.tsx, WidgetRenderer.tsx, WidgetWrapper.tsx
- Knowledge: EntryCard, EntryList, EntryViewer, EntryFilters, VersionHistory, ImportExportActions, StatsDashboard
- Navigation.tsx
- Workspace: WorkspaceCard, WorkspaceSettings, MemberList, InviteMember
- Team: TeamCard, TeamMemberList, TeamSettings
Untested API client modules (11 files):
- chat.ts, domains.ts, events.ts, federation.ts, ideas.ts, knowledge.ts, personalities.ts, teams.ts, api.ts, auth-client.ts
Untested hooks: useChat.ts, useLayout.ts
apps/orchestrator - 1 Untested File
- health/health.service.ts (minimal risk)
packages/ui - 9 Untested Components
- Avatar, Badge, Card, Input, Modal, Select, Textarea, Toast (only Button tested)
23 Skipped Tests (apps/web)
| File | Count | Reason |
|---|---|---|
| CalendarWidget.test.tsx | 5 | Component migrated from setTimeout mock data to real API |
| TasksWidget.test.tsx | 6 | Same - setTimeout mock data mismatch |
| QuickCaptureWidget.test.tsx | 3 | Submit and keyboard shortcut tests |
| LinkAutocomplete.test.tsx | 9 | Debounce search, keyboard nav, link insertion, dropdown |
Action: Re-enable and update tests to match current component implementations.
Test Anti-Patterns Found
Placeholder Assertions (expect(true).toBe(true))
| File | Line | Context |
|---|---|---|
| ChatOverlay.test.tsx | 259, 267 | Responsive design tests |
| rejection-handler.service.spec.ts | 307 | Notification sending |
| semantic-search.integration.spec.ts | 122 | Conditional branch |
Impact: Tests always pass, provide zero regression protection.
Sole toBeDefined() Assertions (30+ instances)
Most concerning in:
llm-telemetry.decorator.spec.ts-- 6 tests verify decorator doesn't throw but never check span attributesfederation/query.service.spec.ts-- 8 testsfederation/query.controller.spec.ts-- 3 testslayouts.service.spec.ts-- 2 testsworkspace-settings.service.spec.ts-- 1 test
Impact: Tests verify existence but not correctness. Regressions slip through.
Testing Implementation Details Instead of Behavior
cors.spec.ts-- Tests CORS by asserting on JS objects, not actual HTTP headers/middlewareButton.test.tsx-- Asserts on CSS class names (bg-blue-600) instead of behavior
Impact: Tests break on implementation changes even when behavior is correct.
Potential Flaky Patterns
setTimeout-based timing in 5 test files:
runner-jobs.service.spec.ts:620,833semantic-search.integration.spec.ts:153mcp/stdio-transport.spec.ts(6 instances)coordinator-integration.service.concurrency.spec.ts:170health.controller.spec.ts:63(1100ms wait)
Missing Test Categories
No Playwright E2E Tests
The project documents Playwright as the E2E framework but no playwright.config.ts or E2E test files exist.
No DTO Validation Tests
22 DTO directories lack validation testing. DTOs define input validation rules via class-validator decorators, but these are never tested in isolation.
Limited Integration Tests
Only 8 integration test files exist across the entire codebase. Most module interactions are untested.
Recommended Test Additions (Priority Order)
| Priority | Item | Effort | Impact |
|---|---|---|---|
| P0 | Investigate coordinator 16% coverage | 2hr | Unblocks all coordinator testing |
| P0 | knowledge.service.ts unit tests | 4hr | Covers largest untested service |
| P0 | admin.guard.ts unit tests | 1hr | Security-critical |
| P0 | embeddings.service.ts unit tests | 2hr | Raw SQL validation |
| P1 | widget-data.service.ts unit tests | 3hr | Dashboard reliability |
| P1 | ideas.service.ts unit tests | 2hr | User-facing CRUD |
| P1 | Re-enable 23 skipped widget tests | 2hr | Immediate coverage gain |
| P1 | Replace placeholder assertions | 1hr | Fix false-positive tests |
| P2 | Chat system component tests | 3hr | Core user interaction |
| P2 | API client module tests (11 files) | 3hr | Request/response validation |
| P2 | Throttler storage tests | 2hr | Security infrastructure |
| P2 | Preferences service tests | 1hr | User settings |
| P3 | Strengthen toBeDefined-only tests | 2hr | Better regression detection |
| P3 | UI package component tests | 3hr | Design system reliability |
| P3 | Playwright E2E setup + smoke tests | 4hr | End-to-end confidence |
Estimated total effort: ~5-6 days for P0+P1 items
Positive Test Observations
- Orchestrator is exemplary -- 452 tests, near-complete coverage, behavioral testing, good mocking
- Federation security tests are thorough -- Crypto, signature, timeout, workspace access, capability guard
- API client test (web) is comprehensive -- 721 lines covering error handling, retries, auth
- Sanitization utilities well-tested -- XSS prevention, log sanitization, query builder
- Coverage thresholds enforced -- 85% on orchestrator and web components/lib
- Concurrency tests exist -- coordinator-integration and runner-jobs
- Good test infrastructure -- Shared fixtures, proper NestJS testing module usage