Files

ci/woodpecker/push/woodpecker Pipeline failed

Details

chore: Remove pre-created task files, add review reports

- Delete docs/tasks.md (let orchestrator bootstrap from scratch)
- Delete docs/claude/task-tracking.md (superseded by universal guide)
- Add codebase review reports for orchestrator to parse

Tests orchestrator's autonomous bootstrap capability.

2026-02-05 15:08:02 -06:00

11 KiB

Raw Permalink Blame History

Mosaic Stack - QA & Test Coverage Report

Date: 2026-02-05 Scope: All workspaces (api, web, orchestrator, coordinator, packages) Total Test Files: 552 | Total Test Cases: ~3,685

Overall Test Health

Workspace	Tests	Files	Coverage	Grade	Key Issue
apps/orchestrator	~452	19	85% enforced	A	Near-complete, well-structured
apps/api	~2,174	143	Not enforced	B-	21 untested services, weak assertions
apps/web	~555	51	85% on components/lib	C+	76 untested components, 23 skipped
apps/coordinator	~504	23	16% reported	D	Coverage crisis despite test files
packages/shared	~25	1	N/A	B+	Adequate for scope
packages/ui	~15	1	N/A	D+	9 of 10 components untested

Critical Coverage Gaps

GAP-1: Coordinator 16% Line Coverage [CRITICAL - Priority 10/10]

Despite having 23 test files and ~504 test cases, the coordinator reports only 16% line coverage with 14 of 22 source files at 0% execution. Files at 0% include the core coordinator.py, queue.py, webhook.py, security.py, parser.py, and metrics.py.

Root Cause (likely): Tests import types/models but mock everything, so actual source code never executes; or coverage run only executes a subset of tests.

Action: Run cd apps/coordinator && python -m pytest tests/ -v --cov=src --cov-report=term-missing and diagnose.

GAP-2: knowledge.service.ts - 916 Lines, No Tests [CRITICAL - Priority 9/10]

The largest service file in the API has no direct unit tests. Core CRUD operations, pagination, filtering, slug generation, cache invalidation, and embedding queue integration are all untested. Only version-specific tests exist.

Regressions at risk: Pagination off-by-one, slug collision handling, stale cache after updates, embedding queue not triggered.

GAP-3: admin.guard.ts - Security Guard, No Tests [CRITICAL - Priority 9/10]

This guard determines system admin access by checking workspace ownership. No tests verify it correctly grants/denies admin access.

Regressions at risk: Non-admin users gaining admin access, valid admins locked out, missing ForbiddenException.

GAP-4: embeddings.service.ts - 249 Lines, Raw SQL, No Tests [CRITICAL - Priority 9/10]

Uses raw SQL for pgvector operations. No tests exist for embedding validation, vector SQL construction, or similarity search.

Regressions at risk: SQL injection through embedding data, invalid vector dimensions, wrong search results.

Second-largest untested file. Fetches data from multiple sources for dashboard widgets.

GAP-6: ideas.service.ts - 321 Lines, No Tests [HIGH - Priority 8/10]

User-facing CRUD feature with domain/project associations and activity logging.

Untested Files by Workspace

apps/api - 21 Untested Service/Controller Files

File	Lines	Risk
knowledge/knowledge.service.ts	916	HIGH
widgets/widget-data.service.ts	695	HIGH
ideas/ideas.service.ts	321	HIGH
database/embeddings.service.ts	249	HIGH
ideas/ideas.controller.ts	123	MEDIUM
widgets/widgets.controller.ts	129	MEDIUM
widgets/widgets.service.ts	59	MEDIUM
users/preferences.service.ts	99	MEDIUM
users/preferences.controller.ts	56	MEDIUM
common/throttler/throttler-storage.service.ts	80+	MEDIUM
auth/guards/admin.guard.ts	46	SECURITY
federation/audit.service.ts	80+	LOW
common/throttler/throttler-api-key.guard.ts	-	MEDIUM
knowledge/import-export.controller.ts	-	MEDIUM
knowledge/knowledge.controller.ts	-	MEDIUM
knowledge/stats.controller.ts	-	LOW
knowledge/queues/embedding-queue.service.ts	-	MEDIUM
layouts/layouts.controller.ts	-	LOW
cron/cron.controller.ts	-	LOW
bridge/parser/command-parser.service.ts	-	MEDIUM
app.service.ts	-	LOW

Additionally, 22 DTO directories lack validation tests.

apps/web - 76 Untested Component/Page Files

Critical pages (user-facing routes):

Main dashboard page.tsx
Calendar page
Knowledge page + 5 sub-pages
Federation connections + 2 sub-pages
Settings (4 sub-pages)

Critical components:

Chat system: Chat.tsx, ChatInput.tsx, MessageList.tsx, ConversationSidebar.tsx, BackendStatusBanner.tsx
Dashboard widgets: DomainOverview, QuickCapture, RecentTasks, UpcomingEvents
HUD system: HUD.tsx, WidgetGrid.tsx, WidgetRenderer.tsx, WidgetWrapper.tsx
Knowledge: EntryCard, EntryList, EntryViewer, EntryFilters, VersionHistory, ImportExportActions, StatsDashboard
Navigation.tsx
Workspace: WorkspaceCard, WorkspaceSettings, MemberList, InviteMember
Team: TeamCard, TeamMemberList, TeamSettings

Untested API client modules (11 files):

chat.ts, domains.ts, events.ts, federation.ts, ideas.ts, knowledge.ts, personalities.ts, teams.ts, api.ts, auth-client.ts

Untested hooks: useChat.ts, useLayout.ts

apps/orchestrator - 1 Untested File

health/health.service.ts (minimal risk)

packages/ui - 9 Untested Components

Avatar, Badge, Card, Input, Modal, Select, Textarea, Toast (only Button tested)

23 Skipped Tests (apps/web)

File	Count	Reason
CalendarWidget.test.tsx	5	Component migrated from setTimeout mock data to real API
TasksWidget.test.tsx	6	Same - setTimeout mock data mismatch
QuickCaptureWidget.test.tsx	3	Submit and keyboard shortcut tests
LinkAutocomplete.test.tsx	9	Debounce search, keyboard nav, link insertion, dropdown

Action: Re-enable and update tests to match current component implementations.

Test Anti-Patterns Found

Placeholder Assertions (expect(true).toBe(true))

File	Line	Context
ChatOverlay.test.tsx	259, 267	Responsive design tests
rejection-handler.service.spec.ts	307	Notification sending
semantic-search.integration.spec.ts	122	Conditional branch

Impact: Tests always pass, provide zero regression protection.

Sole toBeDefined() Assertions (30+ instances)

Most concerning in:

llm-telemetry.decorator.spec.ts -- 6 tests verify decorator doesn't throw but never check span attributes
federation/query.service.spec.ts -- 8 tests
federation/query.controller.spec.ts -- 3 tests
layouts.service.spec.ts -- 2 tests
workspace-settings.service.spec.ts -- 1 test

Impact: Tests verify existence but not correctness. Regressions slip through.

Testing Implementation Details Instead of Behavior

cors.spec.ts -- Tests CORS by asserting on JS objects, not actual HTTP headers/middleware
Button.test.tsx -- Asserts on CSS class names (bg-blue-600) instead of behavior

Impact: Tests break on implementation changes even when behavior is correct.

Potential Flaky Patterns

setTimeout-based timing in 5 test files:

runner-jobs.service.spec.ts:620,833
semantic-search.integration.spec.ts:153
mcp/stdio-transport.spec.ts (6 instances)
coordinator-integration.service.concurrency.spec.ts:170
health.controller.spec.ts:63 (1100ms wait)

Missing Test Categories

No Playwright E2E Tests

The project documents Playwright as the E2E framework but no playwright.config.ts or E2E test files exist.

No DTO Validation Tests

22 DTO directories lack validation testing. DTOs define input validation rules via class-validator decorators, but these are never tested in isolation.

Limited Integration Tests

Only 8 integration test files exist across the entire codebase. Most module interactions are untested.

Recommended Test Additions (Priority Order)

Priority	Item	Effort	Impact
P0	Investigate coordinator 16% coverage	2hr	Unblocks all coordinator testing
P0	knowledge.service.ts unit tests	4hr	Covers largest untested service
P0	admin.guard.ts unit tests	1hr	Security-critical
P0	embeddings.service.ts unit tests	2hr	Raw SQL validation
P1	widget-data.service.ts unit tests	3hr	Dashboard reliability
P1	ideas.service.ts unit tests	2hr	User-facing CRUD
P1	Re-enable 23 skipped widget tests	2hr	Immediate coverage gain
P1	Replace placeholder assertions	1hr	Fix false-positive tests
P2	Chat system component tests	3hr	Core user interaction
P2	API client module tests (11 files)	3hr	Request/response validation
P2	Throttler storage tests	2hr	Security infrastructure
P2	Preferences service tests	1hr	User settings
P3	Strengthen toBeDefined-only tests	2hr	Better regression detection
P3	UI package component tests	3hr	Design system reliability
P3	Playwright E2E setup + smoke tests	4hr	End-to-end confidence

Estimated total effort: ~5-6 days for P0+P1 items

Positive Test Observations

Orchestrator is exemplary -- 452 tests, near-complete coverage, behavioral testing, good mocking
Federation security tests are thorough -- Crypto, signature, timeout, workspace access, capability guard
API client test (web) is comprehensive -- 721 lines covering error handling, retries, auth
Sanitization utilities well-tested -- XSS prevention, log sanitization, query builder
Coverage thresholds enforced -- 85% on orchestrator and web components/lib
Concurrency tests exist -- coordinator-integration and runner-jobs
Good test infrastructure -- Shared fixtures, proper NestJS testing module usage

11 KiB Raw Permalink Blame History