Files
stack/docs/reports/codebase-review-2026-02-05/02-code-quality-review.md
Jason Woltje 9dfbf8cf61
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
chore: Remove pre-created task files, add review reports
- Delete docs/tasks.md (let orchestrator bootstrap from scratch)
- Delete docs/claude/task-tracking.md (superseded by universal guide)
- Add codebase review reports for orchestrator to parse

Tests orchestrator's autonomous bootstrap capability.
2026-02-05 15:08:02 -06:00

218 lines
10 KiB
Markdown

# Mosaic Stack - Code Quality Review
**Date:** 2026-02-05
**Reviewers:** 3 parallel code quality analysis agents
**Total Findings:** 29 (11 Critical/High, 8 Medium, 10 Low/Code Smell)
---
## apps/api - Code Quality (7 Findings)
### CQ-API-1: Memory Leak - setTimeout Not Cleared in WebSocket Gateway [HIGH]
- **File:** `apps/api/src/websocket/websocket.gateway.ts:105-166`
- **Issue:** If workspaceMembership query throws, timeoutId is never cleared. Each failed connection leaks a timer.
- **Fix:** Call clearTimeout(timeoutId) in catch block at line 159.
### CQ-API-2: Memory Leak - setInterval Not Cleared in Runner Jobs [HIGH]
- **File:** `apps/api/src/runner-jobs/runner-jobs.service.ts:310-449`
- **Issue:** keepAliveInterval at line 315 is never explicitly cleared with clearInterval().
- **Fix:** Add clearInterval(keepAliveInterval) in the finally block.
### CQ-API-3: Activity Logging Re-Throws (Defeats Fire-and-Forget) [MEDIUM]
- **File:** `apps/api/src/activity/activity.service.ts:23-31`
- **Issue:** logActivity catches errors, logs them, then re-throws. Activity logging failures break primary operations.
- **Fix:** Make fire-and-forget (don't re-throw) or document that callers must handle.
### CQ-API-4: Missing Cleanup for Redis Event Listeners [MEDIUM]
- **File:** `apps/api/src/valkey/valkey.service.ts:43-53`
- **Issue:** Event listeners registered on Redis client but never removed in onModuleDestroy.
- **Fix:** Call this.client.removeAllListeners() before quit() in onModuleDestroy.
### CQ-API-5: Race Condition in Throttler Fallback Storage [LOW]
- **File:** `apps/api/src/common/throttler/throttler-storage.service.ts:140-154`
- **Issue:** In-memory fallback incrementMemory has non-atomic read-modify-write.
- **Fix:** Use mutex/lock, or document in-memory mode as best-effort.
### CQ-API-6: Hardcoded TODO Values in OIDC Federation [CRITICAL]
- **File:** `apps/api/src/federation/oidc.service.ts:140-141`
- **Issue:** JWT validation uses hardcoded "https://auth.example.com" and "mosaic-client-id".
- **Fix:** Load from environment variables. Fail fast if unconfigured.
### CQ-API-7: N+1 Query in Knowledge Service Tag Lookup [LOW]
- **File:** `apps/api/src/knowledge/knowledge.service.ts:817-827`
- **Issue:** Promise.all with individual findFirst queries per tag name.
- **Fix:** Replace with single findMany WHERE name IN tagNames.
---
## apps/web - Code Quality (12 Findings)
### CQ-WEB-1: Stale Closure in useWebSocket Hook [CRITICAL]
- **File:** `apps/web/src/hooks/useWebSocket.ts:127-137`
- **Issue:** Callback functions in useEffect deps cause WebSocket disconnect/reconnect on every parent re-render.
- **Fix:** Use refs for callbacks. Only include stable values (workspaceId, token) in deps.
### CQ-WEB-2: Missing Dependency in FilterBar useEffect [HIGH]
- **File:** `apps/web/src/components/filters/FilterBar.tsx:33-50`
- **Issue:** useEffect accesses filters and onFilterChange but they're not in the dependency array.
- **Fix:** Add missing dependencies or use functional updates.
### CQ-WEB-3: Race Condition in LinkAutocomplete [HIGH]
- **File:** `apps/web/src/components/knowledge/LinkAutocomplete.tsx:96-107`
- **Issue:** Debounced search doesn't cancel in-flight requests. Older results can overwrite newer ones.
- **Fix:** Use AbortController to cancel in-flight requests.
### CQ-WEB-4: Stale Messages in useChat Hook [CRITICAL]
- **File:** `apps/web/src/hooks/useChat.ts:146-236`
- **Issue:** sendMessage captures messages in closure. Rapid sends use stale state. Data loss.
- **Fix:** Use functional state updates: setMessages(prev => [...prev, userMessage]).
### CQ-WEB-5: Incorrect Boolean Logic in ReactFlowEditor [CRITICAL]
- **File:** `apps/web/src/components/mindmap/ReactFlowEditor.tsx:213-214`
- **Issue:** `readOnly ?? !selectedNode` should be `readOnly || !selectedNode`. The ?? operator with boolean never evaluates right side when readOnly is false.
- **Fix:** Change ?? to ||.
### CQ-WEB-6: Memory Leak in Chat Component Timers [MEDIUM]
- **File:** `apps/web/src/components/chat/Chat.tsx:146-174`
- **Issue:** Timer cleanup structure is confusing and may leak in edge cases.
- **Fix:** Restructure to only create timers when isChatLoading is true with proper cleanup.
### CQ-WEB-7: KanbanBoard No UI Revert on Error [CRITICAL]
- **File:** `apps/web/src/components/kanban/KanbanBoard.tsx:97-117`
- **Issue:** PATCH error logged but UI not reverted. User sees task in wrong column.
- **Fix:** Implement optimistic updates with rollback on error.
### CQ-WEB-8: No React.memo Usage Anywhere [PERFORMANCE]
- **Files:** All component files
- **Issue:** Zero components use React.memo despite 115+ uses of useMemo/useCallback and large lists.
- **Fix:** Wrap pure components in React.memo (TaskItem, KanbanColumn, etc.).
### CQ-WEB-9: DOM Manipulation in LinkAutocomplete [PERFORMANCE]
- **File:** `apps/web/src/components/knowledge/LinkAutocomplete.tsx:112-165`
- **Issue:** Creates temporary DOM element on every position calculation, causing layout thrashing.
- **Fix:** Use textarea-caret-position library or cached measurement.
### CQ-WEB-10: Missing Loading States on Many Pages [CODE SMELL]
- **Files:** tasks/page.tsx, calendar/page.tsx, multiple federation pages
- **Issue:** Pages use mock data with TODO comments. No loading states.
- **Fix:** Implement loading/error states even with mock data.
### CQ-WEB-11: Accessibility - Missing Form Labels [ACCESSIBILITY]
- **File:** `apps/web/src/components/filters/FilterBar.tsx:115-188`
- **Issue:** Checkboxes lack proper label associations for screen readers.
- **Fix:** Add explicit id/htmlFor or aria-label.
### CQ-WEB-12: SSR Issue - Window Check After DOM Usage [BUG]
- **File:** `apps/web/src/components/mindmap/ReactFlowEditor.tsx:238-245`
- **Issue:** Checks typeof window after already using document.addEventListener.
- **Fix:** Use useEffect for all DOM access. Consistently check typeof window.
---
## apps/orchestrator - Code Quality (10 Findings)
### CQ-ORCH-1: Memory Leak - Agent Sessions Never Cleaned Up [CRITICAL]
- **File:** `apps/orchestrator/src/spawner/agent-spawner.service.ts:19`
- **Issue:** Sessions Map never has entries removed. No cleanup on terminal states.
- **Fix:** Add session cleanup after completed/failed/killed transitions.
### CQ-ORCH-2: Race Condition - Dual State Management [HIGH]
- **File:** `apps/orchestrator/src/api/agents/agents.controller.ts:91-118`
- **Issue:** Two sources of truth (in-memory Map + Valkey). Can get out of sync during rapid transitions.
- **Fix:** Use Valkey as single source of truth. Remove in-memory Map.
### CQ-ORCH-3: Redis KEYS Command in Production [CRITICAL]
- **File:** `apps/orchestrator/src/valkey/valkey.client.ts:116, 187`
- **Issue:** KEYS is O(N) and blocks entire Redis instance.
- **Fix:** Use SCAN for non-blocking iteration.
### CQ-ORCH-4: No Timeout Cleanup for AbortController [HIGH]
- **File:** `apps/orchestrator/src/coordinator/coordinator-client.service.ts:67-81, 156-166`
- **Issue:** clearTimeout only called on success path, not in catch block. Timer leak per retry.
- **Fix:** Use try-finally to guarantee cleanup.
### CQ-ORCH-5: TOCTOU Race in State Transitions [HIGH]
- **File:** `apps/orchestrator/src/spawner/agent-lifecycle.service.ts:31-53`
- **Issue:** Read state, validate, then write is not atomic. Concurrent transitions can overwrite each other.
- **Fix:** Use Valkey Lua script for atomic check-and-set.
### CQ-ORCH-6: N+1 Query in listTasks/listAgents [CRITICAL]
- **File:** `apps/orchestrator/src/valkey/valkey.client.ts:114-127, 185-198`
- **Issue:** 1 + N queries (one KEYS + one GET per key). 1000 tasks = 1001 round trips.
- **Fix:** Use MGET for batch retrieval after SCAN. 100-500x faster.
### CQ-ORCH-7: Docker Force Remove Loses Data [MEDIUM]
- **File:** `apps/orchestrator/src/spawner/docker-sandbox.service.ts:179`
- **Issue:** force: true always used, bypassing graceful shutdown. Work-in-progress lost.
- **Fix:** Stop before remove. Only force as last resort.
### CQ-ORCH-8: KillAllAgents Doesn't Distinguish Error Types [MEDIUM]
- **File:** `apps/orchestrator/src/killswitch/killswitch.service.ts:125-137`
- **Issue:** "Already terminated" counted same as "cleanup failed." Misleading metrics.
- **Fix:** Add "skipped" category for expected state transition failures.
### CQ-ORCH-9: Duplicate Validation Logic [CODE SMELL]
- **Files:** `agent-spawner.service.ts:98-119`, `agents.controller.ts:192-213`
- **Issue:** Both implement identical spawn request validation.
- **Fix:** Remove from controller. Rely on DTOs + ValidationPipe.
### CQ-ORCH-10: BullMQ Job Retention May Cause Memory Growth [LOW]
- **File:** `apps/orchestrator/src/queue/queue.service.ts:52-60`
- **Issue:** Retention of 1000 failed jobs for 24 hours may be excessive under load.
- **Fix:** Make configurable via environment variables.
---
## Remediation Priority Matrix
| Priority | ID | Title | Effort |
| -------- | --------- | ------------------------------ | ------ |
| P0 | CQ-API-6 | Hardcoded OIDC values | 30min |
| P0 | CQ-ORCH-3 | KEYS -> SCAN | 1hr |
| P0 | CQ-ORCH-6 | N+1 queries -> MGET | 1hr |
| P0 | CQ-WEB-5 | Boolean logic bug (?? -> \|\|) | 5min |
| P1 | CQ-ORCH-1 | Session Map cleanup | 1hr |
| P1 | CQ-API-1 | WebSocket timer leak | 30min |
| P1 | CQ-API-2 | Runner jobs interval leak | 30min |
| P1 | CQ-WEB-1 | useWebSocket stale closure | 1hr |
| P1 | CQ-WEB-4 | useChat stale messages | 1hr |
| P1 | CQ-ORCH-5 | Atomic state transitions | 2hr |
| P2 | CQ-WEB-7 | Kanban optimistic rollback | 1hr |
| P2 | CQ-ORCH-2 | Single source of truth | 2hr |
| P2 | CQ-ORCH-4 | AbortController cleanup | 30min |
| P2 | CQ-API-4 | Redis listener cleanup | 30min |
| P3 | CQ-WEB-8 | React.memo adoption | 2hr |
| P3 | CQ-WEB-11 | Accessibility labels | 1hr |
| P3 | Remaining | Lower priority items | 3hr |
**Estimated total remediation effort: 2-3 days**