feat: M13-SpeechServices — TTS & STT integration #409

Merged

jason.woltje merged 20 commits from feature/m13-speech-services into develop

2026-02-15 18:37:54 +00:00

Author	SHA1	Message	Date
Jason Woltje	cf28efa880	merge: resolve conflicts with develop (M10-Telemetry + M12-MatrixBridge) All checks were successful ci/woodpecker/push/infra Pipeline was successful Details ci/woodpecker/push/coordinator Pipeline was successful Details ci/woodpecker/push/orchestrator Pipeline was successful Details ci/woodpecker/push/api Pipeline was successful Details ci/woodpecker/push/web Pipeline was successful Details Merge origin/develop into feature/m13-speech-services to incorporate M10-Telemetry and M12-MatrixBridge changes. Resolved 4 conflicts: - .env.example: Added speech config alongside telemetry + matrix config - Makefile: Added speech targets alongside matrix targets - app.module.ts: Import both MosaicTelemetryModule and SpeechModule - docs/tasks.md: Combined all milestone task tracking sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 12:31:08 -06:00
Jason Woltje	af9c5799af	fix(#388 ): address PR review findings — fix WebSocket/REST bugs, improve error handling, fix types and comments All checks were successful ci/woodpecker/push/web Pipeline was successful Details ci/woodpecker/push/api Pipeline was successful Details Critical fixes: - Fix FormData field name mismatch (audio -> file) to match backend FileInterceptor - Add /speech namespace to WebSocket connection URL - Pass auth token in WebSocket handshake options - Wrap audio.play() in try-catch for NotAllowedError and DOMException handling - Replace bare catch block with named error parameter and descriptive message - Add connect_error and disconnect event handlers to WebSocket - Update JSDoc to accurately describe batch transcription (not real-time partial) Important fixes: - Emit transcription-error before disconnect in gateway auth failures - Capture MediaRecorder error details and clean up media tracks on error - Change TtsDefaultConfig.format type from string to AudioFormat - Define canonical SPEECH_TIERS and AUDIO_FORMATS arrays as single source of truth - Fix voice count from 54 to 53 in provider, AGENTS.md, and docs - Fix inaccurate comments (Piper formats, tier prop, SpeachesProvider, TextValidationPipe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:44:33 -06:00
Jason Woltje	dcbc8d1053	chore(orchestrator): finalize M13-SpeechServices tasks.md — all 18/18 done All tasks completed successfully across 7 phases: - Phase 1: Config + Module foundation (2/2) - Phase 2: STT + TTS providers (5/5) - Phase 3: Middleware + REST endpoints (3/3) - Phase 4: WebSocket streaming (1/1) - Phase 5: Docker/DevOps (2/2) - Phase 6: Frontend components (3/3) - Phase 7: E2E tests + Documentation (2/2) Total: ~500+ tests across API and web packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:27:21 -06:00
Jason Woltje	d2c7602430	test(#405 ): add E2E integration tests for speech services All checks were successful ci/woodpecker/push/api Pipeline was successful Details Adds comprehensive integration tests covering all 9 required scenarios: 1. REST transcription (POST /speech/transcribe) 2. REST synthesis (POST /speech/synthesize) 3. Provider fallback (premium -> default -> fallback chain) 4. WebSocket streaming transcription lifecycle 5. Audio MIME type validation (reject invalid formats) 6. File size limit enforcement (25 MB max) 7. Authentication on all endpoints (401 without token) 8. Voice listing with tier filtering (GET /speech/voices) 9. Health check status (GET /speech/health) Uses NestJS testing module with mocked providers (CI-compatible). 30 test cases, all passing. Fixes #405	2026-02-15 03:26:05 -06:00
Jason Woltje	24065aa199	docs(#406 ): add speech services documentation All checks were successful ci/woodpecker/push/api Pipeline was successful Details Comprehensive documentation for the speech services module: - docs/SPEECH.md: Architecture, API reference, WebSocket protocol, environment variables, provider configuration, Docker setup, GPU VRAM budget, and frontend integration examples - apps/api/src/speech/AGENTS.md: Module structure, provider pattern, how to add new providers, gotchas, and test patterns - README.md: Speech capabilities section with quick start Fixes #406 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:23:22 -06:00
Jason Woltje	bc86947d01	feat(#404 ): add speech settings page with provider config All checks were successful ci/woodpecker/push/web Pipeline was successful Details Implements the SpeechSettings component with four sections: - STT settings (enable/disable, language preference) - TTS settings (enable/disable, voice selector, tier preference, auto-play, speed control) - Voice preview with test button - Provider status with health indicators Also adds Slider UI component and getHealthStatus API client function. 30 unit tests covering all sections, toggles, voice loading, and PDA-friendly design. Fixes #404 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:16:27 -06:00
Jason Woltje	74d6c1092e	feat(#403 ): add audio playback component for TTS output All checks were successful ci/woodpecker/push/web Pipeline was successful Details Implements AudioPlayer inline component with play/pause, progress bar, speed control (0.5x-2x), download, and duration display. Adds TextToSpeechButton "Read aloud" component that synthesizes text via the speech API and integrates AudioPlayer for playback. Includes useTextToSpeech hook with API integration, audio caching, and playback state management. All 32 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:05:39 -06:00
Jason Woltje	28c9e6fe65	feat(#397 ): implement WebSocket streaming transcription gateway All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add SpeechGateway with Socket.IO namespace /speech for real-time streaming transcription. Supports start-transcription, audio-chunk, and stop-transcription events with session management, authentication, and buffer size rate limiting. Includes 29 unit tests covering authentication, session lifecycle, error handling, cleanup, and client isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:54:41 -06:00
Jason Woltje	b3d6d73348	feat(#400 ): add Docker Compose swarm/prod deployment for speech services All checks were successful ci/woodpecker/push/infra Pipeline was successful Details Add docker/docker-compose.sample.speech.yml for standalone speech services deployment in Docker Swarm with Portainer compatibility: - Speaches (STT + basic TTS) with Whisper model configuration - Kokoro TTS (default high-quality TTS) always deployed - Chatterbox TTS (premium, GPU) commented out as optional - Traefik labels for reverse proxy routing with TLS - Health checks on all services - Volume persistence for Whisper models - GPU reservation via Swarm generic resources for Chatterbox - Environment variable substitution for Portainer - Comprehensive header documentation Fixes #400	2026-02-15 02:51:13 -06:00
Jason Woltje	527262af38	feat(#392 ): create /api/speech/transcribe REST endpoint All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add SpeechController with POST /api/speech/transcribe for audio transcription and GET /api/speech/health for provider status. Uses AudioValidationPipe for file upload validation and returns results in standard { data: T } envelope. Includes 10 unit tests covering transcribe with options, error propagation, and all health status combinations. Fixes #392 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:47:52 -06:00
Jason Woltje	6c465566f6	feat(#395 ): implement Piper TTS provider via OpenedAI Speech All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add fallback-tier TTS provider using Piper via OpenedAI Speech for ultra-lightweight CPU-only synthesis. Maps 6 standard OpenAI voice names (alloy, echo, fable, onyx, nova, shimmer) to Piper voices. Update factory to use the new PiperTtsProvider class, replacing the inline stub. Includes 37 unit tests covering provider identity, voice mapping, and voice listing. Fixes #395 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:39:20 -06:00
Jason Woltje	7b4fda6011	feat(#398 ): add audio/text validation pipes and speech DTOs All checks were successful ci/woodpecker/push/api Pipeline was successful Details Create AudioValidationPipe for MIME type and file size validation, TextValidationPipe for TTS text input validation, and DTOs for transcribe/synthesize endpoints. Includes 36 unit tests. Fixes #398	2026-02-15 02:37:54 -06:00
Jason Woltje	d37c78f503	feat(#394 ): implement Chatterbox TTS provider with voice cloning All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add ChatterboxSynthesizeOptions interface with referenceAudio and emotionExaggeration fields, and comprehensive unit tests (26 tests) covering voice cloning, emotion control, clamping, graceful degradation, and cross-language support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:29:38 -06:00
Jason Woltje	79b1d81d27	feat(#393 ): implement Kokoro-FastAPI TTS provider with voice catalog Some checks failed ci/woodpecker/push/api Pipeline failed Details Extract KokoroTtsProvider from factory into its own module with: - Full voice catalog of 54 built-in voices across 8 languages - Voice metadata parsing from ID prefix (language, gender, accent) - Exported constants for supported formats and speed range - Comprehensive unit tests (48 tests) - Fix lint/type errors in chatterbox provider (Prettier + unsafe cast) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:27:47 -06:00
Jason Woltje	b5edb4f37e	feat(#391 ): add base TTS provider and factory classes All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add the BaseTTSProvider abstract class and TTS provider factory that were part of the tiered TTS architecture but missed from the previous commit. - BaseTTSProvider: abstract base with synthesize(), listVoices(), isHealthy() - tts-provider.factory: creates Kokoro/Chatterbox/Piper providers from config - 30 tests (22 base provider + 8 factory) Refs #391	2026-02-15 02:20:24 -06:00
Jason Woltje	3ae9e53bcc	feat(#391 ): implement tiered TTS provider architecture with base class Add abstract BaseTTSProvider class that implements common OpenAI-compatible TTS logic using the OpenAI SDK with configurable baseURL. Includes synthesize(), listVoices(), and isHealthy() methods. Create TTS provider factory that dynamically registers Kokoro (default), Chatterbox (premium), and Piper (fallback) providers based on configuration. Update SpeechModule to use the factory for TTS_PROVIDERS injection token. Also fixes lint error in speaches-stt.provider.ts (Array<T> -> T[]). 30 tests added (22 base provider + 8 factory), all passing. Fixes #391	2026-02-15 02:19:46 -06:00
Jason Woltje	c40373fa3b	feat(#389 ): create SpeechModule with provider abstraction layer All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add SpeechModule with provider interfaces and service skeleton for multi-tier TTS fallback (premium -> default -> fallback) and STT transcription support. Includes 27 unit tests covering provider selection, fallback logic, and availability checks. - ISTTProvider interface with transcribe/isHealthy methods - ITTSProvider interface with synthesize/listVoices/isHealthy methods - Shared types: SpeechTier, TranscriptionResult, SynthesisResult, etc. - SpeechService with graceful TTS fallback chain - NestJS injection tokens (STT_PROVIDER, TTS_PROVIDERS) - SpeechModule registered in AppModule - ConfigModule integration via speechConfig registerAs factory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:09:45 -06:00
Jason Woltje	52553c8266	feat(#399 ): add Docker Compose dev overlay for speech services Add docker-compose.speech.yml with three speech services: - Speaches (STT via Whisper + basic TTS) on port 8090 - Kokoro-FastAPI (default TTS) on port 8880 - Chatterbox TTS (premium, GPU-required) on port 8881 behind the premium-tts profile All services include health checks, connect to the mosaic-internal network, and follow existing naming/labeling conventions. Makefile targets added: speech-up, speech-down, speech-logs. Fixes #399 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:06:21 -06:00
Jason Woltje	4cc43bece6	feat(#401 ): add speech services config and env vars All checks were successful ci/woodpecker/push/api Pipeline was successful Details Add SpeechConfig with typed configuration and startup validation for STT (Whisper/Speaches), TTS default (Kokoro), TTS premium (Chatterbox), and TTS fallback (Piper/OpenedAI). Includes registerAs factory for NestJS ConfigModule integration, .env.example documentation, and 51 unit tests covering all validation paths. Refs #401	2026-02-15 02:03:21 -06:00
Jason Woltje	fb53272fa9	chore(orchestrator): Bootstrap M13-SpeechServices tasks.md 18 tasks across 7 phases for TTS & STT integration. Estimated total: ~322K tokens. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:56:06 -06:00

feat: M13-SpeechServices — TTS & STT integration #409

20 Commits