Critical fixes:
- Fix FormData field name mismatch (audio -> file) to match backend FileInterceptor
- Add /speech namespace to WebSocket connection URL
- Pass auth token in WebSocket handshake options
- Wrap audio.play() in try-catch for NotAllowedError and DOMException handling
- Replace bare catch block with named error parameter and descriptive message
- Add connect_error and disconnect event handlers to WebSocket
- Update JSDoc to accurately describe batch transcription (not real-time partial)
Important fixes:
- Emit transcription-error before disconnect in gateway auth failures
- Capture MediaRecorder error details and clean up media tracks on error
- Change TtsDefaultConfig.format type from string to AudioFormat
- Define canonical SPEECH_TIERS and AUDIO_FORMATS arrays as single source of truth
- Fix voice count from 54 to 53 in provider, AGENTS.md, and docs
- Fix inaccurate comments (Piper formats, tier prop, SpeachesProvider, TextValidationPipe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive documentation for the speech services module:
- docs/SPEECH.md: Architecture, API reference, WebSocket protocol,
environment variables, provider configuration, Docker setup,
GPU VRAM budget, and frontend integration examples
- apps/api/src/speech/AGENTS.md: Module structure, provider pattern,
how to add new providers, gotchas, and test patterns
- README.md: Speech capabilities section with quick start
Fixes#406
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the SpeechSettings component with four sections:
- STT settings (enable/disable, language preference)
- TTS settings (enable/disable, voice selector, tier preference, auto-play, speed control)
- Voice preview with test button
- Provider status with health indicators
Also adds Slider UI component and getHealthStatus API client function.
30 unit tests covering all sections, toggles, voice loading, and PDA-friendly design.
Fixes#404
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements AudioPlayer inline component with play/pause, progress bar,
speed control (0.5x-2x), download, and duration display. Adds
TextToSpeechButton "Read aloud" component that synthesizes text via
the speech API and integrates AudioPlayer for playback. Includes
useTextToSpeech hook with API integration, audio caching, and
playback state management. All 32 tests passing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SpeechGateway with Socket.IO namespace /speech for real-time
streaming transcription. Supports start-transcription, audio-chunk,
and stop-transcription events with session management, authentication,
and buffer size rate limiting. Includes 29 unit tests covering
authentication, session lifecycle, error handling, cleanup, and
client isolation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SpeechController with POST /api/speech/transcribe for audio
transcription and GET /api/speech/health for provider status.
Uses AudioValidationPipe for file upload validation and returns
results in standard { data: T } envelope.
Includes 10 unit tests covering transcribe with options, error
propagation, and all health status combinations.
Fixes#392
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add fallback-tier TTS provider using Piper via OpenedAI Speech for
ultra-lightweight CPU-only synthesis. Maps 6 standard OpenAI voice
names (alloy, echo, fable, onyx, nova, shimmer) to Piper voices.
Update factory to use the new PiperTtsProvider class, replacing the
inline stub. Includes 37 unit tests covering provider identity,
voice mapping, and voice listing.
Fixes#395
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create AudioValidationPipe for MIME type and file size validation,
TextValidationPipe for TTS text input validation, and DTOs for
transcribe/synthesize endpoints. Includes 36 unit tests.
Fixes#398
Add ChatterboxSynthesizeOptions interface with referenceAudio and
emotionExaggeration fields, and comprehensive unit tests (26 tests)
covering voice cloning, emotion control, clamping, graceful degradation,
and cross-language support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract KokoroTtsProvider from factory into its own module with:
- Full voice catalog of 54 built-in voices across 8 languages
- Voice metadata parsing from ID prefix (language, gender, accent)
- Exported constants for supported formats and speed range
- Comprehensive unit tests (48 tests)
- Fix lint/type errors in chatterbox provider (Prettier + unsafe cast)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the BaseTTSProvider abstract class and TTS provider factory that were
part of the tiered TTS architecture but missed from the previous commit.
- BaseTTSProvider: abstract base with synthesize(), listVoices(), isHealthy()
- tts-provider.factory: creates Kokoro/Chatterbox/Piper providers from config
- 30 tests (22 base provider + 8 factory)
Refs #391
Add abstract BaseTTSProvider class that implements common OpenAI-compatible
TTS logic using the OpenAI SDK with configurable baseURL. Includes synthesize(),
listVoices(), and isHealthy() methods. Create TTS provider factory that
dynamically registers Kokoro (default), Chatterbox (premium), and Piper
(fallback) providers based on configuration. Update SpeechModule to use
the factory for TTS_PROVIDERS injection token.
Also fixes lint error in speaches-stt.provider.ts (Array<T> -> T[]).
30 tests added (22 base provider + 8 factory), all passing.
Fixes#391
Add docker-compose.speech.yml with three speech services:
- Speaches (STT via Whisper + basic TTS) on port 8090
- Kokoro-FastAPI (default TTS) on port 8880
- Chatterbox TTS (premium, GPU-required) on port 8881 behind
the premium-tts profile
All services include health checks, connect to the mosaic-internal
network, and follow existing naming/labeling conventions. Makefile
targets added: speech-up, speech-down, speech-logs.
Fixes#399
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SpeechConfig with typed configuration and startup validation for
STT (Whisper/Speaches), TTS default (Kokoro), TTS premium (Chatterbox),
and TTS fallback (Piper/OpenedAI). Includes registerAs factory for
NestJS ConfigModule integration, .env.example documentation, and 51
unit tests covering all validation paths.
Refs #401