Critical fixes:
- Fix FormData field name mismatch (audio -> file) to match backend FileInterceptor
- Add /speech namespace to WebSocket connection URL
- Pass auth token in WebSocket handshake options
- Wrap audio.play() in try-catch for NotAllowedError and DOMException handling
- Replace bare catch block with named error parameter and descriptive message
- Add connect_error and disconnect event handlers to WebSocket
- Update JSDoc to accurately describe batch transcription (not real-time partial)
Important fixes:
- Emit transcription-error before disconnect in gateway auth failures
- Capture MediaRecorder error details and clean up media tracks on error
- Change TtsDefaultConfig.format type from string to AudioFormat
- Define canonical SPEECH_TIERS and AUDIO_FORMATS arrays as single source of truth
- Fix voice count from 54 to 53 in provider, AGENTS.md, and docs
- Fix inaccurate comments (Piper formats, tier prop, SpeachesProvider, TextValidationPipe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive documentation for the speech services module:
- docs/SPEECH.md: Architecture, API reference, WebSocket protocol,
environment variables, provider configuration, Docker setup,
GPU VRAM budget, and frontend integration examples
- apps/api/src/speech/AGENTS.md: Module structure, provider pattern,
how to add new providers, gotchas, and test patterns
- README.md: Speech capabilities section with quick start
Fixes#406
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>