docs(#406): add speech services documentation

Comprehensive documentation for the speech services module: - docs/SPEECH.md: Architecture, API reference, WebSocket protocol, environment variables, provider configuration, Docker setup, GPU VRAM budget, and frontend integration examples - apps/api/src/speech/AGENTS.md: Module structure, provider pattern, how to add new providers, gotchas, and test patterns - README.md: Speech capabilities section with quick start Fixes #406 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 03:23:22 -06:00
parent bc86947d01
commit 24065aa199
3 changed files with 1213 additions and 13 deletions
--- a/README.md
+++ b/README.md
@@ -19,19 +19,20 @@ Mosaic Stack is a modern, PDA-friendly platform designed to help users manage th

 ## Technology Stack

-| Layer          | Technology                                   |
-| -------------- | -------------------------------------------- |
-| **Frontend**   | Next.js 16 + React + TailwindCSS + Shadcn/ui |
-| **Backend**    | NestJS + Prisma ORM                          |
-| **Database**   | PostgreSQL 17 + pgvector                     |
-| **Cache**      | Valkey (Redis-compatible)                    |
-| **Auth**       | Authentik (OIDC) via BetterAuth              |
-| **AI**         | Ollama (local or remote)                     |
-| **Messaging**  | MoltBot (stock + plugins)                    |
-| **Real-time**  | WebSockets (Socket.io)                       |
-| **Monorepo**   | pnpm workspaces + TurboRepo                  |
-| **Testing**    | Vitest + Playwright                          |
-| **Deployment** | Docker + docker-compose                      |
+| Layer          | Technology                                     |
+| -------------- | ---------------------------------------------- |
+| **Frontend**   | Next.js 16 + React + TailwindCSS + Shadcn/ui   |
+| **Backend**    | NestJS + Prisma ORM                            |
+| **Database**   | PostgreSQL 17 + pgvector                       |
+| **Cache**      | Valkey (Redis-compatible)                      |
+| **Auth**       | Authentik (OIDC) via BetterAuth                |
+| **AI**         | Ollama (local or remote)                       |
+| **Messaging**  | MoltBot (stock + plugins)                      |
+| **Real-time**  | WebSockets (Socket.io)                         |
+| **Speech**     | Speaches (STT) + Kokoro/Chatterbox/Piper (TTS) |
+| **Monorepo**   | pnpm workspaces + TurboRepo                    |
+| **Testing**    | Vitest + Playwright                            |
+| **Deployment** | Docker + docker-compose                        |

 ## Quick Start

@@ -356,6 +357,29 @@ Mosaic Stack includes a sophisticated agent orchestration system for autonomous

 See [Agent Orchestration Design](docs/design/agent-orchestration.md) for architecture details.

+## Speech Services
+
+Mosaic Stack includes integrated speech-to-text (STT) and text-to-speech (TTS) capabilities through a modular provider architecture. Each component is optional and independently configurable.
+
+- **Speech-to-Text** - Transcribe audio files and real-time audio streams using Whisper (via Speaches)
+- **Text-to-Speech** - Synthesize speech with 54+ voices across 8 languages (via Kokoro, CPU-based)
+- **Premium Voice Cloning** - Clone voices from audio samples with emotion control (via Chatterbox, GPU)
+- **Fallback TTS** - Ultra-lightweight CPU fallback for low-resource environments (via Piper/OpenedAI Speech)
+- **WebSocket Streaming** - Real-time streaming transcription via Socket.IO `/speech` namespace
+- **Automatic Fallback** - TTS tier system with graceful degradation (premium -> default -> fallback)
+
+**Quick Start:**
+
+```bash
+# Start speech services alongside core stack
+make speech-up
+
+# Or with Docker Compose directly
+docker compose -f docker-compose.yml -f docker-compose.speech.yml up -d
+```
+
+See [Speech Services Documentation](docs/SPEECH.md) for architecture details, API reference, provider configuration, and deployment options.
+
 ## Current Implementation Status

 ### ✅ Completed (v0.0.1-0.0.6)