docs(#406): add speech services documentation
All checks were successful
ci/woodpecker/push/api Pipeline was successful

Comprehensive documentation for the speech services module:
- docs/SPEECH.md: Architecture, API reference, WebSocket protocol,
  environment variables, provider configuration, Docker setup,
  GPU VRAM budget, and frontend integration examples
- apps/api/src/speech/AGENTS.md: Module structure, provider pattern,
  how to add new providers, gotchas, and test patterns
- README.md: Speech capabilities section with quick start

Fixes #406

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 03:23:22 -06:00
parent bc86947d01
commit 24065aa199
3 changed files with 1213 additions and 13 deletions

View File

@@ -19,19 +19,20 @@ Mosaic Stack is a modern, PDA-friendly platform designed to help users manage th
## Technology Stack
| Layer | Technology |
| -------------- | -------------------------------------------- |
| **Frontend** | Next.js 16 + React + TailwindCSS + Shadcn/ui |
| **Backend** | NestJS + Prisma ORM |
| **Database** | PostgreSQL 17 + pgvector |
| **Cache** | Valkey (Redis-compatible) |
| **Auth** | Authentik (OIDC) via BetterAuth |
| **AI** | Ollama (local or remote) |
| **Messaging** | MoltBot (stock + plugins) |
| **Real-time** | WebSockets (Socket.io) |
| **Monorepo** | pnpm workspaces + TurboRepo |
| **Testing** | Vitest + Playwright |
| **Deployment** | Docker + docker-compose |
| Layer | Technology |
| -------------- | ---------------------------------------------- |
| **Frontend** | Next.js 16 + React + TailwindCSS + Shadcn/ui |
| **Backend** | NestJS + Prisma ORM |
| **Database** | PostgreSQL 17 + pgvector |
| **Cache** | Valkey (Redis-compatible) |
| **Auth** | Authentik (OIDC) via BetterAuth |
| **AI** | Ollama (local or remote) |
| **Messaging** | MoltBot (stock + plugins) |
| **Real-time** | WebSockets (Socket.io) |
| **Speech** | Speaches (STT) + Kokoro/Chatterbox/Piper (TTS) |
| **Monorepo** | pnpm workspaces + TurboRepo |
| **Testing** | Vitest + Playwright |
| **Deployment** | Docker + docker-compose |
## Quick Start
@@ -356,6 +357,29 @@ Mosaic Stack includes a sophisticated agent orchestration system for autonomous
See [Agent Orchestration Design](docs/design/agent-orchestration.md) for architecture details.
## Speech Services
Mosaic Stack includes integrated speech-to-text (STT) and text-to-speech (TTS) capabilities through a modular provider architecture. Each component is optional and independently configurable.
- **Speech-to-Text** - Transcribe audio files and real-time audio streams using Whisper (via Speaches)
- **Text-to-Speech** - Synthesize speech with 54+ voices across 8 languages (via Kokoro, CPU-based)
- **Premium Voice Cloning** - Clone voices from audio samples with emotion control (via Chatterbox, GPU)
- **Fallback TTS** - Ultra-lightweight CPU fallback for low-resource environments (via Piper/OpenedAI Speech)
- **WebSocket Streaming** - Real-time streaming transcription via Socket.IO `/speech` namespace
- **Automatic Fallback** - TTS tier system with graceful degradation (premium -> default -> fallback)
**Quick Start:**
```bash
# Start speech services alongside core stack
make speech-up
# Or with Docker Compose directly
docker compose -f docker-compose.yml -f docker-compose.speech.yml up -d
```
See [Speech Services Documentation](docs/SPEECH.md) for architecture details, API reference, provider configuration, and deployment options.
## Current Implementation Status
### ✅ Completed (v0.0.1-0.0.6)