docs(#406): add speech services documentation
All checks were successful
ci/woodpecker/push/api Pipeline was successful
All checks were successful
ci/woodpecker/push/api Pipeline was successful
Comprehensive documentation for the speech services module: - docs/SPEECH.md: Architecture, API reference, WebSocket protocol, environment variables, provider configuration, Docker setup, GPU VRAM budget, and frontend integration examples - apps/api/src/speech/AGENTS.md: Module structure, provider pattern, how to add new providers, gotchas, and test patterns - README.md: Speech capabilities section with quick start Fixes #406 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
50
README.md
50
README.md
@@ -19,19 +19,20 @@ Mosaic Stack is a modern, PDA-friendly platform designed to help users manage th
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Technology |
|
||||
| -------------- | -------------------------------------------- |
|
||||
| **Frontend** | Next.js 16 + React + TailwindCSS + Shadcn/ui |
|
||||
| **Backend** | NestJS + Prisma ORM |
|
||||
| **Database** | PostgreSQL 17 + pgvector |
|
||||
| **Cache** | Valkey (Redis-compatible) |
|
||||
| **Auth** | Authentik (OIDC) via BetterAuth |
|
||||
| **AI** | Ollama (local or remote) |
|
||||
| **Messaging** | MoltBot (stock + plugins) |
|
||||
| **Real-time** | WebSockets (Socket.io) |
|
||||
| **Monorepo** | pnpm workspaces + TurboRepo |
|
||||
| **Testing** | Vitest + Playwright |
|
||||
| **Deployment** | Docker + docker-compose |
|
||||
| Layer | Technology |
|
||||
| -------------- | ---------------------------------------------- |
|
||||
| **Frontend** | Next.js 16 + React + TailwindCSS + Shadcn/ui |
|
||||
| **Backend** | NestJS + Prisma ORM |
|
||||
| **Database** | PostgreSQL 17 + pgvector |
|
||||
| **Cache** | Valkey (Redis-compatible) |
|
||||
| **Auth** | Authentik (OIDC) via BetterAuth |
|
||||
| **AI** | Ollama (local or remote) |
|
||||
| **Messaging** | MoltBot (stock + plugins) |
|
||||
| **Real-time** | WebSockets (Socket.io) |
|
||||
| **Speech** | Speaches (STT) + Kokoro/Chatterbox/Piper (TTS) |
|
||||
| **Monorepo** | pnpm workspaces + TurboRepo |
|
||||
| **Testing** | Vitest + Playwright |
|
||||
| **Deployment** | Docker + docker-compose |
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -356,6 +357,29 @@ Mosaic Stack includes a sophisticated agent orchestration system for autonomous
|
||||
|
||||
See [Agent Orchestration Design](docs/design/agent-orchestration.md) for architecture details.
|
||||
|
||||
## Speech Services
|
||||
|
||||
Mosaic Stack includes integrated speech-to-text (STT) and text-to-speech (TTS) capabilities through a modular provider architecture. Each component is optional and independently configurable.
|
||||
|
||||
- **Speech-to-Text** - Transcribe audio files and real-time audio streams using Whisper (via Speaches)
|
||||
- **Text-to-Speech** - Synthesize speech with 54+ voices across 8 languages (via Kokoro, CPU-based)
|
||||
- **Premium Voice Cloning** - Clone voices from audio samples with emotion control (via Chatterbox, GPU)
|
||||
- **Fallback TTS** - Ultra-lightweight CPU fallback for low-resource environments (via Piper/OpenedAI Speech)
|
||||
- **WebSocket Streaming** - Real-time streaming transcription via Socket.IO `/speech` namespace
|
||||
- **Automatic Fallback** - TTS tier system with graceful degradation (premium -> default -> fallback)
|
||||
|
||||
**Quick Start:**
|
||||
|
||||
```bash
|
||||
# Start speech services alongside core stack
|
||||
make speech-up
|
||||
|
||||
# Or with Docker Compose directly
|
||||
docker compose -f docker-compose.yml -f docker-compose.speech.yml up -d
|
||||
```
|
||||
|
||||
See [Speech Services Documentation](docs/SPEECH.md) for architecture details, API reference, provider configuration, and deployment options.
|
||||
|
||||
## Current Implementation Status
|
||||
|
||||
### ✅ Completed (v0.0.1-0.0.6)
|
||||
|
||||
Reference in New Issue
Block a user