EPIC: M13-SpeechServices — TTS & STT Integration #388

New Issue

jason.woltje · 2026-02-15T07:33:00Z

jason.woltje commented

2026-02-15 07:33:00 +00:00

Overview

Integrate Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities into Mosaic Stack using a tiered, OpenAI-compatible API architecture.

Architecture

All speech providers expose OpenAI-compatible endpoints ( for TTS, for STT), enabling a single NestJS integration pattern via the npm package with configurable base URLs.

TTS Tiers

Tier	Provider	License	Runs On	Use Case
Default	Kokoro-82M via Kokoro-FastAPI	Apache 2.0	CPU	Fast, good quality, always available
Premium	Chatterbox via Chatterbox-TTS-Server	MIT	GPU	Voice cloning, best quality, emotion control
Fallback	Piper via OpenedAI Speech	GPL	CPU	Ultra-lightweight, Home Assistant compatible

STT

Provider	License	Runs On	Use Case
Speaches + faster-whisper	MIT	GPU (CPU fallback)	Primary STT, OpenAI-compatible, 7-8% WER

Key Insight

Speaches can serve both STT (faster-whisper) and TTS (Kokoro/Piper) in a single container for simplified deployment.

Research

Full research: jarvis-brain

Scope

NestJS SpeechModule with provider abstraction
STT transcription (REST + WebSocket streaming)
TTS synthesis (REST + streaming)
Three TTS providers (Kokoro, Chatterbox, Piper)
Docker Compose dev + swarm deployment
Frontend voice I/O components
E2E integration tests

Issues

Track sub-issues in this milestone.

## Overview Integrate Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities into Mosaic Stack using a tiered, OpenAI-compatible API architecture. ## Architecture All speech providers expose OpenAI-compatible endpoints ( for TTS, for STT), enabling a single NestJS integration pattern via the npm package with configurable base URLs. ### TTS Tiers | Tier | Provider | License | Runs On | Use Case | |------|----------|---------|---------|----------| | Default | Kokoro-82M via Kokoro-FastAPI | Apache 2.0 | CPU | Fast, good quality, always available | | Premium | Chatterbox via Chatterbox-TTS-Server | MIT | GPU | Voice cloning, best quality, emotion control | | Fallback | Piper via OpenedAI Speech | GPL | CPU | Ultra-lightweight, Home Assistant compatible | ### STT | Provider | License | Runs On | Use Case | |----------|---------|---------|----------| | Speaches + faster-whisper | MIT | GPU (CPU fallback) | Primary STT, OpenAI-compatible, 7-8% WER | ### Key Insight Speaches can serve both STT (faster-whisper) and TTS (Kokoro/Piper) in a single container for simplified deployment. ## Research Full research: jarvis-brain ## Scope - NestJS SpeechModule with provider abstraction - STT transcription (REST + WebSocket streaming) - TTS synthesis (REST + streaming) - Three TTS providers (Kokoro, Chatterbox, Piper) - Docker Compose dev + swarm deployment - Frontend voice I/O components - E2E integration tests ## Issues Track sub-issues in this milestone.

jason.woltje added this to the M13-SpeechServices (0.0.13) milestone 2026-02-15 07:33:00 +00:00

jason.woltje commented

2026-02-15 09:30:23 +00:00

M13-SpeechServices milestone complete. All 18 sub-issues (#389-#406) implemented and closed. 62 files changed, 13,613 lines added. 500+ tests across API and web packages. Branch: feature/m13-speech-services. PR to develop pending.

jason.woltje closed this issue

2026-02-15 09:30:23 +00:00

jason.woltje referenced a pull request that will close this issue

2026-02-15 09:30:58 +00:00

feat: M13-SpeechServices — TTS & STT integration #409

jason.woltje referenced this issue from a commit

2026-02-15 09:45:07 +00:00

fix(#388): address PR review findings — fix WebSocket/REST bugs, improve error handling, fix types and comments

Sign in to join this conversation.

Branches Tags

main

fix/ci-glibc-image

fix/dockerfile-npmrc

fix/matrix-native-binary

fix/kaniko-cache

fix/base-image-kaniko-v2

fix/base-image-kaniko

feat/custom-base-image

ci/pnpm-cache

fix/interceptor-tests

fix/kanban-tests

feat/wire-chat

feat/usage-widget

fix/security-hardening

fix/project-domain-v2

feat/kanban-add-task

fix/project-domain-attach

fix/logs-page-clean

fix/workspace-members

fix/ci-lint-632

fix/file-manager-tags

fix/csrf-debug-log

fix/controller-type-imports

fix/system-admin-env

fix/gateway-cors-trusted-origins

feat/project-detail-page

fix/fleet-provider-form-dto-v2

fix/ms22-audit

fix/orchestrator-widgets

fix/fleet-provider-form-dto

fix/csrf-bearer-bypass

fix/ms22-missing-authmodule-imports

fix/container-lifecycle-config-module

fix/swarm-compose-ms22-vars

chore/ms22-p1-complete

feat/ms22-p1h-settings-ui

feat/ms22-p1f-onboarding-ui

feat/ms22-p1i-chat-proxy

feat/ms22-p1k-idle-reaper

feat/ms22-p1j-docker

feat/ms22-p1e-onboarding-api

feat/ms22-p1g-settings-api

feat/ms22-p1d-container-mgr

feat/ms22-p1c-config-api

chore/ms22-prd-tracking

feat/ms22-p1a-schema

feat/ms22-p1b-crypto

chore/ms22-p1-tasks

docs/ms22-architecture

feat/ms22-openclaw-docker

feat/ms22-openclaw-gateway-module

chore/ms21-complete

chore/ms21-final-tasks-done

fix/ms21-ui-001-qa

test/ms21-ui-tests

chore/ms21-tasks-sync

chore/ms22-phase0-complete

feat/ms22-ingest-clean

feat/ms21-ui-users-members

feat/ms22-task-agent

chore/tasks-final

chore/tasks-update

feat/ms21-session-invalidation

feat/ms21-rbac-settings

feat/ms21-teams-page

feat/ms21-users-page

feat/ms19-terminal-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: mosaic/stack#388