Streaming AI responses via Matrix message edits #383

Closed
opened 2026-02-15 07:01:39 +00:00 by jason.woltje · 1 comment
Owner

Summary

Implement streaming AI chat responses in Matrix rooms using incremental message edits. This fills the gap left by the unimplemented streamChatMessage in the REST API.

Context

The current LLM chat endpoint (/api/llm/chat) is request-response only. streamChatMessage in apps/web/src/api/chat.ts is marked as not implemented. Matrix's protocol natively supports message edits (m.replace relation), making it a natural transport for streaming LLM output.

Implementation

Flow

  1. User sends message in Matrix room (or thread)
  2. Bot sends initial response: "Thinking..." (or typing indicator)
  3. As LLM tokens stream in, bot edits the response message with accumulated text
  4. Final edit includes complete response + token usage metadata

Chunking Strategy

  • Buffer tokens and edit every ~500ms (not per-token — too many API calls)
  • Use Matrix typing indicator (m.typing) while generating
  • Final message replaces the streaming content with clean formatted output

LLM Integration

  • Call the existing LLM service with streaming enabled
  • This requires the LLM providers (Claude, OpenAI, Ollama) to support streaming responses
  • If a provider doesn't support streaming, fall back to request-response (send complete message)

Typing Indicator

  • Send m.typing event when processing starts
  • Clear typing indicator when response is complete or errored

Acceptance Criteria

  • LLM responses stream incrementally in Matrix rooms
  • Message edits used (not multiple messages)
  • Typing indicator shown during generation
  • Graceful fallback for non-streaming providers
  • Rate-limited edits (max every 500ms)
  • Final message is clean and complete
  • Token usage shown in final message (optional reaction or footer)

Refs

## Summary Implement streaming AI chat responses in Matrix rooms using incremental message edits. This fills the gap left by the unimplemented `streamChatMessage` in the REST API. ## Context The current LLM chat endpoint (`/api/llm/chat`) is request-response only. `streamChatMessage` in `apps/web/src/api/chat.ts` is marked as not implemented. Matrix's protocol natively supports message edits (`m.replace` relation), making it a natural transport for streaming LLM output. ## Implementation ### Flow 1. User sends message in Matrix room (or thread) 2. Bot sends initial response: "Thinking..." (or typing indicator) 3. As LLM tokens stream in, bot **edits** the response message with accumulated text 4. Final edit includes complete response + token usage metadata ### Chunking Strategy - Buffer tokens and edit every ~500ms (not per-token — too many API calls) - Use Matrix typing indicator (`m.typing`) while generating - Final message replaces the streaming content with clean formatted output ### LLM Integration - Call the existing LLM service with streaming enabled - This requires the LLM providers (Claude, OpenAI, Ollama) to support streaming responses - If a provider doesn't support streaming, fall back to request-response (send complete message) ### Typing Indicator - Send `m.typing` event when processing starts - Clear typing indicator when response is complete or errored ## Acceptance Criteria - [ ] LLM responses stream incrementally in Matrix rooms - [ ] Message edits used (not multiple messages) - [ ] Typing indicator shown during generation - [ ] Graceful fallback for non-streaming providers - [ ] Rate-limited edits (max every 500ms) - [ ] Final message is clean and complete - [ ] Token usage shown in final message (optional reaction or footer) ## Refs - Current chat API: `apps/api/src/llm/` - Unimplemented streaming: `apps/web/src/api/chat.ts` (search for streamChatMessage) - Matrix message editing: `m.replace` relation type - EPIC: #377 - Depends on: #378, #381
jason.woltje added the ai label 2026-02-15 07:01:39 +00:00
jason.woltje added this to the M12-MatrixBridge (0.0.12) milestone 2026-02-15 07:01:51 +00:00
Author
Owner

Completed in commit 93cd314 on branch feature/m12-matrix-bridge.

  • MatrixStreamingService with editMessage (m.replace), setTypingIndicator, streamResponse
  • Rate-limited edits at 500ms intervals
  • LLM-agnostic AsyncIterable interface
  • Thread support via MSC3440
  • Graceful error handling with typing cleanup
  • Optional editMessage added to IChatProvider interface
  • 20 tests pass, 132 total bridge tests pass
Completed in commit 93cd314 on branch feature/m12-matrix-bridge. - MatrixStreamingService with editMessage (m.replace), setTypingIndicator, streamResponse - Rate-limited edits at 500ms intervals - LLM-agnostic AsyncIterable interface - Thread support via MSC3440 - Graceful error handling with typing cleanup - Optional editMessage added to IChatProvider interface - 20 tests pass, 132 total bridge tests pass
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaic/stack#383