Product Release2025-11-14

Real-time AI chat over Server-Sent Events — streaming, cancellable, connection-leased

Streaming chat delivers LLM tokens progressively over Server-Sent Events with connection-lease management against exhaustion and mid-stream cancellation. The knowledge-aware assistant integrates document context as a first-class multi-turn interaction.

Waiting for a multi-second LLM response to complete before showing anything to the user is a perceived-latency loss the chat surface cannot afford. This release moves the platform's AI chat surface to Server-Sent Events so tokens stream to the browser as the model produces them ; the user sees the answer composing in real time.

Token-progressive delivery. Each SSE event carries the next chunk of the LLM response ; the browser appends to the rendered message without buffering.
Connection-lease management. Each active chat session leases a connection from a bounded pool ; pool exhaustion under spikes blocks new sessions briefly rather than crashing existing ones. Operators can size the pool against expected concurrency.
Mid-stream cancellation. A user-initiated Stop cancels the in-flight LLM call cleanly ; the connection releases back to the pool immediately, the partial response is preserved in the session history.
Knowledge-aware assistant mode. The assistant consults the RAG layer (HNSW on Informix or pgvector on PostgreSQL) before each turn ; retrieved document context flows into the LLM call alongside the user message, with the source citations rendered in the chat as inline references.

See the feature →

← All posts