From 95e23f9aae2e483b332fc3fe1894ad74b89429ff Mon Sep 17 00:00:00 2001 From: Mike Christensen Date: Tue, 9 Dec 2025 18:59:37 +0000 Subject: [PATCH 1/5] ait/message-per-token: add token publishing Includes continuous token streams, correlating tokens for distinct responses, and explicit start/end events. --- .../token-streaming/message-per-token.mdx | 367 ++---------------- 1 file changed, 42 insertions(+), 325 deletions(-) diff --git a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx index 7e0f48e794..15a6803fec 100644 --- a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx +++ b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx @@ -12,11 +12,15 @@ This pattern is useful when clients only care about the most recent part of a re - **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably. - **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant. -## Publishing tokens +To get started with token streaming, all you need to do is: -Publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). +* [Use a channel](#use) +* [Publish tokens from your server](#publish) +* [Subscribe to the token stream](#subscribe) -[Channels](/docs/channels) separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. +## Use a channel + +[Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. Use the [`get()`](/docs/api/realtime-sdk/channels#get) method to create or retrieve a channel instance: @@ -26,37 +30,27 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); ``` -When publishing tokens, don't await the `channel.publish()` call. Ably rolls up acknowledgments and debounces them for efficiency, which means awaiting each publish would unnecessarily slow down your token stream. Messages are still published in the order that `publish()` is called, so delivery order is not affected. +## Publish tokens from your server + +Publishing tokens to a channel is how your AI agent communicates responses to clients. Subscribers receive tokens in realtime as they're published. + + + +Initialize an Ably Realtime client on your server: ```javascript -// ✅ Do this - publish without await for maximum throughput -for await (const event of stream) { - if (event.type === 'token') { - channel.publish('token', event.text); - } -} +import Ably from 'ably'; -// ❌ Don't do this - awaiting each publish reduces throughput -for await (const event of stream) { - if (event.type === 'token') { - await channel.publish('token', event.text); - } -} +const realtime = new Ably.Realtime({ key: 'YOUR_API_KEY' }); ``` -This approach maximizes throughput while maintaining ordering guarantees, allowing you to stream tokens as fast as your AI model generates them. - -## Streaming patterns - -Ably is a pub/sub messaging platform, so you can structure your messages however works best for your application. Below are common patterns for streaming tokens, each showing both agent-side publishing and client-side subscription. Choose the approach that fits your use case, or create your own variation. - -### Continuous token stream +### Continuous token stream -For simple streaming scenarios such as live transcription, where all tokens are part of a continuous stream, simply publish each token as a message. - -#### Publish tokens +For simple streaming scenarios such as live transcription, where all tokens are part of a continuous stream, simply publish each token as a message on the channel: ```javascript @@ -65,33 +59,15 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like { type: 'token', text: 'Hello' } for await (const event of stream) { if (event.type === 'token') { - channel.publish('token', event.text); + await channel.publish('token', event.text); } } ``` -#### Subscribe to tokens - - -```javascript -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); - -// Subscribe to token messages -await channel.subscribe('token', (message) => { - const token = message.data; - console.log(token); // log each token as it arrives -}); -``` - +### Token stream with distinct responses -This pattern is simple and works well when you're displaying a single, continuous stream of tokens. - -### Token stream with multiple responses - -For applications with multiple responses, such as chat conversations, include a `responseId` in message [extras](/docs/messages#properties) to correlate tokens together that belong to the same response. - -#### Publish tokens +For applications with multiple, distinct responses, such as chat conversations, include a `responseId` in message [extras](/docs/messages#properties) to correlate tokens together that belong to the same response: ```javascript @@ -100,7 +76,7 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like { type: 'token', text: 'Hello', responseId: 'resp_abc123' } for await (const event of stream) { if (event.type === 'token') { - channel.publish({ + await channel.publish({ name: 'token', data: event.text, extras: { @@ -114,66 +90,36 @@ for await (const event of stream) { ``` -#### Subscribe to tokens - -Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently. - - -```javascript -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); - -// Track responses by ID -const responses = new Map(); - -await channel.subscribe('token', (message) => { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - if (!responseId) { - console.warn('Token missing responseId'); - return; - } - - // Create an empty response - if (!responses.has(responseId)) { - responses.set(responseId, ''); - } - - // Append token to response - responses.set(responseId, responses.get(responseId) + token); -}); -``` - - -### Token stream with explicit start/stop events +Clients use the `responseId` to group tokens belonging to the same response. -In some cases, your AI model response stream may include explicit events to mark response boundaries. You can indicate the event type, such as a response start/stop event, using the Ably message name. +### Token stream with explicit start/end events -#### Publish tokens +In some cases, your AI model response stream may include explicit events to mark response boundaries: ```javascript const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like: -// { type: 'message_start', responseId: 'resp_abc123' } -// { type: 'message_delta', responseId: 'resp_abc123', text: 'Hello' } -// { type: 'message_stop', responseId: 'resp_abc123' } +// { type: 'start', responseId: 'resp_abc123', metadata: { model: 'llama-3' } } +// { type: 'token', responseId: 'resp_abc123', text: 'Hello' } +// { type: 'end', responseId: 'resp_abc123' } for await (const event of stream) { - if (event.type === 'message_start') { + if (event.type === 'start') { // Publish response start - channel.publish({ - name: 'start', + await channel.publish({ + name: 'response.start', extras: { headers: { - responseId: event.responseId + responseId: event.responseId, + model: event.metadata?.model } } }); - } else if (event.type === 'message_delta') { + } else if (event.type === 'token') { // Publish tokens - channel.publish({ + await channel.publish({ name: 'token', data: event.text, extras: { @@ -182,10 +128,10 @@ for await (const event of stream) { } } }); - } else if (event.type === 'message_stop') { - // Publish response stop - channel.publish({ - name: 'stop', + } else if (event.type === 'end') { + // Publish response complete + await channel.publish({ + name: 'response.complete', extras: { headers: { responseId: event.responseId @@ -197,233 +143,4 @@ for await (const event of stream) { ``` -#### Subscribe to tokens - -Handle each event type to manage response lifecycle: - - -```javascript -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); - -const responses = new Map(); - -// Handle response start -await channel.subscribe('start', (message) => { - const responseId = message.extras?.headers?.responseId; - responses.set(responseId, ''); -}); - -// Handle tokens -await channel.subscribe('token', (message) => { - const responseId = message.extras?.headers?.responseId; - const token = message.data; - - const currentText = responses.get(responseId) || ''; - responses.set(responseId, currentText + token); -}); - -// Handle response stop -await channel.subscribe('stop', (message) => { - const responseId = message.extras?.headers?.responseId; - const finalText = responses.get(responseId); - console.log('Response complete:', finalText); -}); -``` - - -## Client hydration - -When clients connect or reconnect, such as after a page refresh, they often need to catch up on tokens that were published while they were offline or before they joined. Ably provides several approaches to hydrate client state depending on your application's requirements. - - - -### Using rewind for recent history - -The simplest approach is to use Ably's [rewind](/docs/channels/options/rewind) channel option to automatically retrieve recent tokens when attaching to a channel: - - -```javascript -// Use rewind to receive recent historical messages -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', { - params: { rewind: '2m' } // or rewind: 100 for message count -}); - -// Subscribe to receive both recent historical and live messages, -// which are delivered in order to the subscription -await channel.subscribe('token', (message) => { - const token = message.data; - - // Process tokens from both recent history and live stream - console.log('Token received:', token); -}); -``` - - -Rewind supports two formats: - -- **Time-based**: Use a time interval like `'30s'` or `'2m'` to retrieve messages from that time period -- **Count-based**: Use a number like `50` or `100` to retrieve the most recent N messages (maximum 100) - - - -By default, rewind is limited to the last 2 minutes of messages. This is usually sufficient for scenarios where clients need only recent context, such as for continuous token streaming, or when the response stream from a given model request does not exceed 2 minutes. If you need more than 2 minutes of history, see [Using history for longer persistence](#history). - -### Using history for longer persistence - -For applications that need to retrieve tokens beyond the 2-minute rewind window, enable [persistence](/docs/storage-history/storage#all-message-persistence) on your channel. Use [channel history](/docs/storage-history/history) with the [`untilAttach` option](/docs/storage-history/history#continuous-history) to paginate back through history to obtain historical tokens, while preserving continuity with the delivery of live tokens: - - -```javascript -// Use a channel in a namespace called 'persisted', which has persistence enabled -const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}'); - -let response = ''; - -// Subscribe to live messages (implicitly attaches the channel) -await channel.subscribe('token', (message) => { - // Append the token to the end of the response - response += message.data; -}); - -// Fetch history up until the point of attachment -let page = await channel.history({ untilAttach: true }); - -// Paginate backwards through history -while (page) { - // Messages are newest-first, so prepend them to response - for (const message of page.items) { - response = message.data + response; - } - - // Move to next page if available - page = page.hasNext() ? await page.next() : null; -} -``` - - -### Hydrating an in-progress live response - -A common pattern is to persist complete model responses in your database while using Ably for live token delivery of the in-progress response. - -The client loads completed responses from your database, then reaches back into Ably channel history until it encounters a token for a response it's already loaded. - -You can retrieve partial history using either the [rewind](#rewind) or [history](#history) pattern. - -#### Hydrate using rewind - -Load completed responses from your database, then use rewind to catch up on any in-progress responses, skipping any tokens that belong to a response that was already loaded: - - -```javascript -// Load completed responses from database -const completedResponses = await loadResponsesFromDatabase(); - -// Use rewind to receive recent historical messages -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', { - params: { rewind: '2m' } -}); - -// Track in progress responses by ID -const inProgressResponses = new Map(); - -// Subscribe to receive both recent historical and live messages, -// which are delivered in order to the subscription -await channel.subscribe('token', (message) => { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - if (!responseId) { - console.warn('Token missing responseId'); - return; - } - - // Skip tokens for responses already hydrated from database - if (completedResponses.has(responseId)) { - return; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Append tokens for new responses - inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token); -}); -``` - - -#### Hydrate using history - -Load completed responses from your database, then paginate backwards through history to catch up on in-progress responses until you reach a token that belongs to a response you've already loaded: - - -```javascript -// Load completed responses from database -const completedResponses = await loadResponsesFromDatabase(); - -// Use a channel in a namespace called 'persisted', which has persistence enabled -const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}'); - -// Track in progress responses by ID -const inProgressResponses = new Map(); - -// Subscribe to live tokens (implicitly attaches) -await channel.subscribe('token', (message) => { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - if (!responseId) { - console.warn('Token missing responseId'); - return; - } - - // Skip tokens for responses already hydrated from database - if (completedResponses.has(responseId)) { - return; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Append live tokens for in-progress responses - inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token); -}); - -// Paginate backwards through history until we encounter a hydrated response -let page = await channel.history({ untilAttach: true }); - -// Paginate backwards through history -let done = false; -while (page && !done) { - // Messages are newest-first, so prepend them to response - for (const message of page.items) { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - // Stop when we reach a response already loaded from database - if (completedResponses.has(responseId)) { - done = true; - break; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Prepend historical tokens for in-progress responses - inProgressResponses.set(responseId, token + inProgressResponses.get(responseId)); - } - - // Move to next page if available - page = page.hasNext() ? await page.next() : null; -} -``` - +This pattern provides explicit boundaries, making it easier for clients to manage response state. From f221bc48d2a7cd4cf244c40b56f3e40e7e513ca9 Mon Sep 17 00:00:00 2001 From: Mike Christensen Date: Tue, 9 Dec 2025 21:34:29 +0000 Subject: [PATCH 2/5] ait/message-per-token: token streaming patterns Splits each token streaming approach into distinct patterns and shows both the publish and subscribe side behaviour alongside one another. --- .../token-streaming/message-per-token.mdx | 160 ++++++++++++++---- 1 file changed, 125 insertions(+), 35 deletions(-) diff --git a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx index 15a6803fec..0d466bf7cb 100644 --- a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx +++ b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx @@ -30,27 +30,37 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); ``` -## Publish tokens from your server - -Publishing tokens to a channel is how your AI agent communicates responses to clients. Subscribers receive tokens in realtime as they're published. - - - -Initialize an Ably Realtime client on your server: +When publishing tokens, don't await the `channel.publish()` call. Ably rolls up acknowledgments and debounces them for efficiency, which means awaiting each publish would unnecessarily slow down your token stream. Messages are still published in the order that `publish()` is called, so delivery order is not affected. ```javascript -import Ably from 'ably'; +// ✅ Do this - publish without await for maximum throughput +for await (const event of stream) { + if (event.type === 'token') { + channel.publish('token', event.text); + } +} -const realtime = new Ably.Realtime({ key: 'YOUR_API_KEY' }); +// ❌ Don't do this - awaiting each publish reduces throughput +for await (const event of stream) { + if (event.type === 'token') { + await channel.publish('token', event.text); + } +} ``` -### Continuous token stream +This approach maximizes throughput while maintaining ordering guarantees, allowing you to stream tokens as fast as your AI model generates them. + +## Streaming patterns -For simple streaming scenarios such as live transcription, where all tokens are part of a continuous stream, simply publish each token as a message on the channel: +Ably is a pub/sub messaging platform, so you can structure your messages however works best for your application. Below are common patterns for streaming tokens, each showing both agent-side publishing and client-side subscription. Choose the approach that fits your use case, or create your own variation. + +### Continuous token stream + +For simple streaming scenarios such as live transcription, where all tokens are part of a continuous stream, simply publish each token as a message. + +#### Publish tokens ```javascript @@ -59,15 +69,33 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like { type: 'token', text: 'Hello' } for await (const event of stream) { if (event.type === 'token') { - await channel.publish('token', event.text); + channel.publish('token', event.text); } } ``` -### Token stream with distinct responses +#### Subscribe to tokens + + +```javascript +const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); + +// Subscribe to token messages +await channel.subscribe('token', (message) => { + const token = message.data; + console.log(token); // log each token as it arrives +}); +``` + + +This pattern is simple and works well when you're displaying a single, continuous stream of tokens. + +### Token stream with multiple responses -For applications with multiple, distinct responses, such as chat conversations, include a `responseId` in message [extras](/docs/messages#properties) to correlate tokens together that belong to the same response: +For applications with multiple responses, such as chat conversations, include a `responseId` in message [extras](/docs/messages#properties) to correlate tokens together that belong to the same response. + +#### Publish tokens ```javascript @@ -76,7 +104,7 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like { type: 'token', text: 'Hello', responseId: 'resp_abc123' } for await (const event of stream) { if (event.type === 'token') { - await channel.publish({ + channel.publish({ name: 'token', data: event.text, extras: { @@ -90,36 +118,66 @@ for await (const event of stream) { ``` -Clients use the `responseId` to group tokens belonging to the same response. +#### Subscribe to tokens + +Use the `responseId` header in message extras to correlate tokens. The `responseId` allows you to group tokens belonging to the same response and correctly handle token delivery for multiple responses, even when delivered concurrently. + + +```javascript +const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); + +// Track responses by ID +const responses = new Map(); -### Token stream with explicit start/end events +await channel.subscribe('token', (message) => { + const token = message.data; + const responseId = message.extras?.headers?.responseId; -In some cases, your AI model response stream may include explicit events to mark response boundaries: + if (!responseId) { + console.warn('Token missing responseId'); + return; + } + + // Create an empty response + if (!responses.has(responseId)) { + responses.set(responseId, ''); + } + + // Append token to response + responses.set(responseId, responses.get(responseId) + token); +}); +``` + + +### Token stream with explicit start/stop events + +In some cases, your AI model response stream may include explicit events to mark response boundaries. You can indicate the event type, such as a response start/stop event, using the Ably message name. + +#### Publish tokens ```javascript const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); // Example: stream returns events like: -// { type: 'start', responseId: 'resp_abc123', metadata: { model: 'llama-3' } } -// { type: 'token', responseId: 'resp_abc123', text: 'Hello' } -// { type: 'end', responseId: 'resp_abc123' } +// { type: 'message_start', responseId: 'resp_abc123' } +// { type: 'message_delta', responseId: 'resp_abc123', text: 'Hello' } +// { type: 'message_stop', responseId: 'resp_abc123' } for await (const event of stream) { - if (event.type === 'start') { + if (event.type === 'message_start') { // Publish response start - await channel.publish({ - name: 'response.start', + channel.publish({ + name: 'start', extras: { headers: { - responseId: event.responseId, - model: event.metadata?.model + responseId: event.responseId } } }); - } else if (event.type === 'token') { + } else if (event.type === 'message_delta') { // Publish tokens - await channel.publish({ + channel.publish({ name: 'token', data: event.text, extras: { @@ -128,10 +186,10 @@ for await (const event of stream) { } } }); - } else if (event.type === 'end') { - // Publish response complete - await channel.publish({ - name: 'response.complete', + } else if (event.type === 'message_stop') { + // Publish response stop + channel.publish({ + name: 'stop', extras: { headers: { responseId: event.responseId @@ -143,4 +201,36 @@ for await (const event of stream) { ``` -This pattern provides explicit boundaries, making it easier for clients to manage response state. +#### Subscribe to tokens + +Handle each event type to manage response lifecycle: + + +```javascript +const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); + +const responses = new Map(); + +// Handle response start +await channel.subscribe('start', (message) => { + const responseId = message.extras?.headers?.responseId; + responses.set(responseId, ''); +}); + +// Handle tokens +await channel.subscribe('token', (message) => { + const responseId = message.extras?.headers?.responseId; + const token = message.data; + + const currentText = responses.get(responseId) || ''; + responses.set(responseId, currentText + token); +}); + +// Handle response stop +await channel.subscribe('stop', (message) => { + const responseId = message.extras?.headers?.responseId; + const finalText = responses.get(responseId); + console.log('Response complete:', finalText); +}); +``` + From c6ea25a06bdc3d67d1f88bf9cb90565668188f5d Mon Sep 17 00:00:00 2001 From: Fiona Corden Date: Sun, 14 Dec 2025 15:29:17 +0000 Subject: [PATCH 3/5] AI Transport overview page General overview intro page for AIT, giving a summary of major feature groups --- src/pages/docs/ai-transport/index.mdx | 53 +++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index fb2f2b271e..c635f98ade 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -4,3 +4,56 @@ meta_description: "Learn more about Ably's AI Transport and the features that en redirect_from: - /docs/products/ai-transport --- + +Ably AI Transport is a solution for building stateful, steerable, multi-device AI experiences into new or existing applications. You can use AI Transport as the transport layer with any LLM or agent framework, without rebuilding your existing stack or being locked to a particular vendor. + +## Key features + +AI Transport builds on [Ably Pub/Sub](/docs/basics) to enable reliable interactive AI experiences at scale and provides the following key features: + +// TODO: Check this lines up with planned doc sections +* [Reliable, resumable token streams](#reliable-tokens) +* [Session management](#session) +* [Complex message patterns](#message) +* [Enterprise controls](#enterprise) + +### Reliable, resumable token streams + +Ably AI Transport enables you to [stream tokens](/docs/ai-transport/features/token-streaming) reliably from your AI agent to your client devices using a channel. The token stream survives reconnections and token history is available for any new clients that connect. + +### Session management + +With AI Transport, communication between the client and agent is not tied to the connection state of either party. This gives you much more control over how to manage session lifetime and opens up opportunities for improved user experience, for example push notifications to offline users when a long-running task completes. + +// TODO: links + +### Complex message patterns + +Truly interactive AI experiences require more than a simple HTTP request-response exchange between a single client and agent. AI transport allows the use of [complex messaging patterns](//TODO: Link here), for example: +* A user can communicate with multiple agents in a single channel and receive their responses simultaneously, along with any references or citations the model may supply. +* A user can send additional requests or redirections while a response is in progress, instead of waiting until the response is complete. + +### Enterprise controls + +Ably's platform provides [integrations](/docs/platform/integrations) and capabilities to ensure that your application will meet the requirements of enterprise environments, for example [message auditing](/docs/platform/integrations/streaming), [client identification](/docs/auth/identified-clients) and [RBAC](/docs/auth/capabilities). + +## Model SDKs and frameworks + +Ably AI Transport can be used as the transport layer for any model or agent framework you choose to use in your stack, giving you full control of your technology decisions. Ably's how-to guides show example integrations with a selection of common SDKs and frameworks. + +// TODO: link these to our guides, or to their own docs? Also check that this list is accurate at the point we release +* OpenAI Responses API +* OpenAI Agents SDK +* Claude Client SDK +* Claude Agents SDK +* LangChain +* LangGraph +* Vercel AI SDK Core + +If your chosen model or framework is not listed and you need additional help to design the integration with Ably then please contact [Support](/support). + +## Next steps + +Follow the Getting Started guides to learn more about Ably AI Transport and how it will work with your application. + +// TODO: Link to getting started and other documentation when available From 9928ace20ebecead631109b2c895ef4a84d319aa Mon Sep 17 00:00:00 2001 From: Fiona Corden Date: Sun, 14 Dec 2025 15:29:50 +0000 Subject: [PATCH 4/5] Token streaming intro Overview page for token streaming - set direction, link to later pages --- src/data/nav/aitransport.ts | 5 ++ .../features/token-streaming/index.mdx | 49 +++++++++++++++++++ 2 files changed, 54 insertions(+) create mode 100644 src/pages/docs/ai-transport/features/token-streaming/index.mdx diff --git a/src/data/nav/aitransport.ts b/src/data/nav/aitransport.ts index a0cea2f5cc..e75d13d56b 100644 --- a/src/data/nav/aitransport.ts +++ b/src/data/nav/aitransport.ts @@ -21,6 +21,11 @@ export default { { name: 'Token streaming', pages: [ + { + name: 'Overview', + link: '/docs/ai-transport/features/token-streaming', + index: true, + }, { name: 'Message per token', link: '/docs/ai-transport/features/token-streaming/message-per-token', diff --git a/src/pages/docs/ai-transport/features/token-streaming/index.mdx b/src/pages/docs/ai-transport/features/token-streaming/index.mdx new file mode 100644 index 0000000000..7d108c318e --- /dev/null +++ b/src/pages/docs/ai-transport/features/token-streaming/index.mdx @@ -0,0 +1,49 @@ +--- +title: Token streaming +meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution." +--- + +Token streaming is a technique used with Large Language Models (LLMs) where the model's response is transmitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real-time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client. + +// diagram of HTTP request/response + +If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The exact mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. + +// diagram of Ably channel + +The Ably platform guarantees that messages will be [delivered in order](/docs/platform/architecture/message-ordering#ordering-guarantees) and [exactly once](/docs/platform/architecture/idempotency), so your client application does not have to handle duplicate or out-of-order messages. + +## Token streaming patterns + +Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns over realtime connections, so you can choose the one that best fits your requirements and customise it for your application. + +// TODO - find a reference link for realtime connections + +### Message-per-response +Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) is a pattern where a complete response appears as one message in the channel history. The agent publishes the individual tokens over a realtime connection while the response is in progress and Ably concatenates the tokens into a single message in channel history. Connected clients receive the individual tokens messages as they are published. New clients will first receive a single message containing all the tokens that have been published for this response, followed by any newly published token messages. + +This pattern is useful when clients must always receive all tokens for the response and you need to efficiently rehydrate the client with a full conversation log. This is the recommended pattern for most applications. + +// diagram? + +### Message-per-token +Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. + +This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example: + +- **Backend-stored responses**: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used only to deliver live tokens for the current in-progress response. +- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far. +- **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably. +- **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant. + +// diagram? + +## Message events + +Different models and frameworks use different events to signal what is being sent to the client, for example start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects. + +## Next steps

-Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns over realtime connections, so you can choose the one that best fits your requirements and customise it for your application. - -// TODO - find a reference link for realtime connections +Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns using a [Realtime](/docs/api/realtime-sdk) client, +so you can choose the one that best fits your requirements and customise it for your application. The Realtime client maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, +while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). ### Message-per-response -Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) is a pattern where a complete response appears as one message in the channel history. The agent publishes the individual tokens over a realtime connection while the response is in progress and Ably concatenates the tokens into a single message in channel history. Connected clients receive the individual tokens messages as they are published. New clients will first receive a single message containing all the tokens that have been published for this response, followed by any newly published token messages. +Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) enables you to stream AI-generated content as individual tokens in realtime, while maintaining a clean, compacted message history. Each AI response becomes a single message on an Ably channel that grows as tokens are appended, resulting in efficient storage and easy retrieval of complete responses. -This pattern is useful when clients must always receive all tokens for the response and you need to efficiently rehydrate the client with a full conversation log. This is the recommended pattern for most applications. - -// diagram? +This pattern is the recommended approach for most applications. It is useful if you want clients joining mid-stream to catch up efficiently without receiving thousands of individual tokens and if clients build up a long conversation history that must be efficiently loaded on new or reconnecting devices. ### Message-per-token Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. @@ -36,8 +32,6 @@ This pattern is useful when clients only care about the most recent part of a re - **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably. - **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant. -// diagram? - ## Message events Different models and frameworks use different events to signal what is being sent to the client, for example start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects.