diff --git a/src/data/nav/aitransport.ts b/src/data/nav/aitransport.ts index a0cea2f5cc..e75d13d56b 100644 --- a/src/data/nav/aitransport.ts +++ b/src/data/nav/aitransport.ts @@ -21,6 +21,11 @@ export default { { name: 'Token streaming', pages: [ + { + name: 'Overview', + link: '/docs/ai-transport/features/token-streaming', + index: true, + }, { name: 'Message per token', link: '/docs/ai-transport/features/token-streaming/message-per-token', diff --git a/src/images/content/diagrams/ai-transport-before-and-after.png b/src/images/content/diagrams/ai-transport-before-and-after.png new file mode 100644 index 0000000000..d29ae4a6f1 Binary files /dev/null and b/src/images/content/diagrams/ai-transport-before-and-after.png differ diff --git a/src/pages/docs/ai-transport/features/token-streaming/index.mdx b/src/pages/docs/ai-transport/features/token-streaming/index.mdx new file mode 100644 index 0000000000..703d203a32 --- /dev/null +++ b/src/pages/docs/ai-transport/features/token-streaming/index.mdx @@ -0,0 +1,43 @@ +--- +title: Token streaming +meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution." +--- + +Token streaming is a technique used with Large Language Models (LLMs) where the model's response is transmitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real-time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client. + +![Ably AIT network diagram](../../../../../images/content/diagrams/ai-transport-before-and-after.png) + +If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The exact mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. + +The Ably platform guarantees that messages will be [delivered in order](/docs/platform/architecture/message-ordering#ordering-guarantees) and [exactly once](/docs/platform/architecture/idempotency), so your client application does not have to handle duplicate or out-of-order messages. + +## Token streaming patterns + +Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns using a [Realtime](/docs/api/realtime-sdk) client, +so you can choose the one that best fits your requirements and customise it for your application. The Realtime client maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, +while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). + +### Message-per-response +Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) enables you to stream AI-generated content as individual tokens in realtime, while maintaining a clean, compacted message history. Each AI response becomes a single message on an Ably channel that grows as tokens are appended, resulting in efficient storage and easy retrieval of complete responses. + +This pattern is the recommended approach for most applications. It is useful if you want clients joining mid-stream to catch up efficiently without receiving thousands of individual tokens and if clients build up a long conversation history that must be efficiently loaded on new or reconnecting devices. + +### Message-per-token +Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. + +This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example: + +- **Backend-stored responses**: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used only to deliver live tokens for the current in-progress response. +- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far. +- **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably. +- **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant. + +## Message events + +Different models and frameworks use different events to signal what is being sent to the client, for example start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects. + +## Next steps +To get started with token streaming, all you need to do is: -Publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). +* [Use a channel](#use) +* [Publish tokens from your server](#publish) +* [Subscribe to the token stream](#subscribe) -[Channels](/docs/channels) separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. +## Use a channel + +[Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. Use the [`get()`](/docs/api/realtime-sdk/channels#get) method to create or retrieve a channel instance: @@ -230,200 +234,3 @@ await channel.subscribe('stop', (message) => { }); ``` - -## Client hydration - -When clients connect or reconnect, such as after a page refresh, they often need to catch up on tokens that were published while they were offline or before they joined. Ably provides several approaches to hydrate client state depending on your application's requirements. - - - -### Using rewind for recent history - -The simplest approach is to use Ably's [rewind](/docs/channels/options/rewind) channel option to automatically retrieve recent tokens when attaching to a channel: - - -```javascript -// Use rewind to receive recent historical messages -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', { - params: { rewind: '2m' } // or rewind: 100 for message count -}); - -// Subscribe to receive both recent historical and live messages, -// which are delivered in order to the subscription -await channel.subscribe('token', (message) => { - const token = message.data; - - // Process tokens from both recent history and live stream - console.log('Token received:', token); -}); -``` - - -Rewind supports two formats: - -- **Time-based**: Use a time interval like `'30s'` or `'2m'` to retrieve messages from that time period -- **Count-based**: Use a number like `50` or `100` to retrieve the most recent N messages (maximum 100) - - - -By default, rewind is limited to the last 2 minutes of messages. This is usually sufficient for scenarios where clients need only recent context, such as for continuous token streaming, or when the response stream from a given model request does not exceed 2 minutes. If you need more than 2 minutes of history, see [Using history for longer persistence](#history). - -### Using history for longer persistence - -For applications that need to retrieve tokens beyond the 2-minute rewind window, enable [persistence](/docs/storage-history/storage#all-message-persistence) on your channel. Use [channel history](/docs/storage-history/history) with the [`untilAttach` option](/docs/storage-history/history#continuous-history) to paginate back through history to obtain historical tokens, while preserving continuity with the delivery of live tokens: - - -```javascript -// Use a channel in a namespace called 'persisted', which has persistence enabled -const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}'); - -let response = ''; - -// Subscribe to live messages (implicitly attaches the channel) -await channel.subscribe('token', (message) => { - // Append the token to the end of the response - response += message.data; -}); - -// Fetch history up until the point of attachment -let page = await channel.history({ untilAttach: true }); - -// Paginate backwards through history -while (page) { - // Messages are newest-first, so prepend them to response - for (const message of page.items) { - response = message.data + response; - } - - // Move to next page if available - page = page.hasNext() ? await page.next() : null; -} -``` - - -### Hydrating an in-progress live response - -A common pattern is to persist complete model responses in your database while using Ably for live token delivery of the in-progress response. - -The client loads completed responses from your database, then reaches back into Ably channel history until it encounters a token for a response it's already loaded. - -You can retrieve partial history using either the [rewind](#rewind) or [history](#history) pattern. - -#### Hydrate using rewind - -Load completed responses from your database, then use rewind to catch up on any in-progress responses, skipping any tokens that belong to a response that was already loaded: - - -```javascript -// Load completed responses from database -const completedResponses = await loadResponsesFromDatabase(); - -// Use rewind to receive recent historical messages -const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', { - params: { rewind: '2m' } -}); - -// Track in progress responses by ID -const inProgressResponses = new Map(); - -// Subscribe to receive both recent historical and live messages, -// which are delivered in order to the subscription -await channel.subscribe('token', (message) => { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - if (!responseId) { - console.warn('Token missing responseId'); - return; - } - - // Skip tokens for responses already hydrated from database - if (completedResponses.has(responseId)) { - return; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Append tokens for new responses - inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token); -}); -``` - - -#### Hydrate using history - -Load completed responses from your database, then paginate backwards through history to catch up on in-progress responses until you reach a token that belongs to a response you've already loaded: - - -```javascript -// Load completed responses from database -const completedResponses = await loadResponsesFromDatabase(); - -// Use a channel in a namespace called 'persisted', which has persistence enabled -const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}'); - -// Track in progress responses by ID -const inProgressResponses = new Map(); - -// Subscribe to live tokens (implicitly attaches) -await channel.subscribe('token', (message) => { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - if (!responseId) { - console.warn('Token missing responseId'); - return; - } - - // Skip tokens for responses already hydrated from database - if (completedResponses.has(responseId)) { - return; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Append live tokens for in-progress responses - inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token); -}); - -// Paginate backwards through history until we encounter a hydrated response -let page = await channel.history({ untilAttach: true }); - -// Paginate backwards through history -let done = false; -while (page && !done) { - // Messages are newest-first, so prepend them to response - for (const message of page.items) { - const token = message.data; - const responseId = message.extras?.headers?.responseId; - - // Stop when we reach a response already loaded from database - if (completedResponses.has(responseId)) { - done = true; - break; - } - - // Create an empty in-progress response - if (!inProgressResponses.has(responseId)) { - inProgressResponses.set(responseId, ''); - } - - // Prepend historical tokens for in-progress responses - inProgressResponses.set(responseId, token + inProgressResponses.get(responseId)); - } - - // Move to next page if available - page = page.hasNext() ? await page.next() : null; -} -``` - diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index fb2f2b271e..c635f98ade 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -4,3 +4,56 @@ meta_description: "Learn more about Ably's AI Transport and the features that en redirect_from: - /docs/products/ai-transport --- + +Ably AI Transport is a solution for building stateful, steerable, multi-device AI experiences into new or existing applications. You can use AI Transport as the transport layer with any LLM or agent framework, without rebuilding your existing stack or being locked to a particular vendor. + +## Key features + +AI Transport builds on [Ably Pub/Sub](/docs/basics) to enable reliable interactive AI experiences at scale and provides the following key features: + +// TODO: Check this lines up with planned doc sections +* [Reliable, resumable token streams](#reliable-tokens) +* [Session management](#session) +* [Complex message patterns](#message) +* [Enterprise controls](#enterprise) + +### Reliable, resumable token streams + +Ably AI Transport enables you to [stream tokens](/docs/ai-transport/features/token-streaming) reliably from your AI agent to your client devices using a channel. The token stream survives reconnections and token history is available for any new clients that connect. + +### Session management + +With AI Transport, communication between the client and agent is not tied to the connection state of either party. This gives you much more control over how to manage session lifetime and opens up opportunities for improved user experience, for example push notifications to offline users when a long-running task completes. + +// TODO: links + +### Complex message patterns + +Truly interactive AI experiences require more than a simple HTTP request-response exchange between a single client and agent. AI transport allows the use of [complex messaging patterns](//TODO: Link here), for example: +* A user can communicate with multiple agents in a single channel and receive their responses simultaneously, along with any references or citations the model may supply. +* A user can send additional requests or redirections while a response is in progress, instead of waiting until the response is complete. + +### Enterprise controls + +Ably's platform provides [integrations](/docs/platform/integrations) and capabilities to ensure that your application will meet the requirements of enterprise environments, for example [message auditing](/docs/platform/integrations/streaming), [client identification](/docs/auth/identified-clients) and [RBAC](/docs/auth/capabilities). + +## Model SDKs and frameworks + +Ably AI Transport can be used as the transport layer for any model or agent framework you choose to use in your stack, giving you full control of your technology decisions. Ably's how-to guides show example integrations with a selection of common SDKs and frameworks. + +// TODO: link these to our guides, or to their own docs? Also check that this list is accurate at the point we release +* OpenAI Responses API +* OpenAI Agents SDK +* Claude Client SDK +* Claude Agents SDK +* LangChain +* LangGraph +* Vercel AI SDK Core + +If your chosen model or framework is not listed and you need additional help to design the integration with Ably then please contact [Support](/support). + +## Next steps + +Follow the Getting Started guides to learn more about Ably AI Transport and how it will work with your application. + +// TODO: Link to getting started and other documentation when available