Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/data/nav/aitransport.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ export default {
{
name: 'Token streaming',
pages: [
{
name: 'Overview',
link: '/docs/ai-transport/features/token-streaming',
index: true,
},
{
name: 'Message per token',
link: '/docs/ai-transport/features/token-streaming/message-per-token',
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 43 additions & 0 deletions src/pages/docs/ai-transport/features/token-streaming/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: Token streaming
meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution."
---

Token streaming is a technique used with Large Language Models (LLMs) where the model's response is transmitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real-time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client.

![Ably AIT network diagram](../../../../../images/content/diagrams/ai-transport-before-and-after.png)

If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The exact mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use.

The Ably platform guarantees that messages will be [delivered in order](/docs/platform/architecture/message-ordering#ordering-guarantees) and [exactly once](/docs/platform/architecture/idempotency), so your client application does not have to handle duplicate or out-of-order messages.

## Token streaming patterns <a id="patterns"/>

Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns using a [Realtime](/docs/api/realtime-sdk) client,
so you can choose the one that best fits your requirements and customise it for your application. The Realtime client maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies,
while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).

### Message-per-response <a id="pattern-per-response"/>
Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) enables you to stream AI-generated content as individual tokens in realtime, while maintaining a clean, compacted message history. Each AI response becomes a single message on an Ably channel that grows as tokens are appended, resulting in efficient storage and easy retrieval of complete responses.

This pattern is the recommended approach for most applications. It is useful if you want clients joining mid-stream to catch up efficiently without receiving thousands of individual tokens and if clients build up a long conversation history that must be efficiently loaded on new or reconnecting devices.

### Message-per-token <a id="pattern-per-token"/>
Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history.

This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example:

- **Backend-stored responses**: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used only to deliver live tokens for the current in-progress response.
- **Live transcription, captioning, or translation**: A viewer who joins a live stream only needs the last few tokens for the current "frame" of subtitles, not the entire transcript so far.
- **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably.
- **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant.

## Message events <a id="events"/>

Different models and frameworks use different events to signal what is being sent to the client, for example start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects.

## Next steps <a id="next"/>

Read more about token streaming with the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) and [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) patterns. Alternatively, check out the how-to guides to see how to implement these patterns with a variety of models and frameworks.

// TODO: guide links
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,15 @@ This pattern is useful when clients only care about the most recent part of a re
- **Code assistance in an editor**: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably.
- **Autocomplete**: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant.

## Publishing tokens <a id="publishing"/>
To get started with token streaming, all you need to do is:

Publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).
* [Use a channel](#use)
* [Publish tokens from your server](#publish)
* [Subscribe to the token stream](#subscribe)

[Channels](/docs/channels) separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel.
## Use a channel <a id="use"/>

[Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel.

Use the [`get()`](/docs/api/realtime-sdk/channels#get) method to create or retrieve a channel instance:

Expand Down Expand Up @@ -230,200 +234,3 @@ await channel.subscribe('stop', (message) => {
});
```
</Code>

## Client hydration <a id="hydration"/>

When clients connect or reconnect, such as after a page refresh, they often need to catch up on tokens that were published while they were offline or before they joined. Ably provides several approaches to hydrate client state depending on your application's requirements.

<Aside data-type="note">
If you need to retrieve and process large amounts of historical data, consider using the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) pattern instead, in which the complete response appears as a single message in history.
</Aside>

### Using rewind for recent history <a id="rewind"/>

The simplest approach is to use Ably's [rewind](/docs/channels/options/rewind) channel option to automatically retrieve recent tokens when attaching to a channel:

<Code>
```javascript
// Use rewind to receive recent historical messages
const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', {
params: { rewind: '2m' } // or rewind: 100 for message count
});

// Subscribe to receive both recent historical and live messages,
// which are delivered in order to the subscription
await channel.subscribe('token', (message) => {
const token = message.data;

// Process tokens from both recent history and live stream
console.log('Token received:', token);
});
```
</Code>

Rewind supports two formats:

- **Time-based**: Use a time interval like `'30s'` or `'2m'` to retrieve messages from that time period
- **Count-based**: Use a number like `50` or `100` to retrieve the most recent N messages (maximum 100)

<Aside data-type="note">
At most 100 messages will be retrieved in a rewind request. If more messages exist within the specified interval, only the most recent 100 are sent.
</Aside>

By default, rewind is limited to the last 2 minutes of messages. This is usually sufficient for scenarios where clients need only recent context, such as for continuous token streaming, or when the response stream from a given model request does not exceed 2 minutes. If you need more than 2 minutes of history, see [Using history for longer persistence](#history).

### Using history for longer persistence <a id="history"/>

For applications that need to retrieve tokens beyond the 2-minute rewind window, enable [persistence](/docs/storage-history/storage#all-message-persistence) on your channel. Use [channel history](/docs/storage-history/history) with the [`untilAttach` option](/docs/storage-history/history#continuous-history) to paginate back through history to obtain historical tokens, while preserving continuity with the delivery of live tokens:

<Code>
```javascript
// Use a channel in a namespace called 'persisted', which has persistence enabled
const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}');

let response = '';

// Subscribe to live messages (implicitly attaches the channel)
await channel.subscribe('token', (message) => {
// Append the token to the end of the response
response += message.data;
});

// Fetch history up until the point of attachment
let page = await channel.history({ untilAttach: true });

// Paginate backwards through history
while (page) {
// Messages are newest-first, so prepend them to response
for (const message of page.items) {
response = message.data + response;
}

// Move to next page if available
page = page.hasNext() ? await page.next() : null;
}
```
</Code>

### Hydrating an in-progress live response <a id="live-response"/>

A common pattern is to persist complete model responses in your database while using Ably for live token delivery of the in-progress response.

The client loads completed responses from your database, then reaches back into Ably channel history until it encounters a token for a response it's already loaded.

You can retrieve partial history using either the [rewind](#rewind) or [history](#history) pattern.

#### Hydrate using rewind

Load completed responses from your database, then use rewind to catch up on any in-progress responses, skipping any tokens that belong to a response that was already loaded:

<Code>
```javascript
// Load completed responses from database
const completedResponses = await loadResponsesFromDatabase();

// Use rewind to receive recent historical messages
const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}', {
params: { rewind: '2m' }
});

// Track in progress responses by ID
const inProgressResponses = new Map();

// Subscribe to receive both recent historical and live messages,
// which are delivered in order to the subscription
await channel.subscribe('token', (message) => {
const token = message.data;
const responseId = message.extras?.headers?.responseId;

if (!responseId) {
console.warn('Token missing responseId');
return;
}

// Skip tokens for responses already hydrated from database
if (completedResponses.has(responseId)) {
return;
}

// Create an empty in-progress response
if (!inProgressResponses.has(responseId)) {
inProgressResponses.set(responseId, '');
}

// Append tokens for new responses
inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token);
});
```
</Code>

#### Hydrate using history

Load completed responses from your database, then paginate backwards through history to catch up on in-progress responses until you reach a token that belongs to a response you've already loaded:

<Code>
```javascript
// Load completed responses from database
const completedResponses = await loadResponsesFromDatabase();

// Use a channel in a namespace called 'persisted', which has persistence enabled
const channel = realtime.channels.get('persisted:{{RANDOM_CHANNEL_NAME}}');

// Track in progress responses by ID
const inProgressResponses = new Map();

// Subscribe to live tokens (implicitly attaches)
await channel.subscribe('token', (message) => {
const token = message.data;
const responseId = message.extras?.headers?.responseId;

if (!responseId) {
console.warn('Token missing responseId');
return;
}

// Skip tokens for responses already hydrated from database
if (completedResponses.has(responseId)) {
return;
}

// Create an empty in-progress response
if (!inProgressResponses.has(responseId)) {
inProgressResponses.set(responseId, '');
}

// Append live tokens for in-progress responses
inProgressResponses.set(responseId, inProgressResponses.get(responseId) + token);
});

// Paginate backwards through history until we encounter a hydrated response
let page = await channel.history({ untilAttach: true });

// Paginate backwards through history
let done = false;
while (page && !done) {
// Messages are newest-first, so prepend them to response
for (const message of page.items) {
const token = message.data;
const responseId = message.extras?.headers?.responseId;

// Stop when we reach a response already loaded from database
if (completedResponses.has(responseId)) {
done = true;
break;
}

// Create an empty in-progress response
if (!inProgressResponses.has(responseId)) {
inProgressResponses.set(responseId, '');
}

// Prepend historical tokens for in-progress responses
inProgressResponses.set(responseId, token + inProgressResponses.get(responseId));
}

// Move to next page if available
page = page.hasNext() ? await page.next() : null;
}
```
</Code>
53 changes: 53 additions & 0 deletions src/pages/docs/ai-transport/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,56 @@ meta_description: "Learn more about Ably's AI Transport and the features that en
redirect_from:
- /docs/products/ai-transport
---

Ably AI Transport is a solution for building stateful, steerable, multi-device AI experiences into new or existing applications. You can use AI Transport as the transport layer with any LLM or agent framework, without rebuilding your existing stack or being locked to a particular vendor.

## Key features

AI Transport builds on [Ably Pub/Sub](/docs/basics) to enable reliable interactive AI experiences at scale and provides the following key features:

// TODO: Check this lines up with planned doc sections
* [Reliable, resumable token streams](#reliable-tokens)
* [Session management](#session)
* [Complex message patterns](#message)
* [Enterprise controls](#enterprise)

### Reliable, resumable token streams <a id="reliable-tokens"/>

Ably AI Transport enables you to [stream tokens](/docs/ai-transport/features/token-streaming) reliably from your AI agent to your client devices using a channel. The token stream survives reconnections and token history is available for any new clients that connect.

### Session management <a id="session"/>

With AI Transport, communication between the client and agent is not tied to the connection state of either party. This gives you much more control over how to manage session lifetime and opens up opportunities for improved user experience, for example push notifications to offline users when a long-running task completes.

// TODO: links

### Complex message patterns <a id="message"/>

Truly interactive AI experiences require more than a simple HTTP request-response exchange between a single client and agent. AI transport allows the use of [complex messaging patterns](//TODO: Link here), for example:
* A user can communicate with multiple agents in a single channel and receive their responses simultaneously, along with any references or citations the model may supply.
* A user can send additional requests or redirections while a response is in progress, instead of waiting until the response is complete.

### Enterprise controls <a id="enterprise"/>

Ably's platform provides [integrations](/docs/platform/integrations) and capabilities to ensure that your application will meet the requirements of enterprise environments, for example [message auditing](/docs/platform/integrations/streaming), [client identification](/docs/auth/identified-clients) and [RBAC](/docs/auth/capabilities).

## Model SDKs and frameworks <a id="sdks-frameworks"/>

Ably AI Transport can be used as the transport layer for any model or agent framework you choose to use in your stack, giving you full control of your technology decisions. Ably's how-to guides show example integrations with a selection of common SDKs and frameworks.

// TODO: link these to our guides, or to their own docs? Also check that this list is accurate at the point we release
* OpenAI Responses API
* OpenAI Agents SDK
* Claude Client SDK
* Claude Agents SDK
* LangChain
* LangGraph
* Vercel AI SDK Core

If your chosen model or framework is not listed and you need additional help to design the integration with Ably then please contact [Support](/support).

## Next steps

Follow the Getting Started guides to learn more about Ably AI Transport and how it will work with your application.

// TODO: Link to getting started and other documentation when available