Skip to content
Open
131 changes: 131 additions & 0 deletions content/partials/types/_message.textile
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
A @Message@ represents an individual message that is sent to or received from Ably.

h6(#name).
default: name
csharp: Name

The event name, if provided. <br>__Type: @String@__

h6(#data).
default: data
csharp: Data

The message payload, if provided.<br>__Type: <span lang="default">@String@, @StringBuffer@, @JSON Object@</span><span lang="java">@String@, @ByteArray@, @JSONObject@, @JSONArray@</span><span lang="csharp">@String@, @byte[]@, @plain C# object that can be serialized to JSON@</span><span lang="ruby">@String@, @Binary@ (ASCII-8BIT String), @Hash@, @Array@</span><span lang="python">@String@, @Bytearray@, @Dict@, @List@</span><span lang="php">@String@, @Binary String@, @Associative Array@, @Array@</span><span lang="objc">@NSString *@, @NSData *@, @NSDictionary *@, @NSArray *@</span><span lang="swift">@String@, @NSData@, @Dictionary@, @Array@</span><span lang="flutter">@String@, @Map@, @List@</span>__

h6(#extras).
default: extras
csharp: Extras

Metadata and/or ancillary payloads, if provided. Valid payloads include "@push@":/docs/push/publish#payload, "@headers@" (a map of strings to strings for arbitrary customer-supplied metadata), "@ephemeral@":/docs/pub-sub/advanced#ephemeral, and "@privileged@":/docs/platform/integrations/webhooks#skipping objects.<br>__Type: <span lang="java">@JSONObject@, @JSONArray@</span><span lang="csharp">plain C# object that can be converted to JSON</span><span lang="jsall">@JSON Object@</span><span lang="ruby">@Hash@, @Array@</span><span lang="python">@Dict@, @List@</span><span lang="swift">@Dictionary@, @Array@</span><span lang="objc">@NSDictionary *@, @NSArray *@</span><span lang="php">@Associative Array@, @Array@</span>__

h6(#id).
default: id
csharp: Id

A Unique ID assigned by Ably to this message.<br>__Type: @String@__

h6(#client-id).
default: clientId
csharp: ClientId
ruby: client_id
python: client_id

The client ID of the publisher of this message.<br>__Type: @String@__

h6(#connection-id).
default: connectionId
csharp: ConnectionId
ruby: connection_id
python: connection_id

The connection ID of the publisher of this message.<br>__Type: @String@__

h6(#connection-key).
default: connectionKey
csharp,go: ConnectionKey
ruby,python: connection_key

A connection key, which can optionally be included for a REST publish as part of the "publishing on behalf of a realtime client functionality":/docs/pub-sub/advanced#publish-on-behalf.<br>__Type: @String@__

h6(#timestamp).
default: timestamp
csharp: Timestamp

Timestamp when the message was first received by the Ably, as <span lang="default">milliseconds since the epoch</span><span lang="ruby">a @Time@ object</span><br>.__Type: <span lang="default">@Integer@</span><span lang="java">@Long Integer@</span><span lang="csharp">@DateTimeOffset@</span><span lang="ruby">@Time@</span><span lang="objc,swift">@NSDate@</span>__

h6(#encoding).
default: encoding
csharp: Encoding

This will typically be empty as all messages received from Ably are automatically decoded client-side using this value. However, if the message encoding cannot be processed, this attribute will contain the remaining transformations not applied to the @data@ payload.<br>__Type: @String@__

blang[jsall].

h6(#action).
default: action

The action type of the message, one of the "@MessageAction@":#message-action enum values.<br>__Type: @int enum { MESSAGE_CREATE, MESSAGE_UPDATE, MESSAGE_DELETE, META, MESSAGE_SUMMARY }@__

h6(#serial).
default: serial

A server-assigned identifier that will be the same in all future updates of this message. It can be used to add "annotations":/docs/messages/annotations to a message or to "update or delete":/docs/messages/updates-deletes it. Serial will only be set if you enable annotations, updates, deletes, and appends in "channel rules":/docs/channels#rules .<br>__Type: @String@__

h6(#annotations).
default: annotations

An object containing information about annotations that have been made to the object.<br>__Type: "@MessageAnnotations@":/docs/api/realtime-sdk/types#message-annotations__

h6(#version).
default: version

An object containing version metadata for messages that have been updated or deleted. See "updating and deleting messages":/docs/messages/updates-deletes for more information.<br>__Type: "@MessageVersion@":#message-version__

h3(constructors).
default: Message constructors

h6(#message-from-encoded).
default: Message.fromEncoded

bq(definition).
default: Message.fromEncoded(Object encodedMsg, ChannelOptions channelOptions?) -> Message

A static factory method to create a "@Message@":/docs/api/realtime-sdk/types#message from a deserialized @Message@-like object encoded using Ably's wire protocol.

h4. Parameters

- encodedMsg := a @Message@-like deserialized object.<br>__Type: @Object@__
- channelOptions := an optional "@ChannelOptions@":/docs/api/realtime-sdk/types#channel-options. If you have an encrypted channel, use this to allow the library can decrypt the data.<br>__Type: @Object@__

h4. Returns

A "@Message@":/docs/api/realtime-sdk/types#message object

h6(#message-from-encoded-array).
default: Message.fromEncodedArray

bq(definition).
default: Message.fromEncodedArray(Object[] encodedMsgs, ChannelOptions channelOptions?) -> Message[]

A static factory method to create an array of "@Messages@":/docs/api/realtime-sdk/types#message from an array of deserialized @Message@-like object encoded using Ably's wire protocol.

h4. Parameters

- encodedMsgs := an array of @Message@-like deserialized objects.<br>__Type: @Array@__
- channelOptions := an optional "@ChannelOptions@":/docs/api/realtime-sdk/types#channel-options. If you have an encrypted channel, use this to allow the library can decrypt the data.<br>__Type: @Object@__

h4. Returns

An @Array@ of "@Message@":/docs/api/realtime-sdk/types#message objects

h3(#message-version).
default: MessageVersion

h4. Properties

|_. Property |_. Description |_. Type |
| serial | An Ably-generated ID that uniquely identifies this version of the message. Can be compared lexicographically to determine version ordering. For an original message with an action of @message.create@, this will be equal to the top-level @serial@. | @String@ |
| timestamp | The time this version was created (when the update or delete operation was performed). For an original message, this will be equal to the top-level @timestamp@. | <span lang="default">@Integer@</span><span lang="java">@Long Integer@</span><span lang="csharp">@DateTimeOffset@</span><span lang="ruby">@Time@</span><span lang="objc,swift">@NSDate@</span> |
| clientId | The client identifier of the user who performed the update or delete operation. Only present for @message.update@ and @message.delete@ actions. | @String@ (optional) |
| description | Optional description provided when the update or delete was performed. Only present for @message.update@ and @message.delete@ actions. | @String@ (optional) |
| metadata | Optional metadata provided when the update or delete was performed. Only present for @message.update@ and @message.delete@ actions. | @Object@ (optional) |
5 changes: 5 additions & 0 deletions src/data/nav/aitransport.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ export default {
{
name: 'Token streaming',
pages: [
{
name: 'Overview',
link: '/docs/ai-transport/features/token-streaming',
index: true,
},
{
name: 'Message per response',
link: '/docs/ai-transport/features/token-streaming/message-per-response',
Expand Down
44 changes: 44 additions & 0 deletions src/pages/docs/ai-transport/features/token-streaming/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: Token streaming
meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution."
---

Token streaming is a technique used with Large Language Models (LLMs) where the model's response is emitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we prefer to use single word "realtime" at Ably.
(This is not what most of the internet seems to do, but alas this is our convention)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is normally accomplished by streaming the tokens as the response to an HTTP request from the client."

I think this can be moved out into a new paragraph. I think the intro paragraph should focus on the description of what token streaming is before getting into how it is implemented.

Then, I would suggest colocating this statement with the content that follows after the image, since that paragraph starts by describing the weakness of this approach.


![Ably AIT network diagram](../../../../../images/content/diagrams/ai-transport-before-and-after.png)

If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The detailed mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use.

Comment on lines +10 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a wall of text, but there are some nice bits of value prop in there. Can we pull those out, perhaps into bullets?

The Ably platform guarantees that messages from a given realtime publisher will be [delivered in order](/docs/platform/architecture/message-ordering#ordering-guarantees) and [exactly once](/docs/platform/architecture/idempotency), so your client application does not have to handle duplicate or out-of-order messages.

## Token streaming patterns <a id="patterns"/>

Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns using a [Realtime](/docs/api/realtime-sdk) client, so you can choose the one that best fits your requirements and customise it for your application. The Realtime client maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).

### Message-per-response <a id="pattern-per-response"/>
Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) enables you to stream LLM-generated content as individual tokens in realtime, while maintaining a clean, compacted message history. Each LLM response becomes a single message on an Ably channel that grows as tokens are appended, resulting in efficient storage and easy retrieval of complete responses.

This pattern is the recommended approach for most applications. It is useful if you want clients joining mid-stream to catch up efficiently without receiving thousands of individual tokens and if clients build up a long conversation history that must be efficiently loaded on new or reconnecting devices. For example:

- Chat experiences: The full chat history must be replayed from Ably whenever the user changes device or when a new participant joins the chat, allowing both users and agents to maintain context.
- Long-running and asynchronous tasks: Users need to catch up quickly when they reconnect to check progress throughout the task lifetime, but do not need to receive the individual tokens that make up the response.
- Backend-stored responses: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used to deliver the current in-progress response.

### Message-per-token <a id="pattern-per-token"/>
Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as an independent Ably message. Each token then appears as one message in the channel history.

This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log, or if you need to preserve the specific token fragmentation that was generated by the model. For example:

- Live transcription, captioning, or translation: A viewer who joins a live stream only needs sufficient tokens for the current "frame" of subtitles, not the entire transcript so far.
- Code assistance in an editor: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably.
- Autocomplete: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant.

## Message events <a id="events"/>

Different models and frameworks use different events to signal what is being sent to the client, such as start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects.

## Next steps <a id="next"/>

Read more about token streaming with the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) and [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) patterns. Alternatively, check out the how-to guides to see how to implement these patterns with a variety of models and frameworks.

// TODO: guide links
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Standard Ably message [size limits](/docs/platform/pricing/limits#message) apply

## Enable appends <a id="enable"/>

Message append functionality requires "Message annotations, updates, deletes and appends" to be enabled in a [channel rule](/docs/channels#rules) associated with the channel.
Message append functionality requires the "Message annotations, updates, deletes, and appends" [channel rule](/docs/channels#rules) enabled for your channel or [namespace](/docs/channels#namespaces).

<Aside data-type="important">
When the "Message updates and deletes" channel rule is enabled, messages are persisted irrespective of whether or not persistence has also been explicitly enabled. This will be reflected in increased usage since [we charge for persisting messages](https://faqs.ably.com/how-does-ably-count-messages).
Expand All @@ -34,7 +34,7 @@ To enable the channel rule:
2. Navigate to the "Configuration" > "Rules" section from the left-hand navigation bar.
3. Choose "Add new rule".
4. Enter a channel name or namespace pattern (e.g. `ai:*` for all channels starting with `ai:`).
5. Select the "Message annotations, updates, deletes and appends" option from the list.
5. Select the "Message annotations, updates, deletes, and appends" rule from the list.
6. Click "Create channel rule".

The examples on this page use the `ai:` namespace prefix, which assumes you have configured the rule for `ai:*`.
Expand Down