-
Notifications
You must be signed in to change notification settings - Fork 46
Overview/ait 189 intro token #3035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
GregHolmes
wants to merge
11
commits into
AIT-129-AIT-Docs-release-branch
Choose a base branch
from
overview/ait-189-intro-token-fixed
base: AIT-129-AIT-Docs-release-branch
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+182
−2
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
71ff6eb
chore: update message annotations terminology to include appends
matt423 5b6d42f
AI Transport overview page
rainbowFi c75c7d3
Token streaming intro
rainbowFi f88133a
Update token streaming overview with diagram and align with compact h…
rainbowFi c95c68f
Apply suggestions from code review
rainbowFi bf5ae10
Fixup: code review comments
rainbowFi dbc88b6
Fix-ups from review comments
rainbowFi 0187e3a
Update overview page with notes
rainbowFi a9a091c
Remove product overview page, which will be separate PR
rainbowFi ce90afc
Revert "Remove product overview page, which will be separate PR"
rainbowFi b560500
Revert overview to match base branch, will be separate PR
rainbowFi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| A @Message@ represents an individual message that is sent to or received from Ably. | ||
|
|
||
| h6(#name). | ||
| default: name | ||
| csharp: Name | ||
|
|
||
| The event name, if provided. <br>__Type: @String@__ | ||
|
|
||
| h6(#data). | ||
| default: data | ||
| csharp: Data | ||
|
|
||
| The message payload, if provided.<br>__Type: <span lang="default">@String@, @StringBuffer@, @JSON Object@</span><span lang="java">@String@, @ByteArray@, @JSONObject@, @JSONArray@</span><span lang="csharp">@String@, @byte[]@, @plain C# object that can be serialized to JSON@</span><span lang="ruby">@String@, @Binary@ (ASCII-8BIT String), @Hash@, @Array@</span><span lang="python">@String@, @Bytearray@, @Dict@, @List@</span><span lang="php">@String@, @Binary String@, @Associative Array@, @Array@</span><span lang="objc">@NSString *@, @NSData *@, @NSDictionary *@, @NSArray *@</span><span lang="swift">@String@, @NSData@, @Dictionary@, @Array@</span><span lang="flutter">@String@, @Map@, @List@</span>__ | ||
|
|
||
| h6(#extras). | ||
| default: extras | ||
| csharp: Extras | ||
|
|
||
| Metadata and/or ancillary payloads, if provided. Valid payloads include "@push@":/docs/push/publish#payload, "@headers@" (a map of strings to strings for arbitrary customer-supplied metadata), "@ephemeral@":/docs/pub-sub/advanced#ephemeral, and "@privileged@":/docs/platform/integrations/webhooks#skipping objects.<br>__Type: <span lang="java">@JSONObject@, @JSONArray@</span><span lang="csharp">plain C# object that can be converted to JSON</span><span lang="jsall">@JSON Object@</span><span lang="ruby">@Hash@, @Array@</span><span lang="python">@Dict@, @List@</span><span lang="swift">@Dictionary@, @Array@</span><span lang="objc">@NSDictionary *@, @NSArray *@</span><span lang="php">@Associative Array@, @Array@</span>__ | ||
|
|
||
| h6(#id). | ||
| default: id | ||
| csharp: Id | ||
|
|
||
| A Unique ID assigned by Ably to this message.<br>__Type: @String@__ | ||
|
|
||
| h6(#client-id). | ||
| default: clientId | ||
| csharp: ClientId | ||
| ruby: client_id | ||
| python: client_id | ||
|
|
||
| The client ID of the publisher of this message.<br>__Type: @String@__ | ||
|
|
||
| h6(#connection-id). | ||
| default: connectionId | ||
| csharp: ConnectionId | ||
| ruby: connection_id | ||
| python: connection_id | ||
|
|
||
| The connection ID of the publisher of this message.<br>__Type: @String@__ | ||
|
|
||
| h6(#connection-key). | ||
| default: connectionKey | ||
| csharp,go: ConnectionKey | ||
| ruby,python: connection_key | ||
|
|
||
| A connection key, which can optionally be included for a REST publish as part of the "publishing on behalf of a realtime client functionality":/docs/pub-sub/advanced#publish-on-behalf.<br>__Type: @String@__ | ||
|
|
||
| h6(#timestamp). | ||
| default: timestamp | ||
| csharp: Timestamp | ||
|
|
||
| Timestamp when the message was first received by the Ably, as <span lang="default">milliseconds since the epoch</span><span lang="ruby">a @Time@ object</span><br>.__Type: <span lang="default">@Integer@</span><span lang="java">@Long Integer@</span><span lang="csharp">@DateTimeOffset@</span><span lang="ruby">@Time@</span><span lang="objc,swift">@NSDate@</span>__ | ||
|
|
||
| h6(#encoding). | ||
| default: encoding | ||
| csharp: Encoding | ||
|
|
||
| This will typically be empty as all messages received from Ably are automatically decoded client-side using this value. However, if the message encoding cannot be processed, this attribute will contain the remaining transformations not applied to the @data@ payload.<br>__Type: @String@__ | ||
|
|
||
| blang[jsall]. | ||
|
|
||
| h6(#action). | ||
| default: action | ||
|
|
||
| The action type of the message, one of the "@MessageAction@":#message-action enum values.<br>__Type: @int enum { MESSAGE_CREATE, MESSAGE_UPDATE, MESSAGE_DELETE, META, MESSAGE_SUMMARY }@__ | ||
|
|
||
| h6(#serial). | ||
| default: serial | ||
|
|
||
| A server-assigned identifier that will be the same in all future updates of this message. It can be used to add "annotations":/docs/messages/annotations to a message or to "update or delete":/docs/messages/updates-deletes it. Serial will only be set if you enable annotations, updates, deletes, and appends in "channel rules":/docs/channels#rules .<br>__Type: @String@__ | ||
|
|
||
| h6(#annotations). | ||
| default: annotations | ||
|
|
||
| An object containing information about annotations that have been made to the object.<br>__Type: "@MessageAnnotations@":/docs/api/realtime-sdk/types#message-annotations__ | ||
|
|
||
| h6(#version). | ||
| default: version | ||
|
|
||
| An object containing version metadata for messages that have been updated or deleted. See "updating and deleting messages":/docs/messages/updates-deletes for more information.<br>__Type: "@MessageVersion@":#message-version__ | ||
|
|
||
| h3(constructors). | ||
| default: Message constructors | ||
|
|
||
| h6(#message-from-encoded). | ||
| default: Message.fromEncoded | ||
|
|
||
| bq(definition). | ||
| default: Message.fromEncoded(Object encodedMsg, ChannelOptions channelOptions?) -> Message | ||
|
|
||
| A static factory method to create a "@Message@":/docs/api/realtime-sdk/types#message from a deserialized @Message@-like object encoded using Ably's wire protocol. | ||
|
|
||
| h4. Parameters | ||
|
|
||
| - encodedMsg := a @Message@-like deserialized object.<br>__Type: @Object@__ | ||
| - channelOptions := an optional "@ChannelOptions@":/docs/api/realtime-sdk/types#channel-options. If you have an encrypted channel, use this to allow the library can decrypt the data.<br>__Type: @Object@__ | ||
|
|
||
| h4. Returns | ||
|
|
||
| A "@Message@":/docs/api/realtime-sdk/types#message object | ||
|
|
||
| h6(#message-from-encoded-array). | ||
| default: Message.fromEncodedArray | ||
|
|
||
| bq(definition). | ||
| default: Message.fromEncodedArray(Object[] encodedMsgs, ChannelOptions channelOptions?) -> Message[] | ||
|
|
||
| A static factory method to create an array of "@Messages@":/docs/api/realtime-sdk/types#message from an array of deserialized @Message@-like object encoded using Ably's wire protocol. | ||
|
|
||
| h4. Parameters | ||
|
|
||
| - encodedMsgs := an array of @Message@-like deserialized objects.<br>__Type: @Array@__ | ||
| - channelOptions := an optional "@ChannelOptions@":/docs/api/realtime-sdk/types#channel-options. If you have an encrypted channel, use this to allow the library can decrypt the data.<br>__Type: @Object@__ | ||
|
|
||
| h4. Returns | ||
|
|
||
| An @Array@ of "@Message@":/docs/api/realtime-sdk/types#message objects | ||
|
|
||
| h3(#message-version). | ||
| default: MessageVersion | ||
|
|
||
| h4. Properties | ||
|
|
||
| |_. Property |_. Description |_. Type | | ||
| | serial | An Ably-generated ID that uniquely identifies this version of the message. Can be compared lexicographically to determine version ordering. For an original message with an action of @message.create@, this will be equal to the top-level @serial@. | @String@ | | ||
| | timestamp | The time this version was created (when the update or delete operation was performed). For an original message, this will be equal to the top-level @timestamp@. | <span lang="default">@Integer@</span><span lang="java">@Long Integer@</span><span lang="csharp">@DateTimeOffset@</span><span lang="ruby">@Time@</span><span lang="objc,swift">@NSDate@</span> | | ||
| | clientId | The client identifier of the user who performed the update or delete operation. Only present for @message.update@ and @message.delete@ actions. | @String@ (optional) | | ||
| | description | Optional description provided when the update or delete was performed. Only present for @message.update@ and @message.delete@ actions. | @String@ (optional) | | ||
| | metadata | Optional metadata provided when the update or delete was performed. Only present for @message.update@ and @message.delete@ actions. | @Object@ (optional) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
src/pages/docs/ai-transport/features/token-streaming/index.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| title: Token streaming | ||
| meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution." | ||
| --- | ||
|
|
||
| Token streaming is a technique used with Large Language Models (LLMs) where the model's response is emitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client. | ||
|
|
||
|  | ||
|
|
||
| If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The detailed mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. | ||
|
|
||
|
Comment on lines
+10
to
+11
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bit of a wall of text, but there are some nice bits of value prop in there. Can we pull those out, perhaps into bullets? |
||
| The Ably platform guarantees that messages from a given realtime publisher will be [delivered in order](/docs/platform/architecture/message-ordering#ordering-guarantees) and [exactly once](/docs/platform/architecture/idempotency), so your client application does not have to handle duplicate or out-of-order messages. | ||
|
|
||
| ## Token streaming patterns <a id="patterns"/> | ||
|
|
||
| Ably AI Transport is built on the Pub/Sub messaging platform, which allows you to use whatever message structure and pattern works best for your application. AI transport supports two token streaming patterns using a [Realtime](/docs/api/realtime-sdk) client, so you can choose the one that best fits your requirements and customise it for your application. The Realtime client maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). | ||
|
|
||
| ### Message-per-response <a id="pattern-per-response"/> | ||
rainbowFi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Token streaming with [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) enables you to stream LLM-generated content as individual tokens in realtime, while maintaining a clean, compacted message history. Each LLM response becomes a single message on an Ably channel that grows as tokens are appended, resulting in efficient storage and easy retrieval of complete responses. | ||
|
|
||
| This pattern is the recommended approach for most applications. It is useful if you want clients joining mid-stream to catch up efficiently without receiving thousands of individual tokens and if clients build up a long conversation history that must be efficiently loaded on new or reconnecting devices. For example: | ||
|
|
||
| - Chat experiences: The full chat history must be replayed from Ably whenever the user changes device or when a new participant joins the chat, allowing both users and agents to maintain context. | ||
| - Long-running and asynchronous tasks: Users need to catch up quickly when they reconnect to check progress throughout the task lifetime, but do not need to receive the individual tokens that make up the response. | ||
| - Backend-stored responses: The backend writes complete responses to a database and clients load those full responses from there, while Ably is used to deliver the current in-progress response. | ||
|
|
||
| ### Message-per-token <a id="pattern-per-token"/> | ||
| Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as an independent Ably message. Each token then appears as one message in the channel history. | ||
|
|
||
| This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log, or if you need to preserve the specific token fragmentation that was generated by the model. For example: | ||
|
|
||
| - Live transcription, captioning, or translation: A viewer who joins a live stream only needs sufficient tokens for the current "frame" of subtitles, not the entire transcript so far. | ||
| - Code assistance in an editor: Streamed tokens become part of the file on disk as they are accepted, so past tokens do not need to be replayed from Ably. | ||
| - Autocomplete: A fresh response is streamed for each change a user makes to a document, with only the latest suggestion being relevant. | ||
|
|
||
| ## Message events <a id="events"/> | ||
|
|
||
| Different models and frameworks use different events to signal what is being sent to the client, such as start/stop events to mark the beginning and end of a streamed response. When you publish a message to an Ably channel, you can set the [message name](/docs/messages#properties) to the event type your client expects. | ||
|
|
||
| ## Next steps <a id="next"/> | ||
|
|
||
| Read more about token streaming with the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) and [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) patterns. Alternatively, check out the how-to guides to see how to implement these patterns with a variety of models and frameworks. | ||
|
|
||
| // TODO: guide links | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we prefer to use single word "realtime" at Ably.
(This is not what most of the internet seems to do, but alas this is our convention)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is normally accomplished by streaming the tokens as the response to an HTTP request from the client."
I think this can be moved out into a new paragraph. I think the intro paragraph should focus on the description of what token streaming is before getting into how it is implemented.
Then, I would suggest colocating this statement with the content that follows after the image, since that paragraph starts by describing the weakness of this approach.