-
Notifications
You must be signed in to change notification settings - Fork 590
WIP: Reduce copies, eliminate a lot of UTF-16 transcoding #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Tyler-IN
wants to merge
8
commits into
modelcontextprotocol:main
Choose a base branch
from
Tyler-IN:perf/utf8-contentblock-streaming
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
WIP: Reduce copies, eliminate a lot of UTF-16 transcoding #1120
Tyler-IN
wants to merge
8
commits into
modelcontextprotocol:main
from
Tyler-IN:perf/utf8-contentblock-streaming
+2,142
−425
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Author
|
Based on #1070 |
Author
|
Key protocol / model updates
Serialization / JSON-RPC improvements
Transport / streaming changes (stdio + streams)
HTTP transport / client plumbing
Cancellation / lifecycle utilities
URI template parsing
Tests / samples
Behavioral notes / compatibility
|
29fa71b to
30422d2
Compare
… transports
This change set focuses on reducing allocations and unnecessary transcoding across the MCP wire path (JSON-RPC + MCP content), especially for line-delimited stream transports (stdio / raw streams) and for text content blocks that frequently originate as UTF-8 already.
Key protocol / model updates
- ContentBlock JSON converter now reads the "text" property directly into UTF-8 bytes (including unescaping) without first materializing a UTF-16 string.
- Added Utf8TextContentBlock ("type":"text") to allow hot paths to keep text payloads as UTF-8; TextContentBlock now supports a cached UTF-16 string backed by a Utf8Text buffer.
- Added opt-in deserialization behavior to materialize Utf8TextContentBlock instead of TextContentBlock via McpJsonUtilities.CreateOptions(materializeUtf8TextContentBlocks: true); DefaultOptions remains compatible and continues to materialize TextContentBlock.
- ImageContentBlock and AudioContentBlock still have Data as a base64 string; added DataUtf8 and DecodedData helpers to avoid repeated conversions.
- BlobResourceContents updated to cache and interoperate between (1) base64 string (Blob), (2) base64 UTF-8 bytes (BlobUtf8), and (3) decoded bytes (DecodedData)
Serialization / JSON-RPC improvements
- JsonRpcMessage converter now determines concrete message type (request/notification/response/error) by scanning the top-level payload with Utf8JsonReader.ValueTextEquals and skipping values in-place, avoiding JsonElement.GetRawText() / UTF-16 round-trips.
- McpJsonUtilities now exposes CreateOptions(...) to produce an options instance that can override the ContentBlock converter (materializeUtf8TextContentBlocks) while still chaining MEAI type info resolver support.
- Added McpTextUtilities helpers (UTF-8 decode, base64 encode on older TFMs, and common whitespace checks used by transports).
Transport / streaming changes (stdio + streams)
- StreamClientSessionTransport and StreamServerTransport reading loops were refactored to a newline-delimited (LF) byte scanner:
- Parses messages directly from pooled byte buffers + a reusable MemoryStream buffer.
- Handles both LF and CRLF by trimming a trailing '\r' after splitting on '\n'.
- Avoids UTF-16 materialization for parsing and (optionally) for logging by only decoding to string when trace logging is enabled.
- The previous StreamClientSessionTransport impl. became TextStreamClientSessionTransport for the "text writer/reader" client transport.
HTTP transport / client plumbing
- McpHttpClient now uses JsonContent.Create on NET TFMs and a new JsonTypeInfoHttpContent<T> on non-NET TFMs to serialize via JsonTypeInfo without buffering to compute Content-Length.
- StreamableHttpClientSessionTransport and SseClientSessionTransport disposal paths now consistently cancel and "defuse" CTS instances to reduce races/leaks during teardown.
- StreamableHttpSession updates activity tracking and shutdown ordering (dispose transport first, then cancel, then await server run) for cleaner termination.
Cancellation / lifecycle utilities
- Added CanceledTokenSource: a singleton already-canceled CancellationTokenSource plus a Defuse(ref cts, ...) helper to safely swap out mutable CTS fields during disposal.
URI template parsing
- UriTemplate parsing/formatting updated with a more explicit template-expression regex and improved query expression handling; uses GeneratedRegex and (on NET) a non-backtracking regex option for performance.
Tests / samples
- Added ProcessStartInfoUtilities to robustly locate executables on PATH and to handle Windows .cmd/.bat invocation semantics when UseShellExecute=false; updated integration tests accordingly. (This is for my npx.cmd etc.)
- Added TextMaterializationTestHelpers to let tests run with either TextContentBlock or Utf8TextContentBlock materialization.
- Updated tests and samples to align with the new base64-string Data representation for image/audio blocks and to avoid unnecessary allocations in transport tests.
Behavioral notes / compatibility
- Wire format remains MCP/JSON-RPC compatible: messages are still newline-delimited JSON; CRLF continues to work.
- The Utf8TextContentBlock materialization is opt-in via McpJsonUtilities.CreateOptions(materializeUtf8TextContentBlocks: true); default behavior preserves prior materialized types for text blocks.
30422d2 to
50a8255
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR reduces UTF-16 string materialization and intermediate byte[]/string copies across MCP JSON serialization/deserialization and tool-result content handling, while preserving existing public APIs and behaviors by default.
Motivation and Context
Several hot paths (tool result content, JSON-RPC messaging, content block parsing, and URI template processing) were creating avoidable UTF-16 strings and/or staging buffers (e.g.,
GetRawText(),Encoding.UTF8.GetBytes(...),ToArray()patterns). This change set focuses on enabling more UTF-8-first processing and streaming writes, minimizing allocations/copies and making it easier to keep data as UTF-8 until (and unless) astringis actually required.How Has This Been Tested?
dotnet test --filter "(Execution!=Manual)"(passes locally)Breaking Changes
None intended.
TextContentBlockunless opted-in.Types of changes
Checklist
Additional context
Key changes in this branch include:
Utf8TextContentBlockand a configurable option to choose whether JSON “text” blocks materialize asTextContentBlock(default) orUtf8TextContentBlock(opt-in), with tests parameterized accordingly.TextContentBlockserialization/deserialization paths to reduce UTF-16 transitions where feasible (prefer operating on UTF-8 bytes and cachingstringonly when needed).ToArray()/ staging patterns in tool content result paths and tests; replaced literal UTF-8 conversions with"..."u8where applicable.PATHEXThandling on Windows) to reliably locate executables such asnpxin conformance tests.UriTemplateprocessing to reduce allocations (including netstandard2.0-compatible approaches)ChangesCurrently disposes of them atCancellationTokenSource(CTS) intended lifespans to indefinite, rely on being detached from the object graph and later disposed as they're finalized by the GC.Disposebut replaces them with aCanceledTokenSourcethat doesn't get disposed but only returns cancelled tokens which avoidsObjectDisposedExceptionwhen checking if the tokens are canceled.