From 1953a117d02f8983395b244f9c26dd9ecf29a34f Mon Sep 17 00:00:00 2001 From: avempali Date: Mon, 15 Dec 2025 15:45:00 -0800 Subject: [PATCH] docs: Add RFD for agent-guided user selection feature Proposes `session/select` request allowing agents to present interactive menus with prompts and markdown-renderable options. Supports single/multi-select modes with optional free-text input for gathering structured user input and exposing agent capabilities in a discoverable way. --- docs/rfds/agent-guided-user-selection.mdx | 259 ++++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 docs/rfds/agent-guided-user-selection.mdx diff --git a/docs/rfds/agent-guided-user-selection.mdx b/docs/rfds/agent-guided-user-selection.mdx new file mode 100644 index 0000000..2316110 --- /dev/null +++ b/docs/rfds/agent-guided-user-selection.mdx @@ -0,0 +1,259 @@ +--- +title: "Agent-Guided User Selection" +--- + +Author(s): [akhil-vempali](https://github.com/akhil-vempali) + +## Elevator pitch + +> What are you proposing to change? + +Allow agents to dynamically present interactive menus to users during a session. These menus consist of a prompt (question) and a set of markdown-renderable options, enabling agents to guide users through workflows, gather structured input, and expose agent-specific actions in a discoverable way. + +## Status quo + +> How do things work today and what problems does this cause? Why would we change things? + +Currently, agents have limited mechanisms for soliciting structured input from users: + +1. **Free-form prompts only**: Agents must rely on natural language responses, which can be ambiguous and require additional parsing/validation. + +2. **No discoverability**: Users don't know what options or capabilities an agent supports unless explicitly told through conversation. There's no standardized way to present available actions. + +3. **Guided workflows are cumbersome**: Multi-step processes require agents to describe options in prose and hope users respond with recognizable input. This leads to friction and error-prone interactions. + +4. **Agent-specific actions are hidden**: Agents with specialized capabilities (e.g., deployment options, code generation styles, environment configurations) have no structured way to expose these to users. + +5. **Context-dependent options require explanation**: When available actions change based on project state, file type, or session context, agents must repeatedly explain what's possible. + +## What we propose to do about it + +> What are you proposing to improve the situation? + +Introduce a new agent-to-client request that allows agents to present interactive menus to users. Key characteristics: + +- **Agent-initiated**: The agent sends a request to the client with a prompt and options +- **Markdown-renderable options**: Each option can include rich markdown content for clear presentation +- **Configurable selection mode**: Agent specifies whether single-select or multi-select is allowed +- **Optional free-text input**: Agent can enable an "other" option allowing users to provide custom input +- **Callback-based response**: The client returns the user's selection(s) back to the agent via a dedicated response mechanism +- **Dynamic timing**: Menus can be presented at session start, during conversations, or based on context changes + +## Shiny future + +> How will things will play out once this feature exists? + +Once implemented, agents can create rich, guided experiences: + +- **Onboarding flows**: New users are presented with setup options rather than needing to know what to ask +- **Workflow wizards**: Multi-step processes become intuitive click-through experiences +- **Context-aware suggestions**: As users work, agents surface relevant actions ("I noticed you're in a test file - would you like to: Run tests / Generate test cases / View coverage") +- **Configuration dialogs**: Complex agent settings can be presented as structured choices rather than requiring users to remember syntax +- **Domain-specific actions**: Specialized agents (CI/CD, database, cloud deployment) can expose their unique capabilities in discoverable menus + +Users get a more guided, less error-prone experience. Agents get structured, unambiguous input. Clients can render these menus in ways that fit their UI paradigm (dropdowns, modal dialogs, inline buttons, etc.). + +## Implementation details and plan + +> Tell me more about your implementation. What is your detailed implementation plan? + + + +### Protocol Changes + +This proposal follows the same pattern as `session/request_permission`, providing a consistent interaction model for agent-initiated user input. + +#### New Request: `session/select` + +The agent sends this request to the client to present a selection menu and await user response: + +```json +{ + "jsonrpc": "2.0", + "id": 5, + "method": "session/select", + "params": { + "sessionId": "sess_abc123def456", + "prompt": "How would you like to proceed with the refactoring?", + "options": [ + { + "optionId": "inline", + "name": "Inline refactor", + "description": "Refactor in place, modifying existing files" + }, + { + "optionId": "new-files", + "name": "Create new files", + "description": "Generate refactored code in new files, preserving originals" + }, + { + "optionId": "dry-run", + "name": "Dry run", + "description": "Show what would change without making modifications" + } + ], + "selectionMode": "single", + "allowFreeText": true, + "freeTextPlaceholder": "Or describe a different approach..." + } +} +``` + +**Request Parameters:** + +- `sessionId` *(SessionId, required)*: The session ID for this request. +- `prompt` *(string, required)*: The question or instruction to display to the user. Supports markdown. +- `options` *(SelectionOption[], required)*: Available [selection options](#selection-options) for the user to choose from. +- `selectionMode` *(SelectionMode, required)*: Whether the user can select one option (`single`) or multiple options (`multiple`). +- `allowFreeText` *(boolean, optional)*: If `true`, the client should provide a free-text input field in addition to the options. +- `freeTextPlaceholder` *(string, optional)*: Placeholder text to display in the free-text input field (if enabled). + +#### Response + +The client responds with the user's selection, following the same outcome pattern as `session/request_permission`: + +```json +{ + "jsonrpc": "2.0", + "id": 5, + "result": { + "outcome": { + "outcome": "selected", + "optionIds": ["inline"], + "freeText": null + } + } +} +``` + +If the prompt turn is cancelled before the user responds, the client **MUST** respond with the `cancelled` outcome: + +```json +{ + "jsonrpc": "2.0", + "id": 5, + "result": { + "outcome": { + "outcome": "cancelled" + } + } +} +``` + +**Response Fields:** + +- `outcome` *(SelectionOutcome, required)*: The user's decision, either: + - `cancelled` - The [prompt turn was cancelled](./prompt-turn#cancellation) + - `selected` with `optionIds` - The IDs of the selected option(s) + - `selected` with `freeText` - Custom text provided by the user (if `allowFreeText` was enabled) + +### Selection Options + +Each selection option provided to the Client contains: + +- `optionId` *(string, required)*: Unique identifier for this option. +- `name` *(string, required)*: Human-readable label to display to the user. +- `description` *(string, optional)*: Extended description of this option. Supports markdown for rich formatting. + +### Selection Mode + +Controls how many options the user can select: + +- `single` - User must select exactly one option +- `multiple` - User can select one or more options (checkboxes) + +### Example: Multi-Select with Free Text + +```json +{ + "jsonrpc": "2.0", + "id": 6, + "method": "session/select", + "params": { + "sessionId": "sess_abc123def456", + "prompt": "Which files should I include in the review?", + "options": [ + { + "optionId": "modified", + "name": "Modified files", + "description": "Files changed in this branch" + }, + { + "optionId": "tests", + "name": "Test files", + "description": "Include related test files" + }, + { + "optionId": "deps", + "name": "Dependencies", + "description": "Include files that depend on modified files" + } + ], + "selectionMode": "multiple", + "allowFreeText": true, + "freeTextPlaceholder": "Or specify file paths..." + } +} +``` + +Response with multiple selections: + +```json +{ + "jsonrpc": "2.0", + "id": 6, + "result": { + "outcome": { + "outcome": "selected", + "optionIds": ["modified", "tests"], + "freeText": null + } + } +} +``` + +### Considerations + +- **Default selection**: Should agents be able to pre-select an option? Could add an optional `defaultOptionIds` field. +- **Validation**: For multi-select, should there be min/max selection constraints? +- **Grouping**: Should options support grouping/categories for complex menus? + +## Frequently asked questions + +> What questions have arisen over the course of authoring this document or during subsequent discussions? + +### What alternative approaches did you consider, and why did you settle on this one? + +1. **Extending slash commands**: We considered making slash commands more dynamic, but this doesn't solve the "agent needs to ask a question" use case - slash commands are user-initiated. + +2. **Structured content blocks**: We considered adding menu-like content to `session/update` messages, but this conflates display with interaction. A dedicated request/response pattern provides clearer semantics for "agent needs input." + +3. **Form-based approach**: A full form system (text fields, checkboxes, etc.) was considered but adds significant complexity. Menus with optional free-text cover the 80% case while remaining simple. + +### Why not extend `session/request_permission` for this? + +While `session/select` follows a similar interaction pattern to `session/request_permission`, the permission system is tightly coupled to tool calls via the required `toolCallId` field. This makes it unsuitable for general-purpose user input gathering where no tool call is involved. + +`session/select` provides a standalone mechanism for agents to gather structured input at any point during a session—whether for onboarding, workflow decisions, or configuration—without requiring a tool call context. + +### What if the client doesn't support rich rendering? + +Clients should gracefully degrade. At minimum, options can be rendered as a numbered list in plain text. The `description` field is optional, so basic implementations can show just labels. + +### How is cancellation handled? + +Following the same pattern as `session/request_permission`: + +- If the client sends a `session/cancel` notification to cancel an ongoing prompt turn, it **MUST** respond to all pending `session/select` requests with the `cancelled` outcome. +- The agent should handle cancellation gracefully, typically by aborting the current workflow or falling back to a default behavior. + +### Can the user dismiss the menu without selecting? + +If the user dismisses the menu (e.g., clicks outside, presses Escape), the client **SHOULD** treat this as a cancellation and return the `cancelled` outcome. This provides consistent behavior and allows agents to handle the case explicitly. + +## Revision history + +