-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feat/agent tool resilience sample #4086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/agent tool resilience sample #4086
Conversation
Demonstrates timeout protection, automatic retry, and dynamic fallback patterns for multi-agent workflows using AgentTool.
Summary of ChangesHello @sarojrout, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new sample that provides a robust reference implementation for building resilient multi-agent systems within the ADK framework. It addresses the challenge of managing sub-agent timeouts and failures by demonstrating how to combine existing ADK components to achieve timeout protection, automatic retries, intelligent fallback mechanisms, and user-friendly error recovery, all without requiring core framework changes. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable sample demonstrating resilience patterns for multi-agent systems, including timeouts, retries, and fallbacks. The implementation is well-structured and provides a clear example for developers. My review includes suggestions to improve code clarity and maintainability in agent.py by refactoring the timeout handling logic and ensuring consistency in configuration. I also noted a minor issue in the README.md file.
| try: | ||
| while True: | ||
| # Check overall timeout | ||
| elapsed = time.time() - start_time | ||
| if elapsed >= self.timeout: | ||
| # Timeout exceeded | ||
| yield Event( | ||
| content=types.Content( | ||
| role='assistant', | ||
| parts=[ | ||
| types.Part.from_text( | ||
| text=f"Timeout: {self.timeout_error_message}" | ||
| ) | ||
| ], | ||
| ), | ||
| ) | ||
| return | ||
|
|
||
| # Calculate remaining time | ||
| remaining = self.timeout - elapsed | ||
| if remaining <= 0: | ||
| yield Event( | ||
| content=types.Content( | ||
| role='assistant', | ||
| parts=[ | ||
| types.Part.from_text( | ||
| text=f"Timeout: {self.timeout_error_message}" | ||
| ) | ||
| ], | ||
| ), | ||
| ) | ||
| return | ||
|
|
||
| # Get next event with timeout check | ||
| try: | ||
| event = await asyncio.wait_for( | ||
| agen.__anext__(), | ||
| timeout=min(remaining, 0.5) # Check frequently | ||
| ) | ||
| yield event | ||
| except StopAsyncIteration: | ||
| # Generator finished normally | ||
| break | ||
| except asyncio.TimeoutError: | ||
| # This iteration timed out, but check overall timeout | ||
| if time.time() - start_time >= self.timeout: | ||
| yield Event( | ||
| content=types.Content( | ||
| role='assistant', | ||
| parts=[ | ||
| types.Part.from_text( | ||
| text=f"Timeout: {self.timeout_error_message}" | ||
| ) | ||
| ], | ||
| ), | ||
| ) | ||
| return | ||
| # Otherwise, continue waiting for next event | ||
| continue | ||
| except Exception: | ||
| # Re-raise other exceptions | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run_async_with_events method can be simplified for better readability and maintainability.
- The code to create and yield a timeout
Eventis duplicated. This can be extracted into a local helper function. - The check
if remaining <= 0:is redundant because the preceding checkif elapsed >= self.timeout:already covers this condition, making it unreachable. Removing this dead code will make the logic clearer.
Here is a suggested refactoring that addresses both points:
try:
def _create_timeout_event() -> Event:
return Event(
content=types.Content(
role='assistant',
parts=[
types.Part.from_text(
text=f"Timeout: {self.timeout_error_message}"
)
],
),
)
while True:
# Check overall timeout
elapsed = time.time() - start_time
if elapsed >= self.timeout:
# Timeout exceeded
yield _create_timeout_event()
return
# Get next event with timeout check
remaining = self.timeout - elapsed
try:
event = await asyncio.wait_for(
agen.__anext__(),
timeout=min(remaining, 0.5) # Check frequently
)
yield event
except StopAsyncIteration:
# Generator finished normally
break
except asyncio.TimeoutError:
# This iteration timed out, but check overall timeout
if time.time() - start_time >= self.timeout:
yield _create_timeout_event()
return
# Otherwise, continue waiting for next event
continue
except Exception:
# Re-raise other exceptions
raise…from being marked as final
|
@ryanaiagent , can you please get this sample reviewed and merged so that others can take a pull? |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
If applicable, please follow the issue templates to provide as much detail as
possible.
Problem:
Currently, building resilient multi-agent systems with
AgentToolrequires significant custom code. When sub-agents timeout or fail, developers must:This creates a high barrier to entry and leads to inconsistent implementations across different projects.
Solution:
This PR adds a working sample (
contributing/samples/agent_tool_resilience/) that demonstrates how to build resilient multi-agent systems using ADK's existing components:TimeoutAgentToolwrapper - Adds timeout protection toAgentToolAgentTool.run_async()withasyncio.wait_for()Integration with
ReflectAndRetryToolPlugin- Handles automatic retriesPrompt-based dynamic routing - Enables intelligent fallback
Error recovery agent - Provides user-friendly error analysis
Why this solution:
Testing Plan
Please describe the tests that you ran to verify your changes. This is required
for all PRs that are not small documentation or typo fixes.
Unit Tests:
Note: This is a sample addition, not a core feature change. The sample code itself is tested through manual E2E testing. The
TimeoutAgentToolwrapper uses standard Pythonasyncio.wait_for()which is well-tested.Please include a summary of passed
pytestresults.Manual End-to-End (E2E) Tests:
Setup:
Normal Operation:
Timeout Scenario:
timeout=5.0inagent.pyas part of TimeoutAgentToolTest Results Summary:
ReflectAndRetryToolPluginworksChecklist
-
TimeoutAgentToolincludes detailed docstrings- Complex timeout logic for async generators is commented
- Agent instructions explain error handling protocols
- Manual E2E tests demonstrate all scenarios
- No core changes - only sample addition
adk webAdditional context
What This PR Adds
Files Added:
contributing/samples/agent_tool_resilience/agent.py- Complete implementation (~320 lines)contributing/samples/agent_tool_resilience/__init__.py- Package initializationcontributing/samples/agent_tool_resilience/README.md- User documentationKey Features:
TimeoutAgentToolwrapperReflectAndRetryToolPluginImpact
Screenshots