-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Native Ollama LLM Integration + Example Project + Full Unit Tests #3570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @ayman3000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances ADK's LLM provider ecosystem by introducing a native Ollama integration. This new backend directly addresses and fixes critical tool-calling failures that previously affected cloud-based Ollama models when used via LiteLLM. By providing robust and reliable tool support, this change makes ADK fully compatible with enterprise cloud inference, hybrid local and cloud deployments, and advanced agent workflows that depend on accurate function calling, thereby broadening ADK's accessibility and utility. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @ayman3000, thank you for this significant contribution! Before we can proceed with the review, could you please sign the Contributor License Agreement (CLA)? It seems the This will help us to move forward with your PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a fantastic contribution that adds native Ollama support to ADK, filling a significant gap. The implementation is well-structured and the inclusion of a full example project and unit tests is excellent.
I've found a few issues, mainly related to the completeness of the response parsing and corresponding tests. Specifically:
- The
LlmResponseis missing usage metadata and the model version from the Ollama response. - The unit tests for these features are present but will fail due to the missing implementation and use an incorrect mock response structure.
- There's a potential bug in handling malformed JSON in tool call arguments.
My review includes suggestions to address these points. Once these are fixed, this will be a very solid addition to the project. Great work!
| except json.JSONDecodeError: | ||
| logger.debug( | ||
| 'Failed to parse tool call arguments as JSON: %s', arguments | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If json.loads(arguments) fails, arguments remains a string. This string is then passed as the args parameter to types.FunctionCall, which expects a dictionary-like object. This will likely cause a validation error downstream. It would be safer to treat unparseable arguments as an empty dictionary and log a warning.
| except json.JSONDecodeError: | |
| logger.debug( | |
| 'Failed to parse tool call arguments as JSON: %s', arguments | |
| ) | |
| except json.JSONDecodeError: | |
| logger.warning( | |
| 'Failed to parse tool call arguments as JSON: %s. Defaulting to empty arguments.', | |
| arguments, | |
| ) | |
| arguments = {} |
| def test_to_llm_response_usage_metadata(): | ||
| o = Ollama() | ||
| resp = mock_response_ok( | ||
| text="Hi", | ||
| usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15} | ||
| ) | ||
|
|
||
| out = o._to_llm_response(resp) | ||
|
|
||
| assert out.usage_metadata.prompt_token_count == 10 | ||
| assert out.usage_metadata.candidates_token_count == 5 | ||
| assert out.usage_metadata.total_token_count == 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test for usage metadata will fail because the implementation in _to_llm_response does not parse it. Additionally, the mock response structure in this test does not match the actual Ollama API response. The Ollama API returns prompt_eval_count and eval_count at the top level of the JSON response, not a nested usage dictionary. The test should be updated to reflect the correct API response format and to work with the corrected implementation.
| def test_to_llm_response_usage_metadata(): | |
| o = Ollama() | |
| resp = mock_response_ok( | |
| text="Hi", | |
| usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15} | |
| ) | |
| out = o._to_llm_response(resp) | |
| assert out.usage_metadata.prompt_token_count == 10 | |
| assert out.usage_metadata.candidates_token_count == 5 | |
| assert out.usage_metadata.total_token_count == 15 | |
| def test_to_llm_response_usage_metadata(): | |
| o = Ollama() | |
| resp = mock_response_ok(text="Hi") | |
| resp["prompt_eval_count"] = 10 | |
| resp["eval_count"] = 5 | |
| out = o._to_llm_response(resp) | |
| assert out.usage_metadata is not None | |
| assert out.usage_metadata.prompt_token_count == 10 | |
| assert out.usage_metadata.candidates_token_count == 5 | |
| assert out.usage_metadata.total_token_count == 15 |
| async def test_model_override(monkeypatch): | ||
| resp = mock_response_ok("Hello") | ||
|
|
||
| async def fake_thread(fn, *args): | ||
| return resp | ||
|
|
||
| monkeypatch.setattr("asyncio.to_thread", fake_thread) | ||
|
|
||
| o = Ollama(model="default") | ||
| req = LlmRequest( | ||
| model="override", | ||
| contents=[types.Content(role="user", parts=[types.Part.from_text("X")])] | ||
| ) | ||
|
|
||
| out = [r async for r in o.generate_content_async(req)][0] | ||
| assert out.model_version == "override" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will fail because _to_llm_response does not set model_version from the Ollama response. The test should also be improved to assert that the correct model name is sent in the request payload, and the mock response should include the model field, as the real Ollama API would.
| async def test_model_override(monkeypatch): | |
| resp = mock_response_ok("Hello") | |
| async def fake_thread(fn, *args): | |
| return resp | |
| monkeypatch.setattr("asyncio.to_thread", fake_thread) | |
| o = Ollama(model="default") | |
| req = LlmRequest( | |
| model="override", | |
| contents=[types.Content(role="user", parts=[types.Part.from_text("X")])] | |
| ) | |
| out = [r async for r in o.generate_content_async(req)][0] | |
| assert out.model_version == "override" | |
| async def test_model_override(monkeypatch): | |
| resp = mock_response_ok("Hello") | |
| resp['model'] = 'override' | |
| async def fake_thread(fn, *args): | |
| payload = args[0] | |
| assert payload['model'] == 'override' | |
| return resp | |
| monkeypatch.setattr("asyncio.to_thread", fake_thread) | |
| o = Ollama(model="default") | |
| req = LlmRequest( | |
| model="override", | |
| contents=[types.Content(role="user", parts=[types.Part.from_text("X")])] | |
| ) | |
| out = [r async for r in o.generate_content_async(req)][0] | |
| assert out.model_version == "override" |
| return LlmResponse( | ||
| content=types.Content(role='model', parts=parts), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is missing parsing of usage metadata and model version from the Ollama response. The Ollama API provides prompt_eval_count, eval_count, and model which should be mapped to usage_metadata and model_version in the LlmResponse to provide complete information and fix failing tests.
| return LlmResponse( | |
| content=types.Content(role='model', parts=parts), | |
| ) | |
| usage_metadata = None | |
| prompt_tokens = response_json.get('prompt_eval_count') | |
| completion_tokens = response_json.get('eval_count') | |
| if prompt_tokens is not None and completion_tokens is not None: | |
| usage_metadata = types.GenerateContentResponseUsageMetadata( | |
| prompt_token_count=prompt_tokens, | |
| candidates_token_count=completion_tokens, | |
| total_token_count=prompt_tokens + completion_tokens, | |
| ) | |
| return LlmResponse( | |
| content=types.Content(role='model', parts=parts), | |
| usage_metadata=usage_metadata, | |
| model_version=response_json.get('model'), | |
| ) |
…parsing, updated tests
|
Thanks for the review! Please let me know if you would like any additional adjustments. |
|
Hi @ayman3000, Thank you for your work on this pull request. We appreciate the effort you've invested. |
|
…ty, consistency, and ADK compliance
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and valuable feature: native Ollama LLM integration. This is a well-executed contribution that addresses a critical gap in tool-calling support for Ollama cloud models. The implementation is robust, including a new backend, a comprehensive example project, and a full suite of unit tests. My review focuses on enhancing the implementation's correctness, improving documentation consistency, and increasing maintainability. Key suggestions include ensuring environment variable support for the Ollama host works as documented, improving the robustness of the example code, and discussing an implementation detail in role mapping for future-proofing. Overall, this is an excellent addition to the ADK ecosystem.
| host: str = Field( | ||
| default="http://localhost:11434", | ||
| description="Base URL of the Ollama server.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host field is hardcoded to http://localhost:11434 and does not respect the OLLAMA_API_BASE environment variable mentioned in the documentation. This should be implemented to match the documented behavior. Using os.environ.get will allow users to configure the host via environment variables. Remember to add import os at the top of the file.
| host: str = Field( | |
| default="http://localhost:11434", | |
| description="Base URL of the Ollama server.", | |
| ) | |
| host: str = Field( | |
| default=os.environ.get('OLLAMA_API_BASE', 'http://localhost:11434'), | |
| description='Base URL of the Ollama server.', | |
| ) |
| ```python | ||
| import random | ||
| from google.adk.agents.llm_agent import Agent | ||
| from google.adk.models.ollama import Ollama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def check_prime(numbers: list[int]) -> str: | ||
| primes = [] | ||
| for number in numbers: | ||
| number = int(number) | ||
| if number <= 1: | ||
| continue | ||
| for i in range(2, int(number ** 0.5) + 1): | ||
| if number % i == 0: | ||
| break | ||
| else: | ||
| primes.append(number) | ||
| return "No prime numbers found." if not primes else f"{', '.join(map(str, primes))} are prime numbers." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_prime function in this README is inconsistent with the version in agent.py. It also has a potential issue where the type hint is list[int] but the code performs int(number), which could lead to unexpected behavior if the list contains non-integers. It would be best to align this example with the more robust implementation in agent.py and ensure type hints match the code's logic.
| Override using an environment variable: | ||
| ```bash | ||
| export OLLAMA_API_BASE="http://192.168.1.20:11434" | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation suggests that the Ollama host can be configured via the OLLAMA_API_BASE environment variable. However, the current implementation of the Ollama class in ollama_llm.py does not read from this environment variable. To avoid confusion, the implementation should be updated to support this feature as documented.
| def check_prime(numbers: list[int]) -> str: | ||
| """Check if a given list of numbers are prime. | ||
|
|
||
| Args: | ||
| numbers: The list of numbers to check. | ||
|
|
||
| Returns: | ||
| A str indicating which number is prime. | ||
| """ | ||
| primes = set() | ||
| for number in numbers: | ||
| number = int(number) | ||
| if number <= 1: | ||
| continue | ||
| is_prime = True | ||
| for i in range(2, int(number**0.5) + 1): | ||
| if number % i == 0: | ||
| is_prime = False | ||
| break | ||
| if is_prime: | ||
| primes.add(number) | ||
| return ( | ||
| "No prime numbers found." | ||
| if not primes | ||
| else f"{', '.join(str(num) for num in primes)} are prime numbers." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_prime function is type-hinted to receive a list[int], but it defensively casts number = int(number). This suggests it might receive non-integer values from the LLM. To make this function more robust and the type hint accurate, I recommend changing the type hint to list[Any] (and adding the import) and wrapping the cast in a try-except block to gracefully handle any values that cannot be converted to an integer. I've also added sorted() to the output for deterministic ordering of prime numbers.
def check_prime(numbers: list) -> str:
"""Check if a given list of numbers are prime.
Args:
numbers: The list of numbers to check.
Returns:
A str indicating which number is prime.
"""
primes = set()
for number in numbers:
try:
number = int(number)
except (ValueError, TypeError):
continue # Safely skip non-numeric values
if number <= 1:
continue
is_prime = True
for i in range(2, int(number**0.5) + 1):
if number % i == 0:
is_prime = False
break
if is_prime:
primes.add(number)
return (
"No prime numbers found."
if not primes
else f"{', '.join(str(num) for num in sorted(list(primes)))} are prime numbers."
)| You roll dice and answer questions about the outcome of the dice rolls. | ||
| You can roll dice of different sizes. | ||
| You can use multiple tools in parallel by calling functions in parallel(in one request and in one round). | ||
| It is ok to discuss previous dice roles, and comment on the dice rolls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _map_role(self, role: Optional[str]) -> str: | ||
| if role in ("model", "assistant"): | ||
| return "assistant" | ||
| if role == "system": | ||
| return "system" | ||
| # "user", "tool", or anything else defaults to "user". | ||
| return "user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function maps the tool role to user. While this is a clever workaround that likely improves compatibility with models that don't fully support the native tool role, it deviates from the official Ollama API specification. For long-term maintainability, it would be beneficial to add a comment explaining why this mapping is necessary. This will help future developers understand the design choice and know when it might be safe to switch to the native tool role as model support improves.
| def _post_chat(self, payload: dict[str, Any]) -> dict[str, Any]: | ||
| """Perform a blocking POST /api/chat call to Ollama.""" | ||
| url = self.host.rstrip("/") + _CHAT_ENDPOINT | ||
| data = json.dumps(payload).encode("utf-8") | ||
| request = urllib.request.Request( | ||
| url, | ||
| data=data, | ||
| headers={"Content-Type": "application/json"}, | ||
| method="POST", | ||
| ) | ||
|
|
||
| try: | ||
| with urllib.request.urlopen( | ||
| request, timeout=self.request_timeout | ||
| ) as response: | ||
| response_body = response.read().decode("utf-8") | ||
| except urllib.error.URLError as exc: | ||
| raise RuntimeError(exc.reason) from exc | ||
| except urllib.error.HTTPError as exc: | ||
| message = exc.read().decode("utf-8", errors="ignore") | ||
| raise RuntimeError(f"{exc.code}: {message}") from exc | ||
|
|
||
| return json.loads(response_body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _post_chat method uses the blocking urllib.request library and runs it in a separate thread via asyncio.to_thread. For a more modern and efficient asynchronous implementation, consider refactoring this to use an async-native HTTP client library like httpx. This would eliminate the overhead of thread creation and management, making the code more performant and idiomatic for an async codebase.
Here's an example of how it could be implemented with httpx:
import httpx
async def _post_chat_async(self, payload: dict[str, Any]) -> dict[str, Any]:
url = self.host.rstrip('/') + _CHAT_ENDPOINT
async with httpx.AsyncClient(timeout=self.request_timeout) as client:
try:
response = await client.post(
url,
json=payload,
headers={'Content-Type': 'application/json'},
)
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as exc:
raise RuntimeError(f'{exc.response.status_code}: {exc.response.text}') from exc
except httpx.RequestError as exc:
raise RuntimeError(f'Ollama connection failed: {exc}') from excYou would then call await self._post_chat_async(payload) directly from generate_content_async.
|
All review comments have been addressed: The PR is ready for re-review. Thanks! |
🚀 PR: Native Ollama LLM Integration + Example Project + Full Unit Tests
Includes: Critical Fix for Ollama Cloud Tool-Calling + Comparison Test with LiteLLM
🔗 Link to Issue or Description of Change
No existing issue.
Submitting this as a major feature contribution that fills a critical gap in ADK’s LLM provider ecosystem.
❗ Problem
Google ADK currently relies on LiteLLM for Ollama usage (
ollama_chat/...).However, LiteLLM still fails with Ollama when using cloud models such as:
glm-4.6:cloudgpt-oss:20b-cloudThe failures include:
❌ LiteLLM + Ollama Cloud → Broken Tool Calling
In short:
This makes ADK incomplete for:
🔥 Why This Feature Is Critical
✔ This PR provides the first working, reliable tool-calling path for Ollama Cloud models inside ADK.
Native Ollama backend → Stable.
LiteLLM backend → Broken for cloud models.
The ADK ecosystem must support Ollama Cloud + tools — and this PR provides exactly that.
✅ Solution — Full Native Ollama Support
This PR adds:
✔ 1. New Native Backend —
ollama_llm.pyImplements
BaseLlmwith robust support for:🧩 Core Features
POST /api/chatgoogle_adk.xxx🎛 Model Routing
Supports all of these: