-
Notifications
You must be signed in to change notification settings - Fork 709
Description
I am willing to provide any additional information, logs, or perform any necessary steps to help trace and resolve this issue until it's fully fixed.
Description:
I am experiencing persistent RESOURCE_EXHAUSTED errors (code 429) when using batch processing with the Gemini API, even though my enqueued tokens are well below the documented limits. This is severely impacting my workflow, and I believe it may be a backend issue or misconfiguration rather than actual quota exhaustion.
Environment:
- SDK: google-generative-ai (latest version)
- Model: gemini-2.5-flash
- Python version: 3.13
- Project Details:
- Project Name: projects/1045068963884
- Project Number: 1045068963884
- Project ID: gen-lang-client-0203365644
Steps to Reproduce:
- Set up a paid billing account (we have a Pro-tier account enabled for higher limits).
- Use batch processing API as per the documentation: https://ai.google.dev/gemini-api/docs/rate-limits#batch-api.
- Initially submitted batches with size 1000, totaling ~2.5 million enqueued tokens—encountered 429 errors on some requests.
- Reduced batch size to 500, then to 250—now enqueued tokens are under 700,000, but errors persist.
- Checked usage in AI Studio (https://aistudio.google.com/usage?tab=rate-limit), which shows no exceedance.
Error Message:
RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. ', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}]}}
Expected Behavior:
According to the rate limits docs, for gemini-2.5-flash in a paid tier, enqueued tokens should support up to 3 million (or more in higher tiers) without errors.
Actual Behavior:
Errors occur even with low usage. No clear way to monitor exact enqueued tokens per batch in real-time, which makes debugging difficult.
Additional Context:
- We proceeded strictly according to the limits mentioned in the docs.
- There is no dedicated monitoring tool for enqueued tokens, which should be added for better transparency.
- This feels like a policy issue where paid customers are left without adequate support channels. I am very frustrated as we've paid for Pro access specifically for batch processing.
Please let me know what else I can provide—screenshots, code snippets, or retry attempts—to help diagnose this.