Persistent RESOURCE_EXHAUSTED (429) Error in Batch Processing Despite Usage Below Limits


I am willing to provide any additional information, logs, or perform any necessary steps to help trace and resolve this issue until it's fully fixed.

**Description:**  
I am experiencing persistent RESOURCE_EXHAUSTED errors (code 429) when using batch processing with the Gemini API, even though my enqueued tokens are well below the documented limits. This is severely impacting my workflow, and I believe it may be a backend issue or misconfiguration rather than actual quota exhaustion.

**Environment:**  
- SDK: google-generative-ai (latest version)  
- Model: gemini-2.5-flash
- Python version: 3.13
- Project Details:  
  - Project Name: projects/1045068963884  
  - Project Number: 1045068963884  
  - Project ID: gen-lang-client-0203365644  

**Steps to Reproduce:**  
1. Set up a paid billing account (we have a Pro-tier account enabled for higher limits).  
2. Use batch processing API as per the documentation: https://ai.google.dev/gemini-api/docs/rate-limits#batch-api.  
3. Initially submitted batches with size 1000, totaling ~2.5 million enqueued tokens—encountered 429 errors on some requests.  
4. Reduced batch size to 500, then to 250—now enqueued tokens are under 700,000, but errors persist.  
5. Checked usage in AI Studio (https://aistudio.google.com/usage?tab=rate-limit), which shows no exceedance.  

**Error Message:**  
```
RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. ', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}]}}
```

**Expected Behavior:**  
According to the rate limits docs, for gemini-2.5-flash in a paid tier, enqueued tokens should support up to 3 million (or more in higher tiers) without errors.

**Actual Behavior:**  
Errors occur even with low usage. No clear way to monitor exact enqueued tokens per batch in real-time, which makes debugging difficult.

**Additional Context:**  
- We proceeded strictly according to the limits mentioned in the docs.  
- There is no dedicated monitoring tool for enqueued tokens, which should be added for better transparency.  
- This feels like a policy issue where paid customers are left without adequate support channels. I am very frustrated as we've paid for Pro access specifically for batch processing.

Please let me know what else I can provide—screenshots, code snippets, or retry attempts—to help diagnose this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent RESOURCE_EXHAUSTED (429) Error in Batch Processing Despite Usage Below Limits #1901

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persistent RESOURCE_EXHAUSTED (429) Error in Batch Processing Despite Usage Below Limits #1901

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions