Skip to content

Persistent RESOURCE_EXHAUSTED (429) Error in Batch Processing Despite Usage Below Limits #1901

@AlirezaHaghi

Description

@AlirezaHaghi

I am willing to provide any additional information, logs, or perform any necessary steps to help trace and resolve this issue until it's fully fixed.

Description:
I am experiencing persistent RESOURCE_EXHAUSTED errors (code 429) when using batch processing with the Gemini API, even though my enqueued tokens are well below the documented limits. This is severely impacting my workflow, and I believe it may be a backend issue or misconfiguration rather than actual quota exhaustion.

Environment:

  • SDK: google-generative-ai (latest version)
  • Model: gemini-2.5-flash
  • Python version: 3.13
  • Project Details:
    • Project Name: projects/1045068963884
    • Project Number: 1045068963884
    • Project ID: gen-lang-client-0203365644

Steps to Reproduce:

  1. Set up a paid billing account (we have a Pro-tier account enabled for higher limits).
  2. Use batch processing API as per the documentation: https://ai.google.dev/gemini-api/docs/rate-limits#batch-api.
  3. Initially submitted batches with size 1000, totaling ~2.5 million enqueued tokens—encountered 429 errors on some requests.
  4. Reduced batch size to 500, then to 250—now enqueued tokens are under 700,000, but errors persist.
  5. Checked usage in AI Studio (https://aistudio.google.com/usage?tab=rate-limit), which shows no exceedance.

Error Message:

RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. ', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}]}}

Expected Behavior:
According to the rate limits docs, for gemini-2.5-flash in a paid tier, enqueued tokens should support up to 3 million (or more in higher tiers) without errors.

Actual Behavior:
Errors occur even with low usage. No clear way to monitor exact enqueued tokens per batch in real-time, which makes debugging difficult.

Additional Context:

  • We proceeded strictly according to the limits mentioned in the docs.
  • There is no dedicated monitoring tool for enqueued tokens, which should be added for better transparency.
  • This feels like a policy issue where paid customers are left without adequate support channels. I am very frustrated as we've paid for Pro access specifically for batch processing.

Please let me know what else I can provide—screenshots, code snippets, or retry attempts—to help diagnose this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions