-
Notifications
You must be signed in to change notification settings - Fork 709
Open
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Summary
When processing a YouTube video for the first time (uncached), Gemini randomly truncates the transcript to a small portion of the video (0.5%-27% coverage). Subsequent requests for the same video work correctly due to caching.
Environment details
- Programming language: Python
- OS: Linux (WSL2)
- Language runtime version: Python 3.13.6
- Package version: google-genai 1.52.0
Configuration
- Model:
gemini-3-flash-preview - Settings:
thinking_budget=0,fps=0.1,media_resolution=LOW
Observations
We ran two sequential tests on the same video: first without explicit offsets, then with offsets. The first test processes the video fresh; the second benefits from caching.
Video 1: https://www.youtube.com/watch?v=lMAnY2B1UnM (27:21 duration)
First uncached run:
| Test Order | Offsets | Cached | Last Timestamp | Coverage |
|---|---|---|---|---|
| 1st | No | No | 00:09 | 0.5% |
| 2nd | Yes | Yes | 27:08 | 99.2% |
Video 2: https://www.youtube.com/watch?v=tdIUMkXxtHg (25:30 duration)
First uncached run:
| Test Order | Offsets | Cached | Last Timestamp | Coverage |
|---|---|---|---|---|
| 1st | No | No | 25:21 | 99.4% |
| 2nd | Yes | Yes | 06:49 | 26.7% |
Subsequent runs (both cached):
Both tests achieve 90-99% coverage consistently once caching is active.
Key Findings
- First request is unreliable: The first (uncached) request for a video randomly truncates
- Caching masks the issue: Second request uses
cached_content_token_countand works correctly - Not offset-related: Truncation occurs randomly regardless of offset settings
- Same input tokens: Both requests show identical
prompt_token_count, confirming all video data is sent
Reproduction
from google import genai
from google.genai import types
VIDEO_URL = "https://www.youtube.com/watch?v=NEW_VIDEO_ID" # Use a fresh video
PROMPT = """Transcribe this video. Return JSON with format:
{
"transcript_segments": [
{"timestamp": "MM:SS", "text": "transcribed text"}
]
}
Include all speech from the entire video."""
SCHEMA = {
"type": "object",
"properties": {
"transcript_segments": {
"type": "array",
"items": {
"type": "object",
"properties": {
"timestamp": {"type": "string"},
"text": {"type": "string"},
},
"required": ["timestamp", "text"],
},
}
},
"required": ["transcript_segments"],
}
client = genai.Client(api_key="...")
# First request to a NEW video (uncached) - randomly truncates
video_part = types.Part(
file_data=types.FileData(
file_uri=VIDEO_URL,
mime_type="video/mp4"
),
video_metadata=types.VideoMetadata(fps=0.1),
)
config = types.GenerateContentConfig(
response_mime_type="application/json",
response_json_schema=SCHEMA,
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_LOW,
max_output_tokens=65536,
thinking_config=types.ThinkingConfig(thinking_budget=0),
)
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[video_part, PROMPT],
config=config,
)
# Response may be truncated to first few seconds of videoExpected Behavior
First-time video processing should reliably transcribe the entire video, not randomly truncate.
Impact
- Applications processing new videos may silently receive incomplete transcripts
- The issue is masked by caching, making it hard to detect in testing
- Users may only notice when processing videos for the first time
Metadata
Metadata
Assignees
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.