Skip to content

First-time YouTube video processing randomly truncates transcript #1898

@tfriedel

Description

@tfriedel

Summary

When processing a YouTube video for the first time (uncached), Gemini randomly truncates the transcript to a small portion of the video (0.5%-27% coverage). Subsequent requests for the same video work correctly due to caching.

Environment details

  • Programming language: Python
  • OS: Linux (WSL2)
  • Language runtime version: Python 3.13.6
  • Package version: google-genai 1.52.0

Configuration

  • Model: gemini-3-flash-preview
  • Settings: thinking_budget=0, fps=0.1, media_resolution=LOW

Observations

We ran two sequential tests on the same video: first without explicit offsets, then with offsets. The first test processes the video fresh; the second benefits from caching.

Video 1: https://www.youtube.com/watch?v=lMAnY2B1UnM (27:21 duration)

First uncached run:

Test Order Offsets Cached Last Timestamp Coverage
1st No No 00:09 0.5%
2nd Yes Yes 27:08 99.2%

Video 2: https://www.youtube.com/watch?v=tdIUMkXxtHg (25:30 duration)

First uncached run:

Test Order Offsets Cached Last Timestamp Coverage
1st No No 25:21 99.4%
2nd Yes Yes 06:49 26.7%

Subsequent runs (both cached):

Both tests achieve 90-99% coverage consistently once caching is active.

Key Findings

  1. First request is unreliable: The first (uncached) request for a video randomly truncates
  2. Caching masks the issue: Second request uses cached_content_token_count and works correctly
  3. Not offset-related: Truncation occurs randomly regardless of offset settings
  4. Same input tokens: Both requests show identical prompt_token_count, confirming all video data is sent

Reproduction

from google import genai
from google.genai import types

VIDEO_URL = "https://www.youtube.com/watch?v=NEW_VIDEO_ID"  # Use a fresh video

PROMPT = """Transcribe this video. Return JSON with format:
{
  "transcript_segments": [
    {"timestamp": "MM:SS", "text": "transcribed text"}
  ]
}
Include all speech from the entire video."""

SCHEMA = {
    "type": "object",
    "properties": {
        "transcript_segments": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "timestamp": {"type": "string"},
                    "text": {"type": "string"},
                },
                "required": ["timestamp", "text"],
            },
        }
    },
    "required": ["transcript_segments"],
}

client = genai.Client(api_key="...")

# First request to a NEW video (uncached) - randomly truncates
video_part = types.Part(
    file_data=types.FileData(
        file_uri=VIDEO_URL,
        mime_type="video/mp4"
    ),
    video_metadata=types.VideoMetadata(fps=0.1),
)

config = types.GenerateContentConfig(
    response_mime_type="application/json",
    response_json_schema=SCHEMA,
    media_resolution=types.MediaResolution.MEDIA_RESOLUTION_LOW,
    max_output_tokens=65536,
    thinking_config=types.ThinkingConfig(thinking_budget=0),
)

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[video_part, PROMPT],
    config=config,
)
# Response may be truncated to first few seconds of video

Expected Behavior

First-time video processing should reliably transcribe the entire video, not randomly truncate.

Impact

  • Applications processing new videos may silently receive incomplete transcripts
  • The issue is masked by caching, making it hard to detect in testing
  • Users may only notice when processing videos for the first time

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions