Is cached_content_token_count supposed to only count "full" cache hits?


I was reading over implicit context caching here https://docs.cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview, and my understanding is that the implicit caching would work with partial hits as long as the prefix is fixed. However, I was only able to retrieve non-zero cached_content_token_count when the requests were exactly the same. 

Using a slightly different code from the notebook in the docs gives me (at least looking at the usage metadata) no cache hit at all

```python
def main():
    client = Client(
        vertexai=True,
        project=GCP_PROJECT,
        location="us-central1",
    )
    MODEL_ID = "gemini-2.5-flash"
    NUM_ATTEMPTS = 3  
    texts = [
        "Write a short and engaging blog post based on this image.",
        "Describe this image with three words.",
        "What is this image about?",
    ]
    for i in range(NUM_ATTEMPTS):
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=[
                types.Part.from_uri(
                    file_uri="https://storage.googleapis.com/cloud-samples-data/generative-ai/image/a-man-and-a-dog.png",
                    mime_type="image/png",
                ),
                texts[i],
            ],
        )

        cached_token_count = response.usage_metadata.cached_content_token_count or 0

        print(f"#{i + 1} Attempt")
        print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
        print(f"Cached tokens: {cached_token_count}")
        print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
        print(f"Total tokens: {response.usage_metadata.total_token_count}")
        print()

        if cached_token_count > 0:
            print(response.usage_metadata.cache_tokens_details)

```

Results in

```
#1 Attempt
Input tokens: 2334
Cached tokens: 0
Output tokens: 316
Total tokens: 4012

#2 Attempt
Input tokens: 2329
Cached tokens: 0
Output tokens: 6
Total tokens: 3208

#3 Attempt
Input tokens: 2328
Cached tokens: 0
Output tokens: 259
Total tokens: 3527
```


I was expecting at least the image tokens to be cached.
Are partial token count hits available?
Also, is there a difference for caching system instructions  (`config` parameter) versus `contents`?
The way we've been working with is having the system instructions as fixed instructions defined as a `types.GenerateContentConfig` and the variable text as `types.Part(text=text)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is cached_content_token_count supposed to only count "full" cache hits? #1896

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is cached_content_token_count supposed to only count "full" cache hits? #1896

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions