Skip to content

Conversation

@Yuqi-Du
Copy link
Contributor

@Yuqi-Du Yuqi-Du commented Apr 14, 2025

This PR will add ModelUsage as part the BatchedRerankingResponse.
The idea is to generalize ModelUsage that will be common shared by reranking and embedding model service.

public class ModelUsage 
  public final ProviderType providerType;
  public final String provider;
  public final String model;

  // requestBytes will be calculated according to the request http body to the provider.
  private int requestBytes = 0;
  
  // if content-length exists in provider http response, use that, otherwise, calculate bytes from http body in the response.
  private int responseBytes = 0;

  private int promptTokens = 0;
  private int totalTokens = 0;

Note that, gRPC prototype also changed to reflect the shared modelUsage structure.
So This PR will has corresponding EGW changes.

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@Yuqi-Du Yuqi-Du requested a review from a team as a code owner April 14, 2025 19:14
@amorton
Copy link
Contributor

amorton commented Jun 5, 2025

Picking this work up, and combining it with #1865

@amorton amorton changed the title Reranking model usage metering structure. Usage metering for embedding and reranking Jun 17, 2025
Copy link
Contributor

@amorton amorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed by yuqi

@amorton amorton merged commit 48857a2 into main Jun 18, 2025
3 checks passed
@amorton amorton deleted the yuqi/rerank-metering branch June 18, 2025 18:14
Hazel-Datastax added a commit that referenced this pull request Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants