Skip to content

Conversation

@raphaelmansuy
Copy link
Owner

No description provided.

This fixes the issue where token counts appeared to nearly double at each
call in the REPL display. The root cause was that the Gemini API returns
cumulative token counts for multi-turn conversations, but the display was
showing these cumulative values as if they were per-request metrics.

Changes:
- Modified SessionTokens struct to track previous totals for each component
  (prompt, response, cached, thoughts, tool_use)
- Updated RecordMetrics to calculate per-request deltas by subtracting
  previous totals from current API response values
- Each TokenMetrics now represents only the cost of that single request
  (input + output + cache), not cumulative session totals
- Updated tests to expect correct per-request delta calculations

Now when making multiple requests in a session:
- First request shows accurate token usage
- Second request shows ONLY the tokens consumed by that request
- Session totals accumulate correctly from per-request deltas
- No more doubling or misleading token metrics
Modified the event display system to use the per-request token metrics
that have already been calculated with deltas, instead of creating new
metrics from raw API values.

Changes:
- Added GetLastMetric() method to SessionTokens to retrieve the most
  recently recorded metric with correct per-request delta calculations
- Updated PrintEventEnhanced() in event.go to use GetLastMetric()
  instead of reconstructing metrics from UsageMetadata

This ensures that when displaying token metrics in the spinner:
- Metrics show only the tokens consumed by the current request
- Cached tokens are properly accounted for
- No misleading cumulative values are displayed
- Each event shows accurate per-request token consumption
- Add TestFormatSessionSummary_WithThinkingTokens to verify thinking tokens in session summaries
- Add TestFormatGlobalSummary to test global summary formatting
- All 11 tracking tests passing
- Verifies thinking tokens are properly formatted and displayed in all metrics contexts
Display improvements:
- Inline session summary now shows 'Session: 28K actual | 26K cached (92%) | 2K response'
  instead of raw totals, making cache efficiency immediately visible

Session summary now displays:
- 💰 Cost Metrics: actual vs cached tokens with cost savings estimate
- 🔧 Token Breakdown: detailed component breakdown (input, output, thinking, tool use)
- 📈 Session Efficiency: cache hit rate %, avg tokens/request, session duration

Key metrics that matter:
- 'Actual Tokens' = new prompt + response (what you pay for)
- 'Cached Tokens' = tokens reused from prompt cache (90% cost savings)
- 'Cache Hit Rate %' = percentage of processed tokens saved via caching
- 'Cost Savings' = rough estimate of token savings from cache reuse

This helps users understand cache efficiency and actual API costs at a glance.
…ementation

- METRICS_DISPLAY_GUIDE.md: User-friendly reference for understanding token metrics
  - Explains actual vs cached tokens
  - Cache hit rate interpretation
  - Cost savings calculations
  - Optimization tips

- Updated metrics formatter (RenderTokenMetrics):
  - Shows 'Session: 28K actual | 26K cached (92%) | 2K response' format
  - Calculates cache efficiency percentage
  - Uses compact notation for readability
  - New formatCompactNumber() helper for K/M abbreviations

- Updated session summary (FormatSessionSummary):
  - Three-section layout: Cost Metrics, Token Breakdown, Session Efficiency
  - Shows actual tokens (prompt + response) separately
  - Displays cache hit rate percentage
  - Shows estimated cost savings from caching
  - Better labels and structure for understanding
…indicators

Inline Session Summary:
- Changed 'actual' → 'cost' (clearer what you're paying for)
- Changed 'response' → 'out' (shorter, clear output indicator)
- Added cache efficiency indicators (�� excellent, ✅ good, ⚠️ modest, ❌ minimal)
- Format: 'Session: cost:21K | cached:20K (48% good) | out:442'

Session Summary Details:
- 'Actual Tokens' → 'New Tokens' (explicit about new vs cached)
- 'Cached Tokens' → 'Cache Reuse' (describes what cached tokens represent)
- 'Saved Cost' → 'Cost Savings' (more action-oriented)
- 'Total Proc' → 'API Billing' (clarifies billing metric)

Benefits:
- Users immediately see if cache is working (via emoji indicator)
- Terminology clearly describes cost vs reuse vs billing
- Visual feedback on cache effectiveness: 80%+=excellent, 50%+=good, 20%+=modest, <20%=minimal
- Easier to scan and understand at a glance
- No manual percentage interpretation needed
Changed 'out:X' → 'response:X' in inline session summary.

Why this improves clarity:
- 'out' was ambiguous and unclear
- 'response' explicitly means AI output tokens
- Matches terminology in session summary breakdown
- Clear unit: tokens (always abbreviated as numbers)
- Consistent with 'cost', 'cached', 'response' terminology

Example:
  Before: Session: cost:21K | cached:20K (48% good) | out:186
  After:  Session: cost:21K | cached:20K (48% good) | response:186

All values are in TOKENS (abbreviated as K for thousands)
Changed 'response:X' → 'token:X' in inline session summary.

Why 'token' is better:
- Explicit unit: token (not ambiguous like 'response')
- All three metrics now clearly use same unit: cost, cached, token
- No confusion with bytes/octets
- Simple and direct
- Perfect match for API concept

Display now:
  Session: cost:21K | cached:20K (48% ⚠️ modest) | token:512

All values clearly in TOKENS:
- cost:X = new tokens you paid for
- cached:X = tokens reused from cache
- token:X = output tokens generated
- K = thousands (21K = 21,000 tokens)
@raphaelmansuy raphaelmansuy merged commit f2b611b into main Nov 16, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants