Investigate/token-doubling-issue #19

raphaelmansuy · 2025-11-16T04:30:33Z

No description provided.

This fixes the issue where token counts appeared to nearly double at each call in the REPL display. The root cause was that the Gemini API returns cumulative token counts for multi-turn conversations, but the display was showing these cumulative values as if they were per-request metrics. Changes: - Modified SessionTokens struct to track previous totals for each component (prompt, response, cached, thoughts, tool_use) - Updated RecordMetrics to calculate per-request deltas by subtracting previous totals from current API response values - Each TokenMetrics now represents only the cost of that single request (input + output + cache), not cumulative session totals - Updated tests to expect correct per-request delta calculations Now when making multiple requests in a session: - First request shows accurate token usage - Second request shows ONLY the tokens consumed by that request - Session totals accumulate correctly from per-request deltas - No more doubling or misleading token metrics

Modified the event display system to use the per-request token metrics that have already been calculated with deltas, instead of creating new metrics from raw API values. Changes: - Added GetLastMetric() method to SessionTokens to retrieve the most recently recorded metric with correct per-request delta calculations - Updated PrintEventEnhanced() in event.go to use GetLastMetric() instead of reconstructing metrics from UsageMetadata This ensures that when displaying token metrics in the spinner: - Metrics show only the tokens consumed by the current request - Cached tokens are properly accounted for - No misleading cumulative values are displayed - Each event shows accurate per-request token consumption

- Add TestFormatSessionSummary_WithThinkingTokens to verify thinking tokens in session summaries - Add TestFormatGlobalSummary to test global summary formatting - All 11 tracking tests passing - Verifies thinking tokens are properly formatted and displayed in all metrics contexts

Display improvements: - Inline session summary now shows 'Session: 28K actual | 26K cached (92%) | 2K response' instead of raw totals, making cache efficiency immediately visible Session summary now displays: - 💰 Cost Metrics: actual vs cached tokens with cost savings estimate - 🔧 Token Breakdown: detailed component breakdown (input, output, thinking, tool use) - 📈 Session Efficiency: cache hit rate %, avg tokens/request, session duration Key metrics that matter: - 'Actual Tokens' = new prompt + response (what you pay for) - 'Cached Tokens' = tokens reused from prompt cache (90% cost savings) - 'Cache Hit Rate %' = percentage of processed tokens saved via caching - 'Cost Savings' = rough estimate of token savings from cache reuse This helps users understand cache efficiency and actual API costs at a glance.

…ementation - METRICS_DISPLAY_GUIDE.md: User-friendly reference for understanding token metrics - Explains actual vs cached tokens - Cache hit rate interpretation - Cost savings calculations - Optimization tips - Updated metrics formatter (RenderTokenMetrics): - Shows 'Session: 28K actual | 26K cached (92%) | 2K response' format - Calculates cache efficiency percentage - Uses compact notation for readability - New formatCompactNumber() helper for K/M abbreviations - Updated session summary (FormatSessionSummary): - Three-section layout: Cost Metrics, Token Breakdown, Session Efficiency - Shows actual tokens (prompt + response) separately - Displays cache hit rate percentage - Shows estimated cost savings from caching - Better labels and structure for understanding

…indicators Inline Session Summary: - Changed 'actual' → 'cost' (clearer what you're paying for) - Changed 'response' → 'out' (shorter, clear output indicator) - Added cache efficiency indicators (�� excellent, ✅ good, ⚠️ modest, ❌ minimal) - Format: 'Session: cost:21K | cached:20K (48% good) | out:442' Session Summary Details: - 'Actual Tokens' → 'New Tokens' (explicit about new vs cached) - 'Cached Tokens' → 'Cache Reuse' (describes what cached tokens represent) - 'Saved Cost' → 'Cost Savings' (more action-oriented) - 'Total Proc' → 'API Billing' (clarifies billing metric) Benefits: - Users immediately see if cache is working (via emoji indicator) - Terminology clearly describes cost vs reuse vs billing - Visual feedback on cache effectiveness: 80%+=excellent, 50%+=good, 20%+=modest, <20%=minimal - Easier to scan and understand at a glance - No manual percentage interpretation needed

Changed 'out:X' → 'response:X' in inline session summary. Why this improves clarity: - 'out' was ambiguous and unclear - 'response' explicitly means AI output tokens - Matches terminology in session summary breakdown - Clear unit: tokens (always abbreviated as numbers) - Consistent with 'cost', 'cached', 'response' terminology Example: Before: Session: cost:21K | cached:20K (48% good) | out:186 After: Session: cost:21K | cached:20K (48% good) | response:186 All values are in TOKENS (abbreviated as K for thousands)

Changed 'response:X' → 'token:X' in inline session summary. Why 'token' is better: - Explicit unit: token (not ambiguous like 'response') - All three metrics now clearly use same unit: cost, cached, token - No confusion with bytes/octets - Simple and direct - Perfect match for API concept Display now: Session: cost:21K | cached:20K (48% ⚠️ modest) | token:512 All values clearly in TOKENS: - cost:X = new tokens you paid for - cached:X = tokens reused from cache - token:X = output tokens generated - K = thousands (21K = 21,000 tokens)

…play for clarity

…okenMetrics

raphaelmansuy added 11 commits November 16, 2025 11:33

docs: add token metrics reference guide

6b0b778

refactor: remove outdated documentation and improve token metrics dis…

cd27663

…play for clarity

refactor: clean up whitespace and improve code readability in RenderT…

28c5f14

…okenMetrics

raphaelmansuy merged commit f2b611b into main Nov 16, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate/token-doubling-issue #19

Investigate/token-doubling-issue #19

Uh oh!

raphaelmansuy commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Investigate/token-doubling-issue #19

Investigate/token-doubling-issue #19

Uh oh!

Conversation

raphaelmansuy commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants