Skip to content

Conversation

@martani
Copy link
Contributor

@martani martani commented Dec 17, 2025

Issue

The count invariant (count == zero_count + sum(bucket_counts)) was being violated during Base2ExponentialHistogramAggregation::Merge() operations when the combined bucket range of two histograms exceeded max_buckets by exactly 1.

Changes

The offending condition if (pos_max_index > pos_min_index + max_buckets) ... fails when the combined range exceeds max_buckets by exactly 1.

Step-by-step example with max_buckets=5 at scale 0:

  • Add values 2,4,8,16,32: histogram spans indices 0-4
  • Add values 4,8,16,32,64: histogram spans indices 1-5 (64 extends one position beyond the previous range)

During Merge:

  • pos_min_index = min(0, 1) = 0
  • pos_max_index = max(4, 5) = 5
  • combined range = indices 0-5 = 6 positions = max_buckets + 1

Old condition check:

  • pos_max_index > pos_min_index + max_buckets
  • 5 > 0 + 5
  • 5 > 5 = FALSE <-- downscaling NOT triggered (BUG!)

Fixed condition check:

  • pos_max_index >= pos_min_index + max_buckets
  • 5 >= 0 + 5
  • 5 >= 5 = TRUE <-- downscaling correctly triggered

When downscaling wasn't triggered but the range exceeded max_buckets, MergeBuckets() created a result that couldn't fit in the circular buffer, causing bucket counts to be silently lost.

Prometheus rejects histograms with mismatching counts here: https://github.com/prometheus/prometheus/blob/main/model/histogram/histogram.go#L474

For significant contributions please make sure you have completed the following items:

  • [ ] CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • [ ] Changes in public API reviewed

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 17, 2025

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: martani / name: martani (9fbc1d4)

@marcalff marcalff added the pr:waiting-on-cla Waiting on CLA label Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.93%. Comparing base (5fc4707) to head (9fbc1d4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3793   +/-   ##
=======================================
  Coverage   89.93%   89.93%           
=======================================
  Files         225      225           
  Lines        7163     7163           
=======================================
  Hits         6441     6441           
  Misses        722      722           
Files with missing lines Coverage Δ
...egation/base2_exponential_histogram_aggregation.cc 91.33% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martani martani changed the title [WIP] Fix off-by-one error in Base2ExponentialHistogramAggregation::Merge() downscaling Fix off-by-one error in Base2ExponentialHistogramAggregation::Merge() downscaling Dec 17, 2025
@martani martani marked this pull request as ready for review December 17, 2025 23:28
@martani martani requested a review from a team as a code owner December 17, 2025 23:28
@marcalff marcalff removed the pr:waiting-on-cla Waiting on CLA label Dec 18, 2025
Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with detailed root cause analysis, comments and test cases: great work.

Merci pour le patch.

@marcalff marcalff changed the title Fix off-by-one error in Base2ExponentialHistogramAggregation::Merge() downscaling [SDK] Fix off-by-one error in Base2ExponentialHistogramAggregation::Merge() downscaling Dec 18, 2025
@marcalff marcalff merged commit 162246f into open-telemetry:main Dec 18, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants