-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Move newPosition to nextValidLedger:-1 to avoid cursor position and ledger inconsistency in ManagedCursorImpl #25117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes cursor position and ledger inconsistency in ManagedCursorImpl by moving the mark delete position to nextValidLedger:-1 when all entries in the current ledger are consumed. This addresses potential inconsistencies where the cursor position and ledger state could become misaligned.
Key Changes
- Modified
ManagedCursorImpl.asyncMarkDelete()to intelligently move cursor position tonextValidLedger:-1when appropriate - Added snapshot positions to local variables to prevent race conditions
- Added four comprehensive test cases to verify the new mark delete behavior
- Updated existing tests to reflect the new cursor positioning behavior
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java |
Core logic change to move cursor position to nextValidLedger:-1 when ledger entries are fully consumed; snapshots positions to avoid race conditions |
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedCursorTest.java |
Added four new test cases (testAsyncMarkDeleteMoveToNextLedgerInNonRolloverScenario, testAsyncMarkDeleteMayMoveToNextLedgerInRolloverScenario, testAsyncMarkDeleteMoveToNextLedgerOneByOne, testAsyncMarkDeleteNextLedgerMinusOneEntryIdPosition) and updated existing tests to use relaxed assertions |
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerTest.java |
Updated testNoRetention test to use Awaitility for proper async handling and updated assertions for cursor position validation; fixed flaky tests |
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactedTopicTest.java |
Updated test assertions to expect cursor position at currentLedgerId:-1 after compaction |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/PersistentTopicProtectedMethodsTest.java |
Changed exact equality assertions to greater-than-or-equal assertions for mark delete position |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/PersistentMessageFinderTest.java |
Updated message expiry tests to expect mark delete position at nextLedgerId:-1 |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/MessageTTLTest.java |
Fixed spelling from "exacly" to "exactly" and updated test to expect cursor at currentLedgerId:-1 |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BacklogQuotaManagerTest.java |
Updated assertions to reflect that mark deleting moves cursor to nextLedgerId:-1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactedTopicTest.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/MessageTTLTest.java
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #25117 +/- ##
============================================
+ Coverage 74.44% 74.56% +0.12%
- Complexity 34046 34077 +31
============================================
Files 1899 1899
Lines 149655 149711 +56
Branches 17393 17405 +12
============================================
+ Hits 111412 111636 +224
+ Misses 29368 29199 -169
- Partials 8875 8876 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
BewareMyPower
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid cursor position and ledger inconsistency in ManagedCursorImpl whenever possible.
Could you share which test can show the inconsistency? I see some tests are added to ManagedCursorTest, which only verify the new behavior of this PR. But it's still confusing to me that what's wrong with the current behavior.
| assertThat(ledger.getCursors().getSlowestCursorPosition()).isGreaterThanOrEqualTo(lastPosition); | ||
| // cursor.asyncMarkDelete() and maybeUpdateCursorBeforeTrimmingConsumedLedger may run concurrently, | ||
| // so we can't assert properties equals here. | ||
| // assertEquals(cursor.getProperties(), properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use comment to remove the code. Just remove it.
BTW, could it be replaced by Awaitility.await()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, could it be replaced by Awaitility.await()?
Seems can't, the race condition in mark delete and ledger rollover cause this test flaky, and may impact the properties user set in asyncDelete process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so let's get #25110 merged before this PR first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#25110 also has race condition mentioned by #25117 (comment).
|
This PR is inspired by #25087. If a cursor consumes the last entry of previous ledger, I think we should move the
But, unfortunately, we can not ensure happens-before between
I think current behavior is not wrong, they all worked well besides the following code(see PR description). pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Lines 2184 to 2195 in ab65faa
|
Could there be any real issue caused by this inconsistency? It would be helpful if you can share a test to show the impacts. For example, generally computing I think it's a bit different from #25087. From what I understand, #25087 mainly fixes the bug that |
|
I think at least we should snapshot immutable positions into local variables to avoid race condition, or phenomena and logs will not match. The volatile positions may be updated by other threads when retrieving it again or logging. |
In continuous consuming, this should rarely happens. I was intened to do this thing: make sure Now, I intend to consolidate all the logic for moving the |
|
After this PR, if we want to move pulsar/managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedCursorTest.java Lines 2069 to 2080 in d72dc04
|
19325f8 to
ad55d0b
Compare
…position and ledger inconsistency in ManagedCursorImpl
…ce condition, add comments
…hTimestampNonRecoverableException() test
…mestampNonRecoverableException test
…TopicUnloading test
…ursorLedgerFailed test
…ursorLedgerFailed test
…TimeQuotaWithEmptyLedger test
…meBasedBacklogQuotaCheckWhenNoBacklog test
…sInRolloverScenario test
ad55d0b to
ffa7d8f
Compare
Motivation
Since
nextValidLedger:-1is a validmarkDeletePosition, we should movenewPositiontonextValidLedger:-1to avoid cursor position and ledger inconsistency inManagedCursorImplwhenever possible.PR #25087 also made this change.
Modifications
ManagedCursorImpl.asyncMarkDelete()method, movenewPositiontonextValidLedger:-1if we reach the following conditions:a. if
lastConfirmedEntry >= newPosition, and next ledger exists, and current ledger entries are all consumed.b. if
lastConfirmedEntry < newPosition, next ledger exists, andnewPosition == nextValidLedger:-1.I think the previous code might have a little problem.
a. If
newPosition == nextValidLedger:n, n is an non-negative number, we might set newnewPositiontonextValidLedger:nwhich is greater thanlastConfirmedEntry.b. If
markDeletePositionis lagging behind the LAC by several ledgers, andnextValidLedgerexists, we may ack some un-consumed messages.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 2184 to 2195 in ab65faa
And snapshot positions into a local variable to avoid race condition.
Add
testAsyncMarkDeleteMoveToNextLedgerInNonRolloverScenario,testAsyncMarkDeleteMayMoveToNextLedgerInRolloverScenario,testAsyncMarkDeleteMoveToNextLedgerOneByOne,testAsyncMarkDeleteNextLedgerMinusOneEntryIdPositiontests inManagedCursorTestto verify the code change.Fix tests due to this PR's code change.
Fix some flaky tests introduced by PR [fix][broker] Fix cursor position persistence in ledger trimming #25087.
Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: oneby-wang#19