Skip to content

Conversation

@swuferhong
Copy link
Contributor

Purpose

Linked issue: #1397

Brief change log

Tests

API and Format

Documentation

@swuferhong swuferhong marked this pull request as draft August 3, 2025 08:11
@swuferhong swuferhong force-pushed the rebalance-plan branch 9 times, most recently from 061ed6b to 6bf4aca Compare August 6, 2025 03:54
@polyzos polyzos force-pushed the main branch 3 times, most recently from d88c76c to 434a4f4 Compare August 31, 2025 15:13
@swuferhong swuferhong marked this pull request as ready for review September 3, 2025 07:05
@swuferhong swuferhong force-pushed the rebalance-plan branch 3 times, most recently from 967ccb0 to 85341ca Compare September 4, 2025 01:34
@swuferhong swuferhong force-pushed the rebalance-plan branch 3 times, most recently from 60fd7ef to ed3a37a Compare October 23, 2025 08:31
@swuferhong swuferhong force-pushed the rebalance-plan branch 2 times, most recently from 18e28c0 to 541782e Compare January 5, 2026 02:54
Copy link
Contributor

@platinumhamburg platinumhamburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swuferhong Thanks for driving this critical and important work—I have a few comments on this PR.

public RebalancePlan(
RebalanceStatus rebalanceStatus, Map<TableBucket, RebalancePlanForBucket> bucketPlan) {
String rebalanceId,
RebalanceStatus rebalanceStatus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of RebalanceStatus inside RebalancePlan is confusing to me. From the overall design, it appears that the concepts of Plan and Execution are intentionally separated—an elegant approach that aligns with common software architecture principles. However, in the current implementation, they seem tightly coupled.

Beyond this issue, the name rebalanceId feels ambiguous: does it refer to a planId, a progressId, or is it meant to unify both? This coupling isn’t ideal. For example, after a plan is generated, its execution might fail and be retried, which would break the one-to-one mapping between a plan and its corresponding progress instance.

After a broader review, I realized the original design intent: in fact, RebalanceProgress (or RebalanceTask) is the first-class entity. The Plan is merely an internal, ephemeral artifact—generated once per RebalanceProgress, never reused, never externally observed, and not subject to retries.

Given this understanding, perhaps what should be stored in ZooKeeper isn’t the RebalancePlan class itself, but rather RebalanceProgress, with the plan as an internal field within it. @swuferhong @wuchong What do you think?

Copy link
Member

@wuchong wuchong Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@platinumhamburg I agree.

How about this:

CompletableFuture<RebalancePlan> rebalance(List<GoalType> priorityGoals, boolean dryRun);
==>
// triggers a rebalance task, and returns the rebalance id, in the future we can make the plan generation async
CompletableFuture<String> rebalance(List<GoalType> priorityGoals);
// returns a dry run plan for the goals, introduce in future, remove for now. 
// CompletableFuture<RebalancePlan> rebalancePlan(List<GoalType> priorityGoals);

// return optional progress. empty if no rebalance in progress. 
CompletableFuture<Optional<RebalanceProgress>> listRebalanceProgress(@Nullable String rebalanceId);

// RebalanceProgress still contains rebalance id, status, and plan to check the detailed progress per bucket. 


// In ZK, we should json serialize RebalanceTask to distinguish class `RebalanceProgress` and `RebalancePlan`
// because it records the top-level status, but not record per-bucket rebalance status

What do you think @swuferhong ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The new api: CompletableFuture<RebalancePlan> rebalancePlan(List<GoalType> priorityGoals); will be introduce in an individual issue.

@swuferhong swuferhong force-pushed the rebalance-plan branch 4 times, most recently from 6616c63 to d16bd4f Compare January 7, 2026 01:45
Copy link
Contributor

@platinumhamburg platinumhamburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments.

@swuferhong swuferhong force-pushed the rebalance-plan branch 3 times, most recently from 5a0e50b to 995e3d2 Compare January 7, 2026 06:39
(replica) -> {
// For rebalance case. the replica state already set to null in method
// stopRemovedReplicasOfReassignedBucket. so we need not reset it again.
if (coordinatorContext.getReplicaState(replica) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that null is a very ambiguous state—would it be better to define a clear, explicit state for use in rebalance scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null is not define only for reblance. NonExist will change the state in coordinatorContext to null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants