Skip to content

Conversation

@luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented Dec 16, 2025

Purpose

Linked issue: close #1893

Brief change log

  • Introduce a option to control how long the tiering will last before force to complete
  • Tiering Enumerator will record the deadline of tiering for each table, when the deadline reach for a table
    • mark all the pending split for the table to force to ignore so that although reader recieve the split, it can just ignore it to force to complete the tiering (we still need to send the split since the tiering commiter operator need to collect all the split for one round of tiering)
    • send a event to TieringReader to notify it the table reach the max duration
  • when TieringReader` recieves the event, force to complete the split

Tests

TieringSourceReaderTest, TieringSourceReaderTest#testTableReachMaxTieringDuration,TieringITCase

API and Format

Documentation

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements tiering timeout support to prevent long-running table tiering operations from blocking other tables. The feature introduces a maximum duration for tiering a single table, after which it will be force-completed or skipped.

Key Changes

  • Added forceIgnore flag to TieringSplit and its subclasses to mark splits that should be skipped due to timeout
  • Implemented periodic timeout checking in TieringSourceEnumerator with configurable max duration and detection interval
  • Introduced TieringReachMaxDurationEvent to notify readers when a table reaches max tiering duration
  • Updated split handling in TieringSplitReader to force-complete in-progress log splits and skip new splits when timeout occurs

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
TieringSplit.java Added forceIgnore field and methods to mark splits for skipping
TieringSnapshotSplit.java Added constructors supporting forceIgnore parameter
TieringLogSplit.java Added constructors supporting forceIgnore parameter
TieringSplitSerializer.java Updated serialization to include forceIgnore field
TieringSourceEnumerator.java Implemented periodic timeout checking, deadline tracking, and timeout event broadcasting
TieringSplitReader.java Added logic to force-complete log splits and skip splits marked with forceIgnore
TieringSourceReader.java Integrated timeout event handling and custom fetcher manager
TieringSourceFetcherManager.java New class to manage timeout notifications to split readers
TieringReachMaxDurationEvent.java New event class to signal table timeout to readers
TieringSourceOptions.java Added configuration options for max duration and detection interval
TieringSource.java Updated builders to support new timeout configuration parameters
LakeTieringJobBuilder.java Wired up timeout configuration from Fluss config
TieringSplitGenerator.java Removed hardcoded numberOfSplits parameter, changed log levels to DEBUG
TieringSplitSerializerTest.java Added tests for forceIgnore serialization and updated string representations
TieringSourceEnumeratorTest.java Added timeout test, refactored assertions to use containsExactlyInAnyOrderElementsOf, added helper methods
TieringSourceReaderTest.java New test file for testing timeout event handling at reader level
Comments suppressed due to low confidence (2)

fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/split/TieringSplitGenerator.java:339

  • The log level was changed from INFO to DEBUG for this message about skipping splits when offset conditions are met. This is appropriate as it reduces log verbosity for expected behavior. However, ensure this doesn't make it harder to diagnose why certain tables aren't being tiered, as this condition might occur frequently during normal operation and could be useful for troubleshooting.
                LOG.debug(
                        "The lastCommittedBucketOffset {} is equals or bigger than latestBucketOffset {}, skip generating split for bucket {}",
                        lastCommittedBucketOffset,
                        latestBucketOffset,
                        tableBucket);
                return Optional.empty();
            }
        }
    }

    private Optional<TieringSplit> generateSplitForLogTableBucket(
            TablePath tablePath,
            TableBucket tableBucket,
            @Nullable String partitionName,
            @Nullable Long lastCommittedBucketOffset,
            long latestBucketOffset) {
        if (latestBucketOffset <= 0) {
            LOG.debug(
                    "The latestBucketOffset {} is equals or less than 0, skip generating split for bucket {}",
                    latestBucketOffset,
                    tableBucket);
            return Optional.empty();
        }

        // the bucket is never been tiered
        if (lastCommittedBucketOffset == null) {
            // the bucket is never been tiered, scan fluss log from the earliest offset
            return Optional.of(
                    new TieringLogSplit(
                            tablePath,
                            tableBucket,
                            partitionName,
                            EARLIEST_OFFSET,
                            latestBucketOffset));
        } else {
            // the bucket has been tiered, scan remain fluss log
            if (lastCommittedBucketOffset < latestBucketOffset) {
                return Optional.of(
                        new TieringLogSplit(
                                tablePath,
                                tableBucket,
                                partitionName,
                                lastCommittedBucketOffset,
                                latestBucketOffset));
            }
        }
        LOG.debug(
                "The lastCommittedBucketOffset {} is equals or bigger than latestBucketOffset {}, skip generating split for bucket {}",
                lastCommittedBucketOffset,
                latestBucketOffset,
                tableBucket);

fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/TieringSplitReader.java:168

  • When a table reaches the max tiering duration, the timeout check only forces completion for log splits (lines 159-161 and 292). However, snapshot splits can also be in progress. If a table times out while processing snapshot splits, they won't be force-completed, leading to potential hangs or inconsistent behavior. Consider extending the timeout handling to snapshot splits as well.
        // may read snapshot firstly
        if (currentSnapshotSplitReader != null) {
            // for snapshot split, we don't force to complete it
            // since we rely on the log offset for the snapshot to
            // do next tiering, if force to complete, we can't get the log offset
            CloseableIterator<RecordAndPos> recordIterator = currentSnapshotSplitReader.readBatch();
            if (recordIterator == null) {
                LOG.info("Split {} is finished", currentSnapshotSplit.splitId());
                return finishCurrentSnapshotSplit();
            } else {
                return forSnapshotSplitRecords(
                        currentSnapshotSplit.getTableBucket(), recordIterator);
            }
        } else {
            if (currentLogScanner != null) {
                if (timeoutTables.contains(currentTableId)) {
                    return forceCompleteTieringLogRecords();
                }
                ScanRecords scanRecords = currentLogScanner.poll(pollTimeout);
                // force to complete records
                return forLogRecords(scanRecords);
            } else {
                return emptyTableBucketWriteResultWithSplitIds();
            }
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +301 to +311
for (TieringSplit tieringSplit : pendingSplits) {
if (tieringSplit.getTableBucket().getTableId() == reachMaxDurationTable) {
// force ignore this tiering split since the tiering for this table is timeout,
// we have to force to set to ignore the tiering split so that the
// tiering source reader can ignore them directly
tieringSplit.forceIgnore();
} else {
// we can break directly, if found any one split's table id is not equal to the
// timeout
// table, the following split must be not equal to the table id
break;
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pendingSplits list is accessed both from the periodic timeout check (line 301) and the assignSplits method (line 350), which can be called from different threads via handleSplitRequest and addSplitsBack. While assignSplits synchronizes on readersAwaitingSplit, the timeout check doesn't use any synchronization, which could lead to a ConcurrentModificationException or inconsistent state when iterating and modifying pendingSplits concurrently.

Suggested change
for (TieringSplit tieringSplit : pendingSplits) {
if (tieringSplit.getTableBucket().getTableId() == reachMaxDurationTable) {
// force ignore this tiering split since the tiering for this table is timeout,
// we have to force to set to ignore the tiering split so that the
// tiering source reader can ignore them directly
tieringSplit.forceIgnore();
} else {
// we can break directly, if found any one split's table id is not equal to the
// timeout
// table, the following split must be not equal to the table id
break;
// Access to pendingSplits must be synchronized consistently with other paths
// (e.g., assignSplits) to avoid concurrent modification.
synchronized (readersAwaitingSplit) {
for (TieringSplit tieringSplit : pendingSplits) {
if (tieringSplit.getTableBucket().getTableId() == reachMaxDurationTable) {
// force ignore this tiering split since the tiering for this table is
// timeout, we have to force to set to ignore the tiering split so that the
// tiering source reader can ignore them directly
tieringSplit.forceIgnore();
} else {
// we can break directly, if found any one split's table id is not equal to
// the timeout table, the following split must be not equal to the table id
break;
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleReachMaxTieringDurationTables,handleSplitRequest, addSplitsBack is done in a single thread


private void assignSplits() {
/* This method may be called from both addSplitsBack and handleSplitRequest, make it thread safe. */
// todo: do we need to add lock?
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment at line 340 asks "do we need to add lock?" but this question is now more critical given that the timeout checking code (lines 301-312) accesses pendingSplits without synchronization while assignSplits() synchronizes on readersAwaitingSplit. This creates a potential race condition. The TODO should either be resolved by adding proper synchronization or be updated to reflect the new concurrency concerns.

Copilot uses AI. Check for mistakes.
@luoyuxia luoyuxia force-pushed the tiering-support-commit-by-time branch 3 times, most recently from 466629a to bbd9d38 Compare December 25, 2025 08:55
@luoyuxia luoyuxia requested a review from Copilot December 25, 2025 08:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/enumerator/TieringSourceEnumerator.java:350

  • There's a potential race condition between reading and modifying pendingSplits. The handleReachMaxTieringDurationTables method (lines 301-313) iterates and modifies pendingSplits, while assignSplits (line 350) also modifies it. Both methods can be called concurrently from different async callbacks, but only readersAwaitingSplit is synchronized, not pendingSplits. This could lead to concurrent modification exceptions or inconsistent state.
        // todo: do we need to add lock?
        synchronized (readersAwaitingSplit) {
            if (!readersAwaitingSplit.isEmpty()) {
                final Integer[] readers = readersAwaitingSplit.toArray(new Integer[0]);
                for (Integer nextAwaitingReader : readers) {
                    if (!context.registeredReaders().containsKey(nextAwaitingReader)) {
                        readersAwaitingSplit.remove(nextAwaitingReader);
                        continue;
                    }
                    if (!pendingSplits.isEmpty()) {
                        TieringSplit tieringSplit = pendingSplits.remove(0);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@luoyuxia luoyuxia force-pushed the tiering-support-commit-by-time branch from bbd9d38 to 9ae01ad Compare December 25, 2025 09:55
@luoyuxia luoyuxia marked this pull request as ready for review December 25, 2025 11:32
@luoyuxia luoyuxia requested a review from Copilot December 25, 2025 11:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

this.pollTieringTableIntervalMs = pollTieringTableIntervalMs;
this.tieringTableDurationMaxMs = tieringTableDurationMaxMs;
this.tieringTableDurationDetectIntervalMs = tieringTableDurationDetectIntervalMs;
this.pendingSplits = new ArrayList<>();
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pendingSplits list is not thread-safe (ArrayList), but it's being accessed from multiple async callbacks (handleReachMaxTieringDurationTables at line 301 and generateTieringSplits at line 431) and from assignSplits which has synchronization only on readersAwaitingSplit. This could lead to ConcurrentModificationException or other race conditions. Consider using a thread-safe collection like CopyOnWriteArrayList or adding proper synchronization around all accesses to pendingSplits.

Suggested change
this.pendingSplits = new ArrayList<>();
this.pendingSplits = Collections.synchronizedList(new ArrayList<>());

Copilot uses AI. Check for mistakes.
@luoyuxia luoyuxia force-pushed the tiering-support-commit-by-time branch from 9ae01ad to bb3f75e Compare December 25, 2025 11:56
@luoyuxia luoyuxia requested a review from leonardBang December 25, 2025 11:56
@luoyuxia luoyuxia force-pushed the tiering-support-commit-by-time branch from bb3f75e to c72407d Compare December 25, 2025 12:00
@luoyuxia
Copy link
Contributor Author

@leonardBang Coud you please help me review this pr when you got some time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tiering Service supports splitting large splits and committing them separately to the lake.

1 participant