Skip to content

Conversation

@binary-signal
Copy link
Contributor

Purpose

Linked issue: close #2164

Fix IndexOutOfBoundsException when writing rows with array columns where the total number of array elements exceeds INITIAL_CAPACITY (1024) while the row count stays below it.

Brief change log

In ArrowWriter.writeRow(), the handleSafe flag is determined by comparing row count against INITIAL_CAPACITY:

boolean handleSafe = recordsCount >= INITIAL_CAPACITY;

When handleSafe = false, Arrow writers use vector.set() which doesn't auto-grow the buffer. The bug is in ArrowArrayWriter.doWrite() which passes the parent's handleSafe flag to the element writer. However, array element indices grow based on cumulative element count, not row count.

Example: 250 rows with 10-element arrays → row count (250) < 1024 so handleSafe = false, but total elements (2500) exceeds the vector's initial capacity, causing IndexOutOfBoundsException.

Fix:
Always use safe writes (handleSafe = true) for array element writers in ArrowArrayWriter.doWrite(), since element indices can exceed INITIAL_CAPACITY independently of row count.

// Before
elementWriter.write(fieldIndex, array, arrIndex, handleSafe);

// After
elementWriter.write(fieldIndex, array, arrIndex, true);

Tests

  • Added ArrowReaderWriterTest#testArrayWriterWithManyElements: writes 200 rows with 10-element arrays (2000 total elements), verifying serialization succeeds and data can be read back correctly.

API and Format

No API or storage format changes.

Documentation

No documentation changes needed. This is a bug fix.

Copy link
Contributor

@rionmonster rionmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — looks pretty straightforward. Approved! 👍

Copy link
Contributor

@XuQianJin-Stars XuQianJin-Stars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@vamossagar12 vamossagar12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

for (int arrIndex = 0; arrIndex < array.size(); arrIndex++) {
int fieldIndex = offset + arrIndex;
elementWriter.write(fieldIndex, array, arrIndex, handleSafe);
// Always use safe writes for array elements because the element index (offset +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention in the comments on the class that the handleSafe field is ignored when writing.

@platinumhamburg
Copy link
Contributor

+1,just a minor suggestion

binary-signal and others added 2 commits December 31, 2025 10:15
…nt count exceeds INITIAL_CAPACITY (apache#2165)

Signed-off-by: binary-signal <binary-signal@github.noreply.com>
…andleSafe=true with dynamic check based on fieldIndex
@wuchong wuchong force-pushed the fix-arrow-writer-index-out-of-bounds branch from e73e61f to dfd1166 Compare December 31, 2025 02:18
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appended a commit from @platinumhamburg #2287 to improve the fix.

@wuchong wuchong changed the title safe writes for array [common] Fix IndexOutOfBoundsException in ArrowArrayWriter when element count exceeds INITIAL_CAPACITY Dec 31, 2025
// arrIndex) can exceed INITIAL_CAPACITY even when the row count doesn't. The parent's
// handleSafe is based on row count, but array element indices grow based on the total
// number of elements across all arrays, which can be much larger.
elementWriter.write(fieldIndex, array, arrIndex, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elementWriter.write(fieldIndex, array, arrIndex, true);
boolean elementHandleSafe = fieldIndex >= ArrowWriter.INITIAL_CAPACITY;
elementWriter.write(fieldIndex, array, arrIndex, elementHandleSafe);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could apply a clear conditional judgment here.

@wuchong wuchong merged commit 7fad656 into apache:main Dec 31, 2025
6 checks passed
wuchong pushed a commit that referenced this pull request Dec 31, 2025
…nt count exceeds INITIAL_CAPACITY (#2165)

Signed-off-by: binary-signal <binary-signal@github.noreply.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] IndexOutOfBoundsException when writing rows with array columns to KV table

6 participants