-
Notifications
You must be signed in to change notification settings - Fork 469
[common] Fix IndexOutOfBoundsException in ArrowArrayWriter when element count exceeds INITIAL_CAPACITY #2165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[common] Fix IndexOutOfBoundsException in ArrowArrayWriter when element count exceeds INITIAL_CAPACITY #2165
Conversation
rionmonster
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM — looks pretty straightforward. Approved! 👍
XuQianJin-Stars
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
vamossagar12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| for (int arrIndex = 0; arrIndex < array.size(); arrIndex++) { | ||
| int fieldIndex = offset + arrIndex; | ||
| elementWriter.write(fieldIndex, array, arrIndex, handleSafe); | ||
| // Always use safe writes for array elements because the element index (offset + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention in the comments on the class that the handleSafe field is ignored when writing.
|
+1,just a minor suggestion |
…nt count exceeds INITIAL_CAPACITY (apache#2165) Signed-off-by: binary-signal <binary-signal@github.noreply.com>
…andleSafe=true with dynamic check based on fieldIndex
e73e61f to
dfd1166
Compare
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appended a commit from @platinumhamburg #2287 to improve the fix.
| // arrIndex) can exceed INITIAL_CAPACITY even when the row count doesn't. The parent's | ||
| // handleSafe is based on row count, but array element indices grow based on the total | ||
| // number of elements across all arrays, which can be much larger. | ||
| elementWriter.write(fieldIndex, array, arrIndex, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| elementWriter.write(fieldIndex, array, arrIndex, true); | |
| boolean elementHandleSafe = fieldIndex >= ArrowWriter.INITIAL_CAPACITY; | |
| elementWriter.write(fieldIndex, array, arrIndex, elementHandleSafe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could apply a clear conditional judgment here.
…nt count exceeds INITIAL_CAPACITY (#2165) Signed-off-by: binary-signal <binary-signal@github.noreply.com>
Purpose
Linked issue: close #2164
Fix
IndexOutOfBoundsExceptionwhen writing rows with array columns where the total number of array elements exceedsINITIAL_CAPACITY(1024) while the row count stays below it.Brief change log
In
ArrowWriter.writeRow(), thehandleSafeflag is determined by comparing row count againstINITIAL_CAPACITY:When
handleSafe = false, Arrow writers usevector.set()which doesn't auto-grow the buffer. The bug is inArrowArrayWriter.doWrite()which passes the parent'shandleSafeflag to the element writer. However, array element indices grow based on cumulative element count, not row count.Example: 250 rows with 10-element arrays → row count (250) < 1024 so
handleSafe = false, but total elements (2500) exceeds the vector's initial capacity, causingIndexOutOfBoundsException.Fix:
Always use safe writes (
handleSafe = true) for array element writers inArrowArrayWriter.doWrite(), since element indices can exceedINITIAL_CAPACITYindependently of row count.Tests
ArrowReaderWriterTest#testArrayWriterWithManyElements: writes 200 rows with 10-element arrays (2000 total elements), verifying serialization succeeds and data can be read back correctly.API and Format
No API or storage format changes.
Documentation
No documentation changes needed. This is a bug fix.