-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask #9529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @navinko for the patch.
Can it be replaced in processTableInParallel, too?
ozone/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/OmTableInsightTask.java
Lines 197 to 198 in 3b2d4e9
| Table<String, byte[]> table = omMetadataManager.getStore() | |
| .getTable(tableName, StringCodec.get(), ByteArrayCodec.get(), TableCache.CacheType.NO_CACHE); |
If so, import will also become unused:
ozone/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/OmTableInsightTask.java
Line 37 in 3b2d4e9
| import org.apache.hadoop.hdds.utils.db.ByteArrayCodec; |
|
Thanks @adoroszlai for reviewing
I see the processTableInParallel performs many background operation to make it work in parallel which requires key to be Comparable. String as Key are comparable but CodecBuffer not. --> processTableSequentially Table<byte[], byte[]> table = omMetadataManager.getStore() --> processTableInParallel , The keys are string which are comparable and under ParallelTableIteratorOperation.java we see there are operations like splitting the ranges for multiple worker and they do key comparison like key.compareTo(startKey) , since strings are comprable. Table<String, byte[]> table = omMetadataManager.getStore() This seems to be the reason there is separate flow for sequential vs parallel processing and CodecBuffer should not be used existing implementation of processTableSequentially. |
I meant replacing Table<String, CodecBuffer> table = omMetadataManager.getStore()
.getTable(tableName, StringCodec.get(), CodecBufferCodec.get(true), TableCache.CacheType.NO_CACHE);
...
try (ParallelTableIteratorOperation<String, CodecBuffer> parallelIter = new ParallelTableIteratorOperation<>( |
Please correct me if my understanding is wrong . |
Is |
Yes 'ByteArrayCodec' is thread safe . My bad , I am not very sure on other implementation details for CodecBufferCodec but looks like the Codec implementation CodecBufferCodec reuse CodecBuffer instances so once CodecBufferCodec will be used as values across multiple thread the CodeBuffer can be modified by other thread in ParallelTableIteratorOperation. |
|
@swamirishi , it would be great if you could review this. Thank you in advance. |
CodecBufferCodec can be also used in parallel function. As long as one table iterator is used by only one thread we can use CodecBufferCodec there. We can make the value use CodecBufferCodec it would inturn lead to using the RDBStoreCodecBufferIterator otherwise currently the parallel iterator would use RBDStoreByteArrayIterator which is going to be unoptimal. For this we need to get rid of the batching logic which I have seen during my previous benchmark is unnecessary since we would not be blocked on IO ever from rocksdb. Let us get rid of this valueExecutor logic Lines 83 to 85 in ad891ec
|
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navinko Thanks for the patch I have added a comment to remove the ByteArrayCodec from the parallel operation as well. Please check and lmk if you agree with it.
|
This should ideally work |
What changes were proposed in this pull request?
Replaced ByteArrayCodec with CodecBufferCodec inside processTableSequentially method.
Added unit test for processTableSequentially.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14155
How was this patch tested?
Validated with existing and new unit tests .
CI:
https://github.com/navinko/ozone/actions/runs/20353031938