Add HDF5 comprehensive tests #2369
Draft
+438
−131
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for SYSTEMDS-3929 .
The existing tests used three datasets which were commited , for example src/test/scripts/functions/io/hdf5/in/transfusion_1.h5 , I could not find a generator script in the codebase for these.
In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "
rhdf5" , which was the one already being used before for validation.The test loads the dataset with systemds and R and verifies that the outputs match using
TestUtils.compareMatrices.Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds.
Message 11 is the Filter Pipeline Message
Message 12 is the Attribute Message
These messages seem to be applied by default by
rhdf5version 2.54.0 .Please correct me if I'm wrong: as the ReaderHDF5 implements MatrixReader, only 2d datasets are supported, and this implementation should flatten higher dimensional datasets into 2d.
The sytemds implementation currently assumes only 2d datasets:
H5RootObject.java
I would like to know what systemds aims to support regarding hdf5 so the tests can reflect that, after which I can start working on fixing bugs / implementing the missing features potentially.