Skip to content

Conversation

@luccadibe
Copy link

This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for SYSTEMDS-3929 .

The existing tests used three datasets which were commited , for example src/test/scripts/functions/io/hdf5/in/transfusion_1.h5 , I could not find a generator script in the codebase for these.

In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "rhdf5" , which was the one already being used before for validation.

The test loads the dataset with systemds and R and verifies that the outputs match using TestUtils.compareMatrices .

Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds.
Message 11 is the Filter Pipeline Message
Message 12 is the Attribute Message

These messages seem to be applied by default by rhdf5 version 2.54.0 .

Please correct me if I'm wrong: as the ReaderHDF5 implements MatrixReader, only 2d datasets are supported, and this implementation should flatten higher dimensional datasets into 2d.
The sytemds implementation currently assumes only 2d datasets:

H5RootObject.java

	public void setDimensions(int[] dimensions) {
		this.dimensions = dimensions;
		this.row = dimensions[0];
		this.col = dimensions[1];
	}

I would like to know what systemds aims to support regarding hdf5 so the tests can reflect that, after which I can start working on fixing bugs / implementing the missing features potentially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant