Skip to content

Is there interest in a new HDF5 wrapper? #104

@1uc

Description

@1uc

HighFive has the following properties:

  • C++14,
  • header-only,
  • "safe" by exclusive ownership,
  • automatic conversion of C++ types to HDF5 types,
  • and stable/mature.

Hence, it has a few weaknesses:

  1. The combo of "C++14 and mature" means we'll never make it to C++26 (reflection). Even if we did, it's very much not clear we could make it worthwhile for users (not just library maintainers).
  2. Header-only prevents us from having a single singleton. Each shared library will have its own copy of the global variable. This is what's blocking us from making HighFive thread-safe (not concurrent, just safe).
  3. If an HDF5 object is created with HighFive, it can only be destroyed by HighFive. Ownership can't be transferred to a different library. In exchange for this inconvenience, it's impossible to free an HighFive controlled ID too often or too rarely.

Other motivating reasons to rewrite:

  1. An architecture like nanobind which at it's core is just one RAII-style wrapper that supports both modes of acquiring (w/ and w/o incrementing the reference count) an id and both modes of releasing an id (w/ and w/o decrementing the refcount). A library like nanobind doesn't have sole ownership of the id, if the programmer chooses to, they can transfer ownership to a different library. It's a nice feature.
  2. Reflection can likely be used to automatically map C++ structs to HDF5 types.
  3. We have mdspan now. I suspect we can build a nice library around it; and users wont need to create their own specializations of the Inspector.
  4. Allow users to explicitly control allocation of arrays (during reading).
  5. Concepts are an excellent improvement of SFINAE.

Some issues we can't clean up:

  • HighFive uses size_t everywhere in the interface as a "wrapper" for hsize_t. It seems quite daunting to confidently fix this without any breakage of existing code. Therefore, 32-bit wont work.
  • HighFive is very liberal about using std::vector for shapes. This will cause pointless allocations, which likely doesn't cost must runtime, but isn't desirable for embedded applications.
  • The mapping of C++ types to HDF5 types is globally unique: std::array is always a one-dimensional array of "somethings". It's never an scalar HDF5 object of type ARRAY. The user gets no say in the matter.

I'm thinking it should be a layered library:

  1. Low level: thoughtless C++ wrappers of HDF5 functions, e.g. error codes are converted to exceptions and RAII-style wrappers of the different HDF5 IDs.
  2. Mid level: Somewhat convenient ways of creating files, groups, dataset, attributes and everything else that exists in HDF5. Everything should be possible; worst case by calling HDF5 C API. This layer has no surprises and should make no choices.
  3. High level: Much more opinionated and permitted to make choices on the users behalf, e.g. compression requires chunking. In this layer the library is permitted to add chunking if the user requested compression, but didn't request any chunking. This is also a convenience layer to make common tasks as simple as possible.

I can't see a nice way to transform HighFive. Instead, I'm thinking one could start over and create a new library for a new C++ phase/era. The question is then: is there any interest in a different wrapper? Maybe HighFive is good enough and anyway we don't really need these types of wrapper libraries anymore, an LLM can generate the boilerplate for us. However, if there is interest, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions