Skip to content

__pack_tensor__ should be in the beginning of the file to avoid seeking the whole file #162

@dmikushin

Description

@dmikushin

Hi @FrancescAlted ,

I have another concern about __pack_tensor__. According to hexedit, the __pack_tensor__ entry is located in the end of .bl2 file. I think this is an inefficient choice for large files. Suppose I have a 10 GiB bl2 file. I don't want to read it entirely, but knowing its shapes is essential for almost any usecase. So in order to read the shape, the c-blosc2 would need to fseek() up to the end of file. Of course, seeking is much faster than reading the content, but the file I/O would still need to hop over the inodes of the fragmented representation of big file in the filesystem. So why not to eliminate all this extra load on the filesystem by always placing metadata nodes in the beginning of the file? Is there an industry standard or practice that requires metadata to be placed in the end of file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions