Skip to content

On-the-fly Conversion During Download #18

@Integer-Ctrl

Description

@Integer-Ctrl

Currently, files from the Databus are downloaded in their original compression format. A helpful enhancement would be to allow users to specify a target compression format that files should be converted to on the fly during download.

This would make it easier to unify datasets with consistent compression formats, save disk space, or integrate data into pipelines that expect specific formats.

Proposed Feature:
Introduce a new CLI option, for example: --convert-to [format]. The supported formats could initially be limited to a small set (e.g., bz2, gz, xz).

Implementation Considerations:

  • Already compressed files:
    • Such files must be decompressed before being re-compressed into the target format
    • The process should skip recompression if the source and target formats are identical
  • Filtering files to convert:
    • Not all files may need conversion
    • Introduce an optional CLI option like --convert-from [format] to specify which source formats should be converted (e.g., only convert .bz2 files to .gz)
    • Alternatively, allow automatic detection of compressible formats (based on file extension)

Metadata

Metadata

Assignees

No one assigned

    Labels

    downloadIssue related to data download functionality from databus

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions