Skip to content

Conversation

@Symbiomatrix
Copy link

@Symbiomatrix Symbiomatrix commented Jun 6, 2025

One line fix - ggufwriter will cache to file instead of living in ram.
It's a bit slower naturally, but I think it's unreasonable to demand - what is it, 24gb ram? I get crashes on 12gb at 54% - to store a full flux model in memory, when it's a one time operation.
As mentioned in #285 , this will allow running conversions even on low resource containers (colab, space presumably).
In terms of disk space, you need <100gb free for the base, temp and converted files mainly; colab supplies that much.

@Symbiomatrix Symbiomatrix marked this pull request as draft June 6, 2025 22:37
@Symbiomatrix
Copy link
Author

It's acting a bit weird, added some prints to see if it gets stuck during writing somewhere.

@Symbiomatrix Symbiomatrix marked this pull request as ready for review June 6, 2025 23:00
@Symbiomatrix
Copy link
Author

Double checked - seems to be working correctly, just that when you use a temp file the write_tensors_to_file function takes significantly longer - another 5 minutes, I reckon. Breaking it mid run will yield a corrupted file.

Colab probably needs an extra weight to actually reclaim the space.
@Symbiomatrix
Copy link
Author

Symbiomatrix commented Jun 11, 2025

I added a model deletion flag, but colab doesn't appear to free any disk space whilst the process is running. The only way I could get it to work with FP32 is to return the writer after writing header, remove source model and then proceed with writing (per branch efficiency1). Semi manual hack but effective. Hf spaces seem to be even more limited at 50gb, colab at 70gb.

@city96
Copy link
Owner

city96 commented Jun 12, 2025

Sorry, I'm pretty swamped with IRL stuff lately so I can't really review or test stuff at the moment, but just some ideas:

There is a branch with an auto convert gradio here that might have some useful stuff you can reuse for the google colab space, including some of the logic: https://github.com/city96/ComfyUI-GGUF/blob/auto_convert/tools/tool_auto.py (+ the PR #274 )

Some other nodes for this PR specifically:

  • I think adding use_temp_file=True would make sense to have behind a launch arg, with the description mentioning that it saves RAM but makes stuff slightly slower + adds some wear to the SSD I guess (also, possible to crash if the user has temp on the system drive and is running low on space? maybe keep the default and add a note to the readme). Dunno if this needs to have a version check to make sure old versions of gguf-py don't break if that flag is present.
  • There is this PR that I still need to look at. I want to add it to the gradio on the auto_convert branch then make it a HF space similar to gguf-my-repo but yeah, I had zero time to work on any of this in the past months sadly...
    convert : ability to lazy-load safetensors remotely without downloading to disk ggml-org/llama.cpp#12820
  • Deleting the input file is a definite no-go to have in the main repo. I don't like to have any kind of destructive operations that could result in people having to re-download files or even worse, losing custom finetuned models due to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants