Skip to content

Conversation

@RissyRan
Copy link
Collaborator

@RissyRan RissyRan commented Dec 30, 2025

Description

Verify the converted safetensor checkpoint (GCS or local) matches the remote HuggingFace checkpoint reference.

  • Add the standalone script
  • Move the helper functions to the utils.py

We may enable lazy mode as follow up to save memory.

Tests

  • Successfully run and load from gcs: test
  • Successfully run and load from local: test
  • Mismatch report: key mismatch test

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 0% with 143 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...c/MaxText/utils/ckpt_conversion/compare_hf_ckpt.py 0.00% 129 Missing ⚠️
src/MaxText/utils/ckpt_conversion/utils/utils.py 0.00% 11 Missing ⚠️
src/MaxText/utils/ckpt_conversion/to_maxtext.py 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

This pull request introduces a valuable verification script to ensure the correctness of converted safetensor checkpoints. The script is well-structured and provides clear logging. The refactoring of helper functions into a shared utils.py file is a good improvement for code organization.

🔍 General Feedback

  • The new verification script is a great addition for improving the reliability of the checkpoint conversion process.
  • The parallel loading of safetensor files using ThreadPoolExecutor is a good choice for performance.
  • The detailed logging in the verification script is helpful for debugging potential mismatches.

One minor logging issue was found and commented on. Overall, this is a solid contribution.

Copy link
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this utility! Looks good overall.

@RissyRan RissyRan force-pushed the checkpoint_test branch 4 times, most recently from 20f43c9 to ad787ee Compare December 30, 2025 23:54
@RissyRan RissyRan assigned hengtaoguo and unassigned hengtaoguo and shuningjin Dec 31, 2025
Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I also wonder if this is equivalent to the forward logits check, which also requires all layers identical to have matching results. It's great to have a double insurance though.

@RissyRan
Copy link
Collaborator Author

Thanks! I also wonder if this is equivalent to the forward logits check, which also requires all layers identical to have matching results. It's great to have a double insurance though.

Thanks Hengtao! No, it's not the same. Forward logit test is testing orbax checkpoint loading into maxtext against with HF. However this test is used specifically for safetensor checkpoint (converted from to_huggingface) against with reference HF checkpoint.

@copybara-service copybara-service bot merged commit bca71b4 into main Dec 31, 2025
28 of 29 checks passed
@copybara-service copybara-service bot deleted the checkpoint_test branch December 31, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants