test script for deepseek #2902

shuningjin · 2025-12-30T16:10:04Z

Description

Main change
onboard test for deepseek2-16b, plan to run full workflow in XLML on v5p-8

end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh: ckpt conversion, logit check, train, finetune, decode

onboard test for deepseek3-671b, plan to run step 2 in XLML on v5p-128

end_to_end/tpu/deepseek/v3-671b/1_test_deepseek.sh
end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh: logit check (scan), decode (unscan). Note we leave out training and finetuning, as is covered by ubench nightly test.

XLML PR: GoogleCloudPlatform/ml-auto-solutions#1112

FIXES: b/423057893

Other change

Minor format change to other test scripts.
max_logging.log behavior is changed due to PR#2873, need to set absl.logging.level to make logs visible for multiple scripts

Tests

preparation

The checkpoints uploaded
- gs://maxtext-deepseek/deepseek2-16b (hf-bf16)
- gs://maxtext-deepseek/deepseek3-671b (hf-bf16, scanned, unscanned)
golden logits uploaded
- gs://maxtext-test-assets/golden_data_deepseek2-16b.jsonl
- gs://maxtext-test-assets/golden_data_deepseek3-671b.jsonl

bash end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh

xpk on v5p-128: https://paste.googleplex.com/5746766743994368
log: https://cloudlogging.app.goo.gl/38jsiXx6y3miv7TL7

bash end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh

xpk on v5p-8: https://paste.googleplex.com/6732499990478848
log, 30min: https://cloudlogging.app.goo.gl/J7SA4eLjMvjNyNGm8

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2025-12-30T16:22:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2025-12-30T18:45:02Z

🤖 Hi @shuningjin, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

📋 Review Summary

This Pull Request introduces new test scripts for DeepSeek models (v2-16b and v3-671b) and refactors existing test scripts for GPT-OSS, Llama4, and Mixtral. The changes aim to onboard DeepSeek models, improve consistency in script practices, and adjust logging verbosity.

🔍 General Feedback

The refactoring of test scripts into modular 1_test_deepseek.sh and 2_test_deepseek.sh for DeepSeek v3 is a positive step for organization and clarity.
Consistent application of absl.logging.set_verbosity(absl.logging.INFO) across relevant Python files is good for debugging and monitoring.
The general updates to how BASE_OUTPUT_PATH is handled in the shell scripts improve robustness.
Consider reviewing the commented-out training and fine-tuning steps in the DeepSeek test scripts to ensure they are intentionally disabled and to add explanatory comments if needed.
Ensure all golden logit paths are robust and not dependent on transient local paths or specific dates.

end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh

end_to_end/tpu/llama4/1_test_llama4.sh

end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh

end_to_end/tpu/llama4/Run_Llama4.md

RissyRan

Great work! Also thanks for keeping them up-to-date! A few minor comments

end_to_end/tpu/deepseek/Run_DeepSeek.md

RissyRan · 2025-12-30T18:49:21Z

end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh

+# Run fine-tuning - megablox implementation
+python3 -m MaxText.train "${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText}/"configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_path=${DATASET_PATH} enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=4 checkpoint_storage_concurrent_gb=1024
+
+# Run supervised fine-tuning - megablox implementation


Let's remove it if it's comment?

Removed this unused command for sft

RissyRan · 2025-12-30T18:49:54Z

end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh

+# Run pre-training - megablox implementation
+python3 -m MaxText.train "${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText}/"configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_pre_training model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=4
+
+# Run fine-tuning - megablox implementation


Let's update megablox to be tokamax gmm. We may deprecate megablox sometime later, and most of workloads are using tokamax gmm now.

Thanks for clarifying the MoE strategies! I updated to tokamax_gmm for training and decoding for deepseek

RissyRan · 2025-12-30T18:50:39Z

end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh

+
+python3 -m tests.forward_pass_logit_checker ${MAXTEXT_PKG_DIR}/configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=${MODEL_NAME} max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=-1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --golden_logits_path=${GOLDEN_LOGITS_DISK_LOCATION} --atol=1.5 --rtol=1.5 --max_kl_div=0.1
+
+# Run pre-training - megablox implementation


similar comments above.

I assume you mean remove (rather than update with tokamax)? Removed train/finetune/sft

RissyRan · 2025-12-30T18:51:55Z

end_to_end/tpu/llama4/2_test_llama4.sh

 idx=$(date +%Y-%m-%d)

+# By default, we'll use "llama4-17b-16e"
+if [ -z "${MODEL_VARIATION}" ]; then


Do we need to set this from XL ML? or we cold direct export MODEL_VARIATION="llama4-17b-16e" without if/else

For other models, we have separate test scripts for each variant. For llama4, this serves as the united script. Do we need to account for Maverick as well?

I see. I think we should do something similar to GPT-OSS models. By passing the type instead of duplicate the scripts. No need to be in this PR.

RissyRan · 2025-12-30T18:52:56Z

end_to_end/tpu/mixtral/8x7b/2_test_mixtral.sh


 # Test whether the forward pass logits match the golden logits - matmul implementation
-python3 -m tests.forward_pass_logit_checker "${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText}/"configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} load_parameters_path=${UNSCANNED_CKPT_PATH} run_name=matmul_forward_pass_test per_device_batch_size=4 model_name=mixtral-8x7b tokenizer_path="${MAXTEXT_ASSETS_ROOT:-${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText/assets}}"/tokenizer.mistral-v1 ici_tensor_parallelism=1 ici_fsdp_parallelism=-1 max_prefill_predict_length=11 max_target_length=11 dataset_type=synthetic dtype=float32 megablox=False sparse_matmul=False scan_layers=false --atol=3 --rtol=1 --token_size=4
+python3 -m tests.forward_pass_logit_checker "${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText}/"configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} load_parameters_path=${UNSCANNED_CKPT_PATH} run_name=matmul_forward_pass_test per_device_batch_size=1 model_name=mixtral-8x7b tokenizer_path="${MAXTEXT_ASSETS_ROOT:-${MAXTEXT_PKG_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/MaxText/assets}}"/tokenizer.mistral-v1 ici_tensor_parallelism=1 ici_fsdp_parallelism=-1 max_prefill_predict_length=11 max_target_length=11 dataset_type=synthetic dtype=float32 megablox=False sparse_matmul=False scan_layers=false --token_size=4 --atol=3 --rtol=1 --max_kl_div=2e-3


Do you know if --atol=3 --rtol=1 is still needed?

Removed --atol=3 --rtol=1, instead using criteria --max_kl_div=2e-3.

RissyRan

Thanks for the updates!

RissyRan · 2025-12-31T00:58:16Z

end_to_end/tpu/llama4/2_test_llama4.sh

 idx=$(date +%Y-%m-%d)

+# By default, we'll use "llama4-17b-16e"
+if [ -z "${MODEL_VARIATION}" ]; then


I see. I think we should do something similar to GPT-OSS models. By passing the type instead of duplicate the scripts. No need to be in this PR.

init

0d6147b

test script for deepseek

4f5a1c4

shuningjin force-pushed the shuningjin-xlml-ds branch 4 times, most recently from dbed671 to 4f5a1c4 Compare December 30, 2025 17:54

shuningjin mentioned this pull request Dec 30, 2025

onboard deepseek test GoogleCloudPlatform/ml-auto-solutions#1112

Open

4 tasks

format

c7fbddb

shuningjin marked this pull request as ready for review December 30, 2025 18:09

shuningjin requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jacoguzo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners December 30, 2025 18:09

shuningjin assigned RissyRan and parambole Dec 30, 2025

shuningjin added the gemini-review label Dec 30, 2025

github-actions bot reviewed Dec 30, 2025

View reviewed changes

end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh Show resolved Hide resolved

end_to_end/tpu/llama4/1_test_llama4.sh Show resolved Hide resolved

end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh Show resolved Hide resolved

end_to_end/tpu/llama4/Run_Llama4.md Show resolved Hide resolved

RissyRan reviewed Dec 30, 2025

View reviewed changes

shuningjin added 2 commits December 30, 2025 20:26

update

fd3d1b9

remove unused command

c018d3a

RissyRan approved these changes Dec 31, 2025

View reviewed changes


		python3 -m tests.forward_pass_logit_checker ${MAXTEXT_PKG_DIR}/configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=${MODEL_NAME} max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=-1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --golden_logits_path=${GOLDEN_LOGITS_DISK_LOCATION} --atol=1.5 --rtol=1.5 --max_kl_div=0.1

		# Run pre-training - megablox implementation

test script for deepseek #2902

Are you sure you want to change the base?

test script for deepseek #2902

Uh oh!

Conversation

shuningjin commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Dec 30, 2025

Codecov Report

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

📋 Review Summary

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuningjin Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuningjin Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuningjin Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shuningjin commented Dec 30, 2025 •

edited

Loading

shuningjin Dec 30, 2025 •

edited

Loading

shuningjin Dec 30, 2025 •

edited

Loading

shuningjin Dec 30, 2025 •

edited

Loading