Skip to content

ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.

License

Notifications You must be signed in to change notification settings

wizard-III/ArcherCodeR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✨ Archer

🏹️ Reinforcement Learning for Enhanced Reasoning in LLMs 🎯

Github Model Data Wandb 知乎

Overview

The Archer series focuses on research into RL algorithms and training for medium and small-scale models, aiming to deepen the community's understanding of the fundamental principles of reinforcement learning (RL) on large language models (LLMs). All released content will be comprehensively open-sourced to advance community research development.

Archer significantly improves the reasoning performance upon DAPO and outperforms previous 1.5B-level SOTA reasoning models.

Archer is an open-source initiative enhancing reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:

  • Training code and pipelines
  • Curated datasets
  • Trained models
  • Complete training logs

Current Models:

Evaluation

We conduct evaluation on both mathematical and coding benchmarks. Due to the high variance of the outputs from reasoning models, we report avg@K (pass@1 performance averaged over K outputs) and pass@K for each benchmark. The detailed results are shown in the table below.

Getting Started

Installation

# Installing Python 3.10 Environment.
conda create -n archer python=3.10 -y
conda activate archer

# Installing dependencies.
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install --no-cache-dir flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

cd ArcherCodeR
pip install -e .

Data Preparation

Download the training and test data from Hugging Face.

python tools/download_datasets.py

Initialize Ray Cluster

We have provided a one-click script to initialize Ray environments on any number of machines. Run the following command on the head node:

bash ./tools/start_ray.sh

Note:

  • Please replace your_wandb_api_key in export WANDB_API_KEY=your_wandb_api_key with your actual key.
  • Hostfile locations vary across operating systems (e.g., on my machine, it's located at /etc/mpi/hostfile). Locate the file on your server and modify its content accordingly.

Training

We have currently only provided the script and data to reproduce the results of the “Archer-Code-1.5B”.

bash ./scripts/train/run_archer_qwen2.5_1.5b_code.sh

Evaluation

Step 1: Convert model format

Run the following command to convert the model to Hugging Face format:

bash ./tools/model_merge.sh

Step 2: Run evaluation

Execute the script below to evaluate model performance on the LiveCodeBench v5 benchmark:

bash ./scripts/eval/run_eval.sh

Note: Please update the path parameters in the scripts above as needed.

Technical Report

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Acknowledgements

Citation

Please cite the following:

@article{wang2025stabilizing,
  title={Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR},
  author={Wang, Jiakang and Liu, Runze and Zhang, Fuzheng and Li, Xiu and Zhou, Guorui},
  journal={arXiv preprint arXiv:2507.15778},
  year={2025}
}

About

ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •