✨ Archer

🏹️ Reinforcement Learning for Enhanced Reasoning in LLMs 🎯

Overview

The Archer series focuses on research into RL algorithms and training for medium and small-scale models, aiming to deepen the community's understanding of the fundamental principles of reinforcement learning (RL) on large language models (LLMs). All released content will be comprehensively open-sourced to advance community research development.

_{Archer significantly improves the reasoning performance upon DAPO and outperforms previous 1.5B-level SOTA reasoning models.}

Archer is an open-source initiative enhancing reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:

Training code and pipelines
Curated datasets
Trained models
Complete training logs

Current Models:

Archer-Code-1.5B - SOTA among similarly-sized models.

Evaluation

We conduct evaluation on both mathematical and coding benchmarks. Due to the high variance of the outputs from reasoning models, we report avg@K (pass@1 performance averaged over K outputs) and pass@K for each benchmark. The detailed results are shown in the table below.

Getting Started

Installation

# Installing Python 3.10 Environment.
conda create -n archer python=3.10 -y
conda activate archer

# Installing dependencies.
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install --no-cache-dir flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

cd ArcherCodeR
pip install -e .

Data Preparation

Download the training and test data from Hugging Face.

python tools/download_datasets.py

Initialize Ray Cluster

We have provided a one-click script to initialize Ray environments on any number of machines. Run the following command on the head node:

bash ./tools/start_ray.sh

Note:

Please replace your_wandb_api_key in export WANDB_API_KEY=your_wandb_api_key with your actual key.
Hostfile locations vary across operating systems (e.g., on my machine, it's located at /etc/mpi/hostfile). Locate the file on your server and modify its content accordingly.

Training

We have currently only provided the script and data to reproduce the results of the “Archer-Code-1.5B”.

bash ./scripts/train/run_archer_qwen2.5_1.5b_code.sh

Evaluation

Step 1: Convert model format

Run the following command to convert the model to Hugging Face format:

bash ./tools/model_merge.sh

Step 2: Run evaluation

Execute the script below to evaluate model performance on the LiveCodeBench v5 benchmark:

bash ./scripts/eval/run_eval.sh

Note: Please update the path parameters in the scripts above as needed.

Technical Report

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Acknowledgements

We build our model upon DeepSeek-R1-Distill-Qwen-1.5B.
Training was carried out with a modified version of verl.

Citation

Please cite the following:

@article{wang2025stabilizing,
  title={Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR},
  author={Wang, Jiakang and Liu, Runze and Zhang, Fuzheng and Li, Xiu and Zhou, Guorui},
  journal={arXiv preprint arXiv:2507.15778},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
dapo		dapo
rewards		rewards
scripts		scripts
tools		tools
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ Archer

Overview

Evaluation

Getting Started

Installation

Data Preparation

Initialize Ray Cluster

Training

Evaluation

Step 1: Convert model format

Step 2: Run evaluation

Technical Report

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

wizard-III/ArcherCodeR

Folders and files

Latest commit

History

Repository files navigation

✨ Archer

Overview

Evaluation

Getting Started

Installation

Data Preparation

Initialize Ray Cluster

Training

Evaluation

Step 1: Convert model format

Step 2: Run evaluation

Technical Report

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages