424 B
424 B
Reproduce the paper results
Download the dataset
huggingface-cli download --repo-type dataset PeterJinGo/nq_hotpotqa_train --local-dir $WORK_DIR/data/hotpot_qa
Run PPO training
bash train_ppo.sh
Run GRPO training
bash train_ppo.sh
Run evaluation
bash evaluate.sh
You can change $BASE_MODEL to the path of the model you would like to evaluate.