## Reproduce the paper results

### Download the dataset

```bash
huggingface-cli download --repo-type dataset PeterJinGo/nq_hotpotqa_train --local-dir $WORK_DIR/data/nq_hotpotqa_train
```

### Run PPO training
```bash
bash train_ppo.sh
```


### Run GRPO training
```bash
bash train_ppo.sh
```

### Run evaluation
```bash
bash evaluate.sh
```

You can change ```$BASE_MODEL``` to the path of the model you would like to evaluate.