Search-R1/experiment_log.md at f8ee208db139e430b78cb8b8bf1e37177418a8e0

tangger/Search-R1

Fork 0

Files

PeterGriffinJin f8ee208db1 update readme and add exp logs

2025-04-10 18:45:35 +00:00

1.4 KiB

Raw Blame History

Experiment log

Preliminary results

Resources: wandb

The preliminary experiment is conducted only on natural question (NQ) dataset (+ PPO) with a small number of training steps.

v0.1

Resources: wandb, docs, scripts

We extend the experiments from NQ to seven datasets with both PPO and GRPO methods. The studies are still on a small number of training steps with a big learning rate warm up ratio.

v0.2

Resources: wandb, docs, scripts

We fix several bugs including retrieved token masking and GRPO sample indexing. The former can largely improve the stablity of RL training. Then we adjust the training scripts, increasing the number of training steps and decreasing the learning rate warm up ratio, to obtain a better performance, and conduct experiments on different scale of LLMs (3B, 7B, 14B).

v0.3

Ongoing, stay tuned!

1.4 KiB Raw Blame History