PeterGriffinJin
83d10313be
add action status
2025-03-19 22:19:27 +00:00
PeterGriffinJin
9ec2fa9892
fix grpo id bug
2025-03-19 18:59:19 +00:00
PeterGriffinJin
8c7f04ca45
response length include retrieval info
2025-03-19 00:36:21 +00:00
Bowen Jin
50cedb2c00
Merge pull request #21 from xiaobo-yang/yxb/fix-info-mask-bugs
...
Fix bugs related to loss mask, meta info, and response length
2025-03-18 19:33:50 -05:00
PeterGriffinJin
8501d1cdf7
add citation
2025-03-18 22:27:00 +00:00
PeterGriffinJin
4b3c09451a
fix kl loss issue
2025-03-18 20:07:47 +00:00
PeterGriffinJin
e85506f143
remove unnecessary codes
2025-03-17 16:08:33 +00:00
xiaobo-yang
32719b5119
Fix bugs related to loss mask, meta info, and response length
...
1. Construct the loss mask immediately after obtaining the observation to prevent encoding misalignment when converting back to tokens after text transformation.
2. Follow up on meta info to ensure that the test batch can apply do sample.
3. Remove the recording of info information for response length.
2025-03-14 14:25:40 +08:00
PeterGriffinJin
118c6e7361
fix reward bug
2025-03-13 19:18:56 +00:00
PeterGriffinJin
ff85cb7f1e
fix file name bug
2025-03-13 14:42:21 +00:00
PeterGriffinJin
7ffeeaba0f
modify readme
2025-03-13 14:00:41 +00:00
PeterGriffinJin
66cd336580
fix wandb link bug
2025-03-13 13:59:55 +00:00
PeterGriffinJin
fb9940972c
fix typo
2025-03-13 13:58:23 +00:00
PeterGriffinJin
584ce9deb5
add paper scripts
2025-03-13 13:57:47 +00:00
PeterGriffinJin
0ecaf6da76
add ckpt link
2025-03-12 15:33:00 +00:00
PeterGriffinJin
c4e0269cfc
add grpo script
2025-03-12 15:16:03 +00:00
PeterGriffinJin
1bd9cf1749
add citation
2025-03-04 19:42:53 +00:00
Bowen Jin
0840cd7d5e
Merge pull request #4 from eltociear/patch-1
...
docs: update README.md
2025-03-04 14:38:00 -05:00
Ikko Eltociear Ashimine
6972cb4449
docs: update README.md
...
seperately -> separately
2025-03-01 23:26:01 +09:00
PeterGriffinJin
f509c2c565
add gitignore
2025-03-01 04:31:54 +00:00
PeterGriffinJin
5fdfd52ac4
add logger
2025-03-01 04:28:07 +00:00
PeterGriffinJin
f0b4eef7d3
fix logging_utils bug
2025-03-01 04:08:28 +00:00
PeterGriffinJin
5880c6e03c
update readme
2025-02-28 20:53:31 +00:00
PeterGriffinJin
91a452c21a
add twitter thread
2025-02-28 18:41:57 +00:00
PeterGriffinJin
a8770bd014
update wandb link
2025-02-28 18:09:54 +00:00
PeterGriffinJin
068516be64
Initial commit
2025-02-28 15:16:19 +00:00