Commit Graph

5 Commits

Author SHA1 Message Date
xiaobo-yang
32719b5119 Fix bugs related to loss mask, meta info, and response length
1. Construct the loss mask immediately after obtaining the observation to prevent encoding misalignment when converting back to tokens after text transformation.
2. Follow up on meta info to ensure that the test batch can apply do sample.
3. Remove the recording of info information for response length.
2025-03-14 14:25:40 +08:00
PeterGriffinJin
118c6e7361 fix reward bug 2025-03-13 19:18:56 +00:00
PeterGriffinJin
5fdfd52ac4 add logger 2025-03-01 04:28:07 +00:00
PeterGriffinJin
f0b4eef7d3 fix logging_utils bug 2025-03-01 04:08:28 +00:00
PeterGriffinJin
068516be64 Initial commit 2025-02-28 15:16:19 +00:00