Commit Graph

5 Commits

Author SHA1 Message Date
Cadene
fdfb2010fd black 2024-02-18 01:24:19 +00:00
Cadene
a5c305a7a4 offline training + online finetuning converge to 33 reward! 2024-02-18 01:23:44 +00:00
Cadene
c202c2b3c2 Online finetuning runs (sometimes crash because of nans) 2024-02-16 15:13:24 +00:00
Cadene
228c045674 Eval reproduced! Train running (but not reproduced) 2024-02-10 15:46:24 +00:00
Cadene
5a5b190f70 Add common, refactor eval with eval_policy 2024-01-31 13:48:12 +00:00