Skip to content

Align terminal reward with the last trainable token and add ALFWorld Evaluation#31

Merged
zhusq20 merged 1 commit into
open-tinker:mainfrom
Xuyan923r:fix/reward-mask-alignment
Mar 1, 2026
Merged

Align terminal reward with the last trainable token and add ALFWorld Evaluation#31
zhusq20 merged 1 commit into
open-tinker:mainfrom
Xuyan923r:fix/reward-mask-alignment