Skip to content

Major update#48

Open
lintangsutawika wants to merge 34 commits intomainfrom
major-update
Open

Major update#48
lintangsutawika wants to merge 34 commits intomainfrom
major-update

Conversation

@lintangsutawika
Copy link
Collaborator

  1. More rewards (include cosine rewards)
  2. Handle both step-wise and non-step-wise training
  3. Add non-thinking
  4. maintain max length but train on shorter (adjustable) train length. This is so that rollouts length and training can be decoupled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants