Skip to content

Conversation

@jayhenry
Copy link
Collaborator

@jayhenry jayhenry commented Jan 23, 2026

Summary

This PR refactors the loss subsystem to standardize how loss contexts are constructed, sequence-parallelism (SP) is handled, and batches are built.
Now Trainer/Worker only interact with LossConfig and LossContext objects.
It introduces unified builders across losses, removes legacy input-item types.

Highlights

  • Introduces BaseLossConfig.build, and BaseLossContext.build_batches. Removes *ContextInputItem, and BaseLossKwargs now include built-in to, sp_split instead.
  • CE loss:
    • Adds CELossConfig.build and CELossContext.build_batches.
  • RL loss:
    • Adds BaseRLLossConfig, BaseRLLossKwargs, BaseRLLossContext.
    • Refactors GRPO/Oreal to the new API;
    • BaseRLLossContext internalizes rollout importance-sampling handling.
  • Worker/Trainer flow:
    • Workers and trainer now only interact with LossConfig and LossContext directly
    • batching is done via LossContext.build_batches.
    • compute_actor_logprobs and compute_ref_logprobs return List[torch.Tensor] for clearer data flow.
    • SP handling is centralized within builders for consistency.

Compatibility

  • Public training CLI and typical user workflows remain unchanged; only internal wiring and APIs between workers/trainers and loss configs are updated.

Testing

  • Updated unit and integration tests across engines/models/losses to the new API.
  • Verified identical reward curves , correct SP behavior for RL
image

@HIT-cwh
Copy link
Collaborator

HIT-cwh commented Jan 26, 2026

def _train_one_step_sft(self, data_batch):

_train_one_step_sft 这里也需要对应修改

@HIT-cwh
Copy link
Collaborator

HIT-cwh commented Jan 26, 2026

是不是类似 CELossContextInputItem 这样的 class 都不需要了

@jayhenry jayhenry force-pushed the refactor_loss_ctx branch 2 times, most recently from 0fcd257 to 3f97f80 Compare January 27, 2026 14:19
@jayhenry
Copy link
Collaborator Author

是不是类似 CELossContextInputItem 这样的 class 都不需要了

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants