Skip to content

Conversation

@kyleliang919
Copy link

@kyleliang919 kyleliang919 commented Jun 20, 2025

Cautious NAdamW jax

Submission Information

submission_name: "Cautious_NAdamW"
submission_folder: "submissions/external_tuning/cautious_nadamw"
authors: "Kaizhao Liang"
affiliations: "University of Texas at Austin"
version: "1.0"
ruleset: "self-tuning"
framework: "JAX"
description: "Cautious NadamW ([Liang, 2025](https://arxiv.org/abs/2411.16085))."

Evidence for the Submission's Performance

Paper:
https://huggingface.co/papers/2411.16085
Results on RL: https://x.com/KyleLiang5/status/1931344549302927444

Independent verification:
https://huggingface.co/rwightman/timm-optim-caution
https://x.com/_clashluke/status/1935961388553290108

Comments

Finger crossed

@kyleliang919 kyleliang919 requested a review from a team as a code owner June 20, 2025 22:35
@github-actions
Copy link

github-actions bot commented Jun 20, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@kyleliang919
Copy link
Author

recheck

1 similar comment
@kyleliang919
Copy link
Author

recheck

@fsschneider
Copy link
Contributor

fsschneider commented Jun 25, 2025

Hi!

Thanks for your submission. We are very interested in benchmarking Cautious optimizers.
Since we are currently focusing our efforts on strengthening the self-tuning leaderboard, would you be interested in also submitting a self-tuning version of Cautious NAdamW? We do have a self-tuning NAdamW baseline you could use as a starting point.

@kyleliang919
Copy link
Author

recheck

@kyleliang919
Copy link
Author

kyleliang919 commented Jun 25, 2025

Hi!

Thanks for your submission. We are very interested in benchmarking Cautious optimizers. Since we are currently focusing our efforts on strengthening the self-tuning leaderboard, would you be interested in also submitting a self-tuning version of Cautious NAdamW? We do have a self-tuning NAdamW baseline you could use as a starting point.

I just copied the implementation over to self-tuning.

@priyakasimbeg
Copy link
Contributor

@kyleliang919 we have just released v 0.6.0 for the algoperf benchmark which includes moving away from pmap to jit sharding for jax workloads. We temporary halted scoring new submissions so that all new submissions can be scored on >= v0.6.0.
Could you please update your submission to use jit sharding?

@kyleliang919
Copy link
Author

@priyakasimbeg looks like the baseline example is still the same as before. Is there a good example for the new approach? I am not really familiar with Jax so probably need some help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants