The weight_decay parameter is used instead of the lr one during optimizer initialization. Thanks for the very helpful project.
The weight_decay parameter is used instead of the lr one during optimizer initialization.
Thanks for the very helpful project.