The weight_decay parameter is used instead of the lr one during optimizer initialization. Thanks for the very helpful project.