Skip to content

Switching from SGD to AdamW: Significant mIoU boost but concerns about Deformable Offsets stability #270

@lxll-0904

Description

@lxll-0904

Hello @HuguesTHOMAS ,

First of all, thank you for this excellent open-source work. It has been a solid baseline for my research.

I have a question regarding the choice of optimizer and its impact on the Deformable KPConv module.

Observation: I noticed that the original implementation uses SGD with momentum, and specifically applies a lower learning rate to the deformable offsets (controlled by config.deform_lr_factor).

Experiment: I recently attempted to train the model on my custom dataset using the AdamW optimizer instead of SGD. In my implementation, I applied the same learning rate to all parameters (effectively removing the specific deform_lr_factor constraint).

Result: To my surprise, the model converged much faster, and the final mIoU increased by approximately 10% (absolute points) compared to the baseline SGD training on the same dataset.

Questions:

Is this significant performance gap expected? It seems SGD might be harder to tune for certain datasets, whereas AdamW adapts better.

The main concern: Does switching to AdamW (and removing the explicit low LR constraint for offsets) pose any theoretical risks to the Deformable KPConv mechanism?

For example, could this cause the kernel points to "move too fast" or become unstable (losing their geometric physical meaning), even if the validation mIoU looks good?

I would appreciate any insights on whether this modification is considered safe or if there are potential downsides I should be aware of.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions