Skip to content

Change epsilon value to 1e-8 for optimizers & adamwfp8 and adamwbf16 …#703

Closed
NBSTpeterhill wants to merge 2 commits intoostris:mainfrom
NBSTpeterhill:PAseer-patch-2-Optimizers-Adamwfp8-and-Adamwbf16-&-ep=1e-8
Closed

Change epsilon value to 1e-8 for optimizers & adamwfp8 and adamwbf16 …#703
NBSTpeterhill wants to merge 2 commits intoostris:mainfrom
NBSTpeterhill:PAseer-patch-2-Optimizers-Adamwfp8-and-Adamwbf16-&-ep=1e-8

Conversation

@NBSTpeterhill
Copy link
Copy Markdown

…optimizer addition

Updated epsilon value for various optimizers to improve numerical stability.

…optimizer addition

Updated epsilon value for various optimizers to improve numerical stability.
@jaretburkett
Copy link
Copy Markdown
Contributor

The epsilon value is set at 1e-6 instead of 1e-8 for BF16, which most modern models use during training.

BF16 only has ~3 decimal digits of precision (8 exponent bits, 7 mantissa bits), so 1e-8 effectively rounds to zero in BF16 range making 1e-6 needed for numerical stability and preventing dividing by zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants