Fix log_softmax fp16 underflow and logcumsumexp fp16 overflow on ANE via stable decomposition#2727
Open
Ashutosh0x wants to merge 1 commit into
Open
Fix log_softmax fp16 underflow and logcumsumexp fp16 overflow on ANE via stable decomposition#2727Ashutosh0x wants to merge 1 commit into
Ashutosh0x wants to merge 1 commit into
Conversation
…via stable decomposition log_softmax: The naive log(softmax(x)) produces -inf for non-dominant classes in fp16 because softmax outputs underflow to 0, then log(0) = -inf. The stable form x - max(x) - log(sum(exp(x - max(x)))) avoids computing tiny intermediate probabilities directly. logcumsumexp: The naive log(cumsum(exp(x))) overflows in fp16 for x > ~11.09 since exp(11.09) exceeds fp16 max (65,504). The stable form shifts by the global maximum first so all exp() arguments are <= 0, keeping values in (0,1]. Both fixes follow the same max-shift pattern used in the logsumexp stable decomposition (PR apple#2726) and the softplus stable decomposition (PR apple#2725). Added regression tests with extreme fp16 inputs for both ops.
This was referenced May 29, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix two fp16 numerical stability bugs in the PyTorch frontend converter that cause silent output corruption on Apple Neural Engine:
log_softmax: Produces-inffor non-dominant classes when softmax probabilities underflow to 0 in fp16logcumsumexp: Overflows toinffor inputs > ~11.09 becauseexp()is applied to raw input without stabilizationBoth fixes use the standard max-shift decomposition -- the same pattern applied in PR #2725 (softplus/mish) and PR #2726 (logsumexp).
Changes
converters/mil/frontend/torch/ops.pylog_softmax(line 5904): Replace naivelog(softmax(x))with:log_softmax(x) = x - max(x) - log(sum(exp(x - max(x))))This avoids computing tiny intermediate softmax probabilities that underflow to 0 in fp16.
logcumsumexp(line 2230): Replace naivelog(cumsum(exp(x)))with:logcumsumexp(x) = max(x) + log(cumsum(exp(x - max(x))))By subtracting the global max first, all
exp()arguments are <= 0, keeping values in (0, 1].Note on global max for logcumsumexp: The global max is used rather than a running (cumulative) max because MIL does not provide a
cummaxop. The global max is always >= the running max at every position, soexp(x_i - global_max) <= 1for all i, guaranteeing no overflow. The trade-off is slightly more underflow for early positions when a much larger value appears later, but this does not affect correctness -- those contributions are genuinely negligible. A future optimization could introduce a running max if acummaxMIL op becomes available.converters/mil/frontend/torch/test/test_torch_ops.pytest_log_softmax-- standard parametrized test across all shapes/backends/frontendstest_log_softmax_fp16_no_neg_inf-- regression test with dominant-class input (x=50)test_logcumsumexp_fp16_large_input-- regression test with fp16-overflow inputs (x up to 50)Fixes
Related
Pattern
This is part of a systematic effort to fix all
exp()-based operations in the converter that overflow in fp16 on ANE. The root cause is always the same:exp(x)without first boundingxoverflows at the fp16 maximum (65,504, reached at x ~ 11.09). The fix is always the same: subtract the maximum before computingexp(), then add it back afterlog().max(x,0) + log(1+exp(-abs(x)))max + log(sum(exp(x-max)))x - max - log(sum(exp(x-max)))max + log(cumsum(exp(x-max)))