Fix softplus and mish fp16 overflow on ANE via stable decomposition#2725
Open
Ashutosh0x wants to merge 1 commit into
Open
Fix softplus and mish fp16 overflow on ANE via stable decomposition#2725Ashutosh0x wants to merge 1 commit into
Ashutosh0x wants to merge 1 commit into
Conversation
…pple#2687) The native softplus MIL op computes log(1 + exp(x)), where exp(x) overflows in fp16 for x > ~10.4 on Apple Neural Engine, causing a hard output collapse to 0. This also affects nn.Mish (x * tanh(softplus(x))). Replace the native softplus op with the numerically stable equivalent: softplus(x) = max(x, 0) + log(1 + exp(-|x|)). Since -|x| <= 0, exp(-|x|) is always in (0,1], so no overflow can occur in any precision. This matches the value_inference formula already used in coremltools' own softplus MIL op definition. Also apply PyTorch's threshold parameter (default 20) which was previously ignored: for beta*x > threshold, return x directly. Changes: - Decompose softplus to stable form in PyTorch converter (ops.py) - Apply same fix to mish converter which calls softplus internally - Add test_softplus_fp16_threshold regression test with large inputs - Update test_softplus to account for new graph structure
This was referenced May 28, 2026
Open
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The native
softplusMIL op computeslog(1 + exp(x)), whereexp(x)overflows in fp16 for x > ~10.4 on Apple Neural Engine, causing a hard, single-step output collapse to 0. This also affectsnn.Mish(x * tanh(softplus(x))). CPU and GPU compute units are unaffected.Additionally, PyTorch's
thresholdparameter (default 20) was being ignored by the converter.Discovered while debugging fp16 precision in a KataGo-style network's Mish activations (see #2687).
Solution
Replace the native softplus op with the numerically stable equivalent:
softplus(x) = max(x, 0) + log(1 + exp(-|x|))Since
-|x| <= 0,exp(-|x|)is always in(0, 1], so no overflow can occur in any precision. This formula is already used by coremltools' own softplus MIL op value_inference.Also apply PyTorch's threshold parameter: for
beta * x > threshold, returnxdirectly, matching PyTorch's exact semantics.Changes
mb.softplus()with the stable decomposition (abs->mul(-1)->exp->add(1)->log+maximum(x, 0)->add). Applied same approach for all beta values (unit and non-unit). Added threshold viamb.select.test_softplusto account for the new graph structure. Addedtest_softplus_fp16_thresholdregression test with input values spanning the critical fp16 range:[-5, 0, 5, 10, 10.4, 11, 15, 20, 25, 50].Testing
test_softplusparametrized test cases remain (shapes, ranks, beta/threshold combinations, deployment targets)test_softplus_fp16_thresholdspecifically validates correctness at and beyond the ANE fp16 overflow pointFixes #2687
Fixes #2359