Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition by Ashutosh0x · Pull Request #2726 · apple/coremltools

Ashutosh0x · 2026-05-29T00:00:14Z

Problem

The native reduce_log_sum_exp MIL op computes log(sum(exp(x))), where exp(x) overflows in fp16 when x > log(65504/C) on Apple Neural Engine. For a typical C=32 channel reduction, this means the output collapses to 0 at x ≈ 7.63 — well below where the approximation logsumexp(x) ≈ x + log(C) would kick in. CPU and GPU compute units are unaffected.

Same class of bug as the softplus fp16 cliff in #2687 (fixed in #2725), but a different kernel and a different overflow threshold.

Solution

Replace the native reduce_log_sum_exp op with the numerically stable max-shift decomposition:

\
logsumexp(x) = max(x) + log(Σ exp(x - max(x)))
\\

By subtracting max(x) first, all exp() arguments are <= 0, so exp() values are in (0, 1] — no overflow can occur in any precision. This formula is already used by coremltools' own reduce_log_sum_exp MIL op value_inference.

Changes

ops.py: Intercept the logsumexp case in the unified reduction converter. Instead of emitting mb.reduce_log_sum_exp(), decompose into reduce_max → sub → exp → reduce_sum → log → add. Handles both keep_dims=True and keep_dims=False cases correctly.
test_torch_ops.py: Added test_logsumexp_fp16_overflow regression test with C=32 channels and input value 8.0 > 7.63 (the critical overflow point).

Testing

All existing test_logsumexp parametrized test cases remain (shapes, dims, frontends, backends)
New test_logsumexp_fp16_overflow specifically validates correctness at the ANE fp16 overflow point

Same pattern as Fix softplus and mish fp16 overflow on ANE via stable decomposition #2725 (softplus fp16 stable decomposition)
Same reporter (@ChinChangYang) filed both Softplus on Apple Neural Engine has a hard fp16 discontinuity at x ≈ 10.4 (output drops to 0) #2687 and Channel-reduce logsumexp on Apple Neural Engine has a hard fp16 overflow at x ≈ 7.63 (output drops to 0) #2690

Fixes #2690

…apple#2690) The native reduce_log_sum_exp MIL op computes log(sum(exp(x))), where exp(x) overflows in fp16 when x > log(65504/C) (approx 7.63 for C=32 channels) on Apple Neural Engine, causing a hard output collapse to 0. Replace with the numerically stable decomposition: logsumexp(x) = max(x) + log(sum(exp(x - max(x)))). By subtracting max first, all exp() arguments are <= 0, so exp() values are in (0, 1] and no overflow can occur. This matches the value_inference formula already used in coremltools' own reduce_log_sum_exp MIL op definition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition#2726

Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition#2726
Ashutosh0x wants to merge 1 commit into
apple:mainfrom
Ashutosh0x:fix/logsumexp-fp16-stable-decomposition-2690

Ashutosh0x commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ashutosh0x commented May 29, 2026

Problem

Solution

Changes

Testing

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant