Fix reciprocal of infinity in turbo loops#568
Open
AshtonSBradley wants to merge 1 commit intoJuliaSIMD:mainfrom
Open
Fix reciprocal of infinity in turbo loops#568AshtonSBradley wants to merge 1 commit intoJuliaSIMD:mainfrom
AshtonSBradley wants to merge 1 commit intoJuliaSIMD:mainfrom
Conversation
559d78b to
7faaa7c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #539 by lowering the literal reciprocal pattern
1 / xto the existinginvcompute path instead of the fast division path.The reported failure happens on AVX-512 when
@turbocompiles1 / Infthroughdiv_fast. That can enter VectorizationBase's pipelined fast division strategy, which mixes division with reciprocal approximation and Newton refinement; the refinement path can turn the mathematically expected zero intoNaNfor infinite denominators.This keeps the change intentionally narrow:
1special case is rerouted@turbo y[i] = 1 / x[i]withx .= InfType stability and allocations
For the regression case, the replacement is still type-stable:
inv(::Float64)returnsFloat64, matching the result type of1 / x[i]. The change happens while constructing the LoopSet, so it should not introduce runtime allocations in the generated loop body. It also reuses the existinginvlowering path rather than adding a new runtime wrapper.Validation
include("test/testsetup.jl"); include("test/simplemisc.jl")passes from a temporary test environment based ontest/Project.tomlAqua.test_all(LoopVectorization, ambiguities=false, piracies=false)passes from the same temporary test environment@turbo y[i] = 1 / x[i]withx .= Infpasses:invand no longer contains:div_fastgit diff --checkpassesI also ran
LOOPVECTORIZATION_TEST=part2 julia --project=/private/tmp/lv_test_env test/runtests.jl. It runs withAquaavailable and the newissue 539regression passes, but the shard exits nonzero due to existingifelsemasks.jlunexpected-pass@test_brokenentries unrelated to this change.