-
-
Notifications
You must be signed in to change notification settings - Fork 715
PERF: Use tuned ImageRangeRegion copy in CastImageFilter #5670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Use tuned ImageRangeRegion copy in CastImageFilter #5670
Conversation
SimonRit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just typos and a question.
| { | ||
| while (!inputIt.IsAtEndOfLine()) | ||
| const InputPixelType & inputPixel = *inputIt; | ||
| OutputPixelType outputPixel{ *outputIt }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative to this is to replace this by
OutputPixelType outputPixel;
outputPixel = *outputIt;
outside the while loop. Does it improve performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably missing some larger context behavior, but why do we need a temporary local variable for outputPixel? It seems that line 164 could be removed, and line 159 could be an alias to the current memory.
OutputPixelType & outputPixel = *outputIt;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encourage both of you to compile and try experiments with PR #5669
@SimonRit Outside the loop it is a little slower ~2x for vector pixel output. It has odd statements to force a copy.
@hjmjohnson That does not compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hjmjohnson FYI, for VectorImage, *outputIt returns a proxy to the current output pixel. Hope that clarifies why OutputPixelType & outputPixel = *outputIt won't compile for VectorImage!
Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx
Outdated
Show resolved
Hide resolved
Based on performance testing, across converting between Image of
Vector and VectorImage, this loop was the best performing. Key
features of the improved performane:
- Uses NumericTraits::GetLength over Image::GetNumberOfComponents,
the former may be constant, while the latter is virtual
- Uses const InputPixelType & inputPixel
- OutputPixelType value{ outputIt.Get() } initialized to a reference
in the output image bufffer and does not perform memory allocation
for variable length vectors.
0b6156d to
2e8c70a
Compare
| // and output images to be different dimensions | ||
| typename TInputImage::RegionType inputRegionForThread; | ||
|
|
||
| this->CallCopyOutputRegionToInputRegion(inputRegionForThread, outputRegionForThread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, do inputRegionForThread and outputRegionForThread always have exactly the same number of pixels (as returned by ImageRegion::GetNumberOfPixels()), after calling CallCopyOutputRegionToInputRegion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filter has some "radius" (frequent), then the input region is slightly larger (I believe). Each filter decides how to increase the region during propagation from output to input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at an older version, it looks like it assumes that the output region has the same size as the input region. Because it just checks inputIt.IsAtEnd() and inputIt.IsAtEndOfLine(), not outputIt.IsAtEnd() or outputIt.IsAtEndOfLine():
ITK/Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx
Lines 145 to 165 in 921c5ba
| ImageScanlineConstIterator inputIt(inputPtr, inputRegionForThread); | |
| ImageScanlineIterator outputIt(outputPtr, outputRegionForThread); | |
| while (!inputIt.IsAtEnd()) | |
| { | |
| while (!inputIt.IsAtEndOfLine()) | |
| { | |
| const InputPixelType & inputPixel = inputIt.Get(); | |
| OutputPixelType value{ outputIt.Get() }; | |
| for (unsigned int k = 0; k < componentsPerPixel; ++k) | |
| { | |
| value[k] = static_cast<typename OutputPixelType::ValueType>(inputPixel[k]); | |
| } | |
| outputIt.Set(value); | |
| ++inputIt; | |
| ++outputIt; | |
| } | |
| inputIt.NextLine(); | |
| outputIt.NextLine(); | |
| } |
That's OK to me, as long as that assumption is OK 😃 So then, in the new code, it's also OK to just check inputRange.end() and ignore outputRange.end().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast image filter does not need a larger input region, so it does not override the default CallCopyOutputRegionToInputRegion implementation, and has the same input and output regions.
| while (inputIt != inputEnd) | ||
| { | ||
| while (!inputIt.IsAtEndOfLine()) | ||
| const InputPixelType & inputPixel = *inputIt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make it a beautiful range-based for loop, if you like 😃:
for (const InputPixelType & inputPixel : inputRange)Then you may remove the variables inputIt and inputEnd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How close are we to being able to do:
for ( auto &[inputPixel, outputPixel] : zip(inputRange, outputRange))
😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try, but such a zip might be possible. Very cute 😸 However... zip(inputRange, outputRange) will check both inputRange.end() and outputRange.end(). It may not be necessary to check both, when we assume that input and output region have the same size anyway. So then, zip(inputRange, outputRange) may be unnecessary slow!
Update: The input and output region do indeed have the same number of pixels, as implied by the reply from @dzenanz at https://github.com/InsightSoftwareConsortium/ITK/pull/5670/files/2e8c70a4499a2e87c8e0f65d25a355baa82f3bb9#r2586937371 So then, it is indeed unnecessary to check both inputRange.end() and outputRange.end(). So in this case zip is not necessary. Just a range-based for loop over inputRange should be fine 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the range-based for loop in did not give the same performance is all cases in the test code. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the range-based for loop in did not give the same performance is all cases in the test code.
Interesting! Do you mean that my range-based for suggestion is in some cases significantly slower than your current PR? If so, do you have a clue why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the range-based for loop can also be written as:
for (const InputPixelType & inputPixel : ImageRegionRange(*inputPtr, inputRegionForThread))
So then the inputRange variable may be removed as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears measurable slower in the performance testing code. I don't know why.
|
Honestly I think we should still fix this:
For a regular image, |
Based on performance testing, across converting between Image of Vector and VectorImage, this loop was the best performing. Key features of the improved performane:
PR Checklist
Refer to the ITK Software Guide for
further development details if necessary.