[SM6.10][HLK] Restructure LinAlg Tests#8323
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
alsepkow
left a comment
There was a problem hiding this comment.
LGTM. A few minor comments.
tex3d
left a comment
There was a problem hiding this comment.
There are a lot of changes in this PR.
Some are (mostly) code moves, which I had to do an external comparison on to see changes made during the move (there were some). I wish pure moves and formatting changes could have been made as a separate NFC PR, or at least as a separate commit.
There are also new classes: HLSLNorm_t (and variants), F8E4M3_t, and F8E5M2_t, which are not used anywhere except in isFloatingPointType (which is changed to include these).
I think the new classes should be kept out of this PR, since they are not needed/used by it. Doing so would save some required extra careful review, since these can be tricky.
I reviewed moved code by copy-pasting it into the original locations on a local clone and looking at the diff. This reveals several types of changes:
- Fixed spelling in comment:
constuctor->constructor. inlinerequired for moving to header- Parameter and member name changes:
ValidationType->VType- Only for some overloads, why?
ValidationConfig->Validation- If you wanted to avoid the type name collision, why not go with
Config, which seems more intuitive?
- If you wanted to avoid the type name collision, why not go with
- Deleted comments:
// Tolerance is in ULPs. Convert to(int,int64_t)for the comparison. - Deleted extraneous
; - Modified
HLSLHalf_tconstructor (see comment)
I had some more comments too.
| return static_cast<bool>(Val) != static_cast<bool>(Other.Val); | ||
| } | ||
|
|
||
| bool operator<(const HLSLBool_t &Other) const { return Val < Other.Val; } |
There was a problem hiding this comment.
Whenever we use Val in these operators, shouldn't we cast to bool, since we only ever want to interpret the value as either true or false? Also, where are these comparison operators used?
I realize now that this code has simply been moved. Is there a way to separate moved code changes from other changes?
There was a problem hiding this comment.
Moved code yeah. imo the comment should be addressed separately
There was a problem hiding this comment.
I don't think the LT, GT, LTE, or GTE are used with HLSLBool_t anymore. Can probably remove them. But I can do that in a subsequent PR.
| WEX::Logging::Log::Result(WEX::Logging::TestResults::Skipped); | ||
| compileShader(DxcSupport, LoadStoreShader, Target.c_str(), Args, Verbose); | ||
|
|
||
| if (shouldSkipBecauseSM610Unsupported(Device)) |
There was a problem hiding this comment.
Shouldn't we always end the test here, since if we don't have a device in HLK mode, we still can't proceed, and we must have emitted an error earlier when trying to create the device, right?
There was a problem hiding this comment.
I think its helpful to talk about when we intend to skip. (Specifically, skip, not end)
Right now there is a useful flow through the test that is basically
- try to get a SM6.10 device
- fail because non exist yet
- keep going anyway and use dxc to compile the SM6.10 shader
- if it compiles exit with skip (setup was okay, but we can't run yet)
- if it doesn't, exit with fail (can't compile the SM6.10 shader, that's a bug that shouldn't pass CI)
This function is in service of that flow. If we have a device or if we are in HLK mode we should always just continue the test and let something else be the cause of failure
There was a problem hiding this comment.
Generally, it's odd to proceed with a null Device with any other operations. That's a setup failure, and if we're on HLK, we should abort at that point, otherwise, we should skip at that point. I understand the desire to still try a shader compile at this time because you don't have a device that supports 6.10 at all, and that's fine for now, but this is a test-development-time convenience, and it doesn't justify executing test code beyond this point.
if we are in HLK mode we should always just continue the test and let something else be the cause of failure
I don't think so. I think we should abort the test with a failure if we failed to create the device (which would have logged an error). You return at this point when not HLK because the usefulness of proceeding has ended. The same goes for HLK, but usefulness of proceeding ended even earlier (there's no point in going through with anything else in the HLK if device creation failed).
Any other failure that happens down the line as a result of a null Device will just add noise to the failing test result that needs to be ignored (the real failure was the device creation failure).
I think you want to return from this point if (!Device) whether or not you're in HLK mode. The difference is whether you log a comment and call this a skip or log an error and abort.
|
|
||
| if (EmulateTest) { | ||
| hlsl_test::LogWarningFmt(L"EmulateTest flag set. Tests are NOT REAL"); | ||
| return D3D12SDK->createDevice(&D3DDevice, D3D_SHADER_MODEL_6_8, false); |
There was a problem hiding this comment.
We shouldn't create the device twice when EmulateTest is specified. This will break if the first device create succeeded (device reports SM 6.10 support).
There was a problem hiding this comment.
My thoughts on this were that the EmulateTest flag takes precedent. Even if we successfully create the SM6.10 device the downstream tests are still going to run the emulated shaders. If we don't take this branch then it'll look like the tests pass when they didn't actually run at all. Maybe the fix is to put this branch above the one that attempts to create the SM6.10 device.
Thoughts?
There was a problem hiding this comment.
Yes, you should put it above the other device creation. Otherwise, the non-null D3DDevice CComPtr will cause the second call to fail because the pointer is already initialized.
I would normally recommend you move this into setupClass and call this createDevice() method instead of embedding the logic inside this createDevice() method, but I see that it's really a convenient way for you to keep going when you have no valid device so you can perform compilation before skipping/aborting, and to use the same emulation path when recreating a device in setupMethod. So perhaps it's ok to just move it earlier.
It's unexpected that your createDevice always returns true, even when it fails, unless it's in HLK mode. That's counterintuitive. I get that you return this from setupClass, which means it would fail on device creation failure, but that specific decision probably belongs in setupClass, not inside createDevice. And by extension, in setupMethod when recreating the device. That way it's more obvious you've chosen to proceed with a device creation failure in this code.
Your createDevice seems to invert the SkipUnsupported parameter when calling D3D12SDK->createDevice. If this flag is true, D3D12SDK->createDevice will register a skip. You do not want that in HLK mode, yet you initialize it to FailIfRequirementsNotMet which is true in HLK mode. If you'd rather control test skips separately, you should just pass false here instead.
Also, regarding FailIfRequirementsNotMet, when in HLK mode, your log message says it's failing because this FailIfRequirementsNotMet is set. But this is a minute internal code detail of this particular method, not visible to the testing interface. I think it should suffice to say that it is a device creation failure for the required shader model support.
Also, in setupMethod, you assume/log that the device was lost if you have a null D3DDevice, or it's been removed. But you also explicitly proceed if device creation fails, resulting in a null D3DDevice, just so you can try compilation before aborting. So you're going to get a bunch of these misleading "device lost" messages, unless you update the method to account for this condition.
The vast majority of the code moves were triggered at the 11th hour of a pending deadline to satisfy your requirement of the use of F16 for the smoke test. There wasn't time to split this up into multiple PRs and get the necessary round of reviews and merge by deadlines. Multiple commits may have been possible, but I had a bunch of other pending changes before the last minute request came through. I'll break this up into several PRs now and adjust expectations on when things will land |
I'm not asking you to break it up now. But I do have some feedback about removing unused additions and a few other changes. |
5c10f18 to
70fefce
Compare
|
It turns out it was easiest start from "scratch" on the code moves and just move the exact minimal set of code vs trying to go back and handle each case. You can now review by commit where 3eb0d28 does the moves and 70fefce does the rest. I'm going to pull in a couple quick fixes (align => 128, fix a messy switch I wasn't happy with) but all the changes in the moved code has been reverted. If there are any of those changes you'd like to see kept, please not them and I'll get them included in a separate PR |
| bool Verbose, HLSLHalf_t Tolerance = 0.0f) { | ||
| bool Success = true; | ||
| for (size_t I = 0; I < Count; I++) { | ||
| HLSLHalf_t Diff = Actual[I] - Expected[I]; |
There was a problem hiding this comment.
Why aren't these using comparison methods from HlslTestUtils.h?
| WEX::Logging::Log::Result(WEX::Logging::TestResults::Skipped); | ||
| compileShader(DxcSupport, LoadStoreShader, Target.c_str(), Args, Verbose); | ||
|
|
||
| if (shouldSkipBecauseSM610Unsupported(Device)) |
There was a problem hiding this comment.
Generally, it's odd to proceed with a null Device with any other operations. That's a setup failure, and if we're on HLK, we should abort at that point, otherwise, we should skip at that point. I understand the desire to still try a shader compile at this time because you don't have a device that supports 6.10 at all, and that's fine for now, but this is a test-development-time convenience, and it doesn't justify executing test code beyond this point.
if we are in HLK mode we should always just continue the test and let something else be the cause of failure
I don't think so. I think we should abort the test with a failure if we failed to create the device (which would have logged an error). You return at this point when not HLK because the usefulness of proceeding has ended. The same goes for HLK, but usefulness of proceeding ended even earlier (there's no point in going through with anything else in the HLK if device creation failed).
Any other failure that happens down the line as a result of a null Device will just add noise to the failing test result that needs to be ignored (the real failure was the device creation failure).
I think you want to return from this point if (!Device) whether or not you're in HLK mode. The difference is whether you log a comment and call this a skip or log an error and abort.
|
|
||
| if (EmulateTest) { | ||
| hlsl_test::LogWarningFmt(L"EmulateTest flag set. Tests are NOT REAL"); | ||
| return D3D12SDK->createDevice(&D3DDevice, D3D_SHADER_MODEL_6_8, false); |
There was a problem hiding this comment.
Yes, you should put it above the other device creation. Otherwise, the non-null D3DDevice CComPtr will cause the second call to fail because the pointer is already initialized.
I would normally recommend you move this into setupClass and call this createDevice() method instead of embedding the logic inside this createDevice() method, but I see that it's really a convenient way for you to keep going when you have no valid device so you can perform compilation before skipping/aborting, and to use the same emulation path when recreating a device in setupMethod. So perhaps it's ok to just move it earlier.
It's unexpected that your createDevice always returns true, even when it fails, unless it's in HLK mode. That's counterintuitive. I get that you return this from setupClass, which means it would fail on device creation failure, but that specific decision probably belongs in setupClass, not inside createDevice. And by extension, in setupMethod when recreating the device. That way it's more obvious you've chosen to proceed with a device creation failure in this code.
Your createDevice seems to invert the SkipUnsupported parameter when calling D3D12SDK->createDevice. If this flag is true, D3D12SDK->createDevice will register a skip. You do not want that in HLK mode, yet you initialize it to FailIfRequirementsNotMet which is true in HLK mode. If you'd rather control test skips separately, you should just pass false here instead.
Also, regarding FailIfRequirementsNotMet, when in HLK mode, your log message says it's failing because this FailIfRequirementsNotMet is set. But this is a minute internal code detail of this particular method, not visible to the testing interface. I think it should suffice to say that it is a device creation failure for the required shader model support.
Also, in setupMethod, you assume/log that the device was lost if you have a null D3DDevice, or it's been removed. But you also explicitly proceed if device creation fails, resulting in a null D3DDevice, just so you can try compilation before aborting. So you're going to get a bunch of these misleading "device lost" messages, unless you update the method to account for this condition.
tex3d
left a comment
There was a problem hiding this comment.
Approving assuming some follow-up adjustments are coming to address some concerns in a subsequent PR, since none should be blocking for the merge now.
|
Thanks Tex! Here is the tracking issue! I'll be working on this feedback first thing tomorrow #8332 |
There are several infra improvements in this PR
std::variantto not duplicate every variable for every type in the testCo-authored-by: Alex Sepkowski alexsepkowski@gmail.com