Fix literal promotion#4826
Conversation
CharlieL7
left a comment
There was a problem hiding this comment.
This could affect accuracy. Another way to handle this that could preserve accuracy would be to promote the parameter to the wider format and then convert down the output. It's not clear which way would be more reasonable. Feels like a bug in the input model.
| if(not args[5]->is_undefined() and | ||
| args[5]->get_shape().type() != args[1]->get_shape().type()) | ||
| { | ||
| args[5] = info.add_instruction( | ||
| make_op("convert", {{"target_type", args[1]->get_shape().type()}}), args[5]); | ||
| } |
There was a problem hiding this comment.
Why is there a specific change for GRU?
There was a problem hiding this comment.
So, for this specific model just literal change only helped with migraphx driver read command / compile was still failing with types do not match. The fix is in parse_gru.cpp because the type mismatch is introduced by the rewrite_rnn pass, which decomposes the GRU op into dot operations using m.insert_instruction(make_op("dot"), ...) directl; bypassing add_common_op and its type reconciliation. I agree this is odd behaviour; i will send you model via teams; for you to have better understanding of it.
| else | ||
| { | ||
| auto common = common_shape(to_shapes(inputs)); | ||
| if(options.common_type) |
There was a problem hiding this comment.
Add a comment explaining the reasoning for this added code block
…GraphX into fix_literal_promotion
Agree. But it should only affect large scalar values for accuracy; all other cases should be fine? For the model bug i also agree, it's little bit odd, specialy since dml also have prolem with the model. But onnx format say it's fine. I can share model with you via teams. There are few models failing with similar errors (types mismatch) |
| if(options.common_type) | ||
| { | ||
| auto c_type = compute_common_types(input_shapes); | ||
| std::transform(inputs.begin(), inputs.end(), inputs.begin(), [&](auto input) { | ||
| if(input->get_shape().type() != c_type) | ||
| { | ||
| input = m.insert_instruction( | ||
| ins, make_op("convert", {{"target_type", c_type}}), input); | ||
| } | ||
| return input; |
There was a problem hiding this comment.
Dynamic branch should have the same behavior when handling common_type
I'm also curious about these models because the GRU spec for onnx doesnt seem to allow mixed precision like this (ie. I dont see how this come from a valid onnx model). Might be worth investigating the source of these mixed inputs before applying a fix like this. |
| if(tensor_type != common.type()) | ||
| common = shape{tensor_type, common.lens()}; | ||
| } | ||
| } |
There was a problem hiding this comment.
This logic should not be used. The common type should match the logic used in C++. There shouldn't be exceptions for literals, which makes it more difficult to reason about the logic.
If the model is mixing an fp32 with fp16 and it should be fp16 according the onnx spec then common type promotions shouldn't be used there.
Motivation
When parsing ONNX models with half-precision (float16) weights, scalar float32 constants (e.g. epsilon in LayerNorm) were causing unintended type promotion of entire computation paths from float16 to float32. This resulted in a type mismatch error when the promoted float32 tensor reached a downstream operation (e.g. convolution or dot) whose weights remained float16. Result in this error:
same_type: convolution: Types do not match
Technical Details
Before applying the common type, inputs that are scalar literals (single-element and evaluable) are excluded from the type decision. If the remaining tensor inputs share a type, that type takes precedence over the scalar's type, and the scalar is cast down to match.
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable