Conversation
… it fails to serialize the model. Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/utils.py`:
- Around line 654-658: The except block that currently catches ValueError and
google.protobuf.message.EncodeError and unconditionally sets
save_as_external_data and logs "Model exceeds 2GB limit" should instead
distinguish true size-overflow from schema/initialization failures: when
catching EncodeError (and ValueError), check the model's serialized size via
model.ByteSize() against the 2GB threshold (or the existing size_threshold
constant) and only set save_as_external_data and log the size message if
ByteSize() > size_threshold; otherwise re-raise the exception so
schema/initialization errors surface to the caller; update the logger call to
remain in the size branch and keep the original exception propagated in the
non-size branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7da9dc33-34d9-4cc9-aea8-bd0bd2ce85bb
📒 Files selected for processing (1)
modelopt/onnx/utils.py
| except (ValueError, google.protobuf.message.EncodeError) as e: | ||
| logger.warning( | ||
| "Model exceeds 2GB limit, switching to external data storage. Error message: [%s]", e | ||
| ) | ||
| save_as_external_data = True |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Python protobuf, under what conditions can google.protobuf.message.EncodeError be raised by SerializeToString(), and are all such failures size-related?
💡 Result:
SerializeToString() raises google.protobuf.message.EncodeError when the message is not “initialized”, i.e. any required (proto2) field is missing, including missing required fields in nested submessages; Python’s implementation explicitly checks IsInitialized() and raises EncodeError if it fails. (beam.apache.org)
These EncodeError failures are not size-related—they’re schema/initialization-related (proto2 required). (beam.apache.org)
Size-related serialization problems (e.g., exceeding protobuf’s ~2 GiB single-message limit) typically surface as other exception types (commonly ValueError/RuntimeError depending on wrapper/library), not EncodeError. (github.com)
Citations:
- 1: https://beam.apache.org/releases/pydoc/2.0.0/_modules/google/protobuf/internal/python_message.html
- 2: https://beam.apache.org/releases/pydoc/2.0.0/_modules/google/protobuf/internal/python_message.html
- 3: [Serve] ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB ray-project/ray#32049
🏁 Script executed:
# First, let's locate and examine the file
fd -t f "utils.py" "modelopt/onnx"Repository: NVIDIA/Model-Optimizer
Length of output: 698
🏁 Script executed:
# Now let's read the context around lines 654-658
cat -n modelopt/onnx/utils.py | sed -n '640,670p'Repository: NVIDIA/Model-Optimizer
Length of output: 1593
🏁 Script executed:
# Let's see the entire save_onnx function to understand the full context
ast-grep --pattern $'def save_onnx($$$) {
$$$
}'Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check how save_onnx is called and what state the model is in
rg -B5 "save_onnx\(" modelopt/onnx/ -A2 | head -40Repository: NVIDIA/Model-Optimizer
Length of output: 2973
🏁 Script executed:
# Look for any validation of the model before save_onnx is called
rg "model\.IsInitialized\|CheckInitialized" modelopt/onnx/Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if there are any proto2 vs proto3 requirements or model validation
rg "required.*field\|proto.*version\|proto2\|proto3" modelopt/onnx/utils.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Get the full save_onnx function
cat -n modelopt/onnx/utils.py | sed -n '643,685p'Repository: NVIDIA/Model-Optimizer
Length of output: 2179
🏁 Script executed:
# Check if there's any model validation before save_onnx is called
rg -B10 "save_onnx\(" modelopt/onnx/utils.py | grep -A10 "def.*save_onnx\|ByteSize\|IsInitialized"Repository: NVIDIA/Model-Optimizer
Length of output: 1250
🏁 Script executed:
# Look at the actual error handling - check if exception messages distinguish size from schema errors
rg "EncodeError\|ValueError" modelopt/onnx/utils.py -B3 -A3Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check ONNX protobuf usage patterns to understand if models should be validated before save
rg "CheckInitialized\|IsInitialized" modelopt/ --include="*.py"Repository: NVIDIA/Model-Optimizer
Length of output: 507
🏁 Script executed:
# Check what kind of exceptions the nested onnx.save_model call might raise
rg "onnx\.save_model\|onnx\.save" modelopt/onnx/utils.py -A5 -B5Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's check what the protobuf version is and understand the actual error behavior
rg "protobuf" modelopt/onnx/ -A1 -B1 | head -30Repository: NVIDIA/Model-Optimizer
Length of output: 1851
🏁 Script executed:
# Check if there are any test cases that show what exceptions can be raised
fd -t f "test.*onnx\|.*test.*onnx" modelopt/ | head -10Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's look at usage of save_onnx to understand context better
rg "save_onnx" modelopt/onnx/utils.py -B2 -A2Repository: NVIDIA/Model-Optimizer
Length of output: 1159
Distinguish size-limit errors from schema/initialization failures in exception handling.
At lines 654–658, catching both ValueError and EncodeError and treating all cases as size overflow is incorrect. Per protobuf semantics, EncodeError is raised when a message has uninitialized required fields (proto2 schema violations), not size issues. The current logging message "Model exceeds 2GB limit" misattributes these failures and masks actual data model problems. Only force external-data fallback when the actual message size exceeds the 2GB threshold.
Suggested fix: Before enabling external-data mode in the exception handler, verify that the serialization failure is actually size-related (e.g., by checking model.ByteSize() > size_threshold). If not, re-raise the exception to surface schema validation issues to the caller.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/onnx/utils.py` around lines 654 - 658, The except block that
currently catches ValueError and google.protobuf.message.EncodeError and
unconditionally sets save_as_external_data and logs "Model exceeds 2GB limit"
should instead distinguish true size-overflow from schema/initialization
failures: when catching EncodeError (and ValueError), check the model's
serialized size via model.ByteSize() against the 2GB threshold (or the existing
size_threshold constant) and only set save_as_external_data and log the size
message if ByteSize() > size_threshold; otherwise re-raise the exception so
schema/initialization errors surface to the caller; update the logger call to
remain in the size branch and keep the original exception propagated in the
non-size branch.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1058 +/- ##
==========================================
+ Coverage 70.30% 70.32% +0.02%
==========================================
Files 227 227
Lines 25847 25845 -2
==========================================
+ Hits 18172 18176 +4
+ Misses 7675 7669 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix
For the newer version of protobuf when model size > 2gb the model throws an error when serializing to string.
Suggested fix is to force it to save as external data when this happens.
Summary by CodeRabbit