Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18247
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Awaiting Approval, 1 New Failure, 25 PendingAs of commit 1ebb7d7 with merge base eb92cec ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA example LoRA linear implementation to call an nn.Linear module in forward() (instead of torch.nn.functional.linear), presumably to align with module-based patterns.
Changes:
- Replace
torch.nn.functional.linear(x, self.weight, self.bias)withself.linear(x). - Introduce
self.linear = nn.Linear(...)duringLoRALinearinitialization.
Comments suppressed due to low confidence (1)
examples/models/llama/lora.py:36
linearandweightare now undefined after switching toself.linear;bias = linear.bias ...andregister_parameter("weight", ...)will raise at init time. Either derivebias/weightfromself.linear(or remove the extraregister_parametercalls entirely) so construction works.
self.linear = nn.Linear(in_dim, out_dim, bias=use_bias)
bias = linear.bias if self.use_bias else None
self.register_parameter("weight", nn.Parameter(weight))
self.register_parameter(
"bias", nn.Parameter(bias) if bias is not None else None
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Updates the Llama LoRA implementation to call an nn.Linear submodule directly (instead of torch.nn.functional.linear) and simplifies quantization filtering accordingly.
Changes:
- Refactors
LoRALinearto wrap a realnn.Linearmodule and use it inforward. - Adds partial state-dict backward compatibility by remapping the legacy
weightkey. - Simplifies the 8da*w quantization
filter_fnto only considernn.Linearmodules.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies the 8da4w/8da8w quantization filter to target nn.Linear modules only. |
| examples/models/llama/lora.py | Refactors LoRALinear to delegate to an internal nn.Linear and updates forward/state-dict behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA LoRA implementation to wrap a real nn.Linear submodule (instead of calling torch.nn.functional.linear directly), enabling TorchAO quantization tooling to recognize and quantize the base linear layer consistently across LoRA and non-LoRA models.
Changes:
- Refactors
LoRALinearto containself.linear: nn.Linear, addsweight/biasproperties for backward-compatible access, and remaps old checkpoint keys on load. - Simplifies TorchAO 8da*xw quantization filtering to target
nn.Linearmodules only (no special-casing LoRA modules). - Makes XNNPACK constant serialization keys content-hash-based (removes tensor-name prefix) and updates the LoRA CI expectation string accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies TorchAO quantization filter to match nn.Linear modules and apply group-size compatibility logic. |
| examples/models/llama/lora.py | Refactors LoRALinear to wrap an nn.Linear submodule; adds BC weight/bias accessors and state-dict key remapping. |
| backends/xnnpack/operators/node_visitor.py | Changes named constant key generation to be solely SHA256-based to stabilize dedup/indexing behavior. |
| .ci/scripts/test_lora.sh | Updates expected output prefix text for the quantized LoRA test case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| self.use_bias = use_bias | ||
| self.dropout = dropout | ||
|
|
||
| linear = nn.Linear(in_dim, out_dim, bias=use_bias) | ||
| weight = linear.weight | ||
| bias = linear.bias if self.use_bias else None | ||
| self.register_parameter("weight", nn.Parameter(weight)) | ||
| self.register_parameter( | ||
| "bias", nn.Parameter(bias) if bias is not None else None | ||
| ) | ||
|
|
||
| self.linear = nn.Linear(in_dim, out_dim, bias=use_bias) | ||
| self.dropout = nn.Dropout(p=dropout) if dropout > 0.0 else nn.Identity() | ||
| self.lora_a = nn.Linear(in_features=in_dim, out_features=rank, bias=False) |
Summary
Update lora def to use nn.linear instead of torch.nn.functional.linear
executorch/examples/models/llama/static_attention.py
Line 1154 in eb92cec
Test plan
CI