-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ENH: Tie weights for target_modules in Lora (#2864) #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@BenjaminBossan At a high level
Thank you |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this draft PR to extend the feature to target_modules. I haven't done a full review yet, as some implementation details have yet to be figured out, but I gave some early feedback. This feature could be a bit more difficult to implement than for modules_to_save, I added some comments on why, please check.
src/peft/tuners/lora/model.py
Outdated
| peft_config.modules_to_tie = missing_keys | ||
|
|
||
| def _add_targets_to_tie(self, peft_config, tied_weight_keys): | ||
| target_modules = set(getattr(peft_config, "target_modules", []) or []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider the case that target_modules is a string and not a list of strings. If it's a string, we perform a regex match. Honestly, I'm not sure if there is a good solution. So far, I have 3 ideas:
- We could try to use the
model.targeted_module_namesattribute, which lists all targeted modules after the targets have been resolved. But that would mean that we need to first apply all LoRA layers and only then can we check for tied layers, which is the opposite order of how things are implemented right now. - We could try using the string directly and then for example do something like:
config.target_modules += f"|{missing_key}"but this is very brittle and won't work with all regexes, so I would like to avoid this. - We could forbid using
ensure_weight_tying=Trueandtarget_modules = <str>. Then we'd raise an error and tell users they have to pass a list of str if they wantensure_weight_tying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, after going through the code a few more times, I realized this would not work for all the cases. I would go with the 1st approach and move the call to this function after model.targeted_module_names is updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. This has yet to be updated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will do this in the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this after model.targeted_modules_name is populated is tough, as the loop which populates this (https://github.com/huggingface/peft/blob/main/src/peft/tuners/tuners_utils.py#L773-L819) also needs to check and skip if the layers are tied.
Reversing the order would mean that we may end up adding adapters where they're not required. The subsequent code would become more involved, but essentially, we would have to remove adapters from all tied layers, re-add in embed_tokens, and proceed to tie remaining adapters to this. This is an opinionated solve which has the least complexity, according to me.
We can go with (1) in your original comment and redo a few things, or keep the current flow and go with (3).
I think the above might have become tough to follow 😅, so let me know and I can share some schematics. Will wait for your input.
Signed-off-by: romit <romit@ibm.com>
|
@BenjaminBossan This is now ready for review. I have also updated the logic for tied layers in I have also added a few tests for the above case, and all of the tests pass. The only thing remaining is how to check for |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the latest updates. This looks much cleaner now, I think we're approaching the finish line. There were still some issues I had though, so please check my comments.
Signed-off-by: romit <romit@ibm.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
|
@BenjaminBossan I have addressed your comments. PTAL |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I think there are some yet unaddressed comments from before and I also added a few more, please check.
There is, however, a bit of a blocker right now. Currently, a huge PR in transformers is on the way: huggingface/transformers#41580. It is intended to be released soon with transformers v5. A change that might affect us is that _tied_weights_keys will be converted from a list to a dict (with keys being targets and values sources). It could also affect _get_tied_weight_keys. We're still discussing how this will affect PEFT. Possibly it's going to be fine, but we're not sure yet, the PR is still changing.
src/peft/tuners/lora/model.py
Outdated
| peft_config.modules_to_tie = missing_keys | ||
|
|
||
| def _add_targets_to_tie(self, peft_config, tied_weight_keys): | ||
| target_modules = set(getattr(peft_config, "target_modules", []) or []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. This has yet to be updated, right?
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
…to enh/tie-target-modules
Signed-off-by: romit <romit@ibm.com>
| tied_weight_keys = set(tied_weight_keys) | ||
| peft_config.target_modules_to_tie = tied_weight_keys | ||
|
|
||
| raw_target_modules = getattr(peft_config, "target_modules", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenjaminBossan Please review this logic. I know this is a bit hacky! I am open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, this is rough. We can't really operate on the string like this, as there are too many possible ways that the regex could be formed. I wonder if we should just leave it be and deal with the tied module edge case in inject_adapter directly. I haven't fully thought this through, perhaps you already tried that and there is a caveat that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible, it would just make the flow very convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I redid this a bit. We just need to make sure that embed_tokens is present in the target_modules
|
@BenjaminBossan please review the latest changes now. I believe I have addressed all your comments, but let me know if I missed something. I have added test cases where we are passing Regarding the transformers v5 update, since we would be having a version locked in peft, I believe if this PR advances faster than that, we can merge this. I can take up changes too whenever they're needed. However, you are much closer to this, so you can decide and let me know. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new updates. I added my comments, as usual :)
Regarding the transformers v5 update, since we would be having a version locked in peft, I believe if this PR advances faster than that, we can merge this. I can take up changes too whenever they're needed. However, you are much closer to this, so you can decide and let me know.
No, we don't have a version lock, our goal is to ensure that the upcoming PEFT release v0.18.0 is compatible with both transformers v5 and older versions of transformers. As we have a feature freeze for PEFT, it means that this PR will have to wait for after the 0.18.0 release. When it is ready to merge, we should hopefully know the final state of transformers v5 and then we can test the PR with it to ensure the tests still pass.
src/peft/tuners/lora/model.py
Outdated
| peft_config.modules_to_tie = tied_weight_keys | ||
|
|
||
| modules_to_save = getattr(peft_config, "modules_to_save", []) or [] | ||
| if "embed_tokens" not in modules_to_save: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the embedding layer has a different name, this won't be correct, right? It's probably still fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true. I am not able to find a way to get the embedding layer name from the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to find the layer name whose parameter corresponds to model.get_input_embedding() but I'm fine with assuming the name here. Let's just add a comment.
src/peft/tuners/tuners_utils.py
Outdated
| if getattr(peft_config, "ensure_weight_tying", False): | ||
| if is_embedding_to_save and tied_weight_keys: | ||
| self._add_modules_to_tie(peft_config, tied_weight_keys) | ||
| if (is_embedding_to_save or is_embedding_in_target) and tied_weight_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole block, line 1288-1298, can be replaced with:
if getattr(peft_config, "ensure_weight_tying", False):
if tied_weight_keys:
if is_embedding_to_save:
self._add_modules_to_tie(peft_config, tied_weight_keys)
elif is_embedding_in_target:
self._add_targets_to_tie(peft_config, tied_weight_keys)
else:
warnings.warn(
"You have requested `ensure_weight_tying`, but no tied modules are added in either "
"`modules_to_save` or `target_modules`"
)
else:
warnings.warn("You have requested `ensure_weight_tying`, but no tied modules were found in the model")I think this is cleaner.
| tied_weight_keys = set(tied_weight_keys) | ||
| peft_config.target_modules_to_tie = tied_weight_keys | ||
|
|
||
| raw_target_modules = getattr(peft_config, "target_modules", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, this is rough. We can't really operate on the string like this, as there are too many possible ways that the regex could be formed. I wonder if we should just leave it be and deal with the tied module edge case in inject_adapter directly. I haven't fully thought this through, perhaps you already tried that and there is a caveat that I'm missing?
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
|
@BenjaminBossan I have made the updates. As of now, we have 3 outstanding updates we need to resolve
|
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I did only a light pass, since we're still discussing implementation details.
Moving this after model.targeted_modules_name is populated is tough, as the loop which populates this (https://github.com/huggingface/peft/blob/main/src/peft/tuners/tuners_utils.py#L773-L819) also needs to check and skip if the layers are tied.
Reversing the order would mean that we may end up adding adapters where they're not required. The subsequent code would become more involved, but essentially, we would have to remove adapters from all tied layers, re-add in embed_tokens, and proceed to tie remaining adapters to this. This is an opinionated solve which has the least complexity, according to me.
We can go with (1) in your original comment and redo a few things, or keep the current flow and go with (3).
I think the above might have become tough to follow 😅, so let me know and I can share some schematics. Will wait for your input.
So IIUC, we already have code for skipping the tied layers, so that's already taken into account, right? For the rest, maybe you could share high level pseudo code so that I can get an idea?
Adding layers whose weights will be overridden later is btw. not something I'm overly concerned with. The number of tied weights is usually quite small in the grand scheme of things, so the overhead should be negligible.
Release of this PR: Can this PR be merged to main but not in release? Or do we have an ETA for v5 arrival? Some of our internal PRs depend on this, hence asking.
So we decided to go ahead with the PEFT release, as it's unclear when v5 will come. We merged #2902, which should take care of forward compatibility (but it causes the merge conflict now, please take care of it). This means that once this PR is ready, we can merge it starting tomorrow.
src/peft/tuners/lora/model.py
Outdated
| peft_config.modules_to_tie = tied_weight_keys | ||
|
|
||
| modules_to_save = getattr(peft_config, "modules_to_save", []) or [] | ||
| if "embed_tokens" not in modules_to_save: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to find the layer name whose parameter corresponds to model.get_input_embedding() but I'm fine with assuming the name here. Let's just add a comment.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the latest improvements to the PR. As you can see, there is one test failing because of a warning not matching, could you please take a look?
Here's a psudo-code of the complete flow that is currently implemented
Thanks for this. I think the second approach wouldn't actually differ that much when it comes to complexity, as such I'm fine with keeping the current approach.
Signed-off-by: romit <romit@ibm.com>
|
@BenjaminBossan, I have fixed the test. I have removed 3 redundant tests which are no longer required. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your continued work. From my perspective, this LGTM (just a small nit).
Overall, this all turned out to be more complex than I would have hoped, but I'm unsure if there is a better way. @githubnemo it would be great if you could also review, maybe you have some ideas.
|
@BenjaminBossan Resolved your comment |
Signed-off-by: romit <romit@ibm.com>
|
@BenjaminBossan I made a small commit - in one of my earlier commits, I had made a change where the target modules were saved as |
|
Thanks for the update @romitjain. There is a new merge conflict, could you please check? |
|
@BenjaminBossan Done |
|
@BenjaminBossan Let me know if any steps are remaining from my side for final push? |
|
@romitjain No, thank you, let's wait for @githubnemo's review. |
|
Hi @githubnemo, it would be very helpful if you could review the PR. One of our internal features depends on this :) |
|
Hi @githubnemo, gentle reminder. Would really appreciate an update. Thanks |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry for the delayed review.
Thanks for working on this, it's quite the gnarly topic :)
I'm not sure if I understood everything correctly so some of my comments may just be me misunderstanding something but I think that there are some places where the match is rather probabilistic.
I wonder if we need to normalize layer names at some point so that we only work with fully-qualified names after that point. For example in _add_modules_to_tie we will look at the modules to save set:
modules_to_save = getattr(peft_config, "modules_to_save", []) or []
I don't think that we have guaranteed fully-qualified names here as they are still user-supplied. IMO it would be worthwhile to first collect the full names of all values in modules_to_save and then check if they are tied to save us from having various places where we do prefix/suffix/infix/whatever comparisons.
|
|
||
| modules_to_save = getattr(peft_config, "modules_to_save", []) or [] | ||
|
|
||
| embed_layer_name = find_parameter_name_by_tensor(self.model, self.model.get_input_embeddings().weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no guarantee that this will return the name of the embedding layer. It could also return the name of a layer tied to the embedding layer. It is probably safer to compare module identity instead (even though for transformers <5 this will also be flaky for models like T5).
| # find_parameter_name_by_tensor returns the parameter name, so we need to strip the weight from the name | ||
| embed_layer_name = embed_layer_name.replace(".weight", "").replace("model.", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if replacing these strings is a good idea. encoder_model.embed_tokens would be turned into encoder_embed_tokens. Maybe using a more restricted approach (only one replacement, only if the key is found) would be better? .weight for example could be dropped by using .removesuffix.
| """ | ||
| Tied weight keys contains the layers tied to the embedding layer. Add embedding layer and remove rest of the | ||
| tied layers from `module_to_save`. Maintain a separate set for layers to be tied | ||
| Args: | ||
| peft_config (LoraConfig) | ||
| tied_weight_keys (list[str]) | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove or document the parameters from the docstring, simply listing them doesn't add value.
Something like this:
| """ | |
| Tied weight keys contains the layers tied to the embedding layer. Add embedding layer and remove rest of the | |
| tied layers from `module_to_save`. Maintain a separate set for layers to be tied | |
| Args: | |
| peft_config (LoraConfig) | |
| tied_weight_keys (list[str]) | |
| """ | |
| """ | |
| Add embedding layer to `modules_to_save` and remove rest of the tied layers from `module_to_save`. | |
| Maintain a separate set for layers to be tied in `peft_config.tied_weights_keys`. | |
| Args: | |
| peft_config (LoraConfig) | |
| tied_weight_keys (list[str]) | |
| Contains the layers tied to the embedding layer. | |
| """ |
But what I'm still missing is an explanation to some of this. Especially:
"Maintain a separate set for layers to be tied [in
peft_config.tied_weights_keys.]"
Can you add to the docstring to what end this is done?
| if m in modules_to_save: | ||
| modules_to_save.remove(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often this will generate a match. If I understand correctly, tied_weight_keys are fully-qualified keys. So this check will only match if the keys in modules_to_save are also fully-qualified. I don't think this happens often. cc @BenjaminBossan
| "This will ensure that the adapters added to the tied layers " | ||
| "are also tied. This is only applicable for layers passed via " | ||
| "`modules_to_save`." | ||
| "`modules_to_save` and and `target_modules`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "`modules_to_save` and and `target_modules`." | |
| "`modules_to_save` and `target_modules`." |
| # Before exporting the parameters we need to make sure | ||
| # all the tensors are contigious. Tensors can become non contigiuous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Before exporting the parameters we need to make sure | |
| # all the tensors are contigious. Tensors can become non contigiuous | |
| # Before exporting the parameters we need to make sure all the tensors are contigious as saving | |
| # non-contiguous parameters is not supported. Tensors can become non contigiuous |
| arrow_config: ArrowConfig = None, | ||
| qalora_group_size: int = 32, | ||
| inference_mode: bool = False, | ||
| tied_adapters: Optional[dict[str, nn.Parameter]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a misnomer? IIUC it only contains LoRA parameters for one adapter?
| def _add_modules_to_tie(self, peft_config, tied_weight_keys): | ||
| modules_to_save = set(getattr(peft_config, "modules_to_save", []) or []) | ||
| missing_keys = set(tied_weight_keys) - modules_to_save | ||
| def _add_modules_to_tie(self, peft_config: LoraConfig, tied_weight_keys: list[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have _add_target_modules as well I'm wondering if we should refactor this to _add_modules_to_save_to_tie for clarity (it is verbose, yes).
Same goes for the config key modules_to_tie.
| for m in tied_weight_keys: | ||
| if m in target_modules: | ||
| target_modules.remove(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also only occasionally match, right? Only if users supply the fully-qualified module names.
Solves #2864 for
target_modulesEnables
ensure_weight_tyingflag inLoraConfigfortarget_modules.For LoRA, if any of the tied layers are added to
target_modulesandensure_weight_tying == True, the adapters added to the layer are shared with all the tied layers.For example, if a model has tied weights and
target_modules=['embed_tokens']then, LoRA adapters are added to bothembed_tokensandlm_head. The adapters inlm_headshare the weights with the adapters added toembed_tokens