Skip to content

Conversation

@romitjain
Copy link
Contributor

@romitjain romitjain commented Oct 29, 2025

Solves #2864 for target_modules

Enables ensure_weight_tying flag in LoraConfig for target_modules.

For LoRA, if any of the tied layers are added to target_modules and ensure_weight_tying == True, the adapters added to the layer are shared with all the tied layers.

For example, if a model has tied weights and target_modules=['embed_tokens'] then, LoRA adapters are added to both embed_tokens and lm_head. The adapters in lm_head share the weights with the adapters added to embed_tokens

@romitjain
Copy link
Contributor Author

romitjain commented Oct 29, 2025

@BenjaminBossan
I have added the relevant test cases and implemented the ensure_weight_tying flag for target_modules. The current implementation works only if embed_tokens is added and not if lm_head is added. I will implement that fix and update the PR, but meanwhile would appreciate your views on the logic and implementation.

At a high level

  1. I have updated BaseTuner._check_tied_modules to check for tied modules in target_modules
  2. I have added a private method BaseTuner._add_targets_to_tie that needs to be implemented by the inheriting classes
  3. I have added a loop in BaseTuner.inject_adapter to tie the adapters. I have implemented this extra loop to ensure that the order in which adapters are added to the target modules do not matter.

Thank you

@romitjain romitjain changed the title Tie weights for target_modules in Lora Tie weights for target_modules in Lora (#2864) Oct 29, 2025
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this draft PR to extend the feature to target_modules. I haven't done a full review yet, as some implementation details have yet to be figured out, but I gave some early feedback. This feature could be a bit more difficult to implement than for modules_to_save, I added some comments on why, please check.

peft_config.modules_to_tie = missing_keys

def _add_targets_to_tie(self, peft_config, tied_weight_keys):
target_modules = set(getattr(peft_config, "target_modules", []) or [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider the case that target_modules is a string and not a list of strings. If it's a string, we perform a regex match. Honestly, I'm not sure if there is a good solution. So far, I have 3 ideas:

  1. We could try to use the model.targeted_module_names attribute, which lists all targeted modules after the targets have been resolved. But that would mean that we need to first apply all LoRA layers and only then can we check for tied layers, which is the opposite order of how things are implemented right now.
  2. We could try using the string directly and then for example do something like: config.target_modules += f"|{missing_key}" but this is very brittle and won't work with all regexes, so I would like to avoid this.
  3. We could forbid using ensure_weight_tying=True and target_modules = <str>. Then we'd raise an error and tell users they have to pass a list of str if they want ensure_weight_tying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after going through the code a few more times, I realized this would not work for all the cases. I would go with the 1st approach and move the call to this function after model.targeted_module_names is updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This has yet to be updated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do this in the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan

Moving this after model.targeted_modules_name is populated is tough, as the loop which populates this (https://github.com/huggingface/peft/blob/main/src/peft/tuners/tuners_utils.py#L773-L819) also needs to check and skip if the layers are tied.

Reversing the order would mean that we may end up adding adapters where they're not required. The subsequent code would become more involved, but essentially, we would have to remove adapters from all tied layers, re-add in embed_tokens, and proceed to tie remaining adapters to this. This is an opinionated solve which has the least complexity, according to me.

We can go with (1) in your original comment and redo a few things, or keep the current flow and go with (3).

I think the above might have become tough to follow 😅, so let me know and I can share some schematics. Will wait for your input.

@romitjain romitjain marked this pull request as ready for review October 31, 2025 12:11
@romitjain
Copy link
Contributor Author

@BenjaminBossan This is now ready for review. I have also updated the logic for tied layers in modules_to_save so that lm_head and [embed_tokens, lm_head] cases are supported. Earlier, they would not have worked. The high level implementation remains the same but according to me it's much better placed then my earlier commits.

I have also added a few tests for the above case, and all of the tests pass.

The only thing remaining is how to check for target_modules in case it's a string. I will come back to it, but you can go ahead and review the core logic.

@romitjain romitjain changed the title Tie weights for target_modules in Lora (#2864) ENH: Tie weights for target_modules in Lora (#2864) Oct 31, 2025
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest updates. This looks much cleaner now, I think we're approaching the finish line. There were still some issues I had though, so please check my comments.

romitjain and others added 3 commits November 4, 2025 12:17
Signed-off-by: romit <romit@ibm.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Contributor Author

@BenjaminBossan I have addressed your comments. PTAL

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I think there are some yet unaddressed comments from before and I also added a few more, please check.

There is, however, a bit of a blocker right now. Currently, a huge PR in transformers is on the way: huggingface/transformers#41580. It is intended to be released soon with transformers v5. A change that might affect us is that _tied_weights_keys will be converted from a list to a dict (with keys being targets and values sources). It could also affect _get_tied_weight_keys. We're still discussing how this will affect PEFT. Possibly it's going to be fine, but we're not sure yet, the PR is still changing.

peft_config.modules_to_tie = missing_keys

def _add_targets_to_tie(self, peft_config, tied_weight_keys):
target_modules = set(getattr(peft_config, "target_modules", []) or [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This has yet to be updated, right?

romitjain and others added 6 commits November 7, 2025 10:31
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
tied_weight_keys = set(tied_weight_keys)
peft_config.target_modules_to_tie = tied_weight_keys

raw_target_modules = getattr(peft_config, "target_modules", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan Please review this logic. I know this is a bit hacky! I am open to suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, this is rough. We can't really operate on the string like this, as there are too many possible ways that the regex could be formed. I wonder if we should just leave it be and deal with the tied module edge case in inject_adapter directly. I haven't fully thought this through, perhaps you already tried that and there is a caveat that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2879 (comment)

It should be possible, it would just make the flow very convoluted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid this a bit. We just need to make sure that embed_tokens is present in the target_modules

@romitjain
Copy link
Contributor Author

@BenjaminBossan please review the latest changes now. I believe I have addressed all your comments, but let me know if I missed something.

I have added test cases where we are passing target_modules as str and added a (slightly) hacky solve for that. It's opinionated to keep the flow simple.

Regarding the transformers v5 update, since we would be having a version locked in peft, I believe if this PR advances faster than that, we can merge this. I can take up changes too whenever they're needed. However, you are much closer to this, so you can decide and let me know.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new updates. I added my comments, as usual :)

Regarding the transformers v5 update, since we would be having a version locked in peft, I believe if this PR advances faster than that, we can merge this. I can take up changes too whenever they're needed. However, you are much closer to this, so you can decide and let me know.

No, we don't have a version lock, our goal is to ensure that the upcoming PEFT release v0.18.0 is compatible with both transformers v5 and older versions of transformers. As we have a feature freeze for PEFT, it means that this PR will have to wait for after the 0.18.0 release. When it is ready to merge, we should hopefully know the final state of transformers v5 and then we can test the PR with it to ensure the tests still pass.

peft_config.modules_to_tie = tied_weight_keys

modules_to_save = getattr(peft_config, "modules_to_save", []) or []
if "embed_tokens" not in modules_to_save:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the embedding layer has a different name, this won't be correct, right? It's probably still fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true. I am not able to find a way to get the embedding layer name from the model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to find the layer name whose parameter corresponds to model.get_input_embedding() but I'm fine with assuming the name here. Let's just add a comment.

if getattr(peft_config, "ensure_weight_tying", False):
if is_embedding_to_save and tied_weight_keys:
self._add_modules_to_tie(peft_config, tied_weight_keys)
if (is_embedding_to_save or is_embedding_in_target) and tied_weight_keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole block, line 1288-1298, can be replaced with:

        if getattr(peft_config, "ensure_weight_tying", False):
            if tied_weight_keys:
                if is_embedding_to_save:
                    self._add_modules_to_tie(peft_config, tied_weight_keys)
                elif is_embedding_in_target:
                    self._add_targets_to_tie(peft_config, tied_weight_keys)
                else:
                    warnings.warn(
                        "You have requested `ensure_weight_tying`, but no tied modules are added in either "
                        "`modules_to_save` or `target_modules`"
                    )
            else:
                warnings.warn("You have requested `ensure_weight_tying`, but no tied modules were found in the model")

I think this is cleaner.

tied_weight_keys = set(tied_weight_keys)
peft_config.target_modules_to_tie = tied_weight_keys

raw_target_modules = getattr(peft_config, "target_modules", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, this is rough. We can't really operate on the string like this, as there are too many possible ways that the regex could be formed. I wonder if we should just leave it be and deal with the tied module edge case in inject_adapter directly. I haven't fully thought this through, perhaps you already tried that and there is a caveat that I'm missing?

romitjain and others added 2 commits November 10, 2025 21:21
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Contributor Author

@BenjaminBossan I have made the updates. As of now, we have 3 outstanding updates we need to resolve

  1. What to do in case target_modules is a str: ENH: Tie weights for target_modules in Lora (#2864) #2879 (comment): I have replied to the comment with a small modification in the logic
  2. Release of this PR: Can this PR be merged to main but not in release? Or do we have an ETA for v5 arrival? Some of our internal PRs depend on this, hence asking.
  3. (Minor) Duplication of a warning message: ENH: Tie weights for target_modules in Lora (#2864) #2879 (comment) I have replied to the comment

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I did only a light pass, since we're still discussing implementation details.

Moving this after model.targeted_modules_name is populated is tough, as the loop which populates this (https://github.com/huggingface/peft/blob/main/src/peft/tuners/tuners_utils.py#L773-L819) also needs to check and skip if the layers are tied.

Reversing the order would mean that we may end up adding adapters where they're not required. The subsequent code would become more involved, but essentially, we would have to remove adapters from all tied layers, re-add in embed_tokens, and proceed to tie remaining adapters to this. This is an opinionated solve which has the least complexity, according to me.

We can go with (1) in your original comment and redo a few things, or keep the current flow and go with (3).

I think the above might have become tough to follow 😅, so let me know and I can share some schematics. Will wait for your input.

So IIUC, we already have code for skipping the tied layers, so that's already taken into account, right? For the rest, maybe you could share high level pseudo code so that I can get an idea?

Adding layers whose weights will be overridden later is btw. not something I'm overly concerned with. The number of tied weights is usually quite small in the grand scheme of things, so the overhead should be negligible.

Release of this PR: Can this PR be merged to main but not in release? Or do we have an ETA for v5 arrival? Some of our internal PRs depend on this, hence asking.

So we decided to go ahead with the PEFT release, as it's unclear when v5 will come. We merged #2902, which should take care of forward compatibility (but it causes the merge conflict now, please take care of it). This means that once this PR is ready, we can merge it starting tomorrow.

peft_config.modules_to_tie = tied_weight_keys

modules_to_save = getattr(peft_config, "modules_to_save", []) or []
if "embed_tokens" not in modules_to_save:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to find the layer name whose parameter corresponds to model.get_input_embedding() but I'm fine with assuming the name here. Let's just add a comment.

romitjain and others added 3 commits November 13, 2025 10:11
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Contributor Author

  1. ENH: Tie weights for target_modules in Lora (#2864) #2879 (comment): Good suggestion. I have added a new function to find a layer by reference tensors
  2. I have resolved the merge conflict and addressed your remaining comments
  3. Here's a psudo-code of the complete flow that is currently implemented
def inject_adapter():
    ...

    # finds tied modules and adds to a set - target_modules_to_tie
    self._check_tied_modules(model, peft_config)

    tied_targets = []

    # loop 1
    for k in keys:
        result = k in target_modules
        to_tie = k in target_modules_to_tie

        if to_tie:
            tied_targets.append(k)
            continue
        if result:
            add_lora(k)

    # loop 2
    for t in tied_targets:
        add_tied_lora(t)

    ...
  1. Here is the flow of the alternate that you suggested
def inject_adapter():
    ...
    # loop 1
    for k in keys:
        result = k in target_modules
        if result:
            add_lora(k)


    # finds tied modules and adds to a set - target_modules_to_tie
    self._check_tied_modules(model, peft_config)

    # this is needed since all the tied adapters reference embedding
    # layer's lora as source
    add_lora(embed_tokens)

    # loop 2
    for t in target_modules_to_tie:
        remove_lora(t) # will remove lora if it exists, this might be a bit complex
        add_tied_lora(t)

    ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest improvements to the PR. As you can see, there is one test failing because of a warning not matching, could you please take a look?

Here's a psudo-code of the complete flow that is currently implemented

Thanks for this. I think the second approach wouldn't actually differ that much when it comes to complexity, as such I'm fine with keeping the current approach.

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Contributor Author

@BenjaminBossan, I have fixed the test. I have removed 3 redundant tests which are no longer required.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your continued work. From my perspective, this LGTM (just a small nit).

Overall, this all turned out to be more complex than I would have hoped, but I'm unsure if there is a better way. @githubnemo it would be great if you could also review, maybe you have some ideas.

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Contributor Author

@BenjaminBossan Resolved your comment

@romitjain
Copy link
Contributor Author

@BenjaminBossan I made a small commit - in one of my earlier commits, I had made a change where the target modules were saved as model.embed_tokens instead of embed_tokens. This was causing some downstream issues.
Apologies for the late commit, I know that you will have to trigger the runs again 😅

@BenjaminBossan
Copy link
Member

Thanks for the update @romitjain. There is a new merge conflict, could you please check?

@romitjain
Copy link
Contributor Author

@BenjaminBossan Done

@romitjain
Copy link
Contributor Author

@BenjaminBossan Let me know if any steps are remaining from my side for final push?

@BenjaminBossan
Copy link
Member

@romitjain No, thank you, let's wait for @githubnemo's review.

@romitjain
Copy link
Contributor Author

Hi @githubnemo, it would be very helpful if you could review the PR. One of our internal features depends on this :)

@romitjain
Copy link
Contributor Author

Hi @githubnemo, gentle reminder. Would really appreciate an update. Thanks

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for the delayed review.

Thanks for working on this, it's quite the gnarly topic :)

I'm not sure if I understood everything correctly so some of my comments may just be me misunderstanding something but I think that there are some places where the match is rather probabilistic.

I wonder if we need to normalize layer names at some point so that we only work with fully-qualified names after that point. For example in _add_modules_to_tie we will look at the modules to save set:

modules_to_save = getattr(peft_config, "modules_to_save", []) or []

I don't think that we have guaranteed fully-qualified names here as they are still user-supplied. IMO it would be worthwhile to first collect the full names of all values in modules_to_save and then check if they are tied to save us from having various places where we do prefix/suffix/infix/whatever comparisons.


modules_to_save = getattr(peft_config, "modules_to_save", []) or []

embed_layer_name = find_parameter_name_by_tensor(self.model, self.model.get_input_embeddings().weight)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no guarantee that this will return the name of the embedding layer. It could also return the name of a layer tied to the embedding layer. It is probably safer to compare module identity instead (even though for transformers <5 this will also be flaky for models like T5).

Comment on lines +888 to +889
# find_parameter_name_by_tensor returns the parameter name, so we need to strip the weight from the name
embed_layer_name = embed_layer_name.replace(".weight", "").replace("model.", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if replacing these strings is a good idea. encoder_model.embed_tokens would be turned into encoder_embed_tokens. Maybe using a more restricted approach (only one replacement, only if the key is found) would be better? .weight for example could be dropped by using .removesuffix.

Comment on lines +874 to +881
"""
Tied weight keys contains the layers tied to the embedding layer. Add embedding layer and remove rest of the
tied layers from `module_to_save`. Maintain a separate set for layers to be tied
Args:
peft_config (LoraConfig)
tied_weight_keys (list[str])
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove or document the parameters from the docstring, simply listing them doesn't add value.

Something like this:

Suggested change
"""
Tied weight keys contains the layers tied to the embedding layer. Add embedding layer and remove rest of the
tied layers from `module_to_save`. Maintain a separate set for layers to be tied
Args:
peft_config (LoraConfig)
tied_weight_keys (list[str])
"""
"""
Add embedding layer to `modules_to_save` and remove rest of the tied layers from `module_to_save`.
Maintain a separate set for layers to be tied in `peft_config.tied_weights_keys`.
Args:
peft_config (LoraConfig)
tied_weight_keys (list[str])
Contains the layers tied to the embedding layer.
"""

But what I'm still missing is an explanation to some of this. Especially:

"Maintain a separate set for layers to be tied [in peft_config.tied_weights_keys.]"

Can you add to the docstring to what end this is done?

Comment on lines +895 to +896
if m in modules_to_save:
modules_to_save.remove(m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often this will generate a match. If I understand correctly, tied_weight_keys are fully-qualified keys. So this check will only match if the keys in modules_to_save are also fully-qualified. I don't think this happens often. cc @BenjaminBossan

"This will ensure that the adapters added to the tied layers "
"are also tied. This is only applicable for layers passed via "
"`modules_to_save`."
"`modules_to_save` and and `target_modules`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`modules_to_save` and and `target_modules`."
"`modules_to_save` and `target_modules`."

Comment on lines +332 to +333
# Before exporting the parameters we need to make sure
# all the tensors are contigious. Tensors can become non contigiuous
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Before exporting the parameters we need to make sure
# all the tensors are contigious. Tensors can become non contigiuous
# Before exporting the parameters we need to make sure all the tensors are contigious as saving
# non-contiguous parameters is not supported. Tensors can become non contigiuous

arrow_config: ArrowConfig = None,
qalora_group_size: int = 32,
inference_mode: bool = False,
tied_adapters: Optional[dict[str, nn.Parameter]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a misnomer? IIUC it only contains LoRA parameters for one adapter?

def _add_modules_to_tie(self, peft_config, tied_weight_keys):
modules_to_save = set(getattr(peft_config, "modules_to_save", []) or [])
missing_keys = set(tied_weight_keys) - modules_to_save
def _add_modules_to_tie(self, peft_config: LoraConfig, tied_weight_keys: list[str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have _add_target_modules as well I'm wondering if we should refactor this to _add_modules_to_save_to_tie for clarity (it is verbose, yes).

Same goes for the config key modules_to_tie.

Comment on lines +928 to +930
for m in tied_weight_keys:
if m in target_modules:
target_modules.remove(m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also only occasionally match, right? Only if users supply the fully-qualified module names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants