Commit 4731379
authored
FEAT Integrate BD-LoRA into PEFT (#2895)
Implements BD-LoRA: Block-Diagonal LoRA for Eliminating Communication
Overhead in Tensor Parallel LoRA Serving
(https://openreview.net/forum?id=1cjLvtFOmL).
With BD-LoRA, the LoRA weights are implemented in a block-diagonal way.
This allows to reduce communication overhead when using tensor
parallelism (TP) and thus faster serving.
There is an experiment vLLM PR to support this, but it's not merged
(yet): vllm-project/vllm#28136.1 parent 4d63474 commit 4731379
File tree
16 files changed
+907
-3
lines changed- examples/bdlora_finetuning
- method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank14-target-mlp-bdlora
- src/peft
- tuners
- lora
- tests
16 files changed
+907
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
Loading
Loading
0 commit comments