Commit 3ddb86c
authored
Qualcomm AI Engine Direct - Support multimodal(VLM) runner (pytorch#16536)
### Summary:
- Runtime support for models
- SmolVLM 500M
- InternVL3 1B
- add hybrid mode runtime requantization in multimodal runner
- Background: In LLMs, `annotate_prefill_kv_output` effectively narrows
the output gap
between `hybrid` mode and `KV` mode. However, applying the same method
to multimodal
models do not work(bad results). To achieve decent result in hybrid
mode, we dequantize
the KV cache right after prefilling and re‑quantize it based on the
decoder input cache at
runtime.
- CI
- refactor VLM test script
- add VLM acc/perf runtime tests
- Refactor (VLM)
- rename embedding forward input for CPU quantization
- Update VLM vision encoder architecture to align with transformers 5.0
changes
- Documentation
- add readme for multimodal VLM
### Test plan
#### SmolVLM
Perf: ~63 TPS in SM8750
``` bash
python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_static_vlm --model_name smolvlm_500m_instruct -b build-android --executorch_root . -a . -m SM8750 -s ${SERIAL_NUM}
```
#### InternVL3
Perf: ~17 TPS in SM8750
``` bash
python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_static_vlm --model_name internvl3_1b -b build-android --executorch_root . -a . -m SM8750 -s ${SERIAL_NUM}
```
### Script
#### SmolVLM
``` bash
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model smolvlm_500m_instruct --model_mode kv --max_seq_len 1024 --prompt "Can you describe this image?" --image_path "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
```
#### InternVL3
```bash
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model internvl3_1b --model_mode kv --max_seq_len 1024 --prompt "Can you describe this image?" --image_path "http://images.cocodataset.org/val2017/000000039769.jpg"
```1 parent ac0a201 commit 3ddb86c
35 files changed
Lines changed: 3846 additions & 241 deletions
File tree
- backends/qualcomm/tests
- examples/qualcomm/oss_scripts/llama
- assets/samples/images
- encoder
- model
- runner
- multimodal_runner
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6529 | 6529 | | |
6530 | 6530 | | |
6531 | 6531 | | |
6532 | | - | |
6533 | | - | |
6534 | | - | |
6535 | 6532 | | |
6536 | | - | |
6537 | | - | |
6538 | | - | |
6539 | | - | |
6540 | | - | |
6541 | | - | |
6542 | | - | |
6543 | | - | |
6544 | | - | |
6545 | | - | |
6546 | | - | |
6547 | | - | |
6548 | | - | |
6549 | | - | |
6550 | | - | |
6551 | | - | |
6552 | | - | |
6553 | | - | |
6554 | | - | |
6555 | | - | |
6556 | | - | |
6557 | | - | |
6558 | | - | |
6559 | | - | |
6560 | | - | |
6561 | | - | |
6562 | | - | |
6563 | | - | |
6564 | | - | |
6565 | | - | |
6566 | | - | |
6567 | | - | |
6568 | | - | |
6569 | | - | |
6570 | | - | |
| 6533 | + | |
| 6534 | + | |
| 6535 | + | |
| 6536 | + | |
| 6537 | + | |
| 6538 | + | |
| 6539 | + | |
| 6540 | + | |
6571 | 6541 | | |
6572 | | - | |
6573 | | - | |
6574 | | - | |
6575 | | - | |
6576 | | - | |
6577 | | - | |
6578 | | - | |
6579 | | - | |
6580 | | - | |
6581 | | - | |
6582 | | - | |
6583 | | - | |
6584 | | - | |
6585 | | - | |
6586 | | - | |
6587 | | - | |
6588 | | - | |
6589 | | - | |
| 6542 | + | |
| 6543 | + | |
| 6544 | + | |
| 6545 | + | |
6590 | 6546 | | |
6591 | | - | |
6592 | | - | |
| 6547 | + | |
| 6548 | + | |
| 6549 | + | |
| 6550 | + | |
| 6551 | + | |
| 6552 | + | |
| 6553 | + | |
| 6554 | + | |
| 6555 | + | |
| 6556 | + | |
| 6557 | + | |
| 6558 | + | |
| 6559 | + | |
| 6560 | + | |
| 6561 | + | |
| 6562 | + | |
| 6563 | + | |
| 6564 | + | |
| 6565 | + | |
| 6566 | + | |
| 6567 | + | |
| 6568 | + | |
| 6569 | + | |
| 6570 | + | |
| 6571 | + | |
| 6572 | + | |
| 6573 | + | |
6593 | 6574 | | |
6594 | 6575 | | |
| 6576 | + | |
| 6577 | + | |
| 6578 | + | |
6595 | 6579 | | |
| 6580 | + | |
6596 | 6581 | | |
6597 | 6582 | | |
6598 | 6583 | | |
| |||
6608 | 6593 | | |
6609 | 6594 | | |
6610 | 6595 | | |
| 6596 | + | |
| 6597 | + | |
6611 | 6598 | | |
6612 | 6599 | | |
6613 | 6600 | | |
6614 | | - | |
| 6601 | + | |
6615 | 6602 | | |
6616 | 6603 | | |
6617 | 6604 | | |
6618 | | - | |
| 6605 | + | |
6619 | 6606 | | |
6620 | 6607 | | |
6621 | 6608 | | |
| |||
6636 | 6623 | | |
6637 | 6624 | | |
6638 | 6625 | | |
| 6626 | + | |
| 6627 | + | |
| 6628 | + | |
| 6629 | + | |
| 6630 | + | |
| 6631 | + | |
| 6632 | + | |
| 6633 | + | |
| 6634 | + | |
6639 | 6635 | | |
6640 | 6636 | | |
6641 | 6637 | | |
6642 | 6638 | | |
6643 | | - | |
6644 | | - | |
6645 | | - | |
| 6639 | + | |
| 6640 | + | |
| 6641 | + | |
| 6642 | + | |
| 6643 | + | |
6646 | 6644 | | |
6647 | 6645 | | |
6648 | 6646 | | |
6649 | 6647 | | |
| 6648 | + | |
| 6649 | + | |
| 6650 | + | |
| 6651 | + | |
| 6652 | + | |
| 6653 | + | |
| 6654 | + | |
| 6655 | + | |
| 6656 | + | |
| 6657 | + | |
| 6658 | + | |
| 6659 | + | |
| 6660 | + | |
6650 | 6661 | | |
6651 | 6662 | | |
6652 | 6663 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
0 commit comments