Commit 615e45a
committed
perf(eval): skip unnecessary logit array copies during native sampling
- Introduce the `copy_logits` parameter to `Llama.eval()` to control
whether C-level logits are copied into the Python `self.scores` array.
- Automatically disable `copy_logits` during the generation loop unless
Python-side hooks (`logits_processor`, `stopping_criteria`) or
`logits_all` explicitly require them.
- Skip logit copies entirely for intermediate prompt evaluations (e.g.,
before hybrid checkpoints).
- Update logit retrieval to use `get_logits_ith(-1)` to accurately fetch
the final token's logits when copying is required.
In a PDF-reading summarization workload, this reduced the end-to-end completion
time from 41.32s to 25.93s, a ~37.2% improvement. The main generation hot path
also improved noticeably:
- `_create_completion`: 41.32s -> 25.93s
- `generate`: 37.82s -> below the top sampled entries
- `eval`: 35.14s -> 21.96s
- logits retrieval/copy path: 29.89s `get_logits()` -> 18.68s `get_logits_ith()`
- `decode`: 3.89s -> 2.25s
- `detokenize`: 2.60s -> 1.33s
- `sample`: 2.35s -> 2.03s
This significantly reduces CPU overhead and memory bandwidth during generation,
as the native `llama.cpp` sampler reads directly from the C context without
needing to expose the `n_vocab` array to Python on every token.
Signed-off-by: JamePeng <jame_peng@sina.com>1 parent 7e0cd12 commit 615e45a
1 file changed
Lines changed: 43 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1035 | 1035 | | |
1036 | 1036 | | |
1037 | 1037 | | |
| 1038 | + | |
1038 | 1039 | | |
1039 | 1040 | | |
1040 | 1041 | | |
1041 | 1042 | | |
1042 | | - | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
1043 | 1052 | | |
1044 | 1053 | | |
1045 | 1054 | | |
| |||
1246 | 1255 | | |
1247 | 1256 | | |
1248 | 1257 | | |
1249 | | - | |
1250 | | - | |
1251 | | - | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
1252 | 1263 | | |
1253 | 1264 | | |
1254 | 1265 | | |
| |||
1666 | 1677 | | |
1667 | 1678 | | |
1668 | 1679 | | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
1669 | 1688 | | |
1670 | 1689 | | |
1671 | 1690 | | |
| |||
1685 | 1704 | | |
1686 | 1705 | | |
1687 | 1706 | | |
1688 | | - | |
1689 | | - | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
1690 | 1714 | | |
1691 | 1715 | | |
1692 | 1716 | | |
| |||
1695 | 1719 | | |
1696 | 1720 | | |
1697 | 1721 | | |
1698 | | - | |
1699 | | - | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
1700 | 1729 | | |
1701 | 1730 | | |
1702 | | - | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
1703 | 1737 | | |
1704 | 1738 | | |
1705 | 1739 | | |
| |||
0 commit comments