About default order implementation and performance

Thanks for sharing your work, it helps me a lot.
But I have some confusions about the default order performance and implementation, the code here to deal with default order are seems to be mismatch with the original GPTQ code. When ``default order`` and ``groupsize`` are applied, the original GPTQ will re-compute the scale and zeros in calibration steps using the following code:
```
if groupsize != -1:
    if not static_groups:
        if (i1 + i) % groupsize == 0:
            self.quantizer.find_params(W[:, (i1 + i):(i1 + i + groupsize)], weight=True)
    else:
        idx = i1 + i
        if actorder:
            idx = perm[idx]
        self.quantizer = groups[idx // groupsize]
```
But this is removed in this repository, so the scale is fixed before all calibration steps, making the scale of the latter quantization group/block sub-optimal. I wonder why this is removed, because this seems not to bring overhead in inference? If I have misunderstand something, please point it out, I will be very appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About default order implementation and performance #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About default order implementation and performance #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions