Thanks for sharing your work, it helps me a lot.
But I have some confusions about the default order performance and implementation, the code here to deal with default order are seems to be mismatch with the original GPTQ code. When default order and groupsize are applied, the original GPTQ will re-compute the scale and zeros in calibration steps using the following code:
if groupsize != -1:
if not static_groups:
if (i1 + i) % groupsize == 0:
self.quantizer.find_params(W[:, (i1 + i):(i1 + i + groupsize)], weight=True)
else:
idx = i1 + i
if actorder:
idx = perm[idx]
self.quantizer = groups[idx // groupsize]
But this is removed in this repository, so the scale is fixed before all calibration steps, making the scale of the latter quantization group/block sub-optimal. I wonder why this is removed, because this seems not to bring overhead in inference? If I have misunderstand something, please point it out, I will be very appreciated!
Thanks for sharing your work, it helps me a lot.
But I have some confusions about the default order performance and implementation, the code here to deal with default order are seems to be mismatch with the original GPTQ code. When
default orderandgroupsizeare applied, the original GPTQ will re-compute the scale and zeros in calibration steps using the following code:But this is removed in this repository, so the scale is fixed before all calibration steps, making the scale of the latter quantization group/block sub-optimal. I wonder why this is removed, because this seems not to bring overhead in inference? If I have misunderstand something, please point it out, I will be very appreciated!