Quantization support for GroupedTensor: MXFP8

Implement quantization support for the GroupedTensor type for MXFP8 format.
The needed modifications to the existing kernel:
 - ignore padding in the allocation