Implement quantization support for the GroupedTensor type for MXFP8 format. The needed modifications to the existing kernel: - ignore padding in the allocation