Describe the problem
I'm currently using GLM4 with the latest version of HuggingFace's transformers library in a P-Tuning experiment. While preparing input batches, I encountered the following error:
This happens when I try to use .pad(padding_side="right") — a common approach in HuggingFace to pad a batch of tokenized inputs.
My use case
I'm following the HuggingFace-style batching process for fine-tuning, where .pad() is typically used to ensure consistent input shapes. But the GLM4 tokenizer appears to lack support for padding_side, and perhaps even .pad() behavior in general.
What I’ve tried
- Looked into the tokenizer code — it seems that
GLMTokenizer does not inherit the usual pad() method behavior from PreTrainedTokenizerFast.
- Tried manually padding the input sequences, but I’m concerned about whether that matches GLM4’s expected behavior, particularly for
attention_mask and position_ids.
Questions
- What's the recommended way to apply padding when using GLM4 tokenizer?
- Is there a compatible data collator or tokenizer wrapper that supports HuggingFace-style padding?
- Would manually implementing padding + masks be sufficient, or is there a better way to ensure compatibility?
Environment:
- GLM model: GLM4
- Transformers version: latest
- OS: Ubuntu
Thanks a lot for your help!
Describe the problem
I'm currently using GLM4 with the latest version of HuggingFace's
transformerslibrary in a P-Tuning experiment. While preparing input batches, I encountered the following error:This happens when I try to use
.pad(padding_side="right")— a common approach in HuggingFace to pad a batch of tokenized inputs.My use case
I'm following the HuggingFace-style batching process for fine-tuning, where
.pad()is typically used to ensure consistent input shapes. But the GLM4 tokenizer appears to lack support forpadding_side, and perhaps even.pad()behavior in general.What I’ve tried
GLMTokenizerdoes not inherit the usualpad()method behavior fromPreTrainedTokenizerFast.attention_maskandposition_ids.Questions
Environment:
Thanks a lot for your help!