-
Notifications
You must be signed in to change notification settings - Fork 101
Open
Description
encode和decode绝大多数情况下应该是一个互逆操作,但是12b模型的tokenizer,encode和decode表现如下
PATH = '/toolchain/LLM/telechat-12b-hf'
tokenizer = AutoTokenizer.from_pretrained(PATH, trust_remote_code=True)
print(tokenizer.encode(tokenizer.decode([2000]))) #[561,579]
print(tokenizer.decode([579])) # 'red'
print(tokenizer.encode('red')) # [2952]
print(tokenizer.decode([2952])) # 'red'
可以解答一下吗? @hannawong @ZiYu0427 @liuxz0801 @Unknown-Body @LSX-Sneakerprogrammer
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels