-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Closed
Labels
Description
System Info
$ hf env
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.36.0
- Platform: Linux-4.18.0-553.83.1.1toss.t4.x86_64-x86_64-with-glibc2.28
- Python version: 3.12.11
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /g/g11/eisenbnt/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.9.1
- Jinja2: 3.1.6
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 2.3.5
- pydantic: N/A
- aiohttp: 3.13.2
- hf_xet: 1.2.0
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /g/g11/eisenbnt/.cache/huggingface/hub
- HF_ASSETS_CACHE: /g/g11/eisenbnt/.cache/huggingface/assets
- HF_TOKEN_PATH: /g/g11/eisenbnt/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /g/g11/eisenbnt/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_DISABLE_XET: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
((dev) ) eisenbnt@matrix9:~
$ nvidia-smi
Mon Dec 8 17:00:32 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:4C:00.0 Off | 0 |
| N/A 35C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
((dev) ) eisenbnt@matrix9:~
$
((dev) ) eisenbnt@matrix9:~
$ pip show triton
Name: triton
Version: 3.5.1
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/triton-lang/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License:
Location: /usr/workspace/eisenbnt/.venvman/envs/3.12/dev/lib64/python3.12/site-packages
Requires:
Required-by: torch
((dev) ) eisenbnt@matrix9:~
$
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
>>> from transformers.models.gpt_oss.modeling_gpt_oss import GptOssForCausalLM
>>> def get_gpt_oss(device):
... model = GptOssForCausalLM.from_pretrained(
... openai/gpt-oss-20b,
... )
... return model.to(device)
...
>>> model = get_gpt_oss("cuda:0")
MXFP4 quantization requires Triton and kernels installed: CUDA requires Triton >= 3.4.0, XPU requires Triton
>= 3.5.0, we will default to dequantizing the model to bf16
Loading checkpoint shards: 100%|███████████████████████████████████████████████| 3/3 [00:19<00:00, 6.56s/it]
Expected behavior
Is there anything special I need to do to get this working? Thank you!