Skip to content
Discussion options

You must be logged in to vote

@neetscience Yes I think this is expected.

Your benchmark suggests that bitsandbytes is respecting the dtype you pass in, but on a T4, FP16 just seems to be the better option in practice. Even if BF16 loads and runs fine, it can still be slower and use more memory on this GPU. So if someone is using a Tesla T4, I’d probably recommend starting with bnb_4bit_compute_dtype=torch.float16 and only trying BF16 if they specifically want to benchmark it.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by neetscience
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants