fix(bertscore): cap model_max_length to prevent Rust tokenizer OverflowError by xodn348 · Pull Request #756 · huggingface/evaluate

xodn348 · 2026-05-05T09:35:55Z

Summary

BERTScore crashes with OverflowError: int too big to convert when the model being evaluated (e.g. microsoft/deberta-xlarge-mnli) does not declare model_max_length in its tokenizer config. In that case transformers fills the missing field with a huge sentinel value (~1e30). When bert_score later passes this value to the Rust tokenizers backend via enable_truncation(), the backend overflows because usize / u32 cannot hold an integer of that magnitude. The error is silent from the user's perspective — the metric simply crashes without a clear explanation.

The fix adds a guard inside BERTScore._compute(): right after the BERTScorer is created (or retrieved from cache), the metric inspects the tokenizer's model_max_length. If the caller has explicitly set max_length, that value is applied directly. If not, and the tokenizer carries a sentinel larger than sys.maxsize, it is clamped to 512 — a value that is safe for all standard BERT-family models. This keeps the existing behaviour unchanged for every model that properly declares its max length, and silently repairs the broken case without requiring a bert-score upstream change.

A new max_length parameter is also exposed in _compute() so that advanced users who need explicit control over the truncation length (e.g. long-document models) can set it directly, independent of whatever the tokenizer config says.

Issue

Fixes #739

Local verification

$ cd /tmp/evaluate
$ ruff check metrics/bertscore/bertscore.py tests/test_metric_common.py
All checks passed!

$ PYTHONPATH=/tmp/evaluate/src python -m pytest \
    tests/test_metric_common.py::test_bertscore_large_model_max_length_does_not_overflow \
    tests/test_metric_common.py::test_bertscore_explicit_max_length_is_honoured \
    -v
============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /tmp/evaluate
plugins: anyio-4.123.0
collecting ... collected 2 items

tests/test_metric_common.py::test_bertscore_large_model_max_length_does_not_overflow PASSED [ 50%]
tests/test_metric_common.py::test_bertscore_explicit_max_length_is_honoured PASSED [100%]

============================== 2 passed in 4.28s ===============================
=== LOCAL_TEST_PASSED ===

Risk

The guard only fires when model_max_length > sys.maxsize (i.e. the sentinel case) or when the user explicitly passes max_length. All models that define a real model_max_length in their config continue to use their declared value unchanged. The clamped default of 512 is conservative for BERT-family models (max is typically 512) and matches the bert-score project's own hard-coded defaults in its baseline files. Callers who need a different truncation length for a specific model can override it with the new max_length parameter.

…d OverflowError Models such as microsoft/deberta-xlarge-mnli omit model_max_length from their tokenizer config. transformers fills the gap with a huge sentinel (~1e30), which bert_score then passes to the Rust tokenizers backend via enable_truncation(), causing OverflowError: int too big to convert. Add an explicit cap in BERTScore._compute(): if the caller supplies max_length that value is applied directly; otherwise any sentinel larger than sys.maxsize is clamped to 512. Both paths are covered by new unit tests that mock the scorer so no model download is required. Fixes huggingface#739

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bertscore): cap model_max_length to prevent Rust tokenizer OverflowError#756

fix(bertscore): cap model_max_length to prevent Rust tokenizer OverflowError#756
xodn348 wants to merge 1 commit intohuggingface:mainfrom
xodn348:fix/bertscore-max-length-overflow

xodn348 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xodn348 commented May 5, 2026

Summary

Issue

Local verification

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant