Skip to content

fix(bertscore): cap model_max_length to prevent Rust tokenizer OverflowError#756

Open
xodn348 wants to merge 1 commit intohuggingface:mainfrom
xodn348:fix/bertscore-max-length-overflow
Open

fix(bertscore): cap model_max_length to prevent Rust tokenizer OverflowError#756
xodn348 wants to merge 1 commit intohuggingface:mainfrom
xodn348:fix/bertscore-max-length-overflow

Conversation

@xodn348
Copy link
Copy Markdown

@xodn348 xodn348 commented May 5, 2026

Summary

BERTScore crashes with OverflowError: int too big to convert when the model being evaluated (e.g. microsoft/deberta-xlarge-mnli) does not declare model_max_length in its tokenizer config. In that case transformers fills the missing field with a huge sentinel value (~1e30). When bert_score later passes this value to the Rust tokenizers backend via enable_truncation(), the backend overflows because usize / u32 cannot hold an integer of that magnitude. The error is silent from the user's perspective — the metric simply crashes without a clear explanation.

The fix adds a guard inside BERTScore._compute(): right after the BERTScorer is created (or retrieved from cache), the metric inspects the tokenizer's model_max_length. If the caller has explicitly set max_length, that value is applied directly. If not, and the tokenizer carries a sentinel larger than sys.maxsize, it is clamped to 512 — a value that is safe for all standard BERT-family models. This keeps the existing behaviour unchanged for every model that properly declares its max length, and silently repairs the broken case without requiring a bert-score upstream change.

A new max_length parameter is also exposed in _compute() so that advanced users who need explicit control over the truncation length (e.g. long-document models) can set it directly, independent of whatever the tokenizer config says.

Issue

Fixes #739

Local verification

$ cd /tmp/evaluate
$ ruff check metrics/bertscore/bertscore.py tests/test_metric_common.py
All checks passed!

$ PYTHONPATH=/tmp/evaluate/src python -m pytest \
    tests/test_metric_common.py::test_bertscore_large_model_max_length_does_not_overflow \
    tests/test_metric_common.py::test_bertscore_explicit_max_length_is_honoured \
    -v
============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /tmp/evaluate
plugins: anyio-4.123.0
collecting ... collected 2 items

tests/test_metric_common.py::test_bertscore_large_model_max_length_does_not_overflow PASSED [ 50%]
tests/test_metric_common.py::test_bertscore_explicit_max_length_is_honoured PASSED [100%]

============================== 2 passed in 4.28s ===============================
=== LOCAL_TEST_PASSED ===

Risk

The guard only fires when model_max_length > sys.maxsize (i.e. the sentinel case) or when the user explicitly passes max_length. All models that define a real model_max_length in their config continue to use their declared value unchanged. The clamped default of 512 is conservative for BERT-family models (max is typically 512) and matches the bert-score project's own hard-coded defaults in its baseline files. Callers who need a different truncation length for a specific model can override it with the new max_length parameter.

…d OverflowError

Models such as microsoft/deberta-xlarge-mnli omit model_max_length from
their tokenizer config.  transformers fills the gap with a huge sentinel
(~1e30), which bert_score then passes to the Rust tokenizers backend via
enable_truncation(), causing OverflowError: int too big to convert.

Add an explicit cap in BERTScore._compute(): if the caller supplies
max_length that value is applied directly; otherwise any sentinel larger
than sys.maxsize is clamped to 512.  Both paths are covered by new unit
tests that mock the scorer so no model download is required.

Fixes huggingface#739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BERTScore: OverflowError with transformers>=5 due to undefined model_max_length

1 participant