Bug
When EvaluationModule.compute() fails, raw sklearn exceptions (ValueError, KeyError) propagate to the caller with no wrapping. EvaluationModuleError exists internally but is not exported from evaluate/__init__.py, making it impossible to catch evaluate-specific errors without catching broad Exception.
Reproduce (no API key needed, evaluate 0.4.6)
import evaluate
acc = evaluate.load('accuracy')
# Test 1: raw ValueError leaks — no evaluate class in traceback
try:
acc.compute(predictions=[], references=[])
except Exception as e:
print(type(e).__name__) # 'ValueError'
print(hasattr(evaluate, 'EvaluationModuleError')) # False
# Test 2: can't catch evaluate errors specifically
try:
acc.compute(predictions=[], references=[])
except evaluate.EvaluationModuleError: # AttributeError!
pass
Expected
evaluate.EvaluationModuleError is accessible and wraps internal failures — consistent with PyTorch, sklearn, and every other major ML library.
Fix (2 lines)
evaluate/__init__.py — add:
from .module import EvaluationModuleError
evaluate/module.py ~line 467 — wrap _compute:
try:
output = self._compute(**inputs, **compute_kwargs)
except EvaluationModuleError:
raise
except Exception as e:
raise EvaluationModuleError(f"Metric '{self.name}' failed: {e}") from e
Happy to open a PR for this if useful.
— Youssef Ibrahim | github.com/YousefZahran1
(Previously: #753 — added zero_division to F1 metric)
Bug
When
EvaluationModule.compute()fails, raw sklearn exceptions (ValueError,KeyError) propagate to the caller with no wrapping.EvaluationModuleErrorexists internally but is not exported fromevaluate/__init__.py, making it impossible to catch evaluate-specific errors without catching broadException.Reproduce (no API key needed, evaluate 0.4.6)
Expected
evaluate.EvaluationModuleErroris accessible and wraps internal failures — consistent with PyTorch, sklearn, and every other major ML library.Fix (2 lines)
evaluate/__init__.py— add:evaluate/module.py~line 467 — wrap_compute:Happy to open a PR for this if useful.
— Youssef Ibrahim | github.com/YousefZahran1
(Previously: #753 — added zero_division to F1 metric)