-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
🔴 Required Information
Describe the Bug:
rubric_based_tool_use_quality_v1 (and rubric_based_final_response_quality_v1) cannot be used with per-invocation rubrics alone. The RubricBasedEvaluator.__init__ unconditionally asserts that criterion-level rubrics are non-empty (assert self._criterion.rubrics), which prevents the valid use case of defining rubrics only at the invocation or eval-case level in the evalset JSON.
Per-invocation rubric support was added in #3593 (commit 8afb99a0), and the data model (Invocation.rubrics, EvalCase.rubrics) clearly supports this use case, but the assertion was never relaxed to allow criterion-level rubrics to be empty.
Additionally, once the assertion is fixed, cli/cli_eval.py has two unguarded accesses to metric_result.criterion.rubrics (lines 217 and 260) in pretty_print_eval_result that crash with AttributeError when the criterion is a BaseCriterion (which lacks a rubrics attribute). This is currently unreachable because the assertion crashes first.
Steps to Reproduce:
- Create an evalset with per-invocation rubrics (no criterion-level rubrics):
{
"eval_set_id": "example",
"name": "example",
"eval_cases": [
{
"eval_id": "test_case_1",
"conversation": [
{
"user_content": {
"parts": [{ "text": "What's the weather in NYC?" }],
"role": "user"
},
"final_response": {
"parts": [{ "text": "Let me look that up." }],
"role": "model"
},
"intermediate_data": {},
"rubrics": [
{
"rubric_id": "calls_geocoding",
"rubric_content": {
"text_property": "The agent calls the GeoCoding tool."
},
"type": "TOOL_USE_QUALITY"
}
]
}
]
}
]
}- Create an evalconfig with
rubric_based_tool_use_quality_v1but no criterion-level rubrics:
{
"criteria": {
"rubric_based_tool_use_quality_v1": {
"threshold": 0.8,
"judge_model_options": {
"judge_model": "gemini-2.5-flash",
"num_samples": 3
}
}
}
}- Run
adk eval - Get
AssertionError: Rubrics are required.
Expected Behavior:
Per-invocation rubrics should be sufficient. The evaluator should allow criterion-level rubrics to be empty when invocation-level (or eval-case-level) rubrics are provided, since create_effective_rubrics_list already handles merging them.
Observed Behavior:
Two errors occur:
RubricBasedEvaluator.__init__raisesAssertionError: Rubrics are required.before evaluation begins:
File "google/adk/evaluation/rubric_based_evaluator.py", line 332, in __init__
assert self._criterion.rubrics, "Rubrics are required."
AssertionError: Rubrics are required.
- If the assertion is removed,
pretty_print_eval_resultincli/cli_eval.pycrashes becausemetric_result.criterionis aBaseCriterion(notRubricsBasedCriterion) and has norubricsattribute:
File "google/adk/cli/cli_eval.py", line 217, in pretty_print_eval_result
for r in metric_result.criterion.rubrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'BaseCriterion' object has no attribute 'rubrics'
Environment Details:
- ADK Library Version: 1.27.2 (also confirmed on current
main) - Desktop OS: macOS
- Python Version: 3.12
Model Information:
- Are you using LiteLLM: No
- Which model is being used: N/A (bug is in eval framework, not model interaction)
🟡 Optional Information
Regression:
No. The assertion has been present since the feature was introduced. However, per-invocation rubric support added in v1.22.1 (#3593) introduced the data model fields and merging logic that support this use case Invocation.rubrics ("applicable to only this invocation") and EvalCase.rubrics ("applicable to all invocations in the conversation") but the assertion was never relaxed to match.
Minimal Reproduction Code:
The fix is straightforward in two files:
src/google/adk/evaluation/rubric_based_evaluator.py: remove the assertion and handle empty criterion rubrics:
# Before (line 332-334):
assert self._criterion.rubrics, "Rubrics are required."
self._rubrics: list[Rubric] = self._criterion.rubrics
# After:
self._rubrics: list[Rubric] = self._criterion.rubrics or []src/google/adk/cli/cli_eval.py: guard against criterion without rubrics attribute (lines 217 and 260):
# Before:
for r in metric_result.criterion.rubrics
# After:
for r in (getattr(metric_result.criterion, 'rubrics', None) or [])How often has this issue occurred?:
- Always (100%)