Skip to content

rubric_based_tool_use_quality_v1 fails with AssertionError when rubrics are defined per-invocation only #4926

@fparga

Description

@fparga

🔴 Required Information

Describe the Bug:

rubric_based_tool_use_quality_v1 (and rubric_based_final_response_quality_v1) cannot be used with per-invocation rubrics alone. The RubricBasedEvaluator.__init__ unconditionally asserts that criterion-level rubrics are non-empty (assert self._criterion.rubrics), which prevents the valid use case of defining rubrics only at the invocation or eval-case level in the evalset JSON.

Per-invocation rubric support was added in #3593 (commit 8afb99a0), and the data model (Invocation.rubrics, EvalCase.rubrics) clearly supports this use case, but the assertion was never relaxed to allow criterion-level rubrics to be empty.

Additionally, once the assertion is fixed, cli/cli_eval.py has two unguarded accesses to metric_result.criterion.rubrics (lines 217 and 260) in pretty_print_eval_result that crash with AttributeError when the criterion is a BaseCriterion (which lacks a rubrics attribute). This is currently unreachable because the assertion crashes first.

Steps to Reproduce:

  1. Create an evalset with per-invocation rubrics (no criterion-level rubrics):
{
  "eval_set_id": "example",
  "name": "example",
  "eval_cases": [
    {
      "eval_id": "test_case_1",
      "conversation": [
        {
          "user_content": {
            "parts": [{ "text": "What's the weather in NYC?" }],
            "role": "user"
          },
          "final_response": {
            "parts": [{ "text": "Let me look that up." }],
            "role": "model"
          },
          "intermediate_data": {},
          "rubrics": [
            {
              "rubric_id": "calls_geocoding",
              "rubric_content": {
                "text_property": "The agent calls the GeoCoding tool."
              },
              "type": "TOOL_USE_QUALITY"
            }
          ]
        }
      ]
    }
  ]
}
  1. Create an evalconfig with rubric_based_tool_use_quality_v1 but no criterion-level rubrics:
{
  "criteria": {
    "rubric_based_tool_use_quality_v1": {
      "threshold": 0.8,
      "judge_model_options": {
        "judge_model": "gemini-2.5-flash",
        "num_samples": 3
      }
    }
  }
}
  1. Run adk eval
  2. Get AssertionError: Rubrics are required.

Expected Behavior:

Per-invocation rubrics should be sufficient. The evaluator should allow criterion-level rubrics to be empty when invocation-level (or eval-case-level) rubrics are provided, since create_effective_rubrics_list already handles merging them.

Observed Behavior:

Two errors occur:

  1. RubricBasedEvaluator.__init__ raises AssertionError: Rubrics are required. before evaluation begins:
File "google/adk/evaluation/rubric_based_evaluator.py", line 332, in __init__
    assert self._criterion.rubrics, "Rubrics are required."
AssertionError: Rubrics are required.
  1. If the assertion is removed, pretty_print_eval_result in cli/cli_eval.py crashes because metric_result.criterion is a BaseCriterion (not RubricsBasedCriterion) and has no rubrics attribute:
File "google/adk/cli/cli_eval.py", line 217, in pretty_print_eval_result
    for r in metric_result.criterion.rubrics
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'BaseCriterion' object has no attribute 'rubrics'

Environment Details:

  • ADK Library Version: 1.27.2 (also confirmed on current main)
  • Desktop OS: macOS
  • Python Version: 3.12

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: N/A (bug is in eval framework, not model interaction)

🟡 Optional Information

Regression:

No. The assertion has been present since the feature was introduced. However, per-invocation rubric support added in v1.22.1 (#3593) introduced the data model fields and merging logic that support this use case Invocation.rubrics ("applicable to only this invocation") and EvalCase.rubrics ("applicable to all invocations in the conversation") but the assertion was never relaxed to match.

Minimal Reproduction Code:

The fix is straightforward in two files:

src/google/adk/evaluation/rubric_based_evaluator.py: remove the assertion and handle empty criterion rubrics:

# Before (line 332-334):
assert self._criterion.rubrics, "Rubrics are required."

self._rubrics: list[Rubric] = self._criterion.rubrics

# After:
self._rubrics: list[Rubric] = self._criterion.rubrics or []

src/google/adk/cli/cli_eval.py: guard against criterion without rubrics attribute (lines 217 and 260):

# Before:
for r in metric_result.criterion.rubrics

# After:
for r in (getattr(metric_result.criterion, 'rubrics', None) or [])

How often has this issue occurred?:

  • Always (100%)

Metadata

Metadata

Assignees

No one assigned

    Labels

    eval[Component] This issue is related to evaluation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions