Use -passed for result aggregation instead of _result by Copilot · Pull Request #46833 · Azure/azure-sdk-for-python

Copilot · 2026-05-11T22:45:50Z

This PR extracts only the _evaluate.py portion of #46436 into a standalone change targeting main. Scope is intentionally limited to schema-alignment updates in sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py.

Scope
- Single-file change: azure/ai/evaluation/_evaluate/_evaluate.py
- No changelog/version or cross-file edits
Binary aggregation alignment (_result → _passed)
- Removed EVALUATION_PASS_FAIL_MAPPING import from this module
- Updated binary-column discovery to aggregate outputs.*_passed columns
- Updated related comments/docstring wording to reflect passed-state aggregation
Pass-count computation update
- Replaced mapped string counting with direct boolean sum for pass count:
```
pass_count = df[col].sum()
```
Metric exclusion update during mean aggregation
- Updated exclusion set from *_result to *_passed
- Renamed local variable accordingly (threshold_and_result_cols → threshold_and_passed_cols)

Original prompt

This PR isolates the changes made to sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py in PR #46436 (#46436) into a standalone PR targeting main.

Goal

Apply exactly the following changes (and only these changes) to the file sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py on a new branch created off main, then open a PR targeting main.

Do not modify any other file in the repository. Do not run formatters, linters, or any tooling that may alter other lines. Do not change the changelog or version files.

Exact diff to apply

Apply this unified diff verbatim to sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py (the file currently on main):

@@ -26,7 +26,6 @@
 
 from .._constants import (
     CONTENT_SAFETY_DEFECT_RATE_THRESHOLD_DEFAULT,
-    EVALUATION_PASS_FAIL_MAPPING,
     EvaluationMetrics,
     DefaultOpenEncoding,
     Prefixes,
@@ -251,7 +250,7 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
     """
     Aggregate binary output results (pass/fail) from evaluation dataframe.
 
-    For each evaluator, calculates the proportion of "pass" results.
+    For each evaluator, calculates the proportion of passed results.
 
     :param df: The dataframe of evaluation results.
     :type df: ~pandas.DataFrame
@@ -260,12 +259,12 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
     """
     results = {}
 
-    # Find all columns that end with "_result"
-    result_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_result")]
+    # Find all columns that end with "_passed"
+    passed_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_passed")]
 
-    for col in result_columns:
+    for col in passed_columns:
         # Extract the evaluator name from the column name
-        # (outputs.<evaluator>.<metric>_result)
+        # (outputs.<evaluator>.<metric>_passed)
         parts = col.split(".")
         evaluator_name = None
         if len(parts) >= 3:
@@ -292,7 +291,7 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
 
             # Calculate the proportion of EVALUATION_PASS_FAIL_MAPPING[True] results
             total_rows = len(df)
-            pass_count = value_counts.get(EVALUATION_PASS_FAIL_MAPPING[True], 0)
+            pass_count = df[col].sum()
             proportion = pass_count / total_rows if total_rows > 0 else 0.0
 
             # Set the result with the evaluator name as the key
@@ -384,10 +383,10 @@ def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dic
     token_count_cols = _get_token_count_columns_to_exclude(df)
     handled_columns.extend(token_count_cols)
 
-    # Exclude threshold and result columns from aggregation
+    # Exclude threshold and passed columns from aggregation
     # These are per-row metadata, not metrics to be averaged
-    threshold_and_result_cols = [col for col in df.columns if col.endswith("_threshold") or col.endswith("_result")]
-    handled_columns.extend(threshold_and_result_cols)
+    threshold_and_passed_cols = [col for col in df.columns if col.endswith("_threshold") or col.endswith("_passed")]
+    handled_columns.extend(threshold_and_passed_cols)
 
     # For rest of metrics, we will calculate mean
     df.drop(columns=handled_columns, inplace=True)

Authoritative reference

The exact content the file should have AFTER applying the diff matches the version currently on the mohessie/standardize_output_schema branch. You can copy the file directly from there to ensure byte-for-byte equivalence:

Source (apply this content): https://github.com/Azure/azure-sdk-for-python/blob/mohessie/standardize_output_schema/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Target file path: sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

Recommended approach: fetch the file content from the mohessie/standardize_output_schema branch and commit it to your new branch off main. Verify with git diff origin/main that the only changed file is _evaluate.py and the diff matches the one above exactly.

Branch and PR requirements

Create a new branch off main (suggested name: m7md7sien/split-evaluate-py-from-46436).
Commit only the change to sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py.
Open a PR targeting main with a clear title and description referencing PR Standardize Output Schema for Evalautors #46436.
The PR body should mention this is a split from PR Standardize Output Schema for Evalautors #46436 and includes only the _evaluate.py changes.

Validation

The PR's "Files changed" tab must show exactly one file changed: sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py.
The diff must match the unified diff above exactly (no extra whitespace, no ref...

This pull request was created from Copilot chat.

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/87213dd7-6430-4e46-b77c-805e00df187f Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Initial plan

bd829af

Copilot AI assigned Copilot and m7md7sien May 11, 2026

Copilot started work on behalf of m7md7sien May 11, 2026 22:45 View session

Apply split _evaluate.py changes from PR 46436

b088a56

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/87213dd7-6430-4e46-b77c-805e00df187f Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Isolate changes from PR #46436 for evaluation module~~ [evaluation] Split _evaluate.py schema-alignment changes from #46436 May 11, 2026

Copilot AI requested a review from m7md7sien May 11, 2026 22:52

Copilot finished work on behalf of m7md7sien May 11, 2026 22:52

m7md7sien approved these changes May 11, 2026

View reviewed changes

m7md7sien changed the title ~~[evaluation] Split _evaluate.py schema-alignment changes from #46436~~ USe -passed for result aggregation instead of _result May 11, 2026

m7md7sien changed the title ~~USe -passed for result aggregation instead of _result~~ Use -passed for result aggregation instead of _result May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use -passed for result aggregation instead of _result#46833

Use -passed for result aggregation instead of _result#46833
Copilot wants to merge 2 commits into
mainfrom
copilot/isolate-azure-evaluation-changes

Copilot AI commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Exact diff to apply

Authoritative reference

Branch and PR requirements

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 11, 2026 •

edited

Loading