Skip to content

Uncomment evaluation sample function definitions, keep only invocations commented#3697

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/support-foundry-observability
Draft

Uncomment evaluation sample function definitions, keep only invocations commented#3697
Copilot wants to merge 4 commits intomainfrom
copilot/support-foundry-observability

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

Motivation and Context

Evaluation sample functions were entirely commented out (signatures + bodies), making alternative implementations invisible to users. Only function invocations should be commented to allow easy switching between approaches.

Description

Problem: Function definitions like RunComprehensiveRedTeamEvaluation() and RunWithCustomEvaluator() were wrapped in // comments, hiding their implementations.

Solution: Uncommented all function definitions while keeping invocations commented:

// Main flow - only invocations commented
await RunBasicRedTeamEvaluation(aiProjectClient, agent);  // Active
// await RunComprehensiveRedTeamEvaluation(aiProjectClient, agent);  // Uncomment to use

// Function definitions - all visible
static async Task RunBasicRedTeamEvaluation(...) { /* visible */ }

#pragma warning disable CS8321  // Suppress unused warnings
static async Task RunComprehensiveRedTeamEvaluation(...) { /* visible, ready to use */ }
#pragma warning restore CS8321

Changes:

  • Evaluation_Step01_RedTeaming: Uncommented RunComprehensiveRedTeamEvaluation() and RunSimulationOnlyMode()
  • Evaluation_Step02_SelfReflection: Uncommented RunWithCustomEvaluator(), EvaluateGroundedness(), and RunWithAzureEvalService()
  • Added #pragma warning disable CS8321 around alternative implementations to suppress unused function warnings

Users can now see all available implementations and switch by uncommenting a single invocation line.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.
Original prompt

This section details on the original issue you should resolve

<issue_title>.NET: Support for Foundry Observability and Evaluations</issue_title>
<issue_description>Part of #3440

Support for Foundry Observability and Evaluations. Review existing samples and verify everything is working.</issue_description>

<agent_instructions>Follow the instructions in the issue OP and the first comment.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@rogerbarreto # Subplan: #3675 - Support for Foundry Observability and Evaluations

Issue: #3675
Parent Issue: #3440
Scope: .NET only (parity with Python samples)
Deliverable: Sample files demonstrating Observability and Evaluation features


Objective

Create .NET samples demonstrating Foundry Observability and Evaluation capabilities, achieving parity with existing Python samples. This includes verifying existing observability (Step07) works correctly and adding new evaluation samples.


Current State Analysis

Existing .NET Samples

Feature .NET Status Sample
Observability/Telemetry ✅ Exists FoundryAgents_Step07_Observability

Missing from .NET (Exists in Python)

Feature Python Sample Status
Red Teaming Evaluation evaluation/red_teaming/red_team_agent_sample.py ❌ Missing
Self Reflection Evaluation evaluation/self_reflection/self_reflection.py ❌ Missing

Workplan

Phase 1: Verify Existing Observability Sample

  • Task 1.1: Review existing FoundryAgents_Step07_Observability sample
    • Verify it builds and runs successfully
    • Check telemetry data is being generated
    • Document current capabilities
  • Task 1.2: Compare with Python observability patterns
    • Identify any gaps in functionality
    • Check for Azure Monitor integration
  • Task 1.3: Update Step07 if needed
    • Ensure parity with Python approach
    • Add any missing telemetry types (if applicable)

Phase 2: Red Teaming Evaluation Sample

  • Task 2.1: Analyze Python red teaming sample structure
    • Review red_team_agent_sample.py
    • Review README.md for setup requirements
    • Understand Azure AI Evaluation SDK dependencies
  • Task 2.2: Research .NET equivalents
    • Check for Azure.AI.Evaluation NuGet package
    • Identify available .NET evaluation APIs
  • Task 2.3: Create evaluation sample directory structure
    dotnet/samples/GettingStarted/Evaluation/
    ├── Evaluation_Step01_RedTeaming/
    │   ├── Evaluation_Step01_RedTeaming.csproj
    │   ├── Program.cs
    │   └── README.md
    
  • Task 2.4: Implement Red Teaming sample
    • Configure red team evaluator
    • Set up target callback (FoundryAgent integration)
    • Run evaluation against attack prompts
    • Output evaluation results

Phase 3: Self Reflection Evaluation Sample

  • Task 3.1: Analyze Python self-reflection sample
    • Review self_reflection.py
    • Review resources/ folder contents
    • Understand evaluation criteria used
  • Task 3.2: Create Self Reflection sample
    dotnet/samples/GettingStarted/Evaluation/
    └── Evaluation_Step02_SelfReflection/
        ├── Evaluation_Step02_SelfReflection.csproj
        ├── Program.cs
        ├── Resources/
        │   └── (evaluation data files)
        └── README.md
    
  • Task 3.3: Implement Self Reflection sample
    • Create evaluation agent
    • Configure reflection criteria
    • Run self-assessment
    • Generate quality scores

Phase 4: Documentation & Verification

  • Task 4.1: Create Evaluation/README.md
    • Overview of evaluation capabilities
    • Setup instructions (Azure resources needed)
    • Links to Azure AI Evaluation documentation
  • Task 4.2: Update main FoundryAgents/README.md
    • Add links to evaluation samples
    • Document observability integration
  • Task 4.3: Verify all samples
    • Build verification
    • Run with test data
    • Verify output format

Python Sample References

Red Teaming

File Purpose GitHub Link
red_team_agent_sample.py Main sample code Link
README.md Setup instructions Link
.env.example Environment variables [Link](https://github.com/microsoft/agent-framework/blob/main/python/samples/getting_started/evaluation/red_teaming/.env...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
Copilot AI changed the title [WIP] Add .NET samples for Foundry observability and evaluations .NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection Feb 5, 2026
Copilot AI requested a review from rogerbarreto February 5, 2026 13:59
Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
Copilot AI changed the title .NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations Feb 5, 2026
…nted

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
Copilot AI changed the title Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations Uncomment evaluation sample function definitions, keep only invocations commented Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: Support for Foundry Observability and Evaluations

2 participants