Skip to content

[feat][evaluation] agent evaluator#421

Open
HearyShen wants to merge 44 commits intomainfrom
feat/agent_evaluator
Open

[feat][evaluation] agent evaluator#421
HearyShen wants to merge 44 commits intomainfrom
feat/agent_evaluator

Conversation

@HearyShen
Copy link
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 0% with 90 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...kend/modules/evaluation/domain/entity/evaluator.go 0.00% 47 Missing ⚠️
...valuation/domain/entity/evaluator_version_agent.go 0.00% 43 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
- Coverage   71.04%   70.94%   -0.10%     
==========================================
  Files         624      625       +1     
  Lines       61569    61659      +90     
==========================================
+ Hits        43743    43747       +4     
- Misses      14656    14742      +86     
  Partials     3170     3170              
Flag Coverage Δ
unittests 70.94% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
backend/modules/evaluation/domain/entity/common.go 80.00% <ø> (ø)
...dules/evaluation/domain/entity/evaluator_record.go 100.00% <ø> (ø)
...valuation/domain/entity/evaluator_version_agent.go 0.00% <0.00%> (ø)
...kend/modules/evaluation/domain/entity/evaluator.go 78.03% <0.00%> (-14.22%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1693593...6cd39ff. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HearyShen HearyShen changed the title [evaluation] agent evaluator [feat][evaluation] agent evaluator Feb 5, 2026
HearyShen and others added 26 commits February 5, 2026 14:40
Implement async run and debug functionality for agent evaluator type, including:
- Add new async methods in evaluator service interface
- Implement async handlers in evaluator service
- Add request/response types for async operations
- Add agent evaluator version conversion logic
Implement methods to fetch async debug evaluator results across service layers. Added new request/response types and implemented the functionality in evaluator source services while maintaining backward compatibility for unsupported evaluator types.
Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.
…outes

Remove unused async debug result endpoint and its related request/response structs
Add missing api routes for evaluator record endpoints
remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase
The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.
Add new error code for agent evaluator run failures to handle configuration issues
Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221
Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b
Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3
Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890
Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49
…ed to current agent and haven't matched with AG-UI message protocol
…ed to current agent and haven't matched with AG-UI message protocol
Change-Id: I8fea5b657938b46d264349d057b87d733e59f431
…into feat/agent_evaluator

Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5
Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4
- Add new EvaluatorExtraOutputContent type to store additional output data
- Extend async run methods to return additional extension data
- Merge existing ext data when reporting async results
Store async evaluation context in repo for both run and debug operations
Implement LTrim functionality to allow trimming lists in Redis, which is needed for maintaining list sizes efficiently.
Add version field to SkillConfig struct across multiple layers including thrift definition, domain entity, and conversion logic to support skill versioning
…rsions

Update SkillConfig struct to use *int64 for SkillID to maintain consistency with other fields. Modify conversion functions to handle the pointer type directly instead of using helper functions.
add file client parameter to experiment application initialization and implement URI to URL conversion for evaluator extra output
Replace full record update with targeted field updates to improve efficiency and reduce unnecessary conversions. Add new method to update only status, score and output data.
Remove redundant evaluator version check and update authorization parameters to use evaluator record data directly. The change aligns authorization with space-level permissions rather than evaluator-level permissions.
Add transformExtraOutputURIToURL method to convert extra output URIs to accessible URLs. This ensures clients can access the output files directly when needed.
Add conversion logic for agent evaluator version in both DO2PO and PO2DO directions. This includes handling meta info, input schemas, and output schemas similar to other evaluator types.
The transaction wrapper was unnecessary since we're performing a single update operation. Removing it simplifies the code while maintaining the same functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants