[feat][evaluation] agent evaluator by HearyShen · Pull Request #421 · coze-dev/coze-loop

HearyShen · 2026-02-05T04:50:46Z

What type of PR is this?

Check the PR title

This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
The description of this PR title is user-oriented and clear enough for others to understand.
Add documentation if the current PR requires user awareness at the usage level.
This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

codecov · 2026-02-05T05:11:35Z

Codecov Report

❌ Patch coverage is 0% with 90 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...kend/modules/evaluation/domain/entity/evaluator.go	0.00%	47 Missing ⚠️
...valuation/domain/entity/evaluator_version_agent.go	0.00%	43 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
- Coverage   71.04%   70.94%   -0.10%     
==========================================
  Files         624      625       +1     
  Lines       61569    61659      +90     
==========================================
+ Hits        43743    43747       +4     
- Misses      14656    14742      +86     
  Partials     3170     3170

Flag	Coverage Δ
unittests	`70.94% <0.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
backend/modules/evaluation/domain/entity/common.go	`80.00% <ø> (ø)`
...dules/evaluation/domain/entity/evaluator_record.go	`100.00% <ø> (ø)`
...valuation/domain/entity/evaluator_version_agent.go	`0.00% <0.00%> (ø)`
...kend/modules/evaluation/domain/entity/evaluator.go	`78.03% <0.00%> (-14.22%)`	⬇️

... and 2 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1693593...6cd39ff. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ned display

Implement async run and debug functionality for agent evaluator type, including: - Add new async methods in evaluator service interface - Implement async handlers in evaluator service - Add request/response types for async operations - Add agent evaluator version conversion logic

Implement methods to fetch async debug evaluator results across service layers. Added new request/response types and implemented the functionality in evaluator source services while maintaining backward compatibility for unsupported evaluator types.

Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.

…outes Remove unused async debug result endpoint and its related request/response structs Add missing api routes for evaluator record endpoints

remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase

The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.

Add new error code for agent evaluator run failures to handle configuration issues

Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221

Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b

Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3

Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890

Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49

…ed to current agent and haven't matched with AG-UI message protocol

Change-Id: I8fea5b657938b46d264349d057b87d733e59f431

…into feat/agent_evaluator Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5

Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4

- Add new EvaluatorExtraOutputContent type to store additional output data - Extend async run methods to return additional extension data - Merge existing ext data when reporting async results

Store async evaluation context in repo for both run and debug operations

Implement LTrim functionality to allow trimming lists in Redis, which is needed for maintaining list sizes efficiently.

Add version field to SkillConfig struct across multiple layers including thrift definition, domain entity, and conversion logic to support skill versioning

…rsions Update SkillConfig struct to use *int64 for SkillID to maintain consistency with other fields. Modify conversion functions to handle the pointer type directly instead of using helper functions.

add file client parameter to experiment application initialization and implement URI to URL conversion for evaluator extra output

Replace full record update with targeted field updates to improve efficiency and reduce unnecessary conversions. Add new method to update only status, score and output data.

Remove redundant evaluator version check and update authorization parameters to use evaluator record data directly. The change aligns authorization with space-level permissions rather than evaluator-level permissions.

Add transformExtraOutputURIToURL method to convert extra output URIs to accessible URLs. This ensures clients can access the output files directly when needed.

Add conversion logic for agent evaluator version in both DO2PO and PO2DO directions. This includes handling meta info, input schemas, and output schemas similar to other evaluator types.

…ditions call

The transaction wrapper was unnecessary since we're performing a single update operation. Removing it simplifies the code while maintaining the same functionality.

HearyShen added 2 commits February 5, 2026 12:48

init cozeloop idl for agent evaluator

a37b94a

init cozeloop idl for agent evaluator

f60e53b

HearyShen changed the title ~~[evaluation] agent evaluator~~ [feat][evaluation] agent evaluator Feb 5, 2026

HearyShen and others added 26 commits February 5, 2026 14:40

init cozeloop idl for agent evaluator

5e52d26

init cozeloop idl for agent evaluator

6cd39ff

init cozeloop idl for agent evaluator

6f4abab

Merge branch 'main' into feat/agent_evaluator

7c8f305

EvaluatorExtraOutputContent require uri for reporting and url for sig…

1a484d0

…ned display

remove ext in AgentEvaluatorVersion DO

9b7ba91

Merge branch 'main' into feat/agent_evaluator

5f71dc3

update mockgen and codegen

a3acc66

feat(evaluator): add record creation for async evaluator runs

50be157

Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.

refactor(evaluator): remove async debug result endpoint and add api r…

6cec211

…outes Remove unused async debug result endpoint and its related request/response structs Add missing api routes for evaluator record endpoints

refactor(evaluator): remove async debug result feature

e69bd7b

remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase

refactor(evaluator): remove unused GetAsyncRunResult method

c0515ed

The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.

feat(evaluation): add agent evaluator run failed error code

8ae3d69

Add new error code for agent evaluator run failures to handle configuration issues

fix

8a09efd

Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221

异步执行评估器

5b103ed

Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b

fix

8bce90c

Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3

fix

6878d5b

Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890

fix

ec36a71

Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49

remove ReportEvaluatorInvokeProgress from idl considering its specifi…

0134ed8

…ed to current agent and haven't matched with AG-UI message protocol

remove ReportEvaluatorInvokeProgress from idl considering its specifi…

d77fee6

…ed to current agent and haven't matched with AG-UI message protocol

add (api.js_conv="true") to i64 skill_id

290ccad

fox

fb9ec2f

Change-Id: I8fea5b657938b46d264349d057b87d733e59f431

Merge branch 'feat/agent_evaluator' of github.com:coze-dev/coze-loop …

1115c55

…into feat/agent_evaluator Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5

fix

372fab0

Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4

HearyShen added 16 commits February 11, 2026 20:55

feat(redis): add LRange method to ListCmdable interface

d5d7df3

feat(redis): add LRange method to redis provider

ba76ba8

feat(evaluator): add extra output support and async run extensions

eaecfd8

- Add new EvaluatorExtraOutputContent type to store additional output data - Extend async run methods to return additional extension data - Merge existing ext data when reporting async results

feat(evaluator): add async evaluation context tracking

d776ca0

Store async evaluation context in repo for both run and debug operations

feat(redis): add LTrim command to ListCmdable interface

b5ccb22

Implement LTrim functionality to allow trimming lists in Redis, which is needed for maintaining list sizes efficiently.

feat(evaluation): add version field to SkillConfig

eff15cd

Add version field to SkillConfig struct across multiple layers including thrift definition, domain entity, and conversion logic to support skill versioning

refactor(evaluation): change SkillID type to pointer and update conve…

d4364c6

…rsions Update SkillConfig struct to use *int64 for SkillID to maintain consistency with other fields. Modify conversion functions to handle the pointer type directly instead of using helper functions.

feat(evaluation): add file client support for evaluator output

95de3f0

add file client parameter to experiment application initialization and implement URI to URL conversion for evaluator extra output

fix loss idl path

80e55dd

fix loss idl path

7ba3193

refactor(evaluator): optimize record update by using direct fields

7270b06

Replace full record update with targeted field updates to improve efficiency and reduce unnecessary conversions. Add new method to update only status, score and output data.

feat(evaluator): add URI to URL transformation for extra output

8adb801

Add transformExtraOutputURIToURL method to convert extra output URIs to accessible URLs. This ensures clients can access the output files directly when needed.

feat(evaluator): add agent evaluator version support

2716daf

Add conversion logic for agent evaluator version in both DO2PO and PO2DO directions. This includes handling meta info, input schemas, and output schemas similar to other evaluator types.

fix(evaluator): add missing opt parameter to DeleteEvaluatorTagsByCon…

5c77efa

…ditions call

refactor(evaluator): remove transaction wrapper for batch delete

1d60f3c

The transaction wrapper was unnecessary since we're performing a single update operation. Removing it simplifies the code while maintaining the same functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][evaluation] agent evaluator#421

[feat][evaluation] agent evaluator#421
HearyShen wants to merge 44 commits intomainfrom
feat/agent_evaluator

HearyShen commented Feb 5, 2026

Uh oh!

codecov bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HearyShen commented Feb 5, 2026

What type of PR is this?

Check the PR title

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

(Optional) Which issue(s) this PR fixes

Uh oh!

codecov bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 5, 2026 •

edited

Loading