Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
bac05b5
feat(finworld): Added AgentScope learning protocol and OpenJudge eval…
TaoShuchang Jan 16, 2026
ba41164
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 16, 2026
c7ca8c7
Precommit fix (#4)
binary-husky Jan 16, 2026
7f2b017
fix test bench import
binary-husky Jan 16, 2026
9dd3c42
refactor(finworld): Replace agent protocol and unify configuration up…
TaoShuchang Jan 17, 2026
757f8a1
feat(finworld): Added FinWorld training environment configuration scr…
TaoShuchang Jan 18, 2026
079e4bd
refactor(utils): Remove unused extract and compute functions `extract…
TaoShuchang Jan 18, 2026
bcce8f0
refactor(finworld): Replace the old model with OpenJudge, update eval…
TaoShuchang Jan 18, 2026
4662d63
feat(task_reader): Support data reading of type jsonl_with_env_service
TaoShuchang Jan 19, 2026
de81c1d
feat(core): add finworld task reader support to framework
TaoShuchang Jan 19, 2026
248acc4
feat(finworld): implement specialized data reader and openjudge-based…
TaoShuchang Jan 19, 2026
9d651fd
refactor(finworld): optimize configuration templates and prompt engin…
TaoShuchang Jan 19, 2026
7475ecc
chore(finworld): update launch scripts and add variant experiment scr…
TaoShuchang Jan 19, 2026
b95d491
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 19, 2026
f20ab91
feat(finworld): Added support for multi-machine, multi-GPU training s…
TaoShuchang Jan 19, 2026
ea87d4b
chore(git): ignore finworld/yaml/*
TaoShuchang Jan 20, 2026
3082bca
fix(metrics): Fix and enhance the compatibility and debugging output …
TaoShuchang Jan 20, 2026
ef44b63
fix(metrics): Remove debug prints and synchronize reward statistics
TaoShuchang Jan 20, 2026
0889483
chore: "Stop tracking existing yaml files in tutorial directory"
TaoShuchang Jan 20, 2026
db7114c
fix(task_runner): Synchronize reward_stats to log_metrics
TaoShuchang Jan 20, 2026
5a25550
refactor(script): Refactored the finworld training script, integratin…
TaoShuchang Jan 20, 2026
623b7d9
Refactor(deep_finance): Replace and remove finworld-related implement…
TaoShuchang Jan 20, 2026
0aaab86
refactor(deepfinance): Rename and unify DeepFinance module and config…
TaoShuchang Jan 20, 2026
04f4959
refactor(tutorial): Optimize dynamic generation logic for configurati…
TaoShuchang Jan 20, 2026
d0ff68b
fix(deep_finance): argparse: with-deepfinance
TaoShuchang Jan 20, 2026
1c356d7
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 20, 2026
37dcbcc
fix(tutorial): Fixed issues with multi-machine training environment v…
TaoShuchang Jan 20, 2026
529ae7e
fix(env): Corrected the assignment logic for reward and info when ret…
TaoShuchang Jan 20, 2026
f4eb231
chore(config): Update example_deep_finance configuration and clean up…
TaoShuchang Jan 20, 2026
1e07515
Refactor(metric): Optimize tool metric calculation and data saving logic
TaoShuchang Jan 20, 2026
08ba184
fix(metric_helper): fix tool cache metric
TaoShuchang Jan 20, 2026
3d55692
fix little bug
TaoShuchang Jan 21, 2026
a478827
fix(utils): Suppress httpx AsyncClient.aclose() exception warnings
TaoShuchang Jan 21, 2026
88be3e4
comments to english
binary-husky Jan 21, 2026
fb41962
feat: 支持服务名称前缀功能
TaoShuchang Jan 21, 2026
a1f909b
fix: 改进 MultiAgent 消息内容解析逻辑
TaoShuchang Jan 21, 2026
8d2e5d7
fix: 优化 DeepFinance 判断逻辑和配置
TaoShuchang Jan 21, 2026
3c85960
chore(deps): bump agentscope from 1.0.7 to 1.0.8
TaoShuchang Jan 22, 2026
9b541c5
fix(metric_helper): correct trajectory save path and add tool call me…
TaoShuchang Jan 22, 2026
06fda5f
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 22, 2026
63cc682
Merge branch 'main' into dev/shuchang
binary-husky Jan 23, 2026
c9b87ac
revise message parsing
binary-husky Jan 23, 2026
3bd4c7d
fix(metric_helper): update openjudge graders list in reward metric he…
TaoShuchang Jan 25, 2026
8a18d40
feat(deep_finance): replace OpenJudge graders with PresentationQualit…
TaoShuchang Jan 26, 2026
835bdd8
feat(grounding): implement grounding grader for citation compliance e…
TaoShuchang Jan 26, 2026
11ed325
fix(deep_finance_judge): add debug logging for OpenJudge evaluation p…
TaoShuchang Jan 26, 2026
a500e90
feat(deep_finance): enhance reward metadata and zero score debugging
TaoShuchang Jan 27, 2026
d9cbdc0
feat(presentation_quality): upgrade grading to 1/3/5 scoring system w…
TaoShuchang Jan 27, 2026
4538f5a
chore(config): update experiment suffix, prefix and reward weights in…
TaoShuchang Jan 27, 2026
6f0c420
Merge remote-tracking branch 'origin/main' into dev/shuchang_newjudge
TaoShuchang Jan 27, 2026
818a4f7
fix(deep_finance): update environment variables and training launch o…
TaoShuchang Jan 27, 2026
1bb7f60
chore(config): parameterize deep finance training configuration
TaoShuchang Jan 27, 2026
460318f
chore(config): update experiment suffix, prefix, and weight parameters
TaoShuchang Jan 27, 2026
57a3a54
fix(example_deep_finance): update dynamic config file generation path
TaoShuchang Jan 27, 2026
beaa540
refactor(judge): remove deprecated presentation quality script
TaoShuchang Jan 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ tutorial/example_deep_finance/yaml/*
tutorial/example_deep_finance/config/*
tutorial/example_deep_finance/scripts/*
flash_attn-2.8.*.whl
tutorial/example_deep_finance/prepare_data/*
tutorial/example_deep_finance/judge/analytical_sufficiency/*

.dockerignore
benchmark_datasets
modelscope_cache
12 changes: 12 additions & 0 deletions ajet/context_tracker/multiagent_tracking.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,18 @@ def extract_text_content_from_content_dict(self, msg):
# },
# ],
# }
# or tool_result format?? not observed yet:
# msg = {
# "role": "tool",
# "content": [
# {
# "type": "tool_result",
# "id": "call_xxx",
# "output": "tool output content",
# "name": "tool_name"
# },
# ],
# }
Comment on lines +85 to +96

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commented-out code should be removed. Keeping commented-out code, especially large blocks, clutters the codebase and can lead to confusion. If this is an example, it should be moved to documentation or a separate example file.



str_content = ""
Expand Down
5 changes: 5 additions & 0 deletions ajet/task_runner/general_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from ajet.schema.trajectory import Reward
from ajet.task_runner.base_runner import BaseAgentRunner
from ajet.utils.dynamic_import import dynamic_import
from ajet.utils.metric_helper.reward_metric_helper import populate_reward_metadata_from_stats


class GeneralRunner(BaseAgentRunner):
Expand Down Expand Up @@ -73,6 +74,10 @@ def execute(self, workflow_task: WorkflowTask) -> BaseContextTracker:
madness=0,
description="",
)

# Populate reward metadata with deep_finance reward stats if available
if "reward_stats" in workflow_output.metadata:
populate_reward_metadata_from_stats(reward, workflow_output.metadata["reward_stats"])
context_tracker.process_reward(reward)
# generate token before merging
context_tracker.group_merge()
Expand Down
37 changes: 24 additions & 13 deletions ajet/utils/metric_helper/reward_metric_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@
- judge_time/ Judge time consumption statistics
"""

from typing import List, Dict, Any
from typing import List, Dict, Any, TYPE_CHECKING
import numpy as np

if TYPE_CHECKING:
from ajet.schema.trajectory import Reward


def extract_reward_stats_from_trajectories(trajectories: List[Any]) -> List[Dict[str, Any]]:
"""
Expand Down Expand Up @@ -72,22 +75,15 @@ def compute_reward_metrics(reward_stats_list: List[Dict[str, Any]], prefix: str
metrics[f"{prefix}rewards/penalty_count"] = len(non_zero_penalties)
metrics[f"{prefix}rewards/penalty_rate"] = len(non_zero_penalties) / n * 100 if n > 0 else 0.0

# ========== Detect OpenJudge Usage ==========
# ========== OpenJudge Metrics (PresentationQualityGrader, GroundingGrader) ==========
openjudge_enabled_count = sum(1 for rs in reward_stats_list if rs.get('openjudge_enabled', False))

if openjudge_enabled_count > 0:
# ========== OpenJudge Metrics ==========

# Dynamically extract OpenJudge grader fields
# Currently supported graders: report_resolution, trajectory_faithfulness,
# rubrics_performance, trajectory_comprehensive, information_gain, action_loop
# OpenJudge graders: presentation_quality, grounding
openjudge_graders = [
"report_resolution",
"trajectory_faithfulness",
"rubrics_performance",
"trajectory_comprehensive",
"information_gain",
"action_loop",
"presentation_quality",
"grounding",
"planning"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The openjudge_graders list includes "planning", but the _create_grader_configs method in deep_finance_judge.py only creates GraderConfig for presentation_quality and grounding. This inconsistency means that metrics for "planning" will be attempted but no actual grader will be run, potentially leading to misleading metric reports or future errors if a "planning" grader is expected.

]

for grader_name in openjudge_graders:
Expand Down Expand Up @@ -151,3 +147,18 @@ def compute_reward_metrics_from_trajectories(trajectories: List[Any], prefix: st
reward_stats_list = extract_reward_stats_from_trajectories(trajectories)
return compute_reward_metrics(reward_stats_list, prefix=prefix)


def populate_reward_metadata_from_stats(reward: "Reward", reward_stats: Dict[str, Any]) -> None:
"""
Populate Reward.metadata with all reward statistics.

Args:
reward: The Reward object to populate
reward_stats: The reward_stats dictionary from judge
"""
if not reward_stats:
return

# Directly copy all reward_stats into metadata
reward.metadata.update(reward_stats)

1 change: 1 addition & 0 deletions tutorial/example_deep_finance/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# tutorial/example_deep_finance package
38 changes: 20 additions & 18 deletions tutorial/example_deep_finance/deep_finance.sh
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
#!/bin/bash
set -e
set -e
#===============================================================================
# 1. 配置区域 - 用户只需修改这里
#===============================================================================
SUFFIX="deep_finance" # 实验后缀,影响所有日志和实验名称
PREFIX="open" # 实验前缀,影响日志和实验所在文件夹
SUFFIX="newjudge" # 实验后缀,影响所有日志和实验名称
PREFIX="ajet_newjudge" # 实验前缀,影响日志和实验所在文件夹

# OpenJudge 模型配置
OPENJUDGE_LLM='qwen-flash' # OpenJudge 评分模型
RM_LLM='qwen-max' # RM Gallery 评分模型
JUDGE_CONCURRENCY=10

# 奖励权重配置
RM_WEIGHT=0.4
CITATION_AUDIT_WEIGHT=0.2
REPORT_RESOLUTION_WEIGHT=0.2
TRAJECTORY_FAITHFULNESS_WEIGHT=0.2
RM_WEIGHT=0.5
PRESENTATION_QUALITY_WEIGHT=0.25
GROUNDING_WEIGHT=0.25

# 训练参数配置
NUM_REPEAT=4 # group size,每个query rollout NUM_REPEAT次
TRAIN_BATCH_SIZE=32 # 训练batchsize
NUM_STEPS=6 # 每个样本step轮数
DEEPFINANCE_TOOL_RESULT_MAX_CHARS=10000

# 主目录
# 主目录(需要更改)
export AJET_ROOT="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new"

NNODES=${WORLD_SIZE}

Expand All @@ -46,7 +46,7 @@ fi
# 2. 动态生成配置文件 (从yaml template生成yaml)
#===============================================================================
# 修改:配置文件生成路径,现在动态生成到 yaml 目录下
CONFIG_TEMPLATE="tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml"
CONFIG_TEMPLATE="tutorial/example_deep_finance/deep_finance.yaml"
CONFIG_FILE="${AJET_ROOT}/tutorial/example_deep_finance/yaml/${SUFFIX}.yaml"
mkdir -p $(dirname ${CONFIG_FILE})

Expand All @@ -55,12 +55,11 @@ sed -e "s|{{SUFFIX}}|${SUFFIX}|g" \
-e "s|{{MODEL_PATH}}|${MODEL_PATH}|g" \
-e "s|{{NNODES}}|${NNODES}|g" \
-e "s|{{RM_WEIGHT}}|${RM_WEIGHT}|g" \
-e "s|{{CITATION_AUDIT_WEIGHT}}|${CITATION_AUDIT_WEIGHT}|g" \
-e "s|{{PRESENTATION_QUALITY_WEIGHT}}|${PRESENTATION_QUALITY_WEIGHT}|g" \
-e "s|{{GROUNDING_WEIGHT}}|${GROUNDING_WEIGHT}|g" \
-e "s|{{OPENJUDGE_LLM}}|${OPENJUDGE_LLM}|g" \
-e "s|{{RM_LLM}}|${RM_LLM}|g" \
-e "s|{{JUDGE_CONCURRENCY}}|${JUDGE_CONCURRENCY}|g" \
-e "s|{{REPORT_RESOLUTION_WEIGHT}}|${REPORT_RESOLUTION_WEIGHT}|g" \
-e "s|{{TRAJECTORY_FAITHFULNESS_WEIGHT}}|${TRAJECTORY_FAITHFULNESS_WEIGHT}|g" \
-e "s|{{NUM_REPEAT}}|${NUM_REPEAT}|g" \
-e "s|{{NUM_STEPS}}|${NUM_STEPS}|g" \
-e "s|{{TRAIN_BATCH_SIZE}}|${TRAIN_BATCH_SIZE}|g" \
Expand All @@ -72,7 +71,7 @@ sed -e "s|{{SUFFIX}}|${SUFFIX}|g" \
${AJET_ROOT}/${CONFIG_TEMPLATE} > ${CONFIG_FILE}

echo "配置文件已生成: ${CONFIG_FILE}"
echo "参数确认: RM=${RM_WEIGHT}, Citation=${CITATION_AUDIT_WEIGHT}, OpenJudge=${OPENJUDGE_LLM}, RM_LLM=${RM_LLM}"
echo "参数确认: RM=${RM_WEIGHT}, PresentationQuality=${PRESENTATION_QUALITY_WEIGHT}, Grounding=${GROUNDING_WEIGHT}, OpenJudge=${OPENJUDGE_LLM}, RM_LLM=${RM_LLM}"

#===============================================================================
# 3. 环境配置
Expand Down Expand Up @@ -106,15 +105,15 @@ export DEEPFINANCE_MCP_CONFIG DEEPFINANCE_TOOL_RESULT_MAX_CHARS
# 其他服务配置
HF_ENDPOINT="https://hf-mirror.com"
ES_HOSTS="http://11.160.132.46:8200"
export HF_ENDPOINT ES_HOSTS
export HF_ENDPOINT ES_HOSTS

# log 文件位置
CURRENT_TIME=$(date "+%Y%m%d_%H%M%S")
LOG_DIR="${AJET_ROOT}/logs/${PREFIX}"
MASTER_IP_FILE="${LOG_DIR}/master-ip_${SUFFIX}.log"
ENV_SERVICE_LOG="${LOG_DIR}/env_service_${SUFFIX}_${CURRENT_TIME}.log"
TRAIN_LOG="${LOG_DIR}/train_${SUFFIX}_${CURRENT_TIME}.log"

env_log_prefix="${SUFFIX}__${CURRENT_TIME}"
# 多机训练参数配置
GPUS_PER_NODE=8
EXPECTED_WORKERS=$WORLD_SIZE
Expand Down Expand Up @@ -156,6 +155,8 @@ export NCCL_ASYNC_ERROR_HANDLING=1

export PYTHONPATH="${AJET_ROOT}:${PYTHONPATH}"
export RAY_CLUSTER_MODE="multi_node"
export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}" # AgentJet 内部可能使用此路径
export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"
Comment on lines +158 to +159

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The DEEPFINANCE_SCRIPT variable contains a very long and complex shell command with hardcoded paths (/mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh, conda activate finworld_1209). This significantly reduces the portability and maintainability of the script. Consider abstracting parts of this command into functions or making the paths configurable via environment variables or parameters.



#===============================================================================
Expand Down Expand Up @@ -202,11 +203,12 @@ if [[ $HOSTNAME == *"-master-"* ]]; then

# 启动训练任务(最核心)
python ajet/launcher.py \
--with-deepfinance \
--conf ${CONFIG_FILE} \
--backbone="verl" \
--prefix=${SUFFIX} \
--prefix=${env_log_prefix} \
2>&1 | tee ${TRAIN_LOG}


#===============================================================================
# 6.2 Worker 节点启动流程
Expand All @@ -218,4 +220,4 @@ else
ray stop || true
ray start --address $MASTER_ADDR:6379 --num-gpus 8
while true; do sleep 60; done
fi
fi
35 changes: 16 additions & 19 deletions tutorial/example_deep_finance/deep_finance.yaml
Original file line number Diff line number Diff line change
@@ -1,27 +1,26 @@
# ------------------ 主要配置 ------------------
ajet:
project_name: ajet_deep_finance
experiment_name: "ajet_deep_finance"
project_name: "{{PREFIX}}"
experiment_name: "{{SUFFIX}}"
# Judge 配置(嵌套结构,对应 self.config.ajet.judge.*)
judge:
openjudge_llm: qwen-flash # OpenJudge 模型
rm_llm: qwen-max # RM Gallery 模型
concurrency: 10 # Judge 并发数
openjudge_llm: {{OPENJUDGE_LLM}} # OpenJudge 模型
rm_llm: {{RM_LLM}} # RM Gallery 模型
concurrency: {{JUDGE_CONCURRENCY}} # Judge 并发数
train_ref_ans_path: {{TRAIN_REF_ANS_PATH}} # 训练集 Reference Answer 路径
val_ref_ans_path: {{VAL_REF_ANS_PATH}} # 验证集 Reference Answer 路径
# OpenJudge 权重配置
report_resolution_weight: 0.2 # 报告质量评估
trajectory_faithfulness_weight: 0.2 # 事实准确性评估
citation_audit_weight: 0.2 # 引用审计评估 (覆盖率 + 真实性)
rm_weight: 0.4 # RM Gallery 权重
presentation_quality_weight: {{PRESENTATION_QUALITY_WEIGHT}} # 报告呈现质量评估
grounding_weight: {{GROUNDING_WEIGHT}} # 引用规范性评估
rm_weight: {{RM_WEIGHT}} # RM Gallery 权重
task_judge:
# 使用本地 DeepFinanceJudge 进行评估(解耦远程 env_service)
judge_protocol: tutorial.example_deep_finance.deep_finance_judge->DeepFinanceJudgeByOpenJudge
model:
# ✨✨✨✨ 设置待训练的模型
path: {{MODEL_PATH}}
trainer_common:
nnodes: 8
nnodes: {{NNODES}}
n_gpus_per_node: 8
val_before_train: True
val_pass_n: 8
Expand All @@ -32,44 +31,42 @@ ajet:
rollout:
# ✨✨✨✨ 编写并选择Agent
user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
force_disable_toolcalls: True
force_disable_toolcalls: False
enable_oversample: False
tensor_model_parallel_size: 8
num_repeat: 4
num_repeat: {{NUM_REPEAT}}
max_env_worker: 64 # 增加环境并行数
max_num_seqs: 64 # 增加VLLM并发序列数
max_response_length_in_one_turn: 8000
max_model_len: 50000
agent_madness_reward: 0.0
compute_madness_checklist: None
multi_turn:
max_steps: 6
max_steps: {{NUM_STEPS}}
interchange_server:
interchange_method: 'tcp' # options: 'tcp' (multi-nodes) or 'ipc' (1 node)
debug:
debug_max_parallel: 1 # 增加并行任务数,充分利用GPU
debug_first_n_tasks: 100 # 增加处理的任务数
data:
train_batch_size: 32
train_batch_size: {{TRAIN_BATCH_SIZE}}
max_prompt_length: 8000
max_response_length: 41000

task_reader:
type: deep_finance # 数据从 JSON 加载并组装 init_messages,工具调用走 env_service
deep_finance:
training:
file_path: {{TRAIN_PATH}}
file_path: {{TRAIN_DATA_PATH}}
validation:
file_path: {{VAL_PATH}}
file_path: {{VAL_DATA_PATH}}
# env_service 仍需配置(用于工具调用)
env_service:
env_type: "finworld"
env_url: {{ENV_SERVICE_URL}}
env_action_preference: code


trainer:
default_local_dir: {{CKPT_SAVE_PATH}}
default_local_dir: "{{CKPT_SAVE_PATH}}/{{PREFIX}}/{{SUFFIX}}"
# resume_mode: disable # 禁用自动恢复,从头开始训练
actor_rollout_ref:
rollout:
Expand Down
Loading
Loading