Added all the files required to run task generation agentic pipeline#59
Added all the files required to run task generation agentic pipeline#59saidul-islam98 wants to merge 1 commit intomainfrom
Conversation
kohankhaki
left a comment
There was a problem hiding this comment.
Thanks for the new agentic work. I have one question, is this going to replace the stage 3 in agentic workflow? Or introduces an alternative for the stage 3?
From the current repo structure, we have two paths:
- The schema-standard base pipeline (
src/run_base_pipeline.py+src/base_stages/*, documented insrc/schemas/GENERATION_PIPELINE_SCHEMAS.md). - A legacy agentic debate path (
src/agentic_*entrypoints) that uses older custom JSON structures.
For this PR, since the goal is to make the new agentic Stage 3 replaceable with the current base Stage 3, the output contract needs to match the standardized schema exactly (Stage-3 layout, metadata linkage, hierarchy fields, ID conventions), so Stage 4/5 can consume it without special handling.
Stage 3 should consume capabilities/<capabilities_tag>/<area_id>/capabilities.json in the standardized format.
To test this Stage-3 implementation, please use schema-standard Stage-2 inputs:
- either generate areas/capabilities via the standard pipeline (stages 0–2), or
- create custom areas/capabilities using the standard dataclasses (
Domain,Area,Capability) andsave_capabilities.
| ) | ||
|
|
||
| # chapter path for output | ||
| chapter_out_path = ( |
There was a problem hiding this comment.
Output path is non-standard (tasks/<tag>/<book>/<chapter>/tasks.json). Standard schema requires tasks/<task_tag>/<area_id>/<capability_id>/tasks.json. Please align path structure to schema contract.
| def make_verifier_agent() -> VerifierAgent: | ||
| return VerifierAgent(name="Verifier", model_client=verifier_client) | ||
|
|
||
| for chapter_idx, chapter_path in enumerate(chapter_files): |
There was a problem hiding this comment.
Stage-3 standard flow should iterate over Stage-2 capabilities (area/capability hierarchy), not chapter files directly. This currently bypasses standardized Stage-2 -> Stage-3 contract and breaks immediate Stage-4/5 interoperability.
| dedup_cfg.get("embedding_model", "text-embedding-3-small") | ||
| ) | ||
| keep_policy = str(dedup_cfg.get("keep_policy", "first")) | ||
| cache_embeddings = bool(dedup_cfg.get("cache_embeddings", True)) |
There was a problem hiding this comment.
input_stage_tag is set to None. For Stage-3 this must reference the Stage-2 capabilities tag for provenance/resume compatibility. Please pass the actual input tag.
| ) | ||
| continue | ||
|
|
||
| all_tasks: List[Task] = [] |
There was a problem hiding this comment.
Tasks are being built with placeholder capability/area/domain fields (__placeholder__, *_placeholder).
Schema-compliant Task objects must include real hierarchy values from actual Capability inputs (capability/area/domain identifiers and names), not placeholders.
| meta["chapter_id"] = meta.get("chapter_id") or chapter_id | ||
| t.generation_metadata = meta | ||
|
|
||
| t.task_id = f"{prefix}__task_{i:03d}" |
There was a problem hiding this comment.
Dedup rewrites task_id to <chapter_id>__task_###, which breaks standardized task ID format and scope expectations.
Please keep/assign schema-standard task_### IDs in capability scope after deduplication as well.
PR Type
Feature
Short Description
Added the required files to run the task generation agentic pipeline. The instructions to run the pipeline can be found under:
Tests Added
None