Hi SkillOpt team, thanks for releasing the paper and code. I wanted to clarify the intended semantics of the slow-update path.
In the paper, Section 3.6 appears to describe slow update as still going through the validation/selection gate after the longitudinal guidance is injected. My reading is that the slow-update candidate should be accepted only if it passes the same held-out selection validation used for normal edits.
In current main, however, the implementation appears to force-inject slow-update guidance into both current_skill and best_skill without a selection gate:
|
# Slow update field is force-updated into both |
|
# current_skill and best_skill unconditionally. |
|
# The epoch-level longitudinal guidance should always |
|
# persist — it must not be gated by step-level |
|
# selection scores. |
The surrounding code records the action as force_accept:
|
slow_result["action"] = "force_accept" |
|
current_origin = f"slow_update_epoch_{epoch:02d}" |
Could you clarify which behavior is intended for reproducing/reporting SkillOpt results?
- Should slow-update guidance be validation-gated, as described in the paper?
- Or is the force-injection behavior in the released code the intended implementation?
- If force-injection is intended, should exported
best_skill.md be interpreted as the best validation-gated skill plus the latest slow-update field, rather than the exact best step candidate selected by the held-out gate?
Thanks!
Hi SkillOpt team, thanks for releasing the paper and code. I wanted to clarify the intended semantics of the slow-update path.
In the paper, Section 3.6 appears to describe slow update as still going through the validation/selection gate after the longitudinal guidance is injected. My reading is that the slow-update candidate should be accepted only if it passes the same held-out selection validation used for normal edits.
In current
main, however, the implementation appears to force-inject slow-update guidance into bothcurrent_skillandbest_skillwithout a selection gate:SkillOpt/skillopt/engine/trainer.py
Lines 1580 to 1584 in 75b5c7f
The surrounding code records the action as
force_accept:SkillOpt/skillopt/engine/trainer.py
Lines 1597 to 1598 in 75b5c7f
Could you clarify which behavior is intended for reproducing/reporting SkillOpt results?
best_skill.mdbe interpreted as the best validation-gated skill plus the latest slow-update field, rather than the exact best step candidate selected by the held-out gate?Thanks!