Conversation
When a workflow step is rerun, existing .out files with a lower SLURM job ID are moved to log/archive/ before the new step is submitted. This keeps the main log/ directory clean and makes it easy to identify which logs belong to the current run. Closes #31 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AdrienLeGuillou
left a comment
There was a problem hiding this comment.
Currently it does not work.
there is an indexing error below, but even with it fixed, it still does not work.
Here are the things I would be very careful about when designing this PR:
In bash variables are always character. So when comparing JOB_IDS, one must be sure to compare integer and not strings. Otherwise the lexicographic order could mess the actual comparison.
For non array-job, task_id is random / undefined. So it should not be taken into account.
However, for array jobs, the array can be submitted in multiple pieces where taskids are disconnected from jobids. (e.g. wf_step1_1234_1:10.out then wf_step1_5678_11:20.out) In this case, the logs to be archives are the one matching the step AND the taskid, but with a lower jobid.
| local log_dir="$3" | ||
| local archive_dir="$log_dir/archive" | ||
|
|
||
| for f in "$log_dir"/${step_name}_*.out; do |
There was a problem hiding this comment.
That is not a valid bash line. You can't index out of a glob.
find "$log_dir" -name "${step_name}_*.out" actually list them.
This can be stored in a variable then iterated on
Summary
archive_old_logsbash function tolib.shthat moves.outfiles with a SLURM job ID lower than the current one tolog/archive/archive_old_logsfromcontroller.shbefore submitting each step, so that only logs from the current run remain inlog/archive/subdirectory is created on-demand only when old files are foundHow it works
Step log files follow the pattern
<wf_name>_step<N>_<jobid>_<taskid>.out. Before submitting a step, the controller extracts the job ID from each matching file and compares it to its ownSLURM_JOB_ID. Since SLURM job IDs are monotonically increasing, any existing step logs will have a lower ID and are moved tolog/archive/.After a rerun, the log directory looks like:
Closes #31
Test plan
log/archive/🤖 Generated with Claude Code