Skip to content

Archive old step log files on rerun#32

Open
smjenness wants to merge 1 commit intomainfrom
feature/archive-old-logs
Open

Archive old step log files on rerun#32
smjenness wants to merge 1 commit intomainfrom
feature/archive-old-logs

Conversation

@smjenness
Copy link

Summary

  • Adds an archive_old_logs bash function to lib.sh that moves .out files with a SLURM job ID lower than the current one to log/archive/
  • Calls archive_old_logs from controller.sh before submitting each step, so that only logs from the current run remain in log/
  • The archive/ subdirectory is created on-demand only when old files are found

How it works

Step log files follow the pattern <wf_name>_step<N>_<jobid>_<taskid>.out. Before submitting a step, the controller extracts the job ID from each matching file and compares it to its own SLURM_JOB_ID. Since SLURM job IDs are monotonically increasing, any existing step logs will have a lower ID and are moved to log/archive/.

After a rerun, the log directory looks like:

log/
├── archive/
│   ├── hpc_smoke_step1_36981635_4294967294.out
│   └── hpc_smoke_step2_36981637_4294967294.out
├── hpc_smoke_controller.out
├── hpc_smoke_step1_36981789_4294967294.out
└── hpc_smoke_step2_36981791_4294967294.out

Closes #31

Test plan

  • Create a workflow, run it, then rerun a step and verify old logs are moved to log/archive/
  • Verify first run (no existing logs) works without errors
  • Verify array jobs with multiple task IDs are all archived correctly

🤖 Generated with Claude Code

When a workflow step is rerun, existing .out files with a lower SLURM
job ID are moved to log/archive/ before the new step is submitted.
This keeps the main log/ directory clean and makes it easy to identify
which logs belong to the current run.

Closes #31

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@AdrienLeGuillou AdrienLeGuillou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it does not work.
there is an indexing error below, but even with it fixed, it still does not work.

Here are the things I would be very careful about when designing this PR:

In bash variables are always character. So when comparing JOB_IDS, one must be sure to compare integer and not strings. Otherwise the lexicographic order could mess the actual comparison.

For non array-job, task_id is random / undefined. So it should not be taken into account.

However, for array jobs, the array can be submitted in multiple pieces where taskids are disconnected from jobids. (e.g. wf_step1_1234_1:10.out then wf_step1_5678_11:20.out) In this case, the logs to be archives are the one matching the step AND the taskid, but with a lower jobid.

local log_dir="$3"
local archive_dir="$log_dir/archive"

for f in "$log_dir"/${step_name}_*.out; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a valid bash line. You can't index out of a glob.
find "$log_dir" -name "${step_name}_*.out" actually list them.
This can be stored in a variable then iterated on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Archive "old" logs

2 participants