Skip to content

Fix a task watching cancellation bug and a task fingerprinting bug#2740

Open
mkeeler wants to merge 2 commits intogo-task:mainfrom
mkeeler:watch-rebuild-single
Open

Fix a task watching cancellation bug and a task fingerprinting bug#2740
mkeeler wants to merge 2 commits intogo-task:mainfrom
mkeeler:watch-rebuild-single

Conversation

@mkeeler
Copy link

@mkeeler mkeeler commented Mar 12, 2026

Task Fingerprinting Bug

The first commit in this PR fixes a bug where two task invocations (such as in a for loop) inadvertently where writing the checksum or timestamp files for the task to the same location even though the tasks were executed with different arguments causing them to have different sources.

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task copy
    • This will run the copy:single once for each *.in file
  3. Run: echo 2.2 > 2.in
    • _This will run the copy:single task twice again with neither showing as up to date.

Because only 2.in was changed, I was expecting the task to show one copy:single task as up to date and then re-copy 2.in to 2.out.

Fix

Instead of writing out the checksum/timestamps to a single file within the respective directory, the task is first fingerprinted. So instead of the copy:single task here recording the checksum/timestamp in a single copy-single file, it will take a hash of the normalized task name, working directory of the task and the declared sources/generates and store the checksum in copy-single-<hash>. This allows each distinct invocation of the sub-task with different arguments to independently manage whether it is up to date.

Task Watch Cancellation Bug

The pre-existing task watching code had a bug where once an event occurred it would spawn go routines to process all tasks in the background and continue the loop. If an event occurs, it would cancel the context used to run those previous go routines and restart everything. In some scenarios this works fine such as when the generated files do not reside within the same directory being watches. When the generated files do reside in the same directory, the first task generating its output causes an fsnotify event to be triggered which then cancels the context. This is racey, but if the tasks are longer running it can eventually cancel the task resulting in other sub-tasks not being executed. This doesn't result in an infinite loop because prior to executing the task the fingerprint is checked and updated to prevent subsequent runs.

The root cause of all of the bad behavior of not running the tasks to completion is that the context is cancelled when it shouldn't be (an fsnotify event comes in for something that is not one of the sources).

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
      # this is the main difference from the first bugs reproduction yaml
      # the sleep here ensures that tasks are "long running" allowing
      # time for the context cancellation to happen and prevent running
      # all the tasks
      - sleep 3
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task -w copy
    • This will run the copy:single only once. It never gets around to executing the copy for the 2.in file
  3. In another terminal, run: echo 2.2 > 2.in
    • This will run the copy:single only once again.

I would have expected step 2 to run copy:single twice but it doesn't due to the context being cancelled while in the first copy:single invocations sleep command is executing.

I would also have expected step 3 cause copying to take place again. With the fix for the fingerprinting bug included, the first invocation should show as up to date and the second one would then run.

Fix

The fix was to move some logic to check the event against the sources out of the spawned go routines to execute the tasks and to where the event handling first starts. Because we check the events file against the list of sources before the context is cancelled, we can toss out irrelevant events and keep processing of the tasks going.

Matt Keeler added 2 commits March 12, 2026 12:38
…he watched directory

Previously if a generated file was placed alongside the source file, the fsnotify event would be seen and trigger cancellation of all outstanding task runs. If the task were executing a for loop over its sources and calling other tasks, this would have the effect of preventing all of the child tasks from being executed.

The fix here is to move ignoring fsnotify events earlier and prevent that context cancellation unless we know that we need to execute the tasks again anyways.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant