Fix a task fingerprinting bug by mkeeler · Pull Request #2740 · go-task/task

mkeeler · 2026-03-12T19:11:33Z

Task Fingerprinting Bug

The first commit in this PR fixes a bug where two task invocations (such as in a for loop) inadvertently where writing the checksum or timestamp files for the task to the same location even though the tasks were executed with different arguments causing them to have different sources.

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"

Run: echo 1 >1.in && echo 2 >2.in
Run: task copy
- This will run the copy:single once for each *.in file
Run: echo 2.2 > 2.in
- _This will run the copy:single task twice again with neither showing as up to date.

Because only 2.in was changed, I was expecting the task to show one copy:single task as up to date and then re-copy 2.in to 2.out.

Fix

Instead of writing out the checksum/timestamps to a single file within the respective directory, the task is first fingerprinted. So instead of the copy:single task here recording the checksum/timestamp in a single copy-single file, it will take a hash of the normalized task name, working directory of the task and the declared sources/generates and store the checksum in copy-single-<hash>. This allows each distinct invocation of the sub-task with different arguments to independently manage whether it is up to date.

Previously this PR also contained another fix but that has since been rolled into #2743

Task Watch Cancellation Bug

The pre-existing task watching code had a bug where once an event occurred it would
spawn go routines to process all tasks in the background and continue the loop. If an
event occurs, it would cancel the context used to run those previous go routines and > restart everything. In some scenarios this works fine such as when the generated
files do not reside within the same directory being watches. When the generated files
do reside in the same directory, the first task generating its output causes an
fsnotify event to be triggered which then cancels the context. This is racey, but if
the tasks are longer running it can eventually cancel the task resulting in other
sub-tasks not being executed. This doesn't result in an infinite loop because prior
to executing the task the fingerprint is checked and updated to prevent subsequent
runs.

The root cause of all of the bad behavior of not running the tasks to completion is
that the context is cancelled when it shouldn't be (an fsnotify event comes in for
something that is not one of the sources).

Reproduction
version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
      # this is the main difference from the first bugs reproduction yaml
      # the sleep here ensures that tasks are "long running" allowing
      # time for the context cancellation to happen and prevent running
      # all the tasks
      - sleep 3
Run: echo 1 >1.in && echo 2 >2.in

Run: task -w copy

This will run the copy:single only once. It never gets around to executing
the copy for the 2.in file

In another terminal, run: echo 2.2 > 2.in

This will run the copy:single only once again.

I would have expected step 2 to run copy:single twice but it doesn't due to the
context being cancelled while in the first copy:single invocations sleep command
is executing.

I would also have expected step 3 cause copying to take place again. With the fix for
the fingerprinting bug included, the first invocation should show as up to date and
the second one would then run.

Fix

The fix was to move some logic to check the event against the sources out of the
spawned go routines to execute the tasks and to where the event handling first
starts. Because we check the events file against the list of sources before the
context is cancelled, we can toss out irrelevant events and keep processing of the
tasks going.~

butuzov · 2026-03-15T18:53:35Z

This looks way better, than mine simple solution + test coverage added.

mkeeler · 2026-03-16T18:20:16Z

@trulede I have refactored this PR to only have the task fingerprinting fix.

trulede

If you take a look at this PR there is another way to get an identity.
https://github.com/go-task/task/pull/2287/changes#diff-05ed798798e201a0fe6355799c9adb03b42b72710770a86ddecc4def33773c51R137

Its a bit more involved, but what I'm wondering is if the json.Marshal + xxh3 hash is have more (or less) impact on performance in comparison? I think here, mostly avoiding the JSON encoding, and directly hashing the fields necessary.

The other thing is the globbing of sources and generates. This is also an expensive operation. Can we do this fingerprinting after the task is compiled, and use the already globbed task.sources/generates, OR use this globbing result on the task to avoid a subsequent (and duplicate) globbing.

Note: there is another PR #2671 which introduces dynamic sources/generates. That probably also impacts on checksum operation (i.e. it would need to happen after the task is compiled.

Checks:

Can/are checksums done after the Task is compiled (support dynamic sources/generates).
Can the source/generates object on the task be used to avoid recalculation.
Performance of identity generation.

mkeeler · 2026-05-01T21:11:15Z

Regarding the performance I went ahead and tested out a version which uses just hashstructure:

func taskFingerprintKeyHashStructure(t *ast.Task) string {
	name := taskIdentityName(t)
	identity := fingerprintIdentity{
		Task:      name,
		Dir:       t.Dir,
		Sources:   globPatterns(t.Sources),
		Generates: globPatterns(t.Generates),
	}

	hash, err := hashstructure.Hash(identity, hashstructure.FormatV2, nil)
	if err != nil {
		return normalizeFilename(name)
	}

	return normalizeFilename(fmt.Sprintf("%s-%d", name, hash))
}

Its pretty much the exact same function with the hashing swapping out from json encode + xxh3 to use hashstructure.

In addition to just hashstructure I also tried out manually writing to an fnv hasher with the individual fields. The thinking here was that maybe avoiding reflection could be a big performance win.

func taskFingerprintKeyFnv(t *ast.Task) string {
	name := taskIdentityName(t)

	fnvHash := fnv.New64a()
	_, err := fnvHash.Write([]byte(name))
	if err != nil {
		return normalizeFilename(name)
	}

	_, err = fnvHash.Write([]byte(t.Dir))
	if err != nil {
		return normalizeFilename(name)
	}

	if err := hashGlobs(t.Sources, fnvHash); err != nil {
		return normalizeFilename(name)
	}

	if err := hashGlobs(t.Generates, fnvHash); err != nil {
		return normalizeFilename(name)
	}

	return normalizeFilename(fmt.Sprintf("%s-%x", name, fnvHash.Sum64()))
}

func hashGlobs(globs []*ast.Glob, h io.Writer) error {
	if len(globs) == 0 {
		return nil
	}

	for _, glob := range globs {
		if glob == nil {
			continue
		}

		if glob.Negate {
			if _, err := h.Write([]byte("!")); err != nil {
				return err
			}
		}
		if _, err := h.Write([]byte(glob.Glob)); err != nil {
			return err
		}
	}
	return nil
}

Then I have this benchmark test:

func BenchmarkTaskFingerprinting(b *testing.B) {
	task := &ast.Task{
		Task:     "namespace:copy/single",
		FullName: "namespace:copy/single",
		Dir:      "repro",
		Sources: []*ast.Glob{
			{Glob: "**/*.in"},
			{Glob: "vendor/**", Negate: true},
			{Glob: "**/*.tmpl"},
		},
		Generates: []*ast.Glob{
			{Glob: "**/*.out"},
			{Glob: "reports/**/*.json"},
		},
	}

	prefix := normalizeFilename(task.Task) + "-"

	b.ReportAllocs()
	b.ResetTimer()

	type benchmarkCase struct {
		name string
		fn   func(task *ast.Task) string
	}

	cases := []benchmarkCase{
		{name: "json+xxh3", fn: taskFingerprintKey},
		{name: "hashstructure", fn: taskFingerprintKeyHashStructure},
		{name: "fnv", fn: taskFingerprintKeyFnv},
	}

	for _, bc := range cases {
		b.Run(bc.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				benchmarkTaskFingerprintKeySink = bc.fn(task)
				if !strings.HasPrefix(benchmarkTaskFingerprintKeySink, prefix) {
					b.FailNow()
				}
			}
		})
	}
}

The results show hashstructure taking ~2x the CPU time as the json+xxh3 thats committed in this PR. The manual fnv hashing did improve on the json+xxh3 for speed (165 nanoseconds per operation reduction).

➜  go test -benchtime=5s -bench .
goos: darwin
goarch: arm64
pkg: github.com/go-task/task/v3/internal/fingerprint
cpu: Apple M3 Pro
BenchmarkTaskFingerprinting/json+xxh3-12                 5948370               982.8 ns/op
BenchmarkTaskFingerprinting/hashstructure-12             3277699              1845 ns/op
BenchmarkTaskFingerprinting/fnv-12                       7306032               817.2 ns/op
PASS
ok      github.com/go-task/task/v3/internal/fingerprint 21.941s

Obviously with more sources/generates the time to create the fingerprint will increase but I wouldn't expect it to exceed low single digit milliseconds.

trulede · 2026-05-04T20:37:57Z

Thanks for the insight.

I'm reading this:

Each task has only one checksum stored for its sources. If you want to distinguish a task by any of its input variables, you can add those variables as part of the task's label, and it will be considered a different task.
This is useful if you want to run a task once for each distinct set of inputs until the sources actually change. For example, if the sources depend on the value of a variable, or you if you want the task to rerun if some arguments change even if the source has not.

and wondering if this is the right solution. I'm pretty sure if you added {{.ITEM}} to the label of copy:single then it would work. Only an observation on my part.

mkeeler · 2026-05-04T21:28:25Z

I hadn't seen that in the docs before. I think that can certainly work but it seems rather cumbersome to use in practice.

The question in my mind now is whether that default behavior of having a single checksum for a task when there are variables used for sources or generates is actually good and even if it is, is using the label to override the best escape hatch vs having possibly another fields whose sole purpose is to define variables to hash to opt-in to behavior like I have in this PR.

trulede · 2026-05-05T16:06:34Z

The question in my mind now is whether that default behavior of having a single checksum for a task when there are variables used for sources or generates is actually good and even if it is, is using the label to override the best escape hatch vs having possibly another fields whose sole purpose is to define variables to hash to opt-in to behavior like I have in this PR.

The advantage of the label is that you can do whatever you need via a template string/function, and don't depend on the implementation.

Its also less cumbersome than putting a hash value in the fingerprint filename because ... what does that hash value relate too? (think "my-task--3aaedc790d60dc8f6804ac6a4a7890e81394f55a" vs "my-task--somefile"). And label is also used for run for more or less the same "problem".

If label works for you, I would probably stick with that.

mkeeler · 2026-05-05T16:36:13Z

It works for me for now so I will close this.

mkeeler · 2026-05-06T17:01:37Z

Just to confirm I have been using the following and it has been working well:

label: render:{{cat .SOURCE \"-\" .TARGET | adler32sum}}

In the logs the labels look like: render:3156625934

trulede · 2026-05-06T17:19:55Z

@mkeeler That is great! Hopefully they review the watcher PR soon.

Fix task fingerprinting to account for the dir sources and generates

d19a73f

trulede mentioned this pull request Mar 15, 2026

watch issue #2715

Open

butuzov mentioned this pull request Mar 15, 2026

fix: Selective watching #2742

Closed

This was referenced Mar 15, 2026

Watcher should ignore files not in sources before cancelling any running tasks. #2743

Open

Watch hijacking dependency process #2477

Open

mkeeler force-pushed the watch-rebuild-single branch from c5d3920 to d19a73f Compare March 16, 2026 18:14

mkeeler changed the title ~~Fix a task watching cancellation bug and a task fingerprinting bug~~ Fix a task fingerprinting bug Mar 16, 2026

trulede reviewed May 1, 2026

View reviewed changes

mkeeler closed this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix a task fingerprinting bug#2740

Fix a task fingerprinting bug#2740
mkeeler wants to merge 1 commit into
go-task:mainfrom
mkeeler:watch-rebuild-single

mkeeler commented Mar 12, 2026 •

edited

Loading

Uh oh!

butuzov commented Mar 15, 2026

Uh oh!

mkeeler commented Mar 16, 2026

Uh oh!

trulede left a comment •

edited

Loading

Uh oh!

mkeeler commented May 1, 2026

Uh oh!

trulede commented May 4, 2026

Uh oh!

mkeeler commented May 4, 2026

Uh oh!

trulede commented May 5, 2026

Uh oh!

mkeeler commented May 5, 2026

Uh oh!

mkeeler commented May 6, 2026

Uh oh!

trulede commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mkeeler commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task Fingerprinting Bug

Reproduction

Fix

Task Watch Cancellation Bug

Reproduction

Fix

Uh oh!

butuzov commented Mar 15, 2026

Uh oh!

mkeeler commented Mar 16, 2026

Uh oh!

trulede left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkeeler commented May 1, 2026

Uh oh!

trulede commented May 4, 2026

Uh oh!

mkeeler commented May 4, 2026

Uh oh!

trulede commented May 5, 2026

Uh oh!

mkeeler commented May 5, 2026

Uh oh!

mkeeler commented May 6, 2026

Uh oh!

trulede commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkeeler commented Mar 12, 2026 •

edited

Loading

trulede left a comment •

edited

Loading