Skip to content

BulkRequestNotFoundException warning appearing for auto-cleared bulk requests create large logs for bulkDomain #7931

@ArturAkh

Description

@ArturAkh

Dear dCache developers,

We at KIT encounter from time to time very large logs of several GB appearing on our productive systems for the bulk domain.

When looking into the logs, we see the following warnings appearing very often:

05 Nov 2025 13:52:29 (bulk) [] checkTerminated, check for cancel on failure 879e460c-64c9-43ad-8098-3a79dd777275: org.dcache.services.bulk.BulkRequestNotFoundException: request 879e460c-64c9-43ad-8098-3a79dd777275 not found.
05 Nov 2025 13:52:31 (bulk) [] checkTerminated, check for cancel on failure a2f74bb4-2506-4c90-abd0-19b42f41fe29: org.dcache.services.bulk.BulkRequestNotFoundException: request a2f74bb4-2506-4c90-abd0-19b42f41fe29 not found.
05 Nov 2025 13:52:31 (bulk) [] checkTerminated, check for cancel on failure 28d5c7bc-5cb4-4500-b1f3-98bf4d0658ad: org.dcache.services.bulk.BulkRequestNotFoundException: request 28d5c7bc-5cb4-4500-b1f3-98bf4d0658ad not found.
05 Nov 2025 13:52:32 (bulk) [] checkTerminated, check for cancel on failure c6c86170-81f4-4d66-a5d6-fa7e91aa93af: org.dcache.services.bulk.BulkRequestNotFoundException: request c6c86170-81f4-4d66-a5d6-fa7e91aa93af not found.
05 Nov 2025 13:52:32 (bulk) [] checkTerminated, check for cancel on failure f1038556-957c-4d85-b575-0f0c86ffa2a9: org.dcache.services.bulk.BulkRequestNotFoundException: request f1038556-957c-4d85-b575-0f0c86ffa2a9 not found.
05 Nov 2025 13:52:33 (bulk) [] checkTerminated, check for cancel on failure da4b17e8-d7ee-4c5f-9c7a-71764c39d585: org.dcache.services.bulk.BulkRequestNotFoundException: request da4b17e8-d7ee-4c5f-9c7a-71764c39d585 not found.
05 Nov 2025 13:52:33 (bulk) [] checkTerminated, check for cancel on failure e5f35938-1f95-40e9-8719-4f0c99a075ec: org.dcache.services.bulk.BulkRequestNotFoundException: request e5f35938-1f95-40e9-8719-4f0c99a075ec not found.
05 Nov 2025 13:52:34 (bulk) [] checkTerminated, check for cancel on failure 19354e02-3681-4b2d-b91a-d306fb330108: org.dcache.services.bulk.BulkRequestNotFoundException: request 19354e02-3681-4b2d-b91a-d306fb330108 not found.

To analyze the situation, I have run my Github Copilot setup on a locally checked-out dCache codebase twice, each time with a different model. See the 2 files attached.

BULK_REQUEST_NOT_FOUND_RACE_CONDITION-by-Anthropic_Claude_Sonnet_4.5.md
bulk-request-warning-analysis_by_OpenAI-GPT5-Codex.md

Both models conclude consistently, that there is a race condition between the check for the request being terminated, and a automatic clearing of bulk requests which have terminated.

They either suggest a hotfix to catch these messages or restructure the code to avoid this race condition. As always, AI-based analyses should be taken with care, but the conclusions sound plausible keeping the load we have in mind.

While this is not a critical issue in the first place, it might cause problems for us in case of heavy staging activity as done often by ATLAS. This gets even trickier on week-ends, where we don't pay full attention on the emails/messages about too big log files. For the particular case of the bulkDomain.log, it was close to filling up our space for logs, which is setup to provide 14GB space, despite a regular compression and rotation. The log file which we have rotated by hand was close to 1 GB. A few numbers on the appearances of the records in that log file:

> zgrep "bulk" /var/log/dcache/bulkDomain.log-20251102_2.gz | wc -l
61156408
> zgrep "checkTerminated" /var/log/dcache/bulkDomain.log-20251102_2.gz | wc -l
30592931
> zgrep "BulkRequestNotFoundException" /var/log/dcache/bulkDomain.log-20251102_2.gz | wc -l
61156244

I must admit though, that reaching 30 Mio entries for this warning is looking suspicious. It is not clear to me, whether all of them were triggered by the race condition encountered.

Additional note from us would be, that we would like to keep the log level at WARNING at KIT, if possible to keep track of other warnings which might be useful to debug problems.

@mksahakyan FYI (we discussed that in the T1 call)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions