Conversation
…ints, and to forward hints through file stores.
adamnovak
left a comment
There was a problem hiding this comment.
I think the synthetic code suffers from bad style that ought to be fixed.
| may be inefficient; hints are intended for human-navigable | ||
| categorization, not for high-throughput bulk file creation. |
There was a problem hiding this comment.
This sounds like word salad of limited explanatory value.
|
|
||
| ###################################### FILES API ###################################### | ||
|
|
||
| def _claim_hinted_s3_slot(self, sanitized_hints: list[str], basename: str) -> str: |
There was a problem hiding this comment.
Everything here is S3; does it need to be in the name?
| writers on different nodes racing for the same hint+basename will | ||
| each land in a distinct numbered slot. |
There was a problem hiding this comment.
The conditional writes aren't sufficient to provide this by themselves; all they do is avoid clobbering and fail a write instead.
|
|
||
| :param sanitized_hints: Already-sanitized hint path components. | ||
| :param basename: The file basename. | ||
| :returns: The file_id (e.g. ``lions/tigers/bears/0``). |
There was a problem hiding this comment.
This shouldn't use test case content as an example, because the reader will not be familiar with the test case.
| try: | ||
| n = int(parts[0]) | ||
| if n >= next_n: | ||
| next_n = n + 1 | ||
| except (ValueError, IndexError): | ||
| continue |
There was a problem hiding this comment.
This is accomplishing some important stuff around making sure that files with hint lists that are prefixes of each other work properly, which isn't explained in comments.
|
|
||
| # Characters allowed in sanitized hints: these survive quote() unchanged | ||
| # and are safe in filesystem paths and S3 keys. | ||
| HINT_SAFE_RE = re.compile(r"[^a-zA-Z0-9_\-.]") |
There was a problem hiding this comment.
\ is not a great character to have in your paths.
| # +1 for the separator between components | ||
| if total_length + len(sanitized) + (1 if result else 0) > self.MAX_HINTS_PATH_LENGTH: | ||
| break | ||
| total_length += len(sanitized) + (1 if result else 0) |
There was a problem hiding this comment.
This is fiddly and repetitive and the comment does not get a the real point here, which is to take hints until the separator and the next hint no longer fit under the limit, then stop.
| Large numbers of files stored under the same non-empty hints | ||
| may be inefficient, as slot allocation scans existing entries; | ||
| hints are intended for human-navigable categorization, not for | ||
| high-throughput bulk file creation. |
There was a problem hiding this comment.
This gets repeated a lot, so in addition to this part being of dubious conceptual coherence, it might benefit from the parameter description being shorter overall, with the concept of hints explained in more detail elsewhere.
| concurrent writers on a shared filesystem will never collide. | ||
|
|
||
| After deletion, the empty numbered directory persists as a tombstone | ||
| that prevents the slot from being reallocated. This is critical |
There was a problem hiding this comment.
"Critical" is a strong word here. I'd say it's "required".
| def _get_unique_file_path_with_hints(self, fileName, jobStoreID, cleanup, hints): | ||
| """ | ||
| Get a unique file path for a file stored under a human-readable | ||
| hint-derived directory hierarchy. |
There was a problem hiding this comment.
Being "hint-derived" isn't a real concept.
This will fix #5401.
It still needs the hints to actually be plugged in to the WDL/CWL runners.
The code here is mostly synthetic, generated with Anthropic Claude Opus 4.6. It looked right at first glance but I also caught it trying to sneak in race conditions, so it needs thorough review.
It also needs some manual testing to run a real WDL or CWL workflow and look at the file store and agree that's what we want.
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thingin the Toil repo, or from an external repo.camelCasethat want to be insnake_case.docs/running/{cliOptions,cwl,wdl}.rstMerger Checklist