security: prevent path traversal in dataset nodes and use weights_only for torch.load by dfgvaetyj3456356-hash · Pull Request #14166 · Comfy-Org/ComfyUI

dfgvaetyj3456356-hash · 2026-05-29T08:54:20Z

No description provided.

Adds _sanitize_folder_name() helper to validate that folder names resolve within the expected base directory, preventing path traversal attacks that could: - Write files to arbitrary locations (SaveTrainingDataset) - Create directories anywhere (SaveImageDataSetToFolderNode) - Load malicious pickles from unexpected paths (LoadTrainingDataset) Also adds weights_only=True to torch.load() in LoadTrainingDataset to prevent arbitrary code execution via pickle deserialization.

coderabbitai · 2026-05-29T09:01:00Z

📝 Walkthrough

Walkthrough

This PR hardens dataset operations by adding _sanitize_folder_name() which verifies resolved paths remain inside a base directory and replacing os.path.join() with it when computing output/dataset directories in SaveImageDataSetToFolderNode, SaveImageTextDataSetToFolderNode, SaveTrainingDataset, and LoadTrainingDataset. Image-save calls now omit the explicit mode argument, using the helper's default. LoadTrainingDataset also changes shard deserialization to torch.load(f, weights_only=True).

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No description was provided by the author; this is a lenient check that fails only when description is completely off-topic or unrelated.	Consider adding a pull request description explaining the security vulnerabilities being addressed and the rationale for the changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main security improvements made in the changeset: path traversal prevention and torch.load security.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_dataset.py`:
- Around line 15-19: The current _sanitize_folder_name uses startswith on
abspaths which is bypassable by sibling names; replace that check with a robust
containment test: compute base_abs = os.path.abspath(base_dir) and resolved =
os.path.abspath(os.path.join(base_abs, folder_name)) and verify
os.path.commonpath([base_abs, resolved]) == base_abs (or compare with a trailing
os.path.sep on base_abs) and raise ValueError if the check fails; update the
function _sanitize_folder_name to use this commonpath (or separator) containment
logic to reliably prevent path traversal.
- Line 1557: The use of torch.load(..., weights_only=True) at the shard load
site can fail when shards contain comfy.hooks.HookGroup objects (saved by
MakeTrainingDataset via clip.encode_from_tokens_scheduled when
clip.apply_hooks_to_conds is enabled by SetClipHooks). Replace the strict
weights_only load with a safe fallback: try loading with weights_only=True
first, and on UnpicklingError (or other deserialization errors) fall back to
torch.load(f) without weights_only so non-weight objects (e.g., HookGroup in
pooled_dict["hooks"]) can be deserialized; alternatively, if you prefer
prevention, strip or avoid saving pooled_dict["hooks"] in
MakeTrainingDataset/encode_from_tokens_scheduled when clip.apply_hooks_to_conds
is enabled. Ensure the change references the shard loading site where shard_data
= torch.load(f, weights_only=True) is called and mentions MakeTrainingDataset /
clip.encode_from_tokens_scheduled / comfy.hooks.HookGroup so maintainers can
find the related code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3af125df-ae43-4f0a-9382-c9a985d642a0

📥 Commits

Reviewing files that changed from the base of the PR and between e7214d7 and bcdcf89.

📒 Files selected for processing (1)

comfy_extras/nodes_dataset.py

coderabbitai · 2026-05-29T09:01:03Z

+def _sanitize_folder_name(base_dir, folder_name):
+    resolved = os.path.abspath(os.path.join(base_dir, folder_name))
+    if not resolved.startswith(os.path.abspath(base_dir)):
+        raise ValueError("Path traversal detected in folder_name")
+    return resolved


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

startswith check is bypassable by sibling directories sharing the base prefix.

resolved.startswith(os.path.abspath(base_dir)) is the classic prefix-match pitfall. With base = /…/output, a folder_name of "../output_evil" resolves to /…/output_evil, which starts with /…/output and so passes the check — yet it lives outside the base directory. Since path-traversal prevention is the whole point of this helper, gate on a trailing separator (or os.path.commonpath) instead.

🛡️ Proposed fix

def _sanitize_folder_name(base_dir, folder_name): - resolved = os.path.abspath(os.path.join(base_dir, folder_name)) - if not resolved.startswith(os.path.abspath(base_dir)): + base = os.path.abspath(base_dir) + resolved = os.path.abspath(os.path.join(base, folder_name)) + if resolved != base and not resolved.startswith(base + os.sep): raise ValueError("Path traversal detected in folder_name") return resolved

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _sanitize_folder_name(base_dir, folder_name):

resolved = os.path.abspath(os.path.join(base_dir, folder_name))

if not resolved.startswith(os.path.abspath(base_dir)):

raise ValueError("Path traversal detected in folder_name")

return resolved

def _sanitize_folder_name(base_dir, folder_name):

base = os.path.abspath(base_dir)

resolved = os.path.abspath(os.path.join(base, folder_name))

if resolved != base and not resolved.startswith(base + os.sep):

raise ValueError("Path traversal detected in folder_name")

return resolved

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_dataset.py` around lines 15 - 19, The current _sanitize_folder_name uses startswith on abspaths which is bypassable by sibling names; replace that check with a robust containment test: compute base_abs = os.path.abspath(base_dir) and resolved = os.path.abspath(os.path.join(base_abs, folder_name)) and verify os.path.commonpath([base_abs, resolved]) == base_abs (or compare with a trailing os.path.sep on base_abs) and raise ValueError if the check fails; update the function _sanitize_folder_name to use this commonpath (or separator) containment logic to reliably prevent path traversal.

coderabbitai · 2026-05-29T09:01:03Z


            with open(shard_path, "rb") as f:
-                shard_data = torch.load(f)
+                shard_data = torch.load(f, weights_only=True)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

torch.load weights_only=True supported types allowlist UnpicklingError add_safe_globals

💡 Result:

In PyTorch, the weights_only=True option for torch.load is a security feature designed to prevent arbitrary code execution by restricting the unpickler to a limited set of safe types [1][2]. As of PyTorch 2.6, this setting is the default [2][3][4]. When loading a checkpoint with weights_only=True, if the file contains unsupported types or classes (often triggered when serializing custom objects, modules, or certain third-party types like those from NumPy), an UnpicklingError will occur, specifically identifying an "Unsupported global" [2][5][6]. Supported Types and Allowlisting By default, the weights_only unpickler allows [1][7]: - Tensors - Primitive Python types (e.g., int, float, str, bool) - Dictionaries - Basic collections (like lists and tuples) To allow additional types—such as custom model classes or specific third-party objects—you must explicitly add them to an allowlist using torch.serialization.add_safe_globals [1][8]. How to use add_safe_globals: 1. Identify the missing type from the error message (e.g., GLOBAL main.MyModel) [2][6]. 2. Add the class or function to the safe allowlist before calling torch.load [2][9]. Example: import torch from my_module import MyCustomClass # Register the class as safe torch.serialization.add_safe_globals([MyCustomClass]) # Now the load should succeed ckpt = torch.load("checkpoint.pth", weights_only=True) Best Practices: - Only allowlist types you explicitly trust, as this bypasses the security protections of weights_only=True [2]. - Avoid saving entire model objects; it is best practice to save and load only the state_dict [10]. - If you frequently encounter many missing globals, ensure your checkpoint was saved using standard PyTorch types rather than complex, custom-serialized structures [10]. If you are certain that a file is from a trusted source and it is impractical to allowlist all necessary globals, you may set weights_only=False, though this exposes your environment to potential security vulnerabilities [2][5].

Citations:

1: https://docs.pytorch.org/docs/stable/generated/torch.load.md

2: https://stackoverflow.com/questions/79584485/unable-to-torch-load-due-to-pickling-safety-error

3: https://dev-discuss.pytorch.org/t/bc-breaking-change-torch-load-is-being-flipped-to-use-weights-only-true-by-default-in-the-nightlies-after-137602/2573

4: https://forums.developer.nvidia.com/t/pickle-unpicklingerror-pytorch-2-6-weights-only-during-re-identification-evaluate-in-tao-6-25-11/361337

5: torch.load with weights_only=True does not allow bytes or bytearray pytorch/pytorch#133163

6: https://stackoverflow.com/questions/79431134/torch-cannot-add-namedtuple-class-to-safe-globals

7: https://docs.pytorch.org/docs/2.11/generated/torch.load.html

8: https://github.com/pytorch/pytorch/blob/main/torch/serialization.py

9: pytorch/pytorch@072d9e8

10: torch.load with weights_only=True show error for pickle protocol 4 pytorch/pytorch#118092

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Show context around the load call in question sed -n '1520,1605p' comfy_extras/nodes_dataset.py | cat -n # 2) Find where shard files are created/saved (torch.save usage near dataset/shard logic) rg -n "torch\.save\(" comfy_extras/nodes_dataset.py rg -n "shard" comfy_extras/nodes_dataset.py | head -n 80 # 3) Find conditioning generation path (encode_from_tokens_scheduled) rg -n "encode_from_tokens_scheduled" comfy_extras/nodes_dataset.py # 4) Search for add_safe_globals usage anywhere in repo rg -n "add_safe_globals" -S .

Repository: Comfy-Org/ComfyUI

Length of output: 5830

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1360,1415p' comfy_extras/nodes_dataset.py | cat -n echo "----" sed -n '1415,1505p' comfy_extras/nodes_dataset.py | cat -n echo "----" sed -n '1490,1575p' comfy_extras/nodes_dataset.py | cat -n echo "----" sed -n '1,120p' comfy/utils.py | cat -n

Repository: Comfy-Org/ComfyUI

Length of output: 14925

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Locate the implementation of encode_from_tokens_scheduled rg -n "def encode_from_tokens_scheduled|encode_from_tokens_scheduled\s*\(" -S . # 2) Print a focused excerpt around the implementation (first match) python3 - <<'PY' import subprocess, re, sys res = subprocess.check_output(["bash","-lc","rg -n \"def encode_from_tokens_scheduled|encode_from_tokens_scheduled\\s*\\(\" -S . | head -n 20"], text=True) print(res) PY

Repository: Comfy-Org/ComfyUI

Length of output: 4925

🏁 Script executed:

#!/bin/bash set -euo pipefail # Read implementation of encode_from_tokens_scheduled sed -n '250,430p' comfy/sd.py | cat -n # Also inspect any helper used by encode_from_tokens_scheduled (quick scan nearby) rg -n "encode_from_tokens_scheduled" -n comfy/sd.py

Repository: Comfy-Org/ComfyUI

Length of output: 9983

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "apply_hooks_to_conds" comfy/sd.py comfy -S rg -n "use_clip_schedule" comfy/sd.py comfy -S rg -n "add_hooks_to_dict\(" -S comfy/sd.py comfy

Repository: Comfy-Org/ComfyUI

Length of output: 1389

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "apply_hooks_to_conds" comfy/sd.py comfy -S rg -n "use_clip_schedule" comfy/sd.py comfy -S rg -n "add_hooks_to_dict\(" -S comfy/sd.py comfy

Repository: Comfy-Org/ComfyUI

Length of output: 1389

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find encode_token_weights implementations rg -n "def encode_token_weights" comfy -S # Print a small excerpt around the first few matches to see return types/structure rg -n "def encode_token_weights" comfy -S | head -n 5 | cut -d: -f1 | uniq | head -n 5 | while read -r f; do echo "---- $f" # show around the first occurrence start=$(rg -n "def encode_token_weights" "$f" -S | head -n 1 | cut -d: -f2) begin=$((start-20)) end=$((start+80)) sed -n "${begin},${end}p" "$f" | cat -n done

Repository: Comfy-Org/ComfyUI

Length of output: 2256

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect MakeTrainingDataset to see what CLIP object it receives/uses rg -n "class MakeTrainingDataset|MakeTrainingDataset" comfy_extras/nodes_dataset.py start=$(rg -n "class MakeTrainingDataset" comfy_extras/nodes_dataset.py | head -n1 | cut -d: -f1) sed -n "$((start-40)),$((start+220))p" comfy_extras/nodes_dataset.py | cat -n # 2) Inspect encode_token_weights for common CLIP implementations (SD1 + SDXL) for f in comfy/sd1_clip.py comfy/sdxl_clip.py comfy/text_encoders/*clip*.py; do [ -f "$f" ] || continue lines=$(rg -n "def encode_token_weights" "$f" 2>/dev/null | head -n 3 || true) if [ -n "$lines" ]; then echo "==== $f ====" echo "$lines" first=$(echo "$lines" | head -n1 | cut -d: -f1) sed -n "$((first-30)),$((first+120))p" "$f" | cat -n fi done

Repository: Comfy-Org/ComfyUI

Length of output: 14376

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Check whether use_clip_schedule is ever enabled elsewhere rg -n "use_clip_schedule\s*=" comfy -S # 2) For sd1_clip.py, inspect both encode_token_weights definitions for line in $(rg -n "def encode_token_weights" comfy/sd1_clip.py | cut -d: -f1); do begin=$((line-30)); if [ "$begin" -lt 1 ]; then begin=1; fi end=$((line+160)) echo "==== comfy/sd1_clip.py around $line ====" sed -n "${begin},${end}p" comfy/sd1_clip.py | cat -n done # 3) Inspect sdxl_clip.py encode_token_weights definition for line in $(rg -n "def encode_token_weights" comfy/sdxl_clip.py | cut -d: -f1); do begin=$((line-30)); if [ "$begin" -lt 1 ]; then begin=1; fi end=$((line+160)) echo "==== comfy/sdxl_clip.py around $line ====" sed -n "${begin},${end}p" comfy/sdxl_clip.py | cat -n done

Repository: Comfy-Org/ComfyUI

Length of output: 15149

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "use_clip_schedule" . -S rg -n "forced_hooks" . -S rg -n "apply_hooks_to_conds" . -S

Repository: Comfy-Org/ComfyUI

Length of output: 1674

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '220,290p' comfy_extras/nodes_hooks.py | cat -n

Repository: Comfy-Org/ComfyUI

Length of output: 3341

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "add_safe_globals" . -S rg -n "safe_globals" comfy -S rg -n "HookGroup" comfy -S | head -n 50

Repository: Comfy-Org/ComfyUI

Length of output: 5709

Hardening risk: weights_only=True can break shards saved with CLIP hooks.

MakeTrainingDataset persists whatever clip.encode_from_tokens_scheduled() returns. When clip.apply_hooks_to_conds is set (via SetClipHooks with apply_to_conds=True), encode_from_tokens_scheduled includes pooled_dict["hooks"] holding a comfy.hooks.HookGroup. torch.load(..., weights_only=True) will then likely fail to deserialize that shard with UnpicklingError unless those globals are allowlisted or hooks are stripped/avoided during save/load.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_dataset.py` at line 1557, The use of torch.load(..., weights_only=True) at the shard load site can fail when shards contain comfy.hooks.HookGroup objects (saved by MakeTrainingDataset via clip.encode_from_tokens_scheduled when clip.apply_hooks_to_conds is enabled by SetClipHooks). Replace the strict weights_only load with a safe fallback: try loading with weights_only=True first, and on UnpicklingError (or other deserialization errors) fall back to torch.load(f) without weights_only so non-weight objects (e.g., HookGroup in pooled_dict["hooks"]) can be deserialized; alternatively, if you prefer prevention, strip or avoid saving pooled_dict["hooks"] in MakeTrainingDataset/encode_from_tokens_scheduled when clip.apply_hooks_to_conds is enabled. Ensure the change references the shard loading site where shard_data = torch.load(f, weights_only=True) is called and mentions MakeTrainingDataset / clip.encode_from_tokens_scheduled / comfy.hooks.HookGroup so maintainers can find the related code.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_dataset.py`:
- Around line 260-263: mode is extracted into mode = mode[0] but never
forwarded, so the "increment" option is a no-op; change the call to
save_images_to_folder(...) to pass an explicit overwrite argument derived from
mode (e.g., overwrite = True when mode == "overwrite", False when mode ==
"increment") so save_images_to_folder receives the intent from mode; update the
save_images_to_folder(...) invocation near
_sanitize_folder_name(folder_paths.get_output_directory(), folder_name) to
include the overwrite flag (using filename_prefix, images, and output_dir as
before).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 594d24b4-cd2e-452b-8337-ed437c4fcb7c

📥 Commits

Reviewing files that changed from the base of the PR and between bcdcf89 and 27193ea.

📒 Files selected for processing (1)

comfy_extras/nodes_dataset.py

coderabbitai · 2026-05-29T14:00:21Z

        mode = mode[0]

-        output_dir = os.path.join(folder_paths.get_output_directory(), folder_name)
-        saved_files = save_images_to_folder(images, output_dir, filename_prefix, mode=='overwrite')
+        output_dir = _sanitize_folder_name(folder_paths.get_output_directory(), folder_name)
+        saved_files = save_images_to_folder(images, output_dir, filename_prefix)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

mode is extracted but never forwarded — the increment option is a no-op.

mode = mode[0] is computed but save_images_to_folder(...) is called without overwrite, so it always uses the default overwrite=True. Selecting "increment" silently overwrites files instead of incrementing filenames.

🐛 Proposed fix

- saved_files = save_images_to_folder(images, output_dir, filename_prefix) + saved_files = save_images_to_folder(images, output_dir, filename_prefix, overwrite=(mode == "overwrite"))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_dataset.py` around lines 260 - 263, mode is extracted into mode = mode[0] but never forwarded, so the "increment" option is a no-op; change the call to save_images_to_folder(...) to pass an explicit overwrite argument derived from mode (e.g., overwrite = True when mode == "overwrite", False when mode == "increment") so save_images_to_folder receives the intent from mode; update the save_images_to_folder(...) invocation near _sanitize_folder_name(folder_paths.get_output_directory(), folder_name) to include the overwrite flag (using filename_prefix, images, and output_dir as before).

dfgvaetyj3456356-hash requested review from Kosinkadink, alexisrolland, comfyanonymous, guill, kijai and rattus128 as code owners May 29, 2026 08:54

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Merge branch 'master' into security/path-traversal-and-pickle-rce

27193ea

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: prevent path traversal in dataset nodes and use weights_only for torch.load#14166

security: prevent path traversal in dataset nodes and use weights_only for torch.load#14166
dfgvaetyj3456356-hash wants to merge 2 commits into
Comfy-Org:masterfrom
dfgvaetyj3456356-hash:security/path-traversal-and-pickle-rce

dfgvaetyj3456356-hash commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dfgvaetyj3456356-hash commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 29, 2026 •

edited

Loading