Skip to content

security: prevent path traversal in dataset nodes and use weights_only for torch.load#14167

Open
dfgvaetyj3456356-hash wants to merge 2 commits into
Comfy-Org:masterfrom
dfgvaetyj3456356-hash:security/path-traversal-and-pickle-safety
Open

security: prevent path traversal in dataset nodes and use weights_only for torch.load#14167
dfgvaetyj3456356-hash wants to merge 2 commits into
Comfy-Org:masterfrom
dfgvaetyj3456356-hash:security/path-traversal-and-pickle-safety

Conversation

@dfgvaetyj3456356-hash
Copy link
Copy Markdown

This PR addresses path traversal vulnerabilities in dataset loading nodes and switches torch.load to use weights_only=True for safer deserialization.

Adds _sanitize_folder_name() helper to validate that folder names
resolve within the expected base directory, preventing path traversal
attacks that could:
- Write files to arbitrary locations (SaveTrainingDataset)
- Create directories anywhere (SaveImageDataSetToFolderNode)
- Load malicious pickles from unexpected paths (LoadTrainingDataset)

Also adds weights_only=True to torch.load() in LoadTrainingDataset
to prevent arbitrary code execution via pickle deserialization.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb6c47de-21d4-44db-9469-850630db3f12

📥 Commits

Reviewing files that changed from the base of the PR and between bcdcf89 and 4f54cfa.

📒 Files selected for processing (1)
  • comfy_extras/nodes_dataset.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • comfy_extras/nodes_dataset.py

📝 Walkthrough

Walkthrough

This PR hardens the dataset module against two security issues: path traversal attacks and unsafe PyTorch deserialization. A new _sanitize_folder_name helper validates that user-supplied folder names cannot escape a base directory via traversal sequences. This sanitization is applied to all four dataset operations that construct subdirectories: two for saving image datasets, one for saving training datasets, and one for loading training datasets. Additionally, training dataset shard files are now deserialized with torch.load(..., weights_only=True) to restrict tensor loading to safe types, replacing the previous unrestricted deserialization behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main security changes: path traversal prevention and safer torch.load usage via weights_only parameter.
Description check ✅ Passed The description is directly related to the changeset, explaining the path traversal vulnerability fixes and pickle safety improvements through weights_only=True.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_dataset.py`:
- Around line 15-19: The _sanitize_folder_name helper's startswith check is
bypassable (e.g., ../output_evil) so replace the brittle prefix test with a
proper path-boundary check: compute absolute base (base_abs =
os.path.abspath(base_dir)) and resolved path, then ensure
os.path.commonpath([base_abs, resolved]) == base_abs (or compare using path
separator boundary by adding os.path.sep) and raise ValueError if not; update
_sanitize_folder_name to use this commonpath-based validation so sibling
directories that merely share a string prefix cannot escape the base.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e075255f-5d8c-4018-a102-2cb6efcf26f7

📥 Commits

Reviewing files that changed from the base of the PR and between e7214d7 and bcdcf89.

📒 Files selected for processing (1)
  • comfy_extras/nodes_dataset.py

Comment on lines +15 to +19
def _sanitize_folder_name(base_dir, folder_name):
resolved = os.path.abspath(os.path.join(base_dir, folder_name))
if not resolved.startswith(os.path.abspath(base_dir)):
raise ValueError("Path traversal detected in folder_name")
return resolved
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

startswith boundary check is bypassable — escapes to sibling directories.

The prefix check passes for any sibling path that shares the base name as a string prefix. For base_dir = /home/user/output, a folder_name of ../output_evil resolves to /home/user/output_evil, which satisfies startswith("/home/user/output") yet lives outside the base. That re-opens the very traversal this helper is meant to block (writing files / creating dirs in <base>_*).

Anchor the comparison on a path separator boundary (or use os.path.commonpath).

🛡️ Proposed fix
 def _sanitize_folder_name(base_dir, folder_name):
-    resolved = os.path.abspath(os.path.join(base_dir, folder_name))
-    if not resolved.startswith(os.path.abspath(base_dir)):
-        raise ValueError("Path traversal detected in folder_name")
-    return resolved
+    base = os.path.abspath(base_dir)
+    resolved = os.path.abspath(os.path.join(base, folder_name))
+    if resolved != base and not resolved.startswith(base + os.sep):
+        raise ValueError("Path traversal detected in folder_name")
+    return resolved
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_dataset.py` around lines 15 - 19, The
_sanitize_folder_name helper's startswith check is bypassable (e.g.,
../output_evil) so replace the brittle prefix test with a proper path-boundary
check: compute absolute base (base_abs = os.path.abspath(base_dir)) and resolved
path, then ensure os.path.commonpath([base_abs, resolved]) == base_abs (or
compare using path separator boundary by adding os.path.sep) and raise
ValueError if not; update _sanitize_folder_name to use this commonpath-based
validation so sibling directories that merely share a string prefix cannot
escape the base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant