Skip to content

Conversation

@raamana
Copy link

@raamana raamana commented Dec 22, 2025

This PR adds comprehensive path validation to prevent path traversal attacks and unauthorized access to system directories. All user-provided file paths in CLI commands are now validated before use.

Security Improvements

Path Traversal Prevention

  • Blocks relative paths containing .. (parent directory traversal)
  • Validates and resolves all paths before file operations
  • Prevents access to files outside intended directories

System Directory Protection

  • Blocks writes to critical system directories (/etc, /System, C:\Windows, etc.)
  • Platform-specific protection for Linux, macOS, and Windows
  • Allows legitimate temp directories (e.g., /tmp, /private/var/folders on macOS)

User Permission Warnings

  • Detects when running as administrator/root
  • Warns users about security risks of elevated privileges
  • Recommends running as non-admin user

Changes

New Files

  • cdisc_rules_engine/utilities/path_validator.py - Core path validation logic
  • cdisc_rules_engine/exceptions/path_validation_exceptions.py - Custom exceptions for path validation errors
  • tests/unit/test_utilities/test_path_validator.py - Comprehensive unit tests (18 tests)
  • tests/unit/test_core_path_validation.py - CLI integration tests (8 tests)

Modified Files

  • core.py - Integrated path validation for all CLI commands:
    • validate command: validates output, cache, data, dataset, define-xml, and dictionary paths
    • update_cache command: validates cache, custom rules, and custom standard paths
    • Refactored dictionary path handling to use dictionary directly (removed if/elif chain)
  • cdisc_rules_engine/exceptions/__init__.py - Exported new path validation exceptions

Path Validation Coverage

The following user-provided paths are now validated:

validate command:

  • --output / -o - Output file path (write operation)
  • --cache / -c - Cache directory path
  • --data / -d - Data directory path (read)
  • --dataset-path / -dp - Dataset file paths (read)
  • --define-xml-path / -dxp - Define-XML file path (read)
  • --report-template / -rt - Report template path (read)
  • --whodrug, --meddra, --loinc, --medrt, --unii - Dictionary paths (read)

update_cache command:

  • --cache-path / -c - Cache directory path
  • --custom-rules-directory / -crd - Custom rules directory (read)
  • --custom-rule / -cr - Custom rule files (read)
  • --update-custom-rule / -ucr - Custom rule update file (read)
  • --custom-standard / -cs - Custom standard JSON file (read)

Testing

  • 18 path validator unit tests
  • 8 CLI integration tests

Code Quality Improvements

  • Refactored dictionary path handling in core.py to eliminate repetitive if/elif chain
  • Improved maintainability by using dictionary directly for validation and construction
  • Added comprehensive error messages for invalid paths

NOTE: AI was used in helping expedite the process of improving code and docs

raamana and others added 19 commits December 18, 2025 10:12
Extract helper methods to reduce cyclomatic complexity from 14 to under 10:
- _is_macos_temp_path: Check if path is a macOS temp directory
- _is_windows_system_directory: Handle Windows path detection logic
- _normalized_path_matches_system_dir: Normalize and match Windows paths
- _is_unix_system_directory: Handle Unix/Linux path detection

This addresses flake8 C901 complexity warning.
- Move ctypes import to top of file with try/except (fixes E402)
- Reorder imports: logging, os, platform (alphabetical)
- Reorder typing imports: List, Optional (alphabetical)
- Reorder exception imports alphabetically
- Update check_user_permissions to use module-level ctypes
- Remove unused imports from test files (F401)
- Apply black code formatter to all modified files
- Remove trailing whitespace (W293)
- Fix end-of-file newlines (W391)
- Format list items and long lines per black style guide
- Add .editorconfig to enforce consistent code formatting
  - Trim trailing whitespace
  - Insert final newline
  - Set 120 character line length for Python
- Update .pre-commit-config.yaml with additional hooks
  - Add trailing-whitespace hook
  - Add end-of-file-fixer hook
  - Add check-yaml and check-added-large-files hooks
  - Enhance flake8 args to catch W293 and W391 errors
@raamana raamana force-pushed the improved_security_1 branch from a6efe57 to 017ef73 Compare January 6, 2026 02:56
@SFJohnson24 SFJohnson24 mentioned this pull request Jan 8, 2026
@SFJohnson24
Copy link
Collaborator

hi @raamana -- it looks like a few of our unit tests are failing with the new logic from your PR.
FAILED tests/unit/test_usdm_data.py::TestListDatasetMetadata::test_list_dataset_metadata_with_valid_paths - AssertionError: 1 != 0
FAILED tests/unit/test_operations/test_define_variable_metadata.py::test_get_define_variable_metadata_variable_in_domain[PandasDataset] - FileNotFoundError: [Errno 2] No such file or directory
FAILED tests/unit/test_operations/test_define_variable_metadata.py::test_get_define_variable_metadata_variable_in_domain[DaskDataset] - FileNotFoundError: [Errno 2] No such file or directory
FAILED tests/unit/test_operations/test_define_variable_metadata.py::test_get_define_variable_metadata_variable_not_in_domain[PandasDataset] - FileNotFoundError: [Errno 2] No such file or directory
FAILED tests/unit/test_operations/test_define_variable_metadata.py::test_get_define_variable_metadata_variable_not_in_domain[DaskDataset] - FileNotFoundError: [Errno 2] No such file or directory
FAILED tests/unit/test_services/test_data_service/test_local_data_service.py::test_get_variables_metdata[PandasDataset] - FileNotFoundError: [Errno 2] No such file or directory
The logs can be found here: https://github.com/cdisc-org/cdisc-rules-engine/actions/runs/20857758163/job/59943489801?pr=1500 -- could you take a look and resolve them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants