Skip to content

refactor: use pip --dry-run --report for pyproject.toml#406

Merged
ruromero merged 6 commits intoguacsec:mainfrom
ruromero:TC-4087
Apr 13, 2026
Merged

refactor: use pip --dry-run --report for pyproject.toml#406
ruromero merged 6 commits intoguacsec:mainfrom
ruromero:TC-4087

Conversation

@ruromero
Copy link
Copy Markdown
Collaborator

@ruromero ruromero commented Apr 11, 2026

Summary

  • Replace the venv + pip install/freeze/show chain with a single pip install --dry-run --ignore-installed --report command for pyproject.toml dependency resolution
  • Parse pip report JSON to build the full dependency tree directly, eliminating temporary file creation and virtual environment setup
  • Add comprehensive tests for pip report parsing, extras filtering, name canonicalization, and SBOM generation

Jira

TC-4087

Test plan

  • All existing pyproject tests pass (26 tests)
  • New pip report parsing tests pass (7 tests: direct deps, transitive graph, extras filtering, root exclusion, name canonicalization, helper methods)
  • SBOM generation integration tests (3 tests: provideStack, provideComponent, exhortignore — skipped in local env due to pre-existing CycloneDX classpath issue)
  • CI pipeline passes

🤖 Generated with Claude Code

@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Refactor pyproject.toml resolution using pip --dry-run --report

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Replace venv + pip install/freeze/show chain with single pip install --dry-run --report command
• Parse pip report JSON to build full dependency tree directly, eliminating temporary files
• Add comprehensive tests for pip report parsing, extras filtering, and name canonicalization
• Support environment variable override for pip report output in testing
Diagram
flowchart LR
  A["pyproject.toml"] -->|"pip install --dry-run --report"| B["pip_report.json"]
  B -->|"parsePipReport"| C["PipReportData"]
  C -->|"addDependencyTree"| D["SBOM with full dependency tree"]
  E["Environment Variable"] -->|"TRUSTIFY_DA_PIP_REPORT"| B
Loading

Grey Divider

File Changes

1. src/main/java/io/github/guacsec/trustifyda/providers/PythonProvider.java ✨ Enhancement +1/-1

Make handleIgnoredDependencies method protected

• Changed handleIgnoredDependencies method visibility from private to protected
• Allows subclasses to access ignored dependency handling logic

src/main/java/io/github/guacsec/trustifyda/providers/PythonProvider.java


2. src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java ✨ Enhancement +245/-9

Implement pip report-based dependency resolution

• Replaced venv + pip install/freeze/show chain with `pip install --dry-run --ignore-installed
 --report` command
• Implemented parsePipReport method to parse JSON report and build dependency graph with
 transitive dependencies
• Added helper methods for dependency name extraction, canonicalization, and extras filtering
• Implemented provideStack and provideComponent methods using pip report approach
• Added support for TRUSTIFY_DA_PIP_REPORT environment variable for testing
• Created PipPackage and PipReportData inner classes to represent parsed dependency data
• Removed temporary file creation and virtual environment setup logic

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java


3. src/test/java/io/github/guacsec/trustifyda/providers/Python_Pyproject_Provider_Test.java 🧪 Tests +189/-17

Add comprehensive pip report parsing tests

• Added 7 new unit tests for pip report parsing functionality
• Tests cover direct dependency identification, transitive graph building, extras filtering, root
 exclusion, and name canonicalization
• Added tests for helper methods hasExtraMarker, extractDepName, and canonicalize
• Added integration tests for provideStack and provideComponent with pip report
• Added test for exhortignore functionality with pip report
• Replaced old venv-based test with environment variable-based approach

src/test/java/io/github/guacsec/trustifyda/providers/Python_Pyproject_Provider_Test.java


View more (2)
4. src/test/resources/tst_manifests/pip/pip_pyproject_toml_no_ignore/pip_report.json 🧪 Tests +144/-0

Add pip report test fixture without ignore

• New test fixture containing pip report JSON output for pyproject.toml without ignore rules
• Includes root package entry with direct dependencies and transitive dependency graph
• Contains 14 packages with metadata including name, version, and requires_dist
• Demonstrates extras filtering with PySocks dependency marked with extra marker

src/test/resources/tst_manifests/pip/pip_pyproject_toml_no_ignore/pip_report.json


5. src/test/resources/tst_manifests/pip/pip_pyproject_toml_ignore/pip_report.json 🧪 Tests +144/-0

Add pip report test fixture with ignore

• New test fixture containing pip report JSON output for pyproject.toml with ignore rules
• Mirrors structure of no_ignore fixture for testing exhortignore functionality
• Used to verify that ignored dependencies are properly excluded from SBOM

src/test/resources/tst_manifests/pip/pip_pyproject_toml_ignore/pip_report.json


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review bot commented Apr 11, 2026

Code Review by Qodo

🐞 Bugs (1)   📘 Rule violations (0)   📎 Requirement gaps (0)
🐞\ ⛨ Security (1)

Grey Divider


Action required

1. Runs pip on project 🐞
Description
PythonPyprojectProvider executes pip install ... . in the scanned project directory, which can
invoke the project’s build backend/scripts during metadata/dependency resolution. When analyzing
untrusted repositories, this is an RCE risk because the analysis process may execute
attacker-controlled code.
Code

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[R151-155]

+    String pip = findPipBinary();
+    return Operations.runProcessGetOutput(
+        manifestDir,
+        new String[] {pip, "install", "--dry-run", "--ignore-installed", "--report", "-", "."});
+  }
Evidence
The provider shells out to pip with target . (local project) to generate the dependency report,
meaning pip operates on (and may execute build steps from) the untrusted project being analyzed.

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[145-155]
src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[77-98]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`PythonPyprojectProvider` generates a pip report by running `pip install ... .`, which targets the local project directory. Pip may execute the project’s build backend/scripts to obtain metadata, creating an RCE vector when scanning untrusted repos.
### Issue Context
The goal is to resolve dependencies for `pyproject.toml` without creating a venv/temp requirements file.
### Fix Focus Areas
- Use pip on declared deps (not `.`): build the pip command from `parseDependencyStrings()` output (direct requirements) so pip resolves/transitively expands without touching the local project build backend.
- If pip needs requirements-file input, reintroduce a minimal temporary requirements file (or add support to pass `-r -` via stdin) but keep the rest of the refactor.
- Consider sandboxing: if analyzing untrusted source is a requirement, ensure the pip execution is containerized/isolated.
### Fix Focus Areas (code references)
- src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[77-155]
- src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[363-392]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Ignores pip exit code🐞
Description
getPipReportOutput uses Operations.runProcessGetOutput, which does not check the process exit code
and may return stderr as “output”; parsePipReport can then attempt to parse non-JSON text, failing
with confusing IO errors. Additionally, runProcessGetOutput drains stdout and stderr sequentially,
which can deadlock for verbose commands that write heavily to stderr.
Code

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[R152-155]

+    return Operations.runProcessGetOutput(
+        manifestDir,
+        new String[] {pip, "install", "--dry-run", "--ignore-installed", "--report", "-", "."});
+  }
Evidence
PythonPyprojectProvider relies on Operations.runProcessGetOutput to obtain JSON. That helper neither
waits/validates exit status nor safely drains both output streams concurrently; it even contains a
TODO noting failures should throw, meaning this path can mis-handle pip failures or hang under heavy
stderr output.

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[151-155]
src/main/java/io/github/guacsec/trustifyda/tools/Operations.java[162-195]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`getPipReportOutput` uses `Operations.runProcessGetOutput`, which (1) does not check exit codes and can return stderr content as if it were successful stdout, and (2) reads stdout/stderr sequentially, which risks deadlock on stderr-heavy commands.
### Issue Context
`pip install --dry-run --report - ...` is expected to return valid JSON; any non-JSON output or partial output should be treated as a command failure with a clear error message.
### Fix Focus Areas
- Replace `runProcessGetOutput` usage with `runProcessGetFullOutput` and:
- check `exitCode != 0` and throw with stderr
- require non-blank stdout for the JSON report
- If keeping a simpler helper, implement safe concurrent draining (or `redirectErrorStream(true)` where appropriate) and always call `waitFor()` / check exit status.
### Fix Focus Areas (code references)
- src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[145-155]
- src/main/java/io/github/guacsec/trustifyda/tools/Operations.java[162-195]
- src/main/java/io/github/guacsec/trustifyda/tools/Operations.java[199-247]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Assumes single dir_info root🐞
Description
parsePipReport treats any install entry with download_info.dir_info as the root and excludes all
such entries from the dependency graph, overwriting rootEntry each time. If the report contains
multiple dir_info entries, non-root directory/path packages will be dropped and direct dependency
extraction may use the wrong root metadata.
Code

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[R176-186]

+    // Find root entry (has dir_info in download_info) and collect non-root packages
+    JsonNode rootEntry = null;
+    List<JsonNode> nonRootPackages = new ArrayList<>();
+    for (JsonNode entry : installArray) {
+      JsonNode downloadInfo = entry.get("download_info");
+      if (downloadInfo != null && downloadInfo.has("dir_info")) {
+        rootEntry = entry;
+      } else {
+        nonRootPackages.add(entry);
+      }
+    }
Evidence
The code assigns rootEntry on every matching entry and never adds those entries to nonRootPackages.
This creates a fragile implicit invariant (“exactly one dir_info entry exists”) that is not enforced
by the parser and would produce an incomplete graph if violated.

src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[176-187]
src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[188-223]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`parsePipReport` identifies the root entry by `download_info.dir_info` and excludes all such entries from the graph, overwriting `rootEntry` each time. If more than one entry matches, the resulting dependency graph becomes incomplete and root selection becomes unstable.
### Issue Context
The current fixtures only include one `dir_info` entry (the project itself), but the parser should be robust to multiple matches.
### Fix Focus Areas
- Only set `rootEntry` once (e.g., `if (rootEntry == null && ...) rootEntry = entry; else nonRootPackages.add(entry)`), so non-root `dir_info` entries remain in the graph.
- Alternatively, identify root using a stronger condition (e.g., `requested == true` combined with matching `metadata.name` to `getRootComponentName()`).
### Fix Focus Areas (code references)
- src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java[176-207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread src/main/java/io/github/guacsec/trustifyda/providers/PythonPyprojectProvider.java Outdated
@ruromero
Copy link
Copy Markdown
Collaborator Author

Verification Report for TC-4087 (commit 4803939)

Check Result Details
Review Feedback PASS 2 threads classified — 1 suggestion (RCE risk, dismissed by author), 1 code change request (exit code, already fixed in 4803939). No sub-tasks needed.
Root-Cause Investigation N/A No sub-tasks created
Scope Containment PASS 6 files changed: 3 production Java files (PythonPyprojectProvider, PythonProvider, PythonPipProvider), 1 test file, 2 test fixtures — all in scope for pyproject.toml refactoring
Diff Size PASS +950 / -228 across 6 files. ~290 lines are JSON test fixtures, ~200 lines are new tests. Proportionate to scope.
Commit Traceability WARN None of the 3 commits reference TC-4087 in the message
Sensitive Patterns PASS No secrets detected (false positive on Environment import)
CI Status PASS 38/38 checks pass (build, unit tests, integration tests across ubuntu/macos/windows for all ecosystems)
Acceptance Criteria N/A Jira task description not accessible — skipped
Test Quality PASS 10 new unit tests + 3 SBOM integration tests. No repetitive parameterization candidates.
Verification Commands N/A Jira task description not accessible — skipped

Overall: WARN

Commit messages do not reference TC-4087. All CI checks pass and review feedback is addressed.


This comment was AI-generated by sdlc-workflow/verify-pr v0.5.11.

@ruromero ruromero force-pushed the TC-4087 branch 2 times, most recently from f81ae55 to baa8908 Compare April 13, 2026 13:58
Copy link
Copy Markdown
Contributor

@a-oren a-oren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruromero ruromero enabled auto-merge (squash) April 13, 2026 14:36
ruromero and others added 6 commits April 13, 2026 16:42
…dency resolution

Replace the venv + pip install/freeze/show chain with a single
pip install --dry-run --ignore-installed --report command that outputs
the full dependency tree as JSON. This eliminates temporary file
creation, virtual environment setup, and multiple subprocess calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Provider

PythonProvider now only holds shared utilities (toPurl, handleIgnoredDependencies,
containsIgnorePattern, root component defaults). The venv-based provideStack/
provideComponent, getPythonController, getExecutable, and related methods are
moved to PythonPipProvider — the only consumer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use runProcessGetFullOutput instead of runProcessGetOutput so that
stdout/stderr are drained in parallel (avoiding deadlock) and a non-zero
exit code throws a clear error instead of silently returning stderr.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poetry-style [tool.poetry.dependencies] in pyproject.toml is not
supported; provideStack/provideComponent now fail fast with a clear
error message. Also use manifest.toAbsolutePath().getParent() to
prevent null working directory, and log a warning when the pip report
has packages but no root entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poetry dependencies are explicitly rejected; remove the remaining
Poetry-specific fallbacks for name/version/license, the Poetry
dependency parsing and version-conversion methods, and all
corresponding tests to avoid false expectations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ruben Romero Montes <rromerom@redhat.com>
@ruromero ruromero merged commit 2102dab into guacsec:main Apr 13, 2026
38 checks passed
@ruromero ruromero deleted the TC-4087 branch April 13, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants