More maven parsing fix for experiment detector#1724
More maven parsing fix for experiment detector#1724zhenghao104 wants to merge 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the experimental MavenWithFallbackDetector to improve static POM parsing behavior by introducing “smart” POM preloading intended to help cross-document property/version resolution during fallback parsing.
Changes:
- Added per-scan state to support preloading all POM
XmlDocuments for variable resolution. - Wired preloading into the fallback path and variable-replacement path to enable cross-document lookups.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Outdated
Show resolved
Hide resolved
981dd8d to
6f27926
Compare
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
There was a problem hiding this comment.
Pull request overview
This PR updates the Maven detector flow to avoid .mvndeps file cleanup races by deferring cleanup until detector execution completes, and enhances the experimental Maven fallback detector’s static parsing to better handle version variables.
Changes:
- Add orchestrator-level cleanup of Maven
.mvndepsfiles after all detectors finish. - Refactor Maven detectors to remove per-file reader coordination and introduce a two-pass static POM parsing approach (collect variables, then resolve pending components).
- Add orchestrator unit tests covering Maven cleanup behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs | Adds tests validating .mvndeps cleanup behavior depending on whether Maven detectors ran. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs | Adds post-processing cleanup routine for Maven .mvndeps files after detector execution completes. |
| src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs | Introduces a record type to queue unresolved Maven components for second-pass resolution. |
| src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs | Removes reader registration/unregistration logic and tweaks debug logging. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs | Implements two-pass variable collection + pending component resolution and updates telemetry/logging. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs | Removes file-reader coordination and deletion logic from the command service. |
| src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs | Removes file-reader coordination APIs from the service interface. |
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs
Show resolved
Hide resolved
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR updates Maven detector behavior to avoid Maven CLI dependency-file cleanup races, and improves MavenWithFallback static parsing by deferring variable-based version resolution to a second pass.
Changes:
- Add orchestrator-level cleanup of Maven CLI
*.mvndepsfiles after all detectors complete. - Rework
MavenWithFallbackDetectorstatic parsing to collect variables first and resolve variable-based versions in a second pass (viaPendingComponent). - Remove Maven CLI file-reader registration/unregistration coordination APIs from
IMavenCommandService/MavenCommandService, and update detectors accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs | Adds tests asserting Maven deps files are deleted only when Maven detectors run. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs | Adds post-scan Maven deps file cleanup in the orchestrator. |
| src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs | Introduces a record to hold unresolved Maven components for second-pass processing. |
| src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs | Removes per-file reader unregister logic; adds additional debug logging. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs | Implements two-pass variable collection + resolution for static parsing. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs | Removes file-reader reference counting/cleanup logic. |
| src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs | Removes file-reader coordination methods from the interface. |
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs
Show resolved
Hide resolved
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs
Show resolved
Hide resolved
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR tightens Maven detection behavior by (1) improving MavenWithFallback’s static POM parsing to resolve variables across parent/child hierarchies using multi-pass processing, and (2) moving Maven CLI .mvndeps cleanup to the orchestrator after all detectors complete to avoid cross-detector races.
Changes:
- Add orchestrator-level cleanup of Maven CLI-generated dependency files after detector execution.
- Implement multi-pass variable collection + hierarchy-aware resolution in
MavenWithFallbackDetectorand add aPendingComponentmodel for deferred registrations. - Expand test coverage for variable resolution edge cases and new cleanup behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs | Adds tests validating post-processing cleanup of .mvndeps files depending on which detectors ran. |
| test/Microsoft.ComponentDetection.Detectors.Tests/MavenWithFallbackDetectorTests.cs | Adds many tests covering parent/child/sibling variable resolution, deferred resolution, and traversal robustness. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs | Adds CleanupMavenFilesAsync and invokes it after all detectors finish. |
| src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs | Introduces a record to hold unresolved components for second/third-pass resolution. |
| src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs | Removes per-file reader unregister logic and adds debug logging around contribution counts. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs | Introduces multi-pass parsing with concurrent state structures, hierarchy tracking, and deferred resolution. |
| src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs | Removes file-reader tracking/cleanup coordination APIs (cleanup is now orchestrator-driven). |
| src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs | Removes file-reader tracking/cleanup coordination API from the interface. |
| stopwatch.Stop(); | ||
|
|
||
| // Assert - Should complete quickly (well under 1 second for this simple test) | ||
| stopwatch.ElapsedMilliseconds.Should().BeLessThan(1000, "Smart loop prevention should not significantly impact performance"); | ||
| scanResult.ResultCode.Should().Be(ProcessingResultCode.Success); |
There was a problem hiding this comment.
The performance assertion based on a hard 1-second wall-clock threshold is likely to be flaky on slower/contended CI agents. Prefer an assertion that the code completes (no hang) or validate algorithmic behavior without relying on timing, or at least use a much looser threshold and isolate the test as non-gating if timing is required.
| // Clean up Maven CLI temporary files after all detectors have finished | ||
| await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors); |
There was a problem hiding this comment.
CleanupMavenFilesAsync is invoked unconditionally after detector execution. This will delete *.mvndeps even when the scan request has CleanupCreatedFiles=false, which breaks the contract implied by ScanSettings/ScanRequest and could remove user files unexpectedly. Gate this call (or the cleanup method) on settings.CleanupCreatedFiles being true.
| // Clean up Maven CLI temporary files after all detectors have finished | |
| await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors); | |
| // Clean up Maven CLI temporary files after all detectors have finished, | |
| // but only if cleanup of created files is enabled in settings. | |
| if (settings.CleanupCreatedFiles) | |
| { | |
| await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors); | |
| } |
| var searchPattern = "*.mvndeps"; // Simple file pattern for Directory.GetFiles | ||
|
|
||
| this.logger.LogDebug("Starting Maven CLI cleanup in directory: {SourceDirectory}", sourceDirectory.FullName); | ||
|
|
||
| await Task.Run(() => | ||
| { | ||
| // Search recursively for Maven CLI dependency files | ||
| var filesToClean = Directory.GetFiles(sourceDirectory.FullName, searchPattern, SearchOption.AllDirectories); | ||
|
|
There was a problem hiding this comment.
CleanupMavenFilesAsync deletes all files matching "*.mvndeps" using Directory.GetFiles(..., AllDirectories). In this repo MavenCommandService generates a specific filename ("bcde.mvndeps"), so using a broad glob risks deleting unrelated user files; additionally GetFiles materializes the full list and can be memory-heavy on large repos. Consider restricting to the known generated filename and using EnumerateFiles (optionally with cancellation) to stream deletions.
| @@ -118,6 +130,12 @@ public class MavenWithFallbackDetector : FileComponentDetector, IExperimentalDet | |||
| private readonly ConcurrentQueue<string> mavenCliErrors = []; | |||
| private readonly ConcurrentQueue<string> failedEndpoints = []; | |||
|
|
|||
| /// <summary> | |||
| /// Cache for parent POM lookups to avoid repeated file system operations. | |||
| /// Key: current file path, Value: parent POM path or empty string if not found. | |||
| /// </summary> | |||
| private readonly ConcurrentDictionary<string, string> parentPomCache = new(); | |||
|
|
|||
There was a problem hiding this comment.
MavenWithFallbackDetector is registered as a singleton, but the newly added ConcurrentDictionary/ConcurrentQueue fields (collectedVariables, processedMavenProjects, mavenParentChildRelationships, parentPomCache, etc.) are never cleared between scans. This can cause memory growth and incorrect variable resolution due to stale state leaking across scans. Reset these collections (and related counters/flags) at the start of each scan (e.g., in OnPrepareDetectionAsync) and/or ensure they are per-scan locals.
|
|
||
| this.logger.LogDebug("Starting Maven CLI cleanup in directory: {SourceDirectory}", sourceDirectory.FullName); | ||
|
|
||
| await Task.Run(() => |
There was a problem hiding this comment.
Unless I am missing something, we don't have anything async here. Why is it being wrapped with Task.Run? We will still be blocking a thread while waiting for the output, so it isn't scaling the actual work IIRC.
| await Task.Run(() => | ||
| { | ||
| // Search recursively for Maven CLI dependency files | ||
| var filesToClean = Directory.GetFiles(sourceDirectory.FullName, searchPattern, SearchOption.AllDirectories); |
There was a problem hiding this comment.
Why not leveraging FileSystemEnumerable and Parallel.ForEach which can handle streaming and fewer allocations, see FastDirectoryWalkerFactory.cs. The longer the list of files to explore the slower this will be. It would be good to have a telemetry record to capture how long did we take doing the cleanup.
|
|
||
| try | ||
| { | ||
| var searchPattern = "*.mvndeps"; // Simple file pattern for Directory.GetFiles |
There was a problem hiding this comment.
Let's make this variable a private file-level constant, also we don't expect other file names, only bcde.mvndeps right? Why not only sticking to that pattern and create a public constant variable that can be shared between the detectors that use it, and this place so they are always in sync.
| private async Task CleanupMavenFilesAsync(DirectoryInfo sourceDirectory, IEnumerable<IComponentDetector> detectors) | ||
| { | ||
| // Only clean up if Maven detectors are running | ||
| var mavenDetectorIds = new[] { "MvnCli", "MavenWithFallback" }; |
There was a problem hiding this comment.
Without reflection I think we can make another constant string shared between the detector IDs and here, so we are never out of sync.
No description provided.