Skip to content

More maven parsing fix for experiment detector#1724

Open
zhenghao104 wants to merge 6 commits intomainfrom
users/zhentan/more-maven-fix-v2
Open

More maven parsing fix for experiment detector#1724
zhenghao104 wants to merge 6 commits intomainfrom
users/zhentan/more-maven-fix-v2

Conversation

@zhenghao104
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the experimental MavenWithFallbackDetector to improve static POM parsing behavior by introducing “smart” POM preloading intended to help cross-document property/version resolution during fallback parsing.

Changes:

  • Added per-scan state to support preloading all POM XmlDocuments for variable resolution.
  • Wired preloading into the fallback path and variable-replacement path to enable cross-document lookups.

@zhenghao104 zhenghao104 force-pushed the users/zhentan/more-maven-fix-v2 branch from 981dd8d to 6f27926 Compare March 16, 2026 23:52
@github-actions
Copy link

github-actions bot commented Mar 16, 2026

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

  • The detector detects more or fewer components than before
  • The detector generates different parent/child graph relationships than before
  • The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Copilot AI review requested due to automatic review settings March 17, 2026 23:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Maven detector flow to avoid .mvndeps file cleanup races by deferring cleanup until detector execution completes, and enhances the experimental Maven fallback detector’s static parsing to better handle version variables.

Changes:

  • Add orchestrator-level cleanup of Maven .mvndeps files after all detectors finish.
  • Refactor Maven detectors to remove per-file reader coordination and introduce a two-pass static POM parsing approach (collect variables, then resolve pending components).
  • Add orchestrator unit tests covering Maven cleanup behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests validating .mvndeps cleanup behavior depending on whether Maven detectors ran.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds post-processing cleanup routine for Maven .mvndeps files after detector execution completes.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record type to queue unresolved Maven components for second-pass resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes reader registration/unregistration logic and tweaks debug logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Implements two-pass variable collection + pending component resolution and updates telemetry/logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader coordination and deletion logic from the command service.
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader coordination APIs from the service interface.

Copilot AI review requested due to automatic review settings March 18, 2026 01:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Maven detector behavior to avoid Maven CLI dependency-file cleanup races, and improves MavenWithFallback static parsing by deferring variable-based version resolution to a second pass.

Changes:

  • Add orchestrator-level cleanup of Maven CLI *.mvndeps files after all detectors complete.
  • Rework MavenWithFallbackDetector static parsing to collect variables first and resolve variable-based versions in a second pass (via PendingComponent).
  • Remove Maven CLI file-reader registration/unregistration coordination APIs from IMavenCommandService/MavenCommandService, and update detectors accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests asserting Maven deps files are deleted only when Maven detectors run.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds post-scan Maven deps file cleanup in the orchestrator.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record to hold unresolved Maven components for second-pass processing.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes per-file reader unregister logic; adds additional debug logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Implements two-pass variable collection + resolution for static parsing.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader reference counting/cleanup logic.
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader coordination methods from the interface.

Copilot AI review requested due to automatic review settings March 20, 2026 22:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens Maven detection behavior by (1) improving MavenWithFallback’s static POM parsing to resolve variables across parent/child hierarchies using multi-pass processing, and (2) moving Maven CLI .mvndeps cleanup to the orchestrator after all detectors complete to avoid cross-detector races.

Changes:

  • Add orchestrator-level cleanup of Maven CLI-generated dependency files after detector execution.
  • Implement multi-pass variable collection + hierarchy-aware resolution in MavenWithFallbackDetector and add a PendingComponent model for deferred registrations.
  • Expand test coverage for variable resolution edge cases and new cleanup behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests validating post-processing cleanup of .mvndeps files depending on which detectors ran.
test/Microsoft.ComponentDetection.Detectors.Tests/MavenWithFallbackDetectorTests.cs Adds many tests covering parent/child/sibling variable resolution, deferred resolution, and traversal robustness.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds CleanupMavenFilesAsync and invokes it after all detectors finish.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record to hold unresolved components for second/third-pass resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes per-file reader unregister logic and adds debug logging around contribution counts.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Introduces multi-pass parsing with concurrent state structures, hierarchy tracking, and deferred resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader tracking/cleanup coordination APIs (cleanup is now orchestrator-driven).
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader tracking/cleanup coordination API from the interface.

Comment on lines +2203 to +2207
stopwatch.Stop();

// Assert - Should complete quickly (well under 1 second for this simple test)
stopwatch.ElapsedMilliseconds.Should().BeLessThan(1000, "Smart loop prevention should not significantly impact performance");
scanResult.ResultCode.Should().Be(ProcessingResultCode.Success);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance assertion based on a hard 1-second wall-clock threshold is likely to be flaky on slower/contended CI agents. Prefer an assertion that the code completes (no hang) or validate algorithmic behavior without relying on timing, or at least use a much looser threshold and isolate the test as non-gating if timing is required.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +164
// Clean up Maven CLI temporary files after all detectors have finished
await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupMavenFilesAsync is invoked unconditionally after detector execution. This will delete *.mvndeps even when the scan request has CleanupCreatedFiles=false, which breaks the contract implied by ScanSettings/ScanRequest and could remove user files unexpectedly. Gate this call (or the cleanup method) on settings.CleanupCreatedFiles being true.

Suggested change
// Clean up Maven CLI temporary files after all detectors have finished
await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors);
// Clean up Maven CLI temporary files after all detectors have finished,
// but only if cleanup of created files is enabled in settings.
if (settings.CleanupCreatedFiles)
{
await this.CleanupMavenFilesAsync(settings.SourceDirectory, detectors);
}

Copilot uses AI. Check for mistakes.
Comment on lines +446 to +454
var searchPattern = "*.mvndeps"; // Simple file pattern for Directory.GetFiles

this.logger.LogDebug("Starting Maven CLI cleanup in directory: {SourceDirectory}", sourceDirectory.FullName);

await Task.Run(() =>
{
// Search recursively for Maven CLI dependency files
var filesToClean = Directory.GetFiles(sourceDirectory.FullName, searchPattern, SearchOption.AllDirectories);

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupMavenFilesAsync deletes all files matching "*.mvndeps" using Directory.GetFiles(..., AllDirectories). In this repo MavenCommandService generates a specific filename ("bcde.mvndeps"), so using a broad glob risks deleting unrelated user files; additionally GetFiles materializes the full list and can be memory-heavy on large repos. Consider restricting to the known generated filename and using EnumerateFiles (optionally with cancellation) to stream deletions.

Copilot uses AI. Check for mistakes.
Comment on lines 113 to +138
@@ -118,6 +130,12 @@ public class MavenWithFallbackDetector : FileComponentDetector, IExperimentalDet
private readonly ConcurrentQueue<string> mavenCliErrors = [];
private readonly ConcurrentQueue<string> failedEndpoints = [];

/// <summary>
/// Cache for parent POM lookups to avoid repeated file system operations.
/// Key: current file path, Value: parent POM path or empty string if not found.
/// </summary>
private readonly ConcurrentDictionary<string, string> parentPomCache = new();

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MavenWithFallbackDetector is registered as a singleton, but the newly added ConcurrentDictionary/ConcurrentQueue fields (collectedVariables, processedMavenProjects, mavenParentChildRelationships, parentPomCache, etc.) are never cleared between scans. This can cause memory growth and incorrect variable resolution due to stale state leaking across scans. Reset these collections (and related counters/flags) at the start of each scan (e.g., in OnPrepareDetectionAsync) and/or ensure they are per-scan locals.

Copilot uses AI. Check for mistakes.

this.logger.LogDebug("Starting Maven CLI cleanup in directory: {SourceDirectory}", sourceDirectory.FullName);

await Task.Run(() =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something, we don't have anything async here. Why is it being wrapped with Task.Run? We will still be blocking a thread while waiting for the output, so it isn't scaling the actual work IIRC.

await Task.Run(() =>
{
// Search recursively for Maven CLI dependency files
var filesToClean = Directory.GetFiles(sourceDirectory.FullName, searchPattern, SearchOption.AllDirectories);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leveraging FileSystemEnumerable and Parallel.ForEach which can handle streaming and fewer allocations, see FastDirectoryWalkerFactory.cs. The longer the list of files to explore the slower this will be. It would be good to have a telemetry record to capture how long did we take doing the cleanup.


try
{
var searchPattern = "*.mvndeps"; // Simple file pattern for Directory.GetFiles
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this variable a private file-level constant, also we don't expect other file names, only bcde.mvndeps right? Why not only sticking to that pattern and create a public constant variable that can be shared between the detectors that use it, and this place so they are always in sync.

private async Task CleanupMavenFilesAsync(DirectoryInfo sourceDirectory, IEnumerable<IComponentDetector> detectors)
{
// Only clean up if Maven detectors are running
var mavenDetectorIds = new[] { "MvnCli", "MavenWithFallback" };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reflection I think we can make another constant string shared between the detector IDs and here, so we are never out of sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants