Skip to content

feat: deduplicate multi-artifact package results (PyPI, etc.)#156

Merged
Alexandros Kapravelos (kapravel) merged 1 commit intomainfrom
feat/deduplicate-pypi-artifacts
Mar 16, 2026
Merged

feat: deduplicate multi-artifact package results (PyPI, etc.)#156
Alexandros Kapravelos (kapravel) merged 1 commit intomainfrom
feat/deduplicate-pypi-artifacts

Conversation

@kapravel
Copy link
Copy Markdown
Collaborator

@kapravel Alexandros Kapravelos (kapravel) commented Mar 16, 2026

Summary

  • When the Socket API resolves a PyPI PURL like pkg:pypi/numpy@1.26.0, it returns one NDJSON line per artifact (sdist, wheels for manylinux x86_64, macosx arm64, win amd64, etc.). The MCP was outputting every line, flooding the agent with duplicate results for the same package.
  • This PR adds server-side deduplication that groups results by package identity (type, namespace, name, version) and selects one representative artifact per group using a priority system: source distribution > universal wheel > first artifact.
  • Adds an optional platform parameter (e.g. darwin-arm64, linux-x64) so agents that can detect the user's OS/arch can get the most relevant artifact. Falls back gracefully for agents without tool execution (like Claude Web).
  • Also filters out purlError and summary NDJSON lines that were previously processed as if they were artifacts.

Changes

  • New lib/artifacts.ts: deduplicateArtifacts() function with grouping, platform matching (maps Node.js-style os-arch to ecosystem-specific patterns), and default selection logic.
  • Updated index.ts: Added optional platform param to depscore schema, filter non-artifact NDJSON lines, wire in deduplication.
  • New artifacts.test.ts: 18 unit tests covering deduplication, platform matching, edge cases.
  • Updated test.ts: 2 integration tests verifying numpy deduplication with and without platform hint.

Test plan

  • TypeScript type check passes
  • ESLint passes
  • All 30 unit tests pass (18 new + 12 existing)
  • Integration tests pass with live API (numpy returns 1 result instead of many)

Made with Cursor

PyPI packages like numpy return one NDJSON line per artifact (sdist,
wheels for each platform). This floods the agent with duplicate results
for the same package.

- Add lib/artifacts.ts with deduplicateArtifacts() that groups results
  by (type, namespace, name, version) and selects one representative
  per group (source dist > universal wheel > first artifact)
- Add optional `platform` parameter to depscore tool for agents that
  can detect the user's OS/arch (e.g. 'darwin-arm64', 'linux-x64')
- Filter out purlError/summary NDJSON lines before processing
- Add 18 unit tests for deduplication and platform matching
- Add 2 integration tests for numpy deduplication with/without platform

Made-with: Cursor
@kapravel Alexandros Kapravelos (kapravel) merged commit 32aae45 into main Mar 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant