Skip to content

Conversation

@davidgamez
Copy link
Member

Summary:

This is a complete miss from the previous implementation. While the license resolver was able to catch multiple formats of the license URL, it totally missed the SPDX URL(it's kind of obvious... sorry )

From our AI friend

This pull request enhances the license resolution logic by adding support for SPDX catalog URLs (e.g., spdx.org/licenses/<ID>.html) and refines the matching strategies used to resolve license URLs. The main changes include a new SPDX resolver branch, improved ordering and documentation of resolution strategies, and corresponding tests for the new SPDX logic.

License Resolution Logic Improvements:

  • Added a new SPDX catalog URL resolver branch to resolve_license, which parses SPDX IDs from URLs like spdx.org/licenses/ODbL-1.0.html and attempts to match them against the database. If found, returns a high-confidence match; if not, logs a warning and returns no result.
  • Updated the resolution strategy documentation and ordering in resolve_license to include the new SPDX branch and clarify existing heuristics and fuzzy matching.

Supporting Functions and Tests:

  • Introduced extract_spdx_id_from_url to safely extract SPDX IDs from SPDX catalog URLs, and refactored find_exact_match_license_url for clarity and reuse.
  • Added tests for SPDX catalog URL resolution, covering both DB hit and miss scenarios to ensure correct resolver behavior and logging.

Other Changes:

  • Removed a comment placeholder about future license patterns in the MatchingLicense class.
  • Cleaned up duplicate and moved code for find_exact_match_license_url after refactoring.

Expected behavior:

The license URL is properly resolved when the license URL have the SPDX URL format.

Testing tips:

Provide tips, procedures and sample files on how to test the feature.
Testers are invited to follow the tips AND to try anything they deem relevant outside the bounds of the testing tips.

  • Populate your local DB
  • Run the public API locally
  • Execute
curl --request POST \
  --url http://localhost:8080/v1/licenses:match \
  --header 'content-type: application/json' \
  --data '{
  "license_url": "https://spdx.org/licenses/ODbL-1.0.html"
}
'
  • Expect the following response
[
  {
    "license_id": "ODbL-1.0",
    "license_url": "https://spdx.org/licenses/ODbL-1.0.html",
    "normalized_url": "spdx.org/licenses/odbl-1.0.html",
    "match_type": "heuristic",
    "confidence": 0.98,
    "spdx_id": "odbl-1.0",
    "matched_name": "Open Data Commons Open Database License v1.0",
    "matched_catalog_url": "http://www.opendatacommons.org/licenses/odbl/1.0/",
    "matched_source": "spdx-resolver",
    "notes": null,
    "regional_id": null
  }
]

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@davidgamez davidgamez changed the title feat: add license SPDX ID resolver fix: add license SPDX ID resolver Jan 14, 2026
@davidgamez davidgamez marked this pull request as ready for review January 14, 2026 22:27
Copy link
Contributor

@qcdyx qcdyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! LGTM!

@davidgamez davidgamez merged commit 25e41ce into main Jan 15, 2026
2 of 3 checks passed
@davidgamez davidgamez deleted the fix/resolve_spdx_urls branch January 15, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants