Adapt Extraction of PyPI Metadata for PEP 639 Support#729
Open
yashkohli88 wants to merge 1 commit intoclearlydefined:masterfrom
Open
Adapt Extraction of PyPI Metadata for PEP 639 Support#729yashkohli88 wants to merge 1 commit intoclearlydefined:masterfrom
yashkohli88 wants to merge 1 commit intoclearlydefined:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Summary
This PR adds support for PEP 639
license_expressionfield in the PyPI metadata extraction logic. PEP 639 introduces a standardized SPDXlicense_expressionfield, deprecating the legacy free-formlicensetext and license classifiers.Problem
PyPI packages are transitioning to PEP 639, which introduces
info.license_expressioncontaining proper SPDX license expressions. The crawler currently does not recognize this new field, potentially missing accurate license data from packages that have already adopted the new standard.Examples:
info.licensefree-form textinfo.license_expressionSPDX expressionSolution
Extends the license extraction logic to prioritize the new
license_expressionfield while maintaining full backward compatibility with existing detection.Changes
providers/fetch/pypiFetch.jsAdded
_extractLicenseExpression(registryData)— New method to extract and validateinfo.license_expressionfrom PyPI registry data. Returns the expression string if valid, otherwisenull.Updated
_extractDeclaredLicense(registryData)— Now checks forlicense_expressionfirst via_extractLicenseExpression(), falling back to existing logic (free-forminfo.license→ license classifiers → SPDX normalization) when unavailable.Test Summary
extractLicenseExpressionlicense_expressionis present ('MIT')'MIT'extractLicenseExpression'MIT AND Apache-2.0')extractLicenseExpressionWITHclauseextractLicenseExpressionlicense_expressionis missingnullextractLicenseExpressionlicense_expressionisnullnullextractLicenseExpressionlicense_expressionis empty stringnullextractLicenseExpressionlicense_expressionis not a string (123)nullextractDeclaredLicenselicense_expressionoverinfo.licenselicense_expressionextractDeclaredLicenselicense_expressionover classifierslicense_expressionextractDeclaredLicenseinfo.licensewhenlicense_expressionmissinginfo.licenseBackward Compatibility
license_expression(PEP 639)license_expressiondirectlyinfo.license(legacy)license_expressionlicense_expressionis empty/null/invalidTesting
_extractLicenseExpression()with valid SPDX expressions_extractLicenseExpression()with null/empty/invalid valuesAND,OR,WITH)_extractDeclaredLicense()Related