Skip to content

Add license extraction for pyproject.toml#270

Merged
JustinWonjaePark merged 2 commits into
mainfrom
dev_pyproject_toml
May 22, 2026
Merged

Add license extraction for pyproject.toml#270
JustinWonjaePark merged 2 commits into
mainfrom
dev_pyproject_toml

Conversation

@JustinWonjaePark
Copy link
Copy Markdown
Contributor

@JustinWonjaePark JustinWonjaePark commented May 15, 2026

New Features

  • Enabled recognition of pyproject.toml as a manifest file
  • Added automatic license extraction from pyproject.toml configurations, supporting both structured and text-based license declarations

Review Change Stack

@JustinWonjaePark JustinWonjaePark self-assigned this May 15, 2026
@JustinWonjaePark JustinWonjaePark added the enhancement [PR/Issue] New feature or request label May 15, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@JustinWonjaePark has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b7f2710d-30d4-4a83-a1f9-2133ed5e4e68

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa6aee and 18e9054.

📒 Files selected for processing (1)
  • src/fosslight_source/run_manifest_extractor.py
📝 Walkthrough

Walkthrough

Adds pyproject.toml license extraction to the manifest scanner. The file is now recognized as a manifest, license identifiers are extracted from its [project].license field using TOML parsing with regex fallback, and the extractor is integrated into the main manifest license dispatcher.

Changes

pyproject.toml License Extraction

Layer / File(s) Summary
Manifest file recognition
src/fosslight_source/_scan_item.py
Extends _manifest_filename regex pattern to recognize pyproject.toml files.
License extraction implementation
src/fosslight_source/run_manifest_extractor.py
Introduces get_licenses_from_pyproject_toml() that parses [project].license via tomllib/tomli (with string and table text support), includes regex fallback for various license declaration syntaxes, and returns empty list when license.file is specified.
License string helper
src/fosslight_source/run_manifest_extractor.py
Adds _license_string_to_list() helper to normalize a license string into a single-item list or an empty list.
Dispatcher integration
src/fosslight_source/run_manifest_extractor.py
Updates get_manifest_licenses() to recognize pyproject.toml and route it to the new extractor with error handling.
Optional dependency
pyproject.toml
Adds conditional dependency tomli; python_version < '3.11' to support TOML parsing on older Python versions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • soimkim
  • dd-jy
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding license extraction functionality for pyproject.toml files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev_pyproject_toml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JustinWonjaePark JustinWonjaePark marked this pull request as draft May 15, 2026 10:58
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_source/run_manifest_extractor.py`:
- Around line 128-156: The get_licenses_from_pyproject_toml function falls back
to importing tomli for Python <3.11 but tomli is missing from dependencies; add
tomli to the project's pyproject.toml dependencies (e.g., add a conditional
dependency "tomli>=1.0.0; python_version < '3.11'") so the fallback import in
get_licenses_from_pyproject_toml works on Python 3.10 and similar runtimes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fd9017c8-eb3c-486d-82cb-5fb2be70bddd

📥 Commits

Reviewing files that changed from the base of the PR and between 27d43e0 and 2503149.

📒 Files selected for processing (2)
  • src/fosslight_source/_scan_item.py
  • src/fosslight_source/run_manifest_extractor.py

Comment thread src/fosslight_source/run_manifest_extractor.py
Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
@JustinWonjaePark JustinWonjaePark requested review from dd-jy and soimkim May 15, 2026 11:16
@JustinWonjaePark JustinWonjaePark marked this pull request as ready for review May 15, 2026 11:16
@JustinWonjaePark
Copy link
Copy Markdown
Contributor Author

Case Type Text used for detection Extracted result
01_spdx_string_single PEP 639 SPDX string license = "Apache-2.0" ['Apache-2.0']
02_spdx_string_or PEP 639 SPDX string license = "MIT OR Apache-2.0" ['MIT', 'Apache-2.0']
03_spdx_string_and PEP 639 SPDX string license = "BSD-3-Clause AND MIT" ['BSD-3-Clause', 'MIT']
04_spdx_string_with_exception PEP 639 SPDX string license = "GPL-2.0-only WITH Classpath-exception-2.0" ['GPL-2.0-only WITH Classpath-exception-2.0']
05_pep621_text_inline PEP 621 legacy text table license = { text = "MIT OR Apache-2.0" } ['MIT', 'Apache-2.0']
06_pep621_file_inline PEP 621 legacy file table license = { file = "LICENSE" } []
07_triple_quoted_string TOML triple-quoted string license = """Apache-2.0""" ['Apache-2.0']
08_missing_license Missing license field no license field under [project] []
09_tool_poetry_license_only Non-standard Poetry metadata [tool.poetry] license = "MIT" []

@soimkim
Copy link
Copy Markdown
Contributor

soimkim commented May 19, 2026

@JustinWonjaePark , license에는 SPDX 포맷이 아닌 경우도 입력될 수 있습니다.
license = "GPL-2.0 or later" 는 [GPL-2.0,later]로 출력될 것으로 보입니다.
이에 OR 나 AND에 대한 split 을 하지 않게 수정해주십시오.

@JustinWonjaePark
Copy link
Copy Markdown
Contributor Author

JustinWonjaePark commented May 20, 2026

@JustinWonjaePark , license에는 SPDX 포맷이 아닌 경우도 입력될 수 있습니다. license = "GPL-2.0 or later" 는 [GPL-2.0,later]로 출력될 것으로 보입니다. 이에 OR 나 AND에 대한 split 을 하지 않게 수정해주십시오.

반영했습니다.
동일한 코드 여러번 반복되어서 _license_string_to_list() 추가했습니다 :)
마음이 바뀌어서...그냥 return 안에서 list로 변환했습니다.

Case Type Text used for detection Extracted result
01_spdx_string_single PEP 639 SPDX string license = "Apache-2.0" ['Apache-2.0']
02_spdx_string_or PEP 639 SPDX string license = "MIT OR Apache-2.0" ['MIT OR Apache-2.0']
03_spdx_string_and PEP 639 SPDX string license = "BSD-3-Clause AND MIT" ['BSD-3-Clause AND MIT']
04_spdx_string_with_exception PEP 639 SPDX string license = "GPL-2.0-only WITH Classpath-exception-2.0" ['GPL-2.0-only WITH Classpath-exception-2.0']
05_pep621_text_inline PEP 621 legacy text table license = { text = "MIT OR Apache-2.0" } ['MIT OR Apache-2.0']
06_pep621_file_inline PEP 621 legacy file table license = { file = "LICENSE" } []
07_triple_quoted_string TOML triple-quoted string license = """Apache-2.0""" ['Apache-2.0']
08_missing_license Missing license field no license field under [project] []
09_tool_poetry_license_only Non-standard Poetry metadata [tool.poetry] license = "MIT" []

Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
@JustinWonjaePark JustinWonjaePark merged commit 9ae8f45 into main May 22, 2026
8 checks passed
@JustinWonjaePark JustinWonjaePark deleted the dev_pyproject_toml branch May 22, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement [PR/Issue] New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants