-
-
Notifications
You must be signed in to change notification settings - Fork 266
Open
Description
Description
I have been analyzing the vulnerablecode/importers module to understand how data is fetched.
I noticed that several importers make HTTP requests using the default library User-Agents (e.g., requests or aiohttp defaults).
The Problem
Using default User-Agents often triggers bot-protection mechanisms on upstream sources (like GitHub, GitLab, or corporate security feeds). This leads to:
- 403 Forbidden errors (blocking the importer).
- Inconsistent identification of our traffic to upstream maintainers (good citizenship).
Proposed Solution
I propose standardizing the identity of the scraper across the codebase.
- Define a Constant: Add a standard
USER_AGENTstring invulnerablecode/settings.py(e.g.,VulnerableCode/1.0 (+https://github.com/aboutcode-org/vulnerablecode)). - Update Importers: Audit the
importers/directory and ensure thatrequests.get()orClientSession()calls explicitly include this header.
Files Involved
vulnerablecode/settings.pyvulnerablecode/importers/*.py(specifically searching for network calls)
Request
Could you please assign this issue to me? I can perform the audit and standardize the headers to improve the reliability of the scrapers.
Metadata
Metadata
Assignees
Labels
No labels