Skip to content

[Improvement] Standardize 'User-Agent' headers across Importers to prevent blocking #2122

@Kiran95021

Description

@Kiran95021

Description

I have been analyzing the vulnerablecode/importers module to understand how data is fetched.
I noticed that several importers make HTTP requests using the default library User-Agents (e.g., requests or aiohttp defaults).

The Problem

Using default User-Agents often triggers bot-protection mechanisms on upstream sources (like GitHub, GitLab, or corporate security feeds). This leads to:

  1. 403 Forbidden errors (blocking the importer).
  2. Inconsistent identification of our traffic to upstream maintainers (good citizenship).

Proposed Solution

I propose standardizing the identity of the scraper across the codebase.

  1. Define a Constant: Add a standard USER_AGENT string in vulnerablecode/settings.py (e.g., VulnerableCode/1.0 (+https://github.com/aboutcode-org/vulnerablecode)).
  2. Update Importers: Audit the importers/ directory and ensure that requests.get() or ClientSession() calls explicitly include this header.

Files Involved

  • vulnerablecode/settings.py
  • vulnerablecode/importers/*.py (specifically searching for network calls)

Request

Could you please assign this issue to me? I can perform the audit and standardize the headers to improve the reliability of the scrapers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions