Skip to content

fix: Handle invalid URLs in RequestList#1803

Merged
Pijukatel merged 2 commits intoapify:masterfrom
Mantisus:handle-invalid-url
Mar 23, 2026
Merged

fix: Handle invalid URLs in RequestList#1803
Pijukatel merged 2 commits intoapify:masterfrom
Mantisus:handle-invalid-url

Conversation

@Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Mar 19, 2026

Description

  • Handle invalid URLs in RequestList.

Issues

Testing

  • Add new tests for `RequestList

@Mantisus Mantisus changed the title fix: Handle invalid URL in RequestList fix: Handle invalid URLs in RequestList Mar 19, 2026
@Mantisus Mantisus requested review from Pijukatel and janbuchar March 19, 2026 15:06
@Mantisus Mantisus self-assigned this Mar 19, 2026

async def test_handle_invalid_url() -> None:
"""Test that invalid URLs are handled gracefully."""
request_list = RequestList(['invalid-url.com', 'https://valid.placeholder.com'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not running the validation even earlier?

Is there any point why should this RequestList(['invalid-url.com']) be even allowed?

Maybe we could warn/raise already here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestList can pull the URLs from any iterable, so validation just before returning "next request" is non-negotiable. We can validate in the constructor as well though, I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a very long list, and if we iterate fully in the constructor just for validation, it feels like an overhead. However, if we wrap it in a generator that performs the validation, that would essentially be the current solution. This is because, with persistence disabled, validation occurs the moment a new URL is received from the iterator. And when persistence is enabled, we iterate through all URLs before saving in the storage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this explanation, thanks

Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but what @Pijukatel says is legit

@Pijukatel Pijukatel merged commit 0b2e3fc into apify:master Mar 23, 2026
58 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RequestList with persistence doesn't handle malformed URLs correctly

4 participants