Skip to content

fix(playwright): filter unsupported context options in persistent browser#1796

Open
sushant-mutnale wants to merge 6 commits intoapify:masterfrom
sushant-mutnale:fix/playwright-context-options
Open

fix(playwright): filter unsupported context options in persistent browser#1796
sushant-mutnale wants to merge 6 commits intoapify:masterfrom
sushant-mutnale:fix/playwright-context-options

Conversation

@sushant-mutnale
Copy link

This PR fixes issue #1784, where PlaywrightCrawler would crash when passing context options (like storage_state) that are unsupported by Playwright's launch_persistent_context method.

Changes:

Implemented dynamic argument filtering in PlaywrightPersistentBrowser.new_context using inspect. signature.
Added a warning log to guide users when options are filtered out, suggesting the use of incognito pages as an alternative.
Added a unit test in

tests/unit/browsers/test_playwright_browser.py
to verify the fix and prevent regressions.
Fixes #1784

…wser

This addresses issue apify#1784 by dynamically filtering options passed to launch_persistent_context and providing a warning log for ignored options like storage_state.
@janbuchar janbuchar requested a review from Pijukatel March 16, 2026 09:11
Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thanks for the PR. Please see my comments; maybe we can use this approach on a different level.

pyproject.toml Outdated
"scraping",
]
dependencies = [
"apify-fingerprint-datapoints>=0.11.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all these added dependencies in the optional dependencies group playwright. So please remove them from here.

user_data_dir = tempfile.mkdtemp(prefix=self._TMP_DIR_PREFIX)
self._temp_dir = Path(user_data_dir)

launch_persistent_context_sig = inspect.signature(self._browser_type.launch_persistent_context)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable approach, but it has some drawbacks. If user has just typo ( in otherwise valid argument name), it will just show warning in log. Same for using some completely nonsensical argument. That should raise an error and not just log a warning.

For example, this should raise (typo in headles):

    persist_browser = PlaywrightPersistentBrowser(
        playwright.chromium, browser_launch_options={'headles': True}
    )

Maybe this approach could be adopted one lever higher (not in PlaywrightPersistentBrowser - which always just calls launch_persistent_context), but in PlaywrightBrowserController - that is the class that decides about calling launch_persistent_context or new_context, but feeds them the same arguments.

It should properly raise exceptions for bad arguments, but it could just log a warning as per your suggestion for arguments at least valid in the other method. It would have to get 3 sets of arguments to be able to do such a distinction. Something like:

...
    launch_persistent_context_sig = set(inspect.signature(BrowserType.launch_persistent_context).parameters)
    new_context_sig = set(inspect.signature(Browser.new_context).parameters)
    persistent_unique_options = launch_persistent_context_sig - new_context_sig
    new_context_unique_options = new_context_sig - launch_persistent_context_sig
    common_options = launch_persistent_context_sig & new_context_sig
...

And then raise an exception or just log based on the selected mode.

…owserController

Moving the validation logic from the browser instance to its controller as suggested by the reviewer. This improves user experience by raising TypeError for typos and nonsensical arguments while still providing helpful warnings for valid but incompatible cross-mode options like storage_state in persistent contexts. Also fixed dependency management in pyproject.toml.
@sushant-mutnale
Copy link
Author

Hello! Thank you for the detailed feedback. I've refactored the validation logic into

PlaywrightBrowserController using the suggested three-set approach with cached signatures. I also moved the dependencies back to the optional group in
pyproject.toml.

New unit tests cover both the warning and error scenarios. Ready for another look!

pyproject.toml Outdated
Comment on lines +38 to +43
"browserforge>=1.2.4",
"cachetools>=5.5.0",
"colorama>=0.4.0",
"impit>=0.8.0",
"more-itertools>=10.2.0",
"playwright>=1.58.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

browserforge and playwright should not be part of core dependencies

pyproject.toml Outdated
"playwright>=1.27.0",
"scikit-learn>=1.6.0",
"apify_fingerprint_datapoints>=0.0.3",
"apify_fingerprint_datapoints>=0.11.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Comment on lines +33 to +34
_launch_persistent_context_params = set(inspect.signature(PlaywrightBrowserType.launch_persistent_context).parameters)
_new_context_params = set(inspect.signature(Browser.new_context).parameters)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to run these at the import time of the module?

Removed browserforge and playwright from core dependencies in pyproject.toml as they belong in optional dependencies. Refactored Playwright signature cache in _playwright_browser_controller.py to load lazily via lru_cache rather than at module import time, preventing overhead when Playwright is not used.
Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, and apologies for the delayed review. Just a few small comments, and I think it will be ready.

Comment on lines +243 to +265
filtered_options = {}
for key, value in browser_new_context_options.items():
if self._use_incognito_pages:
# Incognito mode (new_context)
if key in params_cache['common'] or key in params_cache['incognito_unique']:
filtered_options[key] = value
elif key in params_cache['persistent_unique']:
logger.warning(
f'Option "{key}" is only supported in persistent context mode '
'(use_incognito_pages=False) and will be ignored.'
)
else:
raise TypeError(f'"{key}" is not a valid Playwright context option.')
elif key in params_cache['common'] or key in params_cache['persistent_unique']:
# Persistent mode (launch_persistent_context)
filtered_options[key] = value
elif key in params_cache['incognito_unique']:
logger.warning(
f'Option "{key}" is only supported in incognito context mode '
'(use_incognito_pages=True) and will be ignored.'
)
else:
raise TypeError(f'"{key}" is not a valid Playwright context option.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract to a standalone private method with docstring explaining it.
filtered_options = self._filter_new_context_options(options=browser_new_context_options)

'(use_incognito_pages=False) and will be ignored.'
)
else:
raise TypeError(f'"{key}" is not a valid Playwright context option.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need to raise here; it is better for the Playwright code to raise, so that anyone can see the code where the arguments are defined.

It will be sufficient to filter out the arguments valid in the other case and warn for those, while letting the completely wrong arguments go through, and let them fail in Playwright. So it can be simplified to something like

if self._use_incognito_pages and key in params_cache['persistent_unique']:
    logger.warning(
        f'Option "{key}" is only supported in persistent context mode '
        '(use_incognito_pages=False) and will be ignored.'
    )
elif not self._use_incognito_pages and key in params_cache['incognito_unique']:
    logger.warning(
        f'Option "{key}" is only supported in incognito context mode '
        '(use_incognito_pages=True) and will be ignored.'
    )
else:
    filtered_options[key] = value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PlaywrightCrawler __init__ method browser_new_context_options argument does not function

5 participants