fix(playwright): filter unsupported context options in persistent browser#1796
fix(playwright): filter unsupported context options in persistent browser#1796sushant-mutnale wants to merge 6 commits intoapify:masterfrom
Conversation
…wser This addresses issue apify#1784 by dynamically filtering options passed to launch_persistent_context and providing a warning log for ignored options like storage_state.
Pijukatel
left a comment
There was a problem hiding this comment.
Hello, thanks for the PR. Please see my comments; maybe we can use this approach on a different level.
pyproject.toml
Outdated
| "scraping", | ||
| ] | ||
| dependencies = [ | ||
| "apify-fingerprint-datapoints>=0.11.0", |
There was a problem hiding this comment.
We have all these added dependencies in the optional dependencies group playwright. So please remove them from here.
| user_data_dir = tempfile.mkdtemp(prefix=self._TMP_DIR_PREFIX) | ||
| self._temp_dir = Path(user_data_dir) | ||
|
|
||
| launch_persistent_context_sig = inspect.signature(self._browser_type.launch_persistent_context) |
There was a problem hiding this comment.
This is a reasonable approach, but it has some drawbacks. If user has just typo ( in otherwise valid argument name), it will just show warning in log. Same for using some completely nonsensical argument. That should raise an error and not just log a warning.
For example, this should raise (typo in headles):
persist_browser = PlaywrightPersistentBrowser(
playwright.chromium, browser_launch_options={'headles': True}
)
Maybe this approach could be adopted one lever higher (not in PlaywrightPersistentBrowser - which always just calls launch_persistent_context), but in PlaywrightBrowserController - that is the class that decides about calling launch_persistent_context or new_context, but feeds them the same arguments.
It should properly raise exceptions for bad arguments, but it could just log a warning as per your suggestion for arguments at least valid in the other method. It would have to get 3 sets of arguments to be able to do such a distinction. Something like:
...
launch_persistent_context_sig = set(inspect.signature(BrowserType.launch_persistent_context).parameters)
new_context_sig = set(inspect.signature(Browser.new_context).parameters)
persistent_unique_options = launch_persistent_context_sig - new_context_sig
new_context_unique_options = new_context_sig - launch_persistent_context_sig
common_options = launch_persistent_context_sig & new_context_sig
...
And then raise an exception or just log based on the selected mode.
…owserController Moving the validation logic from the browser instance to its controller as suggested by the reviewer. This improves user experience by raising TypeError for typos and nonsensical arguments while still providing helpful warnings for valid but incompatible cross-mode options like storage_state in persistent contexts. Also fixed dependency management in pyproject.toml.
|
Hello! Thank you for the detailed feedback. I've refactored the validation logic into
New unit tests cover both the warning and error scenarios. Ready for another look! |
Ran ruff formatter to fix CI lint error.
pyproject.toml
Outdated
| "browserforge>=1.2.4", | ||
| "cachetools>=5.5.0", | ||
| "colorama>=0.4.0", | ||
| "impit>=0.8.0", | ||
| "more-itertools>=10.2.0", | ||
| "playwright>=1.58.0", |
There was a problem hiding this comment.
browserforge and playwright should not be part of core dependencies
pyproject.toml
Outdated
| "playwright>=1.27.0", | ||
| "scikit-learn>=1.6.0", | ||
| "apify_fingerprint_datapoints>=0.0.3", | ||
| "apify_fingerprint_datapoints>=0.11.0", |
| _launch_persistent_context_params = set(inspect.signature(PlaywrightBrowserType.launch_persistent_context).parameters) | ||
| _new_context_params = set(inspect.signature(Browser.new_context).parameters) |
There was a problem hiding this comment.
Is it necessary to run these at the import time of the module?
Removed browserforge and playwright from core dependencies in pyproject.toml as they belong in optional dependencies. Refactored Playwright signature cache in _playwright_browser_controller.py to load lazily via lru_cache rather than at module import time, preventing overhead when Playwright is not used.
Pijukatel
left a comment
There was a problem hiding this comment.
Thanks for the changes, and apologies for the delayed review. Just a few small comments, and I think it will be ready.
| filtered_options = {} | ||
| for key, value in browser_new_context_options.items(): | ||
| if self._use_incognito_pages: | ||
| # Incognito mode (new_context) | ||
| if key in params_cache['common'] or key in params_cache['incognito_unique']: | ||
| filtered_options[key] = value | ||
| elif key in params_cache['persistent_unique']: | ||
| logger.warning( | ||
| f'Option "{key}" is only supported in persistent context mode ' | ||
| '(use_incognito_pages=False) and will be ignored.' | ||
| ) | ||
| else: | ||
| raise TypeError(f'"{key}" is not a valid Playwright context option.') | ||
| elif key in params_cache['common'] or key in params_cache['persistent_unique']: | ||
| # Persistent mode (launch_persistent_context) | ||
| filtered_options[key] = value | ||
| elif key in params_cache['incognito_unique']: | ||
| logger.warning( | ||
| f'Option "{key}" is only supported in incognito context mode ' | ||
| '(use_incognito_pages=True) and will be ignored.' | ||
| ) | ||
| else: | ||
| raise TypeError(f'"{key}" is not a valid Playwright context option.') |
There was a problem hiding this comment.
Could you please extract to a standalone private method with docstring explaining it.
filtered_options = self._filter_new_context_options(options=browser_new_context_options)
| '(use_incognito_pages=False) and will be ignored.' | ||
| ) | ||
| else: | ||
| raise TypeError(f'"{key}" is not a valid Playwright context option.') |
There was a problem hiding this comment.
I do not think we need to raise here; it is better for the Playwright code to raise, so that anyone can see the code where the arguments are defined.
It will be sufficient to filter out the arguments valid in the other case and warn for those, while letting the completely wrong arguments go through, and let them fail in Playwright. So it can be simplified to something like
if self._use_incognito_pages and key in params_cache['persistent_unique']:
logger.warning(
f'Option "{key}" is only supported in persistent context mode '
'(use_incognito_pages=False) and will be ignored.'
)
elif not self._use_incognito_pages and key in params_cache['incognito_unique']:
logger.warning(
f'Option "{key}" is only supported in incognito context mode '
'(use_incognito_pages=True) and will be ignored.'
)
else:
filtered_options[key] = value
This PR fixes issue #1784, where PlaywrightCrawler would crash when passing context options (like storage_state) that are unsupported by Playwright's launch_persistent_context method.
Changes:
Implemented dynamic argument filtering in PlaywrightPersistentBrowser.new_context using inspect. signature.
Added a warning log to guide users when options are filtered out, suggesting the use of incognito pages as an alternative.
Added a unit test in
tests/unit/browsers/test_playwright_browser.py
to verify the fix and prevent regressions.
Fixes #1784