You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Opens up a browser and do your request based on your chosen options below.
58
26
59
27
:param url: Target url.
60
-
:param headless: Run the browser in headless/hidden (default), or headful/visible mode.
61
-
:param disable_resources: Drop requests of unnecessary resources for a speed boost. It depends, but it made requests ~25% faster in my tests for some websites.
62
-
Requests dropped are of type `font`, `image`, `media`, `beacon`, `object`, `imageset`, `texttrack`, `websocket`, `csp_report`, and `stylesheet`.
63
-
This can help save your proxy usage but be careful with this option as it makes some websites never finish loading.
64
-
:param useragent: Pass a useragent string to be used. Otherwise the fetcher will generate a real Useragent of the same browser and use it.
65
-
:param cookies: Set cookies for the next request.
66
-
:param network_idle: Wait for the page until there are no network connections for at least 500 ms.
67
-
:param load_dom: Enabled by default, wait for all JavaScript on page(s) to fully load and execute.
68
-
:param timeout: The timeout in milliseconds that is used in all operations and waits through the page. The default is 30,000
69
-
:param wait: The time (milliseconds) the fetcher will wait after everything finishes before closing the page and returning the ` Response ` object.
70
-
:param page_action: Added for automation. A function that takes the `page` object and does the automation you need.
71
-
:param wait_selector: Wait for a specific CSS selector to be in a specific state.
72
-
:param init_script: An absolute path to a JavaScript file to be executed on page creation with this request.
73
-
:param locale: Set the locale for the browser if wanted. The default value is `en-US`.
74
-
:param wait_selector_state: The state to wait for the selector given with `wait_selector`. The default state is `attached`.
75
-
:param stealth: Enables stealth mode, check the documentation to see what stealth mode does currently.
76
-
:param real_chrome: If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it.
77
-
:param hide_canvas: Add random noise to canvas operations to prevent fingerprinting.
78
-
:param disable_webgl: Disables WebGL and WebGL 2.0 support entirely.
79
-
:param cdp_url: Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP.
80
-
:param google_search: Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search of this website's domain name.
81
-
:param extra_headers: A dictionary of extra headers to add to the request. _The referer set by the `google_search` argument takes priority over the referer set here if used together._
82
-
:param proxy: The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only.
83
-
:param extra_flags: A list of additional browser flags to pass to the browser on launch.
84
-
:param custom_config: A dictionary of custom parser arguments to use with this request. Any argument passed will override any class parameters values.
85
-
:param additional_args: Additional arguments to be passed to Playwright's context as additional settings, and it takes higher priority than Scrapling's settings.
- headless: Run the browser in headless/hidden (default), or headful/visible mode.
30
+
- disable_resources: Drop requests of unnecessary resources for a speed boost.
31
+
- useragent: Pass a useragent string to be used. Otherwise the fetcher will generate a real Useragent of the same browser and use it.
32
+
- cookies: Set cookies for the next request.
33
+
- network_idle: Wait for the page until there are no network connections for at least 500 ms.
34
+
- load_dom: Enabled by default, wait for all JavaScript on page(s) to fully load and execute.
35
+
- timeout: The timeout in milliseconds that is used in all operations and waits through the page. The default is 30,000
36
+
- wait: The time (milliseconds) the fetcher will wait after everything finishes before closing the page and returning the Response object.
37
+
- page_action: Added for automation. A function that takes the `page` object and does the automation you need.
38
+
- wait_selector: Wait for a specific CSS selector to be in a specific state.
39
+
- init_script: An absolute path to a JavaScript file to be executed on page creation with this request.
40
+
- locale: Set the locale for the browser if wanted. The default value is `en-US`.
41
+
- wait_selector_state: The state to wait for the selector given with `wait_selector`. The default state is `attached`.
42
+
- stealth: Enables stealth mode, check the documentation to see what stealth mode does currently.
43
+
- real_chrome: If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it.
44
+
- hide_canvas: Add random noise to canvas operations to prevent fingerprinting.
45
+
- disable_webgl: Disables WebGL and WebGL 2.0 support entirely.
46
+
- cdp_url: Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP.
47
+
- google_search: Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search of this website's domain name.
48
+
- extra_headers: A dictionary of extra headers to add to the request.
49
+
- proxy: The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only.
50
+
- extra_flags: A list of additional browser flags to pass to the browser on launch.
51
+
- selector_config: The arguments that will be passed in the end while creating the final Selector's class.
52
+
- additional_args: Additional arguments to be passed to Playwright's context as additional settings.
86
53
:return: A `Response` object.
87
54
"""
88
-
ifnotcustom_config:
89
-
custom_config= {}
90
-
elifnotisinstance(custom_config, dict):
91
-
raiseValueError(f"The custom parser config must be of type dictionary, got {cls.__class__}")
55
+
# Get selector_config from kwargs if provided, otherwise use empty dict
56
+
selector_config=kwargs.get("selector_config", {})
57
+
ifnotisinstance(selector_config, dict):
58
+
raiseTypeError("Argument `selector_config` must be a dictionary.")
"""Opens up a browser and do your request based on your chosen options below.
151
69
152
70
:param url: Target url.
153
-
:param headless: Run the browser in headless/hidden (default), or headful/visible mode.
154
-
:param disable_resources: Drop requests of unnecessary resources for a speed boost. It depends, but it made requests ~25% faster in my tests for some websites.
155
-
Requests dropped are of type `font`, `image`, `media`, `beacon`, `object`, `imageset`, `texttrack`, `websocket`, `csp_report`, and `stylesheet`.
156
-
This can help save your proxy usage but be careful with this option as it makes some websites never finish loading.
157
-
:param useragent: Pass a useragent string to be used. Otherwise the fetcher will generate a real Useragent of the same browser and use it.
158
-
:param cookies: Set cookies for the next request.
159
-
:param network_idle: Wait for the page until there are no network connections for at least 500 ms.
160
-
:param load_dom: Enabled by default, wait for all JavaScript on page(s) to fully load and execute.
161
-
:param timeout: The timeout in milliseconds that is used in all operations and waits through the page. The default is 30,000
162
-
:param wait: The time (milliseconds) the fetcher will wait after everything finishes before closing the page and returning the ` Response ` object.
163
-
:param page_action: Added for automation. A function that takes the `page` object and does the automation you need.
164
-
:param wait_selector: Wait for a specific CSS selector to be in a specific state.
165
-
:param init_script: An absolute path to a JavaScript file to be executed on page creation with this request.
166
-
:param locale: Set the locale for the browser if wanted. The default value is `en-US`.
167
-
:param wait_selector_state: The state to wait for the selector given with `wait_selector`. The default state is `attached`.
168
-
:param stealth: Enables stealth mode, check the documentation to see what stealth mode does currently.
169
-
:param real_chrome: If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it.
170
-
:param hide_canvas: Add random noise to canvas operations to prevent fingerprinting.
171
-
:param disable_webgl: Disables WebGL and WebGL 2.0 support entirely.
172
-
:param cdp_url: Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP.
173
-
:param google_search: Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search of this website's domain name.
174
-
:param extra_headers: A dictionary of extra headers to add to the request. _The referer set by the `google_search` argument takes priority over the referer set here if used together._
175
-
:param proxy: The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only.
176
-
:param extra_flags: A list of additional browser flags to pass to the browser on launch.
177
-
:param custom_config: A dictionary of custom parser arguments to use with this request. Any argument passed will override any class parameters values.
178
-
:param additional_args: Additional arguments to be passed to Playwright's context as additional settings, and it takes higher priority than Scrapling's settings.
- headless: Run the browser in headless/hidden (default), or headful/visible mode.
73
+
- disable_resources: Drop requests of unnecessary resources for a speed boost.
74
+
- useragent: Pass a useragent string to be used. Otherwise the fetcher will generate a real Useragent of the same browser and use it.
75
+
- cookies: Set cookies for the next request.
76
+
- network_idle: Wait for the page until there are no network connections for at least 500 ms.
77
+
- load_dom: Enabled by default, wait for all JavaScript on page(s) to fully load and execute.
78
+
- timeout: The timeout in milliseconds that is used in all operations and waits through the page. The default is 30,000
79
+
- wait: The time (milliseconds) the fetcher will wait after everything finishes before closing the page and returning the Response object.
80
+
- page_action: Added for automation. A function that takes the `page` object and does the automation you need.
81
+
- wait_selector: Wait for a specific CSS selector to be in a specific state.
82
+
- init_script: An absolute path to a JavaScript file to be executed on page creation with this request.
83
+
- locale: Set the locale for the browser if wanted. The default value is `en-US`.
84
+
- wait_selector_state: The state to wait for the selector given with `wait_selector`. The default state is `attached`.
85
+
- stealth: Enables stealth mode, check the documentation to see what stealth mode does currently.
86
+
- real_chrome: If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it.
87
+
- hide_canvas: Add random noise to canvas operations to prevent fingerprinting.
88
+
- disable_webgl: Disables WebGL and WebGL 2.0 support entirely.
89
+
- cdp_url: Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP.
90
+
- google_search: Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search of this website's domain name.
91
+
- extra_headers: A dictionary of extra headers to add to the request.
92
+
- proxy: The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only.
93
+
- extra_flags: A list of additional browser flags to pass to the browser on launch.
94
+
- selector_config: The arguments that will be passed in the end while creating the final Selector's class.
95
+
- additional_args: Additional arguments to be passed to Playwright's context as additional settings.
179
96
:return: A `Response` object.
180
97
"""
181
-
ifnotcustom_config:
182
-
custom_config= {}
183
-
elifnotisinstance(custom_config, dict):
184
-
raiseValueError(f"The custom parser config must be of type dictionary, got {cls.__class__}")
98
+
# Get selector_config from kwargs if provided, otherwise use empty dict
99
+
selector_config=kwargs.get("selector_config", {})
100
+
ifnotisinstance(selector_config, dict):
101
+
raiseTypeError("Argument `selector_config` must be a dictionary.")
0 commit comments