-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Open
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers
Description
crawl4ai version
0.8.0
Expected Behavior
We should be able to select the corresponding element using the css selector "a span.fn".
Current Behavior
The following minimal poc is used to describe the problem:
import asyncio
from crawl4ai import AsyncWebCrawler, JsonCssExtractionStrategy
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
schema = {
"name": "minimal reproducer",
"baseSelector": "td.change-author",
"type": "nested_list",
"fields": [
{"name": "field1", "selector": "a span", "type": "text"},
{"name": "field2", "selector": "a span", "type": "attribute", "attribute": "class"},
{"name": "field3", "selector": "a span.fn", "type": "text"},
]
}
async def main():
browser_config = BrowserConfig()
crawler_config = CrawlerRunConfig(extraction_strategy=JsonCssExtractionStrategy(schema))
async with AsyncWebCrawler(config=browser_config) as web_crawler:
result = await web_crawler.arun(
url="https://bugzilla.mozilla.org/show_bug.cgi?id=1770266",
config=crawler_config
)
if result.success:
print(result.extracted_content)
if __name__ == "__main__":
asyncio.run(main())Running the above poc gives the result:
$ python3 minimal_poc.py
[INIT].... → Crawl4AI 0.8.0
[FETCH]... ↓ https://bugzilla.mozilla.org/show_bug.cgi?id=1770266 | ✓ | ⏱: 2.76s
[SCRAPE].. ◆ https://bugzilla.mozilla.org/show_bug.cgi?id=1770266 | ✓ | ⏱: 0.03s
[EXTRACT]. ■ https://bugzilla.mozilla.org/show_bug.cgi?id=1770266 | ✓ | ⏱: 0.03s
[COMPLETE] ● https://bugzilla.mozilla.org/show_bug.cgi?id=1770266 | ✓ | ⏱: 2.83s
[
{
"field1": "Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)",
"field2": [
"fna"
]
},
{
"field1": "Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)",
"field2": [
"fna"
]
},
...We cannot extract the corresponding element using css selector "a span.fn". Instead, by inspecting the class attribute of "a span", we find that it contains a weird value "fna" instead of "fn".
Is this reproducible?
Yes
OS
macOS
Python version
3.14.3
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers