Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a68cbb2
feat(browser): add standalone CDP browser launch and lxml extraction …
unclecode Mar 7, 2025
4aeb7ef
refactor(proxy): consolidate proxy configuration handling
unclecode Mar 7, 2025
c6a605c
feat(filters): add reverse option to URLPatternFilter
unclecode Mar 8, 2025
9d69fce
feat(scraping): add smart table extraction and analysis capabilities
unclecode Mar 9, 2025
9547bad
feat(content): add target_elements parameter for selective content ex…
unclecode Mar 10, 2025
1630fbd
feat(monitor): add real-time crawler monitoring system with memory ma…
unclecode Mar 12, 2025
dc36997
feat(schema): improve HTML preprocessing for schema generation
unclecode Mar 12, 2025
b750542
feat(crawler): optimize single URL handling and add performance compa…
unclecode Mar 13, 2025
6e3c048
feat(api): refactor crawl request handling to streamline single and m…
unclecode Mar 13, 2025
7884a98
feat(crawler): add experimental parameters support and optimize brows…
unclecode Mar 14, 2025
a31d7b8
feat(changelog): update CHANGELOG for version 0.5.0.post5 with new fe…
unclecode Mar 14, 2025
a247999
feat(llm): add additional LLM configuration parameters
unclecode Mar 14, 2025
5358ac0
refactor: clean up imports and improve JSON schema generation instruc…
unclecode Mar 18, 2025
6432ff1
feat(browser): add builtin browser management system
unclecode Mar 20, 2025
ddaa072
feat(ssl-certificate): get ssl certificate support proxy
wakaka6 Mar 5, 2025
5a84854
refactor(ssl_certificate): apply strategy and factory patterns for pr…
wakaka6 Mar 21, 2025
4ab0893
feat(browser): implement modular browser management system
unclecode Mar 21, 2025
0094cac
refactor(browser): improve parallel crawling and browser management
unclecode Mar 23, 2025
6eeb2e4
feat(browser): enhance browser context creation with user data direct…
unclecode Mar 23, 2025
462d576
fix(browser): improve storage state persistence in CDP strategy
unclecode Mar 23, 2025
8c08521
feat(browser): add Docker-based browser automation strategy
unclecode Mar 24, 2025
1107fa1
feat(cli): enhance markdown generation with default content filters
unclecode Mar 25, 2025
bdd9db5
chore(version): bump version to 0.5.0.post6
unclecode Mar 25, 2025
380663f
fix(ssl_certificate): with encode credentials to decode
wakaka6 Mar 25, 2025
163cf29
fix(ssl_ceritificate): fix https proxy not working and ignore ssl ver…
wakaka6 Mar 25, 2025
6405cf0
Merge branch 'vr0.5.0.post5' into next
unclecode Mar 25, 2025
3066ae2
update(ssl_ceritificate): catch developer edgecase
wakaka6 Mar 25, 2025
4a20d7f
feat(cli): add quick JSON extraction and global config management
unclecode Mar 25, 2025
5c88d13
feat(cli): add output file option and integrate LXML web scraping str…
unclecode Mar 25, 2025
d8f38f2
chore(version): bump version to 0.5.0.post7
unclecode Mar 25, 2025
40d4dd3
chore(version): bump version to 0.5.0.post8 and update post-installat…
unclecode Mar 25, 2025
dd73259
update(ssl_certificate): support socks4 and better error handler
wakaka6 Mar 26, 2025
d498847
Merge branch 'next' into feat/support_proxy_for_ssl_certificate
wakaka6 Mar 27, 2025
5939800
fix(merge-next): proxyconfig
wakaka6 Mar 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -255,3 +255,6 @@ continue_config.json

.llm.env
.private/

CLAUDE_MONITOR.md
CLAUDE.md
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,39 @@ All notable changes to Crawl4AI will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Version 0.5.0.post5 (2025-03-14)

### Added

- *(crawler)* Add experimental parameters dictionary to CrawlerRunConfig to support beta features
- *(tables)* Add comprehensive table detection and extraction functionality with scoring system
- *(monitor)* Add real-time crawler monitoring system with memory management
- *(content)* Add target_elements parameter for selective content extraction
- *(browser)* Add standalone CDP browser launch capability
- *(schema)* Add preprocess_html_for_schema utility for better HTML cleaning
- *(api)* Add special handling for single URL requests in Docker API

### Changed

- *(filters)* Add reverse option to URLPatternFilter for inverting filter logic
- *(browser)* Make CSP nonce headers optional via experimental config
- *(browser)* Remove default cookie injection from page initialization
- *(crawler)* Optimize response handling for single-URL processing
- *(api)* Refactor crawl request handling to streamline processing
- *(config)* Update default provider to gpt-4o
- *(cache)* Change default cache_mode from aggressive to bypass in examples

### Fixed

- *(browser)* Clean up browser context creation code
- *(api)* Improve code formatting in API handler

### Breaking Changes

- WebScrapingStrategy no longer returns 'scraped_html' in its output dictionary
- Table extraction logic has been modified to better handle thead/tbody structures
- Default cookie injection has been removed from page initialization

## Version 0.5.0 (2025-03-02)

### Added
Expand Down
7 changes: 4 additions & 3 deletions crawl4ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
CosineStrategy,
JsonCssExtractionStrategy,
JsonXPathExtractionStrategy,
JsonLxmlExtractionStrategy
)
from .chunking_strategy import ChunkingStrategy, RegexChunking
from .markdown_generation_strategy import DefaultMarkdownGenerator
Expand All @@ -32,13 +33,12 @@
LLMContentFilter,
RelevantContentFilter,
)
from .models import CrawlResult, MarkdownGenerationResult
from .models import CrawlResult, MarkdownGenerationResult, DisplayMode
from .components.crawler_monitor import CrawlerMonitor
from .async_dispatcher import (
MemoryAdaptiveDispatcher,
SemaphoreDispatcher,
RateLimiter,
CrawlerMonitor,
DisplayMode,
BaseDispatcher,
)
from .docker_client import Crawl4aiDockerClient
Expand Down Expand Up @@ -103,6 +103,7 @@
"CosineStrategy",
"JsonCssExtractionStrategy",
"JsonXPathExtractionStrategy",
"JsonLxmlExtractionStrategy",
"ChunkingStrategy",
"RegexChunking",
"DefaultMarkdownGenerator",
Expand Down
2 changes: 1 addition & 1 deletion crawl4ai/__version__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# crawl4ai/_version.py
__version__ = "0.5.0.post4"
__version__ = "0.5.0.post8"
140 changes: 129 additions & 11 deletions crawl4ai/async_configs.py

Large diffs are not rendered by default.

Loading