A web crawler for logo detection using GPT-4o-mini vision. Uses a three-tier fallback system: Clearbit API → Google Favicon → AI-powered crawling.
- ⚡ Clearbit API priority - Instant high-quality logos for established companies (free, ~100ms)
- 🔄 Google Favicon fallback - Good coverage for sites not in Clearbit (~100ms)
- 🤖 AI-powered crawling - GPT-4o-mini vision for complete coverage (slower)
- 🔍 Async web crawling with browser-like headers (avoids 403 blocks)
- 🔄 Meta refresh redirect support (follows
<meta http-equiv="refresh">redirects) - 🖼️ SVG to PNG conversion
- 📊 Confidence scores and descriptions
- 💾 Image caching
- 🎯 Header/nav logo prioritization
- Clearbit (confidence: 0.95) - Best quality, ~100ms, covers most established companies
- Google Favicon (confidence: 0.75) - Good coverage, ~100ms, 128px icons
- AI Crawler (confidence: varies) - Complete coverage, slower, uses GPT-4o-mini
# macOS
brew install cairo tesseract
# Ubuntu/Debian
sudo apt-get install libcairo2-dev tesseract-ocr libmagic1# From PyPI
pip install openlogo
# Basic install (from source)
pip install -e .
# With AI client (OpenAI)
pip install -e ".[ai]"
# With all optional deps
pip install -e ".[all]"
# For development
pip install -e ".[dev]"import asyncio
import os
from openlogo import LogoCrawler
async def main():
crawler = LogoCrawler(api_key=os.environ["OPENAI_API_KEY"])
results = await crawler.crawl_website("https://stripe.com")
for logo in results:
print(f"{logo.url} - {logo.confidence:.0f}% confidence")
asyncio.run(main())See examples/basic_usage.py for a complete example.
openlogo/
├── src/
│ └── openlogo/
│ ├── __init__.py
│ ├── crawler.py # Main LogoCrawler class
│ └── detection.py # Logo detection strategies
├── tests/
│ ├── conftest.py
│ └── test_logo_crawler.py
├── examples/
│ └── basic_usage.py
├── pyproject.toml
└── README.md
# Required
export OPENAI_API_KEY="your_api_key"
# Optional: Azure OpenAI
export AZURE_OPENAI_API_KEY="your_api_key"
# Optional: Custom tesseract path
export TESSERACT_CMD="/path/to/tesseract"LogoResult(
url="https://example.com/logo.png",
confidence=95.0,
description="Company logo with blue text",
page_url="https://example.com",
image_hash="abc123...",
timestamp=datetime(...),
is_header=True,
rank_score=0.95,
detection_scores={...}
)- Google Favicon fallback - Added
try_google_favicon()as middle-tier between Clearbit and AI crawler - Three-tier resolution: Clearbit → Google Favicon → AI Crawler
- Added
skip_google_faviconparameter tocrawl_website() - Exported
try_google_favicon()for direct use - Skips generic Google globe icons (< 1KB)
- Clearbit API priority - Now tries Clearbit first for instant logos (~100ms, free)
- Falls back to GPT-4o-mini crawler only when Clearbit returns 404
- Added
skip_clearbitparameter tocrawl_website()for forcing crawler mode - Exported
try_clearbit_logo()for direct use
- Renamed package from
crawl4logotoopenlogo - Added meta refresh redirect support (handles sites that use
<meta http-equiv="refresh">instead of HTTP redirects)
- Initial public release
MIT License - see LICENSE