-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
Bug Report: AI Web Fetch Tools Silently Drop Content Inside Custom HTML Elements
Date: March 2026
Tools affected: claude.ai web_fetch, ChatGPT web fetch
Status: Confirmed across multiple AI platforms. Not reproducible in Claude Code's WebFetch tool, curl, or Grok.
Summary
The web fetch tools in claude.ai and ChatGPT both fail to extract text content from pages that use custom HTML elements (web components). Both tools return only the document <title> tag, stripping all meaningful page content. The prerendered HTML is valid, fully accessible, and correctly read by curl, Grok, Google's crawler, and Claude Code's WebFetch. The bug appears to be shared across AI platform fetch implementations and is likely rooted in a common underlying library or parsing approach.
Expected Behavior
All text content nested inside custom elements should be extracted and returned. The page at this URL uses a structure like:
<app-shell>
<site-header>...</site-header>
<page-hero>
<h1>Headline text</h1>
<p>Body copy</p>
</page-hero>
</app-shell>
Standard <h1>, <h2>, and <p> elements exist inside the custom elements. Any conformant HTML parser should walk the full DOM tree and extract their text content.
Actual Behavior
The HTML-to-markdown converter skips the entire subtree when it encounters an unknown (custom) element tag. The tool returns only the contents of the <title> tag. No other content is returned.
Verification
| Tool | Reads content correctly? |
|---|---|
| curl https://mandmkelly.com | ✅ Returns 17KB of full HTML |
| Grok | ✅ Reads and summarizes the full page |
| Claude Code WebFetch | ✅ Reads and summarizes the full page |
| Google crawler | ✅ Indexes the page |
| claude.ai web_fetch | ❌ Returns only the <title> tag |
| ChatGPT web fetch | ❌ Returns only the <title> tag |
Root Cause (likely)
The HTML-to-markdown converter used in these fetch pipelines treats unknown element names as opaque blocks and skips their children rather than recursing into them.
The HTML spec defines custom elements as valid and requires parsers to treat unrecognized element names as generic container elements. The correct behavior is to recurse into their children exactly as a browser would. The current behavior — skipping the entire subtree — is non-conformant and produces silent data loss with no error or warning to the user.
The fact that both claude.ai and ChatGPT exhibit identical behavior suggests a shared upstream dependency, possibly a common open-source HTML-to-markdown or HTML parsing library used by both platforms.
Suggested Fix
Treat unrecognized element names as passthrough containers. Recurse into their children and extract text from any standard elements found within them. This matches browser behavior and the HTML parsing specification.
Impact
Any site using web components, Angular, Lit, or other frameworks that place custom element names at the top level of the content hierarchy will be completely unreadable by both claude.ai and ChatGPT fetch tools, even when the content is fully prerendered and accessible to every other crawler and tool. Both platforms will misattribute the failure to the site rather than the tool, which compounds the confusion for users and may incorrectly signal to site owners that their implementation is broken when it is not.
To Reproduce
- Open chatgpt.com and start a new conversation
- Ask Claude to fetch or review https://mandmkelly.com. Or any modern site using web components custom elements
- Observe the response
Code snippets
OS
macOS
Python version
Python 3.x
Library version
openai v1.0.1