Skip to content

fix: enable huge_tree for HTMLParser to handle large documents#4306

Open
joaquinhuigomez wants to merge 1 commit intoUnstructured-IO:mainfrom
joaquinhuigomez:fix/html-huge-tree
Open

fix: enable huge_tree for HTMLParser to handle large documents#4306
joaquinhuigomez wants to merge 1 commit intoUnstructured-IO:mainfrom
joaquinhuigomez:fix/html-huge-tree

Conversation

@joaquinhuigomez
Copy link
Copy Markdown

Pass huge_tree=True to lxml.html.HTMLParser so that deeply nested or very large HTML documents can be parsed without hitting the default tree depth limit.

Closes #4289

Pass huge_tree=True to lxml.html.HTMLParser to allow parsing HTML
documents that exceed the default tree depth limit.

Closes Unstructured-IO#4289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large HTML documents cannot be partitioned using the partition_html function.

1 participant