Skip to content

Conversation

@MkDev11
Copy link
Contributor

@MkDev11 MkDev11 commented Jan 22, 2026

Summary

This PR fixes a bug where passing coordinates=True to partition() causes a TypeError when processing PDFs with the hi_res strategy.

Problem

When users call partition() with coordinates=True, the boolean value flows through kwargs and eventually reaches add_element_metadata(). However, this function already receives computed coordinate data as an explicit parameter. Python then raises:

TypeError: add_element_metadata() got multiple values for keyword argument 'coordinates'

This is confusing because users reasonably expect coordinates=True to enable coordinate output, not realizing that hi_res strategy already computes and includes coordinates automatically.

Solution

Filter out coordinates and coordinate_system from kwargs before passing them to add_element_metadata(). This prevents the conflict while preserving the internally-computed coordinate data.

The fix is minimal and targeted - just 3 lines of code that filter the problematic kwargs.

Changes

  • unstructured/partition/pdf.py: Added filtering for coordinates and coordinate_system kwargs
  • CHANGELOG.md: Added entry for this fix

Fixes #4126

@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 22, 2026

@badGarnet could you please take a look at the PR I mentioned earlier? really appreciate your time and feedback.

@badGarnet
Copy link
Collaborator

@badGarnet could you please take a look at the PR I mentioned earlier? really appreciate your time and feedback.

please add a unit test and remember to bump version number

@MkDev11 MkDev11 force-pushed the fix/coordinates-kwarg-4126 branch from f7c3d98 to bd4e320 Compare January 22, 2026 23:33
…cessing

Filter out 'coordinates' and 'coordinate_system' from kwargs before passing
to add_element_metadata() to prevent conflict with explicit parameters.

When users pass coordinates=True to partition_pdf with hi_res strategy,
this boolean value could conflict with the explicit coordinates parameter
which expects tuple data, causing TypeError.

Fixes Unstructured-IO#4126
@MkDev11 MkDev11 force-pushed the fix/coordinates-kwarg-4126 branch from bd4e320 to 97a06cd Compare January 22, 2026 23:43
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 23, 2026

@badGarnet updated again! could you please have a look at the changes and let me know ther result? appeciate for your time!

@badGarnet badGarnet added this pull request to the merge queue Jan 23, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 23, 2026
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 23, 2026

The CI's docker-build step fails because the Dockerfile tries to create a symlink python3 but it already exists (Python 3.12 installation created it).
This is an upstream infrastructure issue - it affects the main Dockerfile, not my PR code.
@badGarnet can you retry to merge?

The libreoffice package brings in python-3.13-base which creates a
/usr/bin/python3 symlink. The previous conditional check '[ -e ... ] || ln -s'
failed because -e returns false for broken symlinks, but ln still fails
because the symlink name exists.

Using ln -sf forces the symlink to be overwritten regardless of its state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: coordinates=True kwarg causes TypeError in hi_res PDF processing

2 participants