Error in grafting ocr'd text back into PDF

What I did: 
I'm adding the ability to optionally force OCR a document after uploading them via URL in the Python wrapper. 
urls= ["https://www.chicago.gov/content/dam/city/depts/dcd/tif/24reports/T_063_CanalCongressAR24.pdf", "https://www.chicago.gov/content/dam/city/depts/dcd/tif/24reports/T_072_24thMichiganAR24.pdf"]
uploaded_docs = client.documents.upload_urls(urls, force_ocr=True, ocr_engine="textract")

One document succeeded just fine, the other encountered this error:
https://muckrock.sentry.io/issues/6895388692/?alert_rule_id=1010155&alert_timestamp=1758569111535&alert_type=email&notification_uuid=b002be3c-4651-4b4e-98b1-3e929498be0d&project=2873549

https://www.documentcloud.org/documents/26105901-t_063_canalcongressar24/

This left the document in a failed state, which could probably be handled better (present a working version of the PDF)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in grafting ocr'd text back into PDF #336

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in grafting ocr'd text back into PDF #336

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions