-
Notifications
You must be signed in to change notification settings - Fork 8
Description
What I did:
I'm adding the ability to optionally force OCR a document after uploading them via URL in the Python wrapper.
urls= ["https://www.chicago.gov/content/dam/city/depts/dcd/tif/24reports/T_063_CanalCongressAR24.pdf", "https://www.chicago.gov/content/dam/city/depts/dcd/tif/24reports/T_072_24thMichiganAR24.pdf"]
uploaded_docs = client.documents.upload_urls(urls, force_ocr=True, ocr_engine="textract")
One document succeeded just fine, the other encountered this error:
https://muckrock.sentry.io/issues/6895388692/?alert_rule_id=1010155&alert_timestamp=1758569111535&alert_type=email¬ification_uuid=b002be3c-4651-4b4e-98b1-3e929498be0d&project=2873549
https://www.documentcloud.org/documents/26105901-t_063_canalcongressar24/
This left the document in a failed state, which could probably be handled better (present a working version of the PDF)