-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Hi Jmuccigr,
your redo_ocr script looks really interesting and I would love to use it for all these jstore pdfs sitting on my harddrive which I cannot properly annotate due to the poor ocr. Unfortunately, however, there seems to be some version conflict which I cannot solve on my own. Could you maybe help me out with some advice?
This is the error I get when I run redo_ocr_PDF.sh:
[philipp@philap pdf]$ ./redo_ocr_PDF.sh file1.pdf
No language was specified. Hit enter to use English or supply the 3-letter language code:
Traceback (most recent call last):
File "/home/philipp/Software/Skripte/pdf/./remove_PDF_text.py", line 17, in <module>
with open(outputname, 'wb') as f:
PermissionError: [Errno 13] Permission denied: '/no_text.pdf'
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 573, in _build_master
ws.require(__requires__)
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 891, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 782, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (pdfminer.six 20220319 (/usr/lib/python3.10/site-packages), Requirement.parse('pdfminer.six!=20200720,<=20211012,>=20191110'), {'ocrmypdf'})
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/ocrmypdf", line 33, in <module>
sys.exit(load_entry_point('ocrmypdf==13.4.0', 'console_scripts', 'ocrmypdf')())
File "/usr/bin/ocrmypdf", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
module = import_module(match.group('module'))
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/lib/python3.10/site-packages/ocrmypdf/__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, pdfa, pdfinfo
File "/usr/lib/python3.10/site-packages/ocrmypdf/helpers.py", line 22, in <module>
import img2pdf
File "/usr/lib/python3.10/site-packages/img2pdf.py", line 49, in <module>
import pikepdf
File "/usr/lib/python3.10/site-packages/pikepdf/__init__.py", line 19, in <module>
from ._version import __version__
File "/usr/lib/python3.10/site-packages/pikepdf/_version.py", line 7, in <module>
from pkg_resources import DistributionNotFound
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3266, in <module>
def _initialize_master_working_set():
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3240, in _call_aside
f(*args, **kwargs)
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3278, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 575, in _build_master
return cls._build_from_requirements(__requires__)
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 588, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 777, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20211012,>=20191110' distribution was not found and is required by ocrmypdf
GPL Ghostscript 9.55.0 (2021-09-27)
Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
GPL Ghostscript 9.55.0: **** Could not open the file /textonly.pdf .
**** Unable to open the initial device, quitting.
qpdf: open /final.pdf: Permission denied
Error: File not found - /final.pdf
0 image files updated
1 files weren't updated due to errors
Error: File not found - /final.pdf
0 image files updated
1 files weren't updated due to errors
mv: der Aufruf von stat für '/final.pdf' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
./redo_ocr_PDF.sh: Zeile 144: terminal-notifier: Kommando nicht gefunden.
Metadata
Metadata
Assignees
Labels
No labels