Skip to content

redo_ocr_PDF some version conflict #1

@kareliot

Description

@kareliot

Hi Jmuccigr,

your redo_ocr script looks really interesting and I would love to use it for all these jstore pdfs sitting on my harddrive which I cannot properly annotate due to the poor ocr. Unfortunately, however, there seems to be some version conflict which I cannot solve on my own. Could you maybe help me out with some advice?

This is the error I get when I run redo_ocr_PDF.sh:

[philipp@philap pdf]$ ./redo_ocr_PDF.sh file1.pdf
No language was specified. Hit enter to use English or supply the 3-letter language code: 
Traceback (most recent call last):
  File "/home/philipp/Software/Skripte/pdf/./remove_PDF_text.py", line 17, in <module>
    with open(outputname, 'wb') as f:
PermissionError: [Errno 13] Permission denied: '/no_text.pdf'
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 573, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 891, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 782, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (pdfminer.six 20220319 (/usr/lib/python3.10/site-packages), Requirement.parse('pdfminer.six!=20200720,<=20211012,>=20191110'), {'ocrmypdf'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/ocrmypdf", line 33, in <module>
    sys.exit(load_entry_point('ocrmypdf==13.4.0', 'console_scripts', 'ocrmypdf')())
  File "/usr/bin/ocrmypdf", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/lib/python3.10/site-packages/ocrmypdf/__init__.py", line 10, in <module>
    from ocrmypdf import helpers, hocrtransform, pdfa, pdfinfo
  File "/usr/lib/python3.10/site-packages/ocrmypdf/helpers.py", line 22, in <module>
    import img2pdf
  File "/usr/lib/python3.10/site-packages/img2pdf.py", line 49, in <module>
    import pikepdf
  File "/usr/lib/python3.10/site-packages/pikepdf/__init__.py", line 19, in <module>
    from ._version import __version__
  File "/usr/lib/python3.10/site-packages/pikepdf/_version.py", line 7, in <module>
    from pkg_resources import DistributionNotFound
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3266, in <module>
    def _initialize_master_working_set():
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3240, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3278, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 575, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 588, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python3.10/site-packages/pkg_resources/__init__.py", line 777, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20211012,>=20191110' distribution was not found and is required by ocrmypdf
GPL Ghostscript 9.55.0 (2021-09-27)
Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
GPL Ghostscript 9.55.0: **** Could not open the file /textonly.pdf .
**** Unable to open the initial device, quitting.
qpdf: open /final.pdf: Permission denied
Error: File not found - /final.pdf
    0 image files updated
    1 files weren't updated due to errors
Error: File not found - /final.pdf
    0 image files updated
    1 files weren't updated due to errors
mv: der Aufruf von stat für '/final.pdf' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
./redo_ocr_PDF.sh: Zeile 144: terminal-notifier: Kommando nicht gefunden.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions