Skip to content

troubleshooting

Björn edited this page Apr 13, 2018 · 3 revisions

Here is a lost of known error-messages and solutions. Note: the messages are shortened, but you should find what you are looking for with "ctrl+f" ""


Error: subprocess.CalledProcessError: Command '['hadoop', 'fs', '-ls', '-R', '/gpfs/smartdata/path/to/folder']' returned non-zero exit status 1

Additional observation: hadoop fs -ls -R /gpfs/smartdata/path/to/folder gives ls: /gpfs/smartdata/path/to/folder': No such file or directory`

Problem: hadoop can't find file, because it uses /smartdata/ instead of /gpfs/smartdata/

Solution: use /smartdata/instead of /gpfs/smartdata/


Error: java.lang.ClassNotFoundException: niklasb.sparkhacks.FixedLengthBinaryInputFormat

Additional observation: target folder does not exists inside dirhash folder

Problem: maven project not setup

Solution: install package with cd dirhash; mvn package


Error: ImportError: No module named pyblake2

Additional observation:

  • the module should be optional
  • it isn't thrown with run_tests.sh, only with run.sh
  • the import is actually successful, but the actual hashing is not

Quick solution: pip uninstall pyblake2 (because it's optional to begin with, also python3.6+ doesn't has that issue, because blake is part of hashlib)


Error: java.lang.OutOfMemoryError: Java heap space

Additional observation:

  • freezes
  • no output when 2> /dev/null

solution: add --master yarn --deploy-mode client to spark-submit

Clone this wiki locally