troubleshooting

Here is a lost of known error-messages and solutions. Note: the messages are shortened, but you should find what you are looking for with "ctrl+f" ""

Error: subprocess.CalledProcessError: Command '['hadoop', 'fs', '-ls', '-R', '/gpfs/smartdata/path/to/folder']' returned non-zero exit status 1

Additional observation: hadoop fs -ls -R /gpfs/smartdata/path/to/folder gives ls: /gpfs/smartdata/path/to/folder': No such file or directory`

Problem: hadoop can't find file, because it uses /smartdata/ instead of /gpfs/smartdata/

Solution: use /smartdata/instead of /gpfs/smartdata/

Error: java.lang.ClassNotFoundException: niklasb.sparkhacks.FixedLengthBinaryInputFormat

Additional observation: target folder does not exists inside dirhash folder

Problem: maven project not setup

Solution: install package with cd dirhash; mvn package

Error: ImportError: No module named pyblake2

Additional observation:

the module should be optional
it isn't thrown with run_tests.sh, only with run.sh
the import is actually successful, but the actual hashing is not

Quick solution: pip uninstall pyblake2 (because it's optional to begin with, also python3.6+ doesn't has that issue, because blake is part of hashlib)

Error: java.lang.OutOfMemoryError: Java heap space

Additional observation:

freezes
no output when 2> /dev/null

solution: add --master yarn --deploy-mode client to spark-submit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

troubleshooting

Uh oh!

Clone this wiki locally