-
Notifications
You must be signed in to change notification settings - Fork 0
troubleshooting
Here is a lost of known error-messages and solutions. Note: the messages are shortened, but you should find what you are looking for with "ctrl+f" ""
Error: subprocess.CalledProcessError: Command '['hadoop', 'fs', '-ls', '-R', '/gpfs/smartdata/path/to/folder']' returned non-zero exit status 1
Additional observation: hadoop fs -ls -R /gpfs/smartdata/path/to/folder gives ls: /gpfs/smartdata/path/to/folder': No such file or directory`
Problem: hadoop can't find file, because it uses /smartdata/ instead of /gpfs/smartdata/
Solution: use /smartdata/instead of /gpfs/smartdata/
Error: java.lang.ClassNotFoundException: niklasb.sparkhacks.FixedLengthBinaryInputFormat
Additional observation: target folder does not exists inside dirhash folder
Problem: maven project not setup
Solution: install package with cd dirhash; mvn package
Error: ImportError: No module named pyblake2
Additional observation:
- the module should be optional
- it isn't thrown with run_tests.sh, only with run.sh
- the import is actually successful, but the actual hashing is not
Quick solution: pip uninstall pyblake2 (because it's optional to begin with, also python3.6+ doesn't has that issue, because blake is part of hashlib)
Error: java.lang.OutOfMemoryError: Java heap space
Additional observation:
- freezes
- no output when 2> /dev/null
solution: add --master yarn --deploy-mode client to spark-submit