-
Notifications
You must be signed in to change notification settings - Fork 362
Crypto Module Preperation/Tests #6261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add comprehensive test coverage for the crypto.hash_file function: - 11 unit tests covering algorithms, size parameter, edge cases, errors - 4 functional tests for real-world scenarios (download verification, tampering detection, manifest creation, symlink handling) - Update TEST_SIZE counts in check.py Reference: avocado-framework/aautils#43 Assisted-By: Cursor-Claude-4-Sonnet Signed-off-by: Harvey Lynden <hlynden@redhat.com>
- Add module docstring and enhance function RST docstrings - Add deprecation warning for autils migration - Remove from .pylintrc_utils ignore list to enable pylint checking - Add to autils_migration_announcement workflow - Update spell.ignore with new technical terms Reference: avocado-framework/aautils#43 Assisted-By: Cursor-Claude-4-Sonnet Signed-off-by: Harvey Lynden <hlynden@redhat.com>
WalkthroughThis PR introduces a crypto module migration with expanded documentation, deprecation warnings, and comprehensive test coverage. Changes include updating workflow paths to trigger on crypto.py modifications, reorganizing linter configuration ignore lists, adding two new test modules (unit and functional) covering hash file functionality, updating test counts in the manifest, and expanding the spell ignore list with cryptographic-related terms. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @harvey0100, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the avocado.utils.crypto module by significantly enhancing the docstring for hash_file, adding a module docstring, and introducing a comprehensive suite of unit and functional tests. The new tests are well-structured and cover a wide range of scenarios, including edge cases and real-world use cases. The changes also include enabling pylint for the crypto module and updating test counts.
My main feedback is to change the default hashing algorithm from md5 to a more secure one like sha256, as md5 is cryptographically weak. This would make the utility safer by default for its intended purpose of file verification.
| LOG = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def hash_file(filename, size=None, algorithm="md5"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default hashing algorithm is md5, which is known to have collision vulnerabilities and is considered cryptographically broken. For a utility intended for file verification, using a weak algorithm by default poses a security risk.
I recommend changing the default to a more secure algorithm like sha256 to ensure safer usage out-of-the-box.
This change will require updating the unit tests that currently assert md5 is the default (e.g., HashFileTest.test_hash_file_md5_default).
| def hash_file(filename, size=None, algorithm="md5"): | |
| def hash_file(filename, size=None, algorithm="sha256"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @avocado/utils/crypto.py:
- Around line 26-55: The example in the hash_file docstring incorrectly shows
the MD5 of an empty string for '/etc/passwd'; update the docstring in the
hash_file function to remove the misleading hash by either (A) using a neutral
placeholder path like '/path/to/file' with a non-specific example digest such as
'abc123...' or (B) replacing the digest with the actual hash for the given
example path if you compute it; ensure the example path and digest are
consistent and not the empty-string MD5.
🧹 Nitpick comments (1)
selftests/functional/utils/crypto.py (1)
96-114: Consider whether to skip this test on Windows for consistency with platform-specific features.The test correctly validates symlink behavior on Unix-like systems. However, note that other symlink tests in the codebase (e.g.,
selftests/unit/datadir.py,selftests/unit/utils/archive.py) do not use skip decorators, and production code (avocado/utils/service.py, asset.py) callsos.symlink()without platform checks. If the test suite intends to gracefully skip platform-unsupported operations, adding a skip decorator would be consistent:Example skip decorator
@unittest.skipIf(sys.platform.startswith('win'), "Symlinks require special privileges on Windows") def test_symlink_follows_to_target(self): # existing test codeNote: This requires adding
import sysat the top of the file.Verify whether the project's testing strategy treats symlink failures as expected behavior or as unsupported operations that should be skipped.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/autils_migration_announcement.yml.pylintrc_utilsavocado/utils/crypto.pyselftests/check.pyselftests/functional/utils/crypto.pyselftests/unit/utils/crypto.pyspell.ignore
🧰 Additional context used
🧬 Code graph analysis (3)
selftests/functional/utils/crypto.py (1)
avocado/utils/crypto.py (1)
hash_file(25-80)
avocado/utils/crypto.py (1)
avocado/utils/deprecation.py (1)
warning(18-25)
selftests/unit/utils/crypto.py (2)
selftests/functional/utils/crypto.py (2)
setUp(13-15)tearDown(17-19)avocado/utils/crypto.py (1)
hash_file(25-80)
🪛 Ruff (0.14.10)
selftests/unit/utils/crypto.py
33-33: Probable use of insecure hash functions in hashlib: md5
(S324)
51-51: Probable use of insecure hash functions in hashlib: md5
(S324)
59-59: Probable use of insecure hash functions in hashlib: md5
(S324)
68-68: Probable use of insecure hash functions in hashlib: md5
(S324)
77-77: Probable use of insecure hash functions in hashlib: md5
(S324)
85-85: Probable use of insecure hash functions in hashlib: md5
(S324)
94-94: Probable use of insecure hash functions in hashlib: md5
(S324)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
- GitHub Check: rpm-build:fedora-41-aarch64
- GitHub Check: rpm-build:fedora-43-x86_64
- GitHub Check: rpm-build:epel-9-x86_64
- GitHub Check: rpm-build:centos-stream-9-x86_64
- GitHub Check: rpm-build:fedora-41-s390x
- GitHub Check: rpm-build:fedora-rawhide-x86_64
- GitHub Check: rpm-build:fedora-41-ppc64le
- GitHub Check: rpm-build:fedora-42-x86_64
- GitHub Check: rpm-build:fedora-41-s390x
- GitHub Check: rpm-build:fedora-42-x86_64
- GitHub Check: rpm-build:fedora-rawhide-x86_64
- GitHub Check: rpm-build:fedora-41-aarch64
- GitHub Check: rpm-build:epel-9-x86_64
- GitHub Check: rpm-build:centos-stream-9-x86_64
- GitHub Check: rpm-build:fedora-43-x86_64
- GitHub Check: rpm-build:fedora-41-ppc64le
- GitHub Check: rpm-build:fedora-43-x86_64
- GitHub Check: rpm-build:centos-stream-9-x86_64
- GitHub Check: rpm-build:epel-9-x86_64
- GitHub Check: rpm-build:fedora-41-s390x
- GitHub Check: rpm-build:fedora-42-x86_64
- GitHub Check: rpm-build:fedora-41-ppc64le
- GitHub Check: rpm-build:fedora-rawhide-x86_64
- GitHub Check: rpm-build:fedora-41-aarch64
- GitHub Check: Smokecheck on Linux with Python 3.9
- GitHub Check: Smokecheck on Linux with Python 3.10
- GitHub Check: Build Package (wheel/tarball) for Python 3.10
- GitHub Check: Build Package (wheel/tarball) for Python 3.13
- GitHub Check: Build Package (wheel/tarball) for Python 3.11
- GitHub Check: Windows with Python 3.13
- GitHub Check: Build Package (wheel/tarball) for Python 3.12
- GitHub Check: Egg task fedora:40
- GitHub Check: Version task ubuntu:22.04
- GitHub Check: Fedora selftests
- GitHub Check: Version task ubi:8.8
- GitHub Check: Static checks
- GitHub Check: Windows with Python 3.10
- GitHub Check: Windows with Python 3.11
- GitHub Check: Version task debian:12.4
- GitHub Check: macOS with Python 3.11
- GitHub Check: Code Coverage (3.11)
🔇 Additional comments (16)
.pylintrc_utils (1)
10-10: LGTM! Enabling pylint checks for crypto.py.Removing crypto.py from the ignore list enables static analysis, which is appropriate for a module being prepared for migration.
spell.ignore (1)
823-826: LGTM! Appropriate spell check additions.The crypto-related technical terms align with the module enhancements and test coverage in this PR.
selftests/check.py (1)
30-32: LGTM! Test counts match the crypto module additions.The increments (+11 unit, +4 functional-parallel) align precisely with the 11 unit tests and 4 functional tests mentioned in the commit message.
.github/workflows/autils_migration_announcement.yml (1)
9-9: LGTM! Workflow trigger updated for crypto module migration.Adding crypto.py to the path triggers ensures PRs modifying this module receive appropriate autils migration announcements.
avocado/utils/crypto.py (2)
15-16: LGTM! Clear module docstring.The concise module-level documentation effectively describes the module's purpose.
81-86: LGTM! Deprecation warning properly implemented.The deprecation warning is correctly positioned at module load time and uses the appropriate API. The
pylint: disabledirective is justified for this end-of-file import pattern.selftests/unit/utils/crypto.py (6)
1-19: LGTM! Solid test infrastructure.The imports are appropriate, and the setUp/tearDown pattern correctly manages temporary test files with proper cleanup.
21-26: LGTM! Good use of a helper method.The
_create_test_filehelper follows DRY principles and correctly handles binary file creation for crypto tests.
28-43: LGTM! Comprehensive algorithm testing.The tests correctly verify both default (MD5) and explicit (SHA256) algorithm support. The static analysis warnings about MD5 being insecure are false positives in this context—tests must use MD5 to verify MD5 functionality works correctly.
45-71: LGTM! Thorough size parameter coverage.The tests comprehensively cover all size parameter branches: partial hashing (size < file size), oversized requests (size > file size), and falsy values (None, 0). The inline comments effectively document the tested code paths.
73-110: LGTM! Excellent edge case and error handling coverage.The tests thoroughly validate edge cases (empty files, full binary range, multi-chunk reads) and error conditions (invalid algorithm gracefully returns None, missing file raises FileNotFoundError). This aligns perfectly with the documented API behavior.
113-123: LGTM! Good sanity check and proper test runner.The hash uniqueness test provides a valuable sanity check that the hash function produces distinct outputs for different inputs. The main block follows standard unittest conventions.
selftests/functional/utils/crypto.py (4)
1-19: LGTM! Consistent test infrastructure.The setup mirrors the unit test file, maintaining consistency across the test suite.
45-64: LGTM! Excellent security scenario test.The test correctly validates that file tampering is detectable through hash comparison, demonstrating a practical security use case.
66-94: LGTM! Practical manifest creation scenario.The test effectively demonstrates a real-world use case for build systems and package managers, correctly creating a directory structure and verifying checksums.
21-43: Checksums are correct for the test vector.The hardcoded checksums match the test vector "The quick brown fox jumps over the lazy dog". Both MD5 (9e107d9d372bb6826bd81d3542a419d6) and SHA256 (d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592) are accurate. The test appropriately uses a well-known standard test vector for verification.
| """Calculate the hash value of a file. | ||
| If size is not None, limit to first size bytes. | ||
| Throw exception if something is wrong with filename. | ||
| Can be also implemented with bash one-liner (assuming ``size%1024==0``):: | ||
| dd if=filename bs=1024 count=size/1024 | sha1sum - | ||
| Computes a cryptographic hash of the specified file using the given | ||
| algorithm. Optionally limits hashing to the first N bytes of the file, | ||
| which is useful for verifying partial downloads or large files. | ||
| :param filename: Path of the file that will have its hash calculated. | ||
| :param algorithm: Method used to calculate the hash (default is md5). | ||
| :type filename: str | ||
| :param size: If provided, hash only the first size bytes of the file. | ||
| :return: Hash of the file, if something goes wrong, return None. | ||
| If None or 0, the entire file is hashed. If size exceeds the file | ||
| size, the entire file is hashed. | ||
| :type size: int or None | ||
| :param algorithm: Hash algorithm to use. Supported algorithms include | ||
| md5, sha1, sha256, sha512, blake2b, and others available in hashlib. | ||
| :type algorithm: str | ||
| :return: Hexadecimal digest string of the computed hash. Returns None | ||
| if an invalid algorithm is specified. | ||
| :rtype: str or None | ||
| :raises FileNotFoundError: When the specified file does not exist. | ||
| :raises PermissionError: When the file cannot be read due to permissions. | ||
| Example:: | ||
| >>> hash_file('/etc/passwd') | ||
| 'd41d8cd98f00b204e9800998ecf8427e' | ||
| >>> hash_file('/path/to/file', algorithm='sha256') | ||
| 'e3b0c44298fc1c149afbf4c8996fb924...' | ||
| >>> hash_file('/path/to/large_file', size=1024) | ||
| 'abc123...' | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the example hash value.
The documentation improvements are excellent—comprehensive parameter descriptions, type hints, exception documentation, and examples. However, the example hash on line 50 is incorrect.
The hash d41d8cd98f00b204e9800998ecf8427e is the MD5 of an empty string, not /etc/passwd. Please use a realistic example or a clearly placeholder path like /path/to/file.
🔎 Suggested fix
- Example::
-
- >>> hash_file('/etc/passwd')
- 'd41d8cd98f00b204e9800998ecf8427e'
- >>> hash_file('/path/to/file', algorithm='sha256')
+ Example::
+
+ >>> hash_file('/path/to/file')
+ 'abc123def456...'
+ >>> hash_file('/path/to/file', algorithm='sha256')
'e3b0c44298fc1c149afbf4c8996fb924...'
>>> hash_file('/path/to/large_file', size=1024)
'abc123...'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Calculate the hash value of a file. | |
| If size is not None, limit to first size bytes. | |
| Throw exception if something is wrong with filename. | |
| Can be also implemented with bash one-liner (assuming ``size%1024==0``):: | |
| dd if=filename bs=1024 count=size/1024 | sha1sum - | |
| Computes a cryptographic hash of the specified file using the given | |
| algorithm. Optionally limits hashing to the first N bytes of the file, | |
| which is useful for verifying partial downloads or large files. | |
| :param filename: Path of the file that will have its hash calculated. | |
| :param algorithm: Method used to calculate the hash (default is md5). | |
| :type filename: str | |
| :param size: If provided, hash only the first size bytes of the file. | |
| :return: Hash of the file, if something goes wrong, return None. | |
| If None or 0, the entire file is hashed. If size exceeds the file | |
| size, the entire file is hashed. | |
| :type size: int or None | |
| :param algorithm: Hash algorithm to use. Supported algorithms include | |
| md5, sha1, sha256, sha512, blake2b, and others available in hashlib. | |
| :type algorithm: str | |
| :return: Hexadecimal digest string of the computed hash. Returns None | |
| if an invalid algorithm is specified. | |
| :rtype: str or None | |
| :raises FileNotFoundError: When the specified file does not exist. | |
| :raises PermissionError: When the file cannot be read due to permissions. | |
| Example:: | |
| >>> hash_file('/etc/passwd') | |
| 'd41d8cd98f00b204e9800998ecf8427e' | |
| >>> hash_file('/path/to/file', algorithm='sha256') | |
| 'e3b0c44298fc1c149afbf4c8996fb924...' | |
| >>> hash_file('/path/to/large_file', size=1024) | |
| 'abc123...' | |
| """ | |
| """Calculate the hash value of a file. | |
| Computes a cryptographic hash of the specified file using the given | |
| algorithm. Optionally limits hashing to the first N bytes of the file, | |
| which is useful for verifying partial downloads or large files. | |
| :param filename: Path of the file that will have its hash calculated. | |
| :type filename: str | |
| :param size: If provided, hash only the first size bytes of the file. | |
| If None or 0, the entire file is hashed. If size exceeds the file | |
| size, the entire file is hashed. | |
| :type size: int or None | |
| :param algorithm: Hash algorithm to use. Supported algorithms include | |
| md5, sha1, sha256, sha512, blake2b, and others available in hashlib. | |
| :type algorithm: str | |
| :return: Hexadecimal digest string of the computed hash. Returns None | |
| if an invalid algorithm is specified. | |
| :rtype: str or None | |
| :raises FileNotFoundError: When the specified file does not exist. | |
| :raises PermissionError: When the file cannot be read due to permissions. | |
| Example:: | |
| >>> hash_file('/path/to/file') | |
| 'abc123def456...' | |
| >>> hash_file('/path/to/file', algorithm='sha256') | |
| 'e3b0c44298fc1c149afbf4c8996fb924...' | |
| >>> hash_file('/path/to/large_file', size=1024) | |
| 'abc123...' | |
| """ |
🤖 Prompt for AI Agents
In @avocado/utils/crypto.py around lines 26-55, The example in the hash_file
docstring incorrectly shows the MD5 of an empty string for '/etc/passwd'; update
the docstring in the hash_file function to remove the misleading hash by either
(A) using a neutral placeholder path like '/path/to/file' with a non-specific
example digest such as 'abc123...' or (B) replacing the digest with the actual
hash for the given example path if you compute it; ensure the example path and
digest are consistent and not the empty-string MD5.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6261 +/- ##
==========================================
+ Coverage 73.58% 73.60% +0.02%
==========================================
Files 206 206
Lines 22497 22500 +3
==========================================
+ Hits 16555 16562 +7
+ Misses 5942 5938 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Updates to Crypto including functional/unit tests and improvement of RST Docstrings.
Summary by CodeRabbit
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.