Skip to content

Conversation

@harvey0100
Copy link
Contributor

@harvey0100 harvey0100 commented Jan 5, 2026

Updates to Crypto including functional/unit tests and improvement of RST Docstrings.

Summary by CodeRabbit

  • Documentation

    • Improved hash_file function documentation with enhanced descriptions, type hints, and usage examples.
  • Chores

    • Crypto module now displays a deprecation notice.

✏️ Tip: You can customize this high-level summary in your review settings.

Add comprehensive test coverage for the crypto.hash_file function:

- 11 unit tests covering algorithms, size parameter, edge cases, errors
- 4 functional tests for real-world scenarios (download verification,
  tampering detection, manifest creation, symlink handling)
- Update TEST_SIZE counts in check.py

Reference: avocado-framework/aautils#43
Assisted-By: Cursor-Claude-4-Sonnet
Signed-off-by: Harvey Lynden <hlynden@redhat.com>
- Add module docstring and enhance function RST docstrings
- Add deprecation warning for autils migration
- Remove from .pylintrc_utils ignore list to enable pylint checking
- Add to autils_migration_announcement workflow
- Update spell.ignore with new technical terms

Reference: avocado-framework/aautils#43
Assisted-By: Cursor-Claude-4-Sonnet
Signed-off-by: Harvey Lynden <hlynden@redhat.com>
@harvey0100 harvey0100 requested a review from richtja January 5, 2026 12:49
@harvey0100 harvey0100 self-assigned this Jan 5, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Walkthrough

This PR introduces a crypto module migration with expanded documentation, deprecation warnings, and comprehensive test coverage. Changes include updating workflow paths to trigger on crypto.py modifications, reorganizing linter configuration ignore lists, adding two new test modules (unit and functional) covering hash file functionality, updating test counts in the manifest, and expanding the spell ignore list with cryptographic-related terms.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Crypto Module Preperation/Tests' is partially related to the changeset but contains a typo and is overly broad. While the PR does add tests and preparation, the title uses 'Preperation' (misspelled; should be 'Preparation') and does not clearly convey the comprehensive nature of the changes including docstring improvements, deprecation warnings, and workflow updates. Correct the typo ('Preperation' → 'Preparation') and consider a more specific title such as 'Add crypto module tests and preparation for autils migration' to better reflect all changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mr-avocado mr-avocado bot moved this to Review Requested in Default project Jan 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @harvey0100, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the crypto module, particularly the hash_file function, by adding robust unit and functional tests to ensure its reliability and correctness across various use cases. Concurrently, the documentation for hash_file has been greatly expanded for better clarity and usability. The module has also been brought under pylint scrutiny, and a deprecation warning has been added, signaling future changes to its status.

Highlights

  • Comprehensive Testing: Introduced extensive unit and functional tests for the crypto.hash_file utility, covering various algorithms, size limits, edge cases, and real-world scenarios like download verification and file tampering detection.
  • Improved Docstrings: Enhanced the docstring for the crypto.hash_file function, providing clearer explanations, type hints, details on raised exceptions, and practical examples, adhering to RST format.
  • Pylint Integration: Removed crypto.py from the .pylintrc_utils ignore list, integrating the module into the project's static analysis checks.
  • Module Deprecation: Added a deprecation warning to the crypto module, indicating its planned phase-out or replacement.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/autils_migration_announcement.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the avocado.utils.crypto module by significantly enhancing the docstring for hash_file, adding a module docstring, and introducing a comprehensive suite of unit and functional tests. The new tests are well-structured and cover a wide range of scenarios, including edge cases and real-world use cases. The changes also include enabling pylint for the crypto module and updating test counts.

My main feedback is to change the default hashing algorithm from md5 to a more secure one like sha256, as md5 is cryptographically weak. This would make the utility safer by default for its intended purpose of file verification.

LOG = logging.getLogger(__name__)


def hash_file(filename, size=None, algorithm="md5"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The default hashing algorithm is md5, which is known to have collision vulnerabilities and is considered cryptographically broken. For a utility intended for file verification, using a weak algorithm by default poses a security risk.

I recommend changing the default to a more secure algorithm like sha256 to ensure safer usage out-of-the-box.

This change will require updating the unit tests that currently assert md5 is the default (e.g., HashFileTest.test_hash_file_md5_default).

Suggested change
def hash_file(filename, size=None, algorithm="md5"):
def hash_file(filename, size=None, algorithm="sha256"):

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @avocado/utils/crypto.py:
- Around line 26-55: The example in the hash_file docstring incorrectly shows
the MD5 of an empty string for '/etc/passwd'; update the docstring in the
hash_file function to remove the misleading hash by either (A) using a neutral
placeholder path like '/path/to/file' with a non-specific example digest such as
'abc123...' or (B) replacing the digest with the actual hash for the given
example path if you compute it; ensure the example path and digest are
consistent and not the empty-string MD5.
🧹 Nitpick comments (1)
selftests/functional/utils/crypto.py (1)

96-114: Consider whether to skip this test on Windows for consistency with platform-specific features.

The test correctly validates symlink behavior on Unix-like systems. However, note that other symlink tests in the codebase (e.g., selftests/unit/datadir.py, selftests/unit/utils/archive.py) do not use skip decorators, and production code (avocado/utils/service.py, asset.py) calls os.symlink() without platform checks. If the test suite intends to gracefully skip platform-unsupported operations, adding a skip decorator would be consistent:

Example skip decorator
@unittest.skipIf(sys.platform.startswith('win'), "Symlinks require special privileges on Windows")
def test_symlink_follows_to_target(self):
    # existing test code

Note: This requires adding import sys at the top of the file.

Verify whether the project's testing strategy treats symlink failures as expected behavior or as unsupported operations that should be skipped.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 111ca38 and 47551ca.

📒 Files selected for processing (7)
  • .github/workflows/autils_migration_announcement.yml
  • .pylintrc_utils
  • avocado/utils/crypto.py
  • selftests/check.py
  • selftests/functional/utils/crypto.py
  • selftests/unit/utils/crypto.py
  • spell.ignore
🧰 Additional context used
🧬 Code graph analysis (3)
selftests/functional/utils/crypto.py (1)
avocado/utils/crypto.py (1)
  • hash_file (25-80)
avocado/utils/crypto.py (1)
avocado/utils/deprecation.py (1)
  • warning (18-25)
selftests/unit/utils/crypto.py (2)
selftests/functional/utils/crypto.py (2)
  • setUp (13-15)
  • tearDown (17-19)
avocado/utils/crypto.py (1)
  • hash_file (25-80)
🪛 Ruff (0.14.10)
selftests/unit/utils/crypto.py

33-33: Probable use of insecure hash functions in hashlib: md5

(S324)


51-51: Probable use of insecure hash functions in hashlib: md5

(S324)


59-59: Probable use of insecure hash functions in hashlib: md5

(S324)


68-68: Probable use of insecure hash functions in hashlib: md5

(S324)


77-77: Probable use of insecure hash functions in hashlib: md5

(S324)


85-85: Probable use of insecure hash functions in hashlib: md5

(S324)


94-94: Probable use of insecure hash functions in hashlib: md5

(S324)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
  • GitHub Check: rpm-build:fedora-41-aarch64
  • GitHub Check: rpm-build:fedora-43-x86_64
  • GitHub Check: rpm-build:epel-9-x86_64
  • GitHub Check: rpm-build:centos-stream-9-x86_64
  • GitHub Check: rpm-build:fedora-41-s390x
  • GitHub Check: rpm-build:fedora-rawhide-x86_64
  • GitHub Check: rpm-build:fedora-41-ppc64le
  • GitHub Check: rpm-build:fedora-42-x86_64
  • GitHub Check: rpm-build:fedora-41-s390x
  • GitHub Check: rpm-build:fedora-42-x86_64
  • GitHub Check: rpm-build:fedora-rawhide-x86_64
  • GitHub Check: rpm-build:fedora-41-aarch64
  • GitHub Check: rpm-build:epel-9-x86_64
  • GitHub Check: rpm-build:centos-stream-9-x86_64
  • GitHub Check: rpm-build:fedora-43-x86_64
  • GitHub Check: rpm-build:fedora-41-ppc64le
  • GitHub Check: rpm-build:fedora-43-x86_64
  • GitHub Check: rpm-build:centos-stream-9-x86_64
  • GitHub Check: rpm-build:epel-9-x86_64
  • GitHub Check: rpm-build:fedora-41-s390x
  • GitHub Check: rpm-build:fedora-42-x86_64
  • GitHub Check: rpm-build:fedora-41-ppc64le
  • GitHub Check: rpm-build:fedora-rawhide-x86_64
  • GitHub Check: rpm-build:fedora-41-aarch64
  • GitHub Check: Smokecheck on Linux with Python 3.9
  • GitHub Check: Smokecheck on Linux with Python 3.10
  • GitHub Check: Build Package (wheel/tarball) for Python 3.10
  • GitHub Check: Build Package (wheel/tarball) for Python 3.13
  • GitHub Check: Build Package (wheel/tarball) for Python 3.11
  • GitHub Check: Windows with Python 3.13
  • GitHub Check: Build Package (wheel/tarball) for Python 3.12
  • GitHub Check: Egg task fedora:40
  • GitHub Check: Version task ubuntu:22.04
  • GitHub Check: Fedora selftests
  • GitHub Check: Version task ubi:8.8
  • GitHub Check: Static checks
  • GitHub Check: Windows with Python 3.10
  • GitHub Check: Windows with Python 3.11
  • GitHub Check: Version task debian:12.4
  • GitHub Check: macOS with Python 3.11
  • GitHub Check: Code Coverage (3.11)
🔇 Additional comments (16)
.pylintrc_utils (1)

10-10: LGTM! Enabling pylint checks for crypto.py.

Removing crypto.py from the ignore list enables static analysis, which is appropriate for a module being prepared for migration.

spell.ignore (1)

823-826: LGTM! Appropriate spell check additions.

The crypto-related technical terms align with the module enhancements and test coverage in this PR.

selftests/check.py (1)

30-32: LGTM! Test counts match the crypto module additions.

The increments (+11 unit, +4 functional-parallel) align precisely with the 11 unit tests and 4 functional tests mentioned in the commit message.

.github/workflows/autils_migration_announcement.yml (1)

9-9: LGTM! Workflow trigger updated for crypto module migration.

Adding crypto.py to the path triggers ensures PRs modifying this module receive appropriate autils migration announcements.

avocado/utils/crypto.py (2)

15-16: LGTM! Clear module docstring.

The concise module-level documentation effectively describes the module's purpose.


81-86: LGTM! Deprecation warning properly implemented.

The deprecation warning is correctly positioned at module load time and uses the appropriate API. The pylint: disable directive is justified for this end-of-file import pattern.

selftests/unit/utils/crypto.py (6)

1-19: LGTM! Solid test infrastructure.

The imports are appropriate, and the setUp/tearDown pattern correctly manages temporary test files with proper cleanup.


21-26: LGTM! Good use of a helper method.

The _create_test_file helper follows DRY principles and correctly handles binary file creation for crypto tests.


28-43: LGTM! Comprehensive algorithm testing.

The tests correctly verify both default (MD5) and explicit (SHA256) algorithm support. The static analysis warnings about MD5 being insecure are false positives in this context—tests must use MD5 to verify MD5 functionality works correctly.


45-71: LGTM! Thorough size parameter coverage.

The tests comprehensively cover all size parameter branches: partial hashing (size < file size), oversized requests (size > file size), and falsy values (None, 0). The inline comments effectively document the tested code paths.


73-110: LGTM! Excellent edge case and error handling coverage.

The tests thoroughly validate edge cases (empty files, full binary range, multi-chunk reads) and error conditions (invalid algorithm gracefully returns None, missing file raises FileNotFoundError). This aligns perfectly with the documented API behavior.


113-123: LGTM! Good sanity check and proper test runner.

The hash uniqueness test provides a valuable sanity check that the hash function produces distinct outputs for different inputs. The main block follows standard unittest conventions.

selftests/functional/utils/crypto.py (4)

1-19: LGTM! Consistent test infrastructure.

The setup mirrors the unit test file, maintaining consistency across the test suite.


45-64: LGTM! Excellent security scenario test.

The test correctly validates that file tampering is detectable through hash comparison, demonstrating a practical security use case.


66-94: LGTM! Practical manifest creation scenario.

The test effectively demonstrates a real-world use case for build systems and package managers, correctly creating a directory structure and verifying checksums.


21-43: Checksums are correct for the test vector.

The hardcoded checksums match the test vector "The quick brown fox jumps over the lazy dog". Both MD5 (9e107d9d372bb6826bd81d3542a419d6) and SHA256 (d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592) are accurate. The test appropriately uses a well-known standard test vector for verification.

Comment on lines +26 to 55
"""Calculate the hash value of a file.
If size is not None, limit to first size bytes.
Throw exception if something is wrong with filename.
Can be also implemented with bash one-liner (assuming ``size%1024==0``)::
dd if=filename bs=1024 count=size/1024 | sha1sum -
Computes a cryptographic hash of the specified file using the given
algorithm. Optionally limits hashing to the first N bytes of the file,
which is useful for verifying partial downloads or large files.
:param filename: Path of the file that will have its hash calculated.
:param algorithm: Method used to calculate the hash (default is md5).
:type filename: str
:param size: If provided, hash only the first size bytes of the file.
:return: Hash of the file, if something goes wrong, return None.
If None or 0, the entire file is hashed. If size exceeds the file
size, the entire file is hashed.
:type size: int or None
:param algorithm: Hash algorithm to use. Supported algorithms include
md5, sha1, sha256, sha512, blake2b, and others available in hashlib.
:type algorithm: str
:return: Hexadecimal digest string of the computed hash. Returns None
if an invalid algorithm is specified.
:rtype: str or None
:raises FileNotFoundError: When the specified file does not exist.
:raises PermissionError: When the file cannot be read due to permissions.
Example::
>>> hash_file('/etc/passwd')
'd41d8cd98f00b204e9800998ecf8427e'
>>> hash_file('/path/to/file', algorithm='sha256')
'e3b0c44298fc1c149afbf4c8996fb924...'
>>> hash_file('/path/to/large_file', size=1024)
'abc123...'
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the example hash value.

The documentation improvements are excellent—comprehensive parameter descriptions, type hints, exception documentation, and examples. However, the example hash on line 50 is incorrect.

The hash d41d8cd98f00b204e9800998ecf8427e is the MD5 of an empty string, not /etc/passwd. Please use a realistic example or a clearly placeholder path like /path/to/file.

🔎 Suggested fix
-    Example::
-
-        >>> hash_file('/etc/passwd')
-        'd41d8cd98f00b204e9800998ecf8427e'
-        >>> hash_file('/path/to/file', algorithm='sha256')
+    Example::
+
+        >>> hash_file('/path/to/file')
+        'abc123def456...'
+        >>> hash_file('/path/to/file', algorithm='sha256')
         'e3b0c44298fc1c149afbf4c8996fb924...'
         >>> hash_file('/path/to/large_file', size=1024)
         'abc123...'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Calculate the hash value of a file.
If size is not None, limit to first size bytes.
Throw exception if something is wrong with filename.
Can be also implemented with bash one-liner (assuming ``size%1024==0``)::
dd if=filename bs=1024 count=size/1024 | sha1sum -
Computes a cryptographic hash of the specified file using the given
algorithm. Optionally limits hashing to the first N bytes of the file,
which is useful for verifying partial downloads or large files.
:param filename: Path of the file that will have its hash calculated.
:param algorithm: Method used to calculate the hash (default is md5).
:type filename: str
:param size: If provided, hash only the first size bytes of the file.
:return: Hash of the file, if something goes wrong, return None.
If None or 0, the entire file is hashed. If size exceeds the file
size, the entire file is hashed.
:type size: int or None
:param algorithm: Hash algorithm to use. Supported algorithms include
md5, sha1, sha256, sha512, blake2b, and others available in hashlib.
:type algorithm: str
:return: Hexadecimal digest string of the computed hash. Returns None
if an invalid algorithm is specified.
:rtype: str or None
:raises FileNotFoundError: When the specified file does not exist.
:raises PermissionError: When the file cannot be read due to permissions.
Example::
>>> hash_file('/etc/passwd')
'd41d8cd98f00b204e9800998ecf8427e'
>>> hash_file('/path/to/file', algorithm='sha256')
'e3b0c44298fc1c149afbf4c8996fb924...'
>>> hash_file('/path/to/large_file', size=1024)
'abc123...'
"""
"""Calculate the hash value of a file.
Computes a cryptographic hash of the specified file using the given
algorithm. Optionally limits hashing to the first N bytes of the file,
which is useful for verifying partial downloads or large files.
:param filename: Path of the file that will have its hash calculated.
:type filename: str
:param size: If provided, hash only the first size bytes of the file.
If None or 0, the entire file is hashed. If size exceeds the file
size, the entire file is hashed.
:type size: int or None
:param algorithm: Hash algorithm to use. Supported algorithms include
md5, sha1, sha256, sha512, blake2b, and others available in hashlib.
:type algorithm: str
:return: Hexadecimal digest string of the computed hash. Returns None
if an invalid algorithm is specified.
:rtype: str or None
:raises FileNotFoundError: When the specified file does not exist.
:raises PermissionError: When the file cannot be read due to permissions.
Example::
>>> hash_file('/path/to/file')
'abc123def456...'
>>> hash_file('/path/to/file', algorithm='sha256')
'e3b0c44298fc1c149afbf4c8996fb924...'
>>> hash_file('/path/to/large_file', size=1024)
'abc123...'
"""
🤖 Prompt for AI Agents
In @avocado/utils/crypto.py around lines 26-55, The example in the hash_file
docstring incorrectly shows the MD5 of an empty string for '/etc/passwd'; update
the docstring in the hash_file function to remove the misleading hash by either
(A) using a neutral placeholder path like '/path/to/file' with a non-specific
example digest such as 'abc123...' or (B) replacing the digest with the actual
hash for the given example path if you compute it; ensure the example path and
digest are consistent and not the empty-string MD5.

@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.60%. Comparing base (111ca38) to head (47551ca).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6261      +/-   ##
==========================================
+ Coverage   73.58%   73.60%   +0.02%     
==========================================
  Files         206      206              
  Lines       22497    22500       +3     
==========================================
+ Hits        16555    16562       +7     
+ Misses       5942     5938       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@harvey0100 harvey0100 requested a review from clebergnu January 5, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Review Requested

Development

Successfully merging this pull request may close these issues.

1 participant