Document deterministic behavior and canonicalization controls.
- Sorted file traversal:
codeclone/scanner.py - Canonical report construction:
codeclone/report/document/* - Deterministic text projection:
codeclone/report/renderers/text.py - Baseline hashing:
codeclone/baseline/trust.py - Cache signing:
codeclone/cache/integrity.py
Deterministic outputs depend on:
- fixed Python tag
- fixed baseline/cache/report schemas
- sorted file traversal
- sorted group keys and item records
- canonical JSON serialization for hashes/signatures
- Canonical JSON report uses deterministic ordering for files, groups, items, and summaries.
- Text/Markdown/SARIF projections are deterministic views over the canonical report.
- Baseline hash is canonical and independent from non-payload metadata fields.
- Cache signature is canonical and independent from JSON whitespace.
Refs:
codeclone/report/document/builder.py:build_report_documentcodeclone/report/renderers/text.py:render_text_report_documentcodeclone/baseline/trust.py:_compute_payload_sha256codeclone/cache/integrity.py:sign_cache_payload
inventory.file_registry.itemsis lexicographically sorted.- finding groups/items and derived hotlists are deterministically ordered.
- baseline clone lists are sorted and unique.
- golden detector fixtures run only on the canonical Python tag from fixture metadata.
Refs:
codeclone/report/document/inventory.py:_build_inventory_payloadcodeclone/baseline/trust.py:_require_sorted_unique_idstests/test_detector_golden.py::test_detector_output_matches_golden_fixture
| Condition | Determinism impact |
|---|---|
| Different Python tag | Clone IDs may differ; baseline becomes incompatible |
| Unsorted/non-canonical baseline IDs | Baseline rejected as invalid |
| Cache signature mismatch | Cache ignored and recomputed |
| Different cache provenance state | meta.cache_* differs by design |
Primary canonicalization points:
- canonical JSON with sorted keys and compact separators for baseline/cache hashing
- stable tuple-based sort keys for report arrays and hotlists
Refs:
codeclone/baseline/trust.py:_compute_payload_sha256codeclone/cache/integrity.py:canonical_jsoncodeclone/report/document/integrity.py:_build_integrity_payload
tests/test_report.py::test_report_json_deterministic_group_ordertests/test_report.py::test_report_json_deterministic_with_shuffled_unitstests/test_report.py::test_text_report_deterministic_group_ordertests/test_baseline.py::test_baseline_hash_canonical_determinismtests/test_cache.py::test_cache_signature_validation_ignores_json_whitespace
- Determinism is not guaranteed across different
python_tagvalues. - Byte-identical reports are not guaranteed across different cache provenance states.