22
33## [ 2.0.0b1]
44
5- CodeClone 2.0 is a major upgrade that expands the project from a structural clone detector into a broader *
6- * baseline-aware code-health and CI governance tool** for Python.
5+ CodeClone 2.0 is a major upgrade that evolves the project from a structural clone detector into a ** baseline-aware * *
6+ code-health and CI governance tool for Python.
77
8- This beta introduces:
8+ This beta focuses on the new architecture, expanded code-health analysis, contract stability, and performance validation
9+ ahead of the final ` 2.0.0 ` release.
910
10- - a new stage-based architecture
11- - unified clone + metrics baseline flow
12- - report schema ` 2.1 ` , cache schema ` 2.1 ` , and richer report provenance
13- - expanded code-health analysis (complexity, coupling, cohesion, dependencies, dead code, health)
14- - improved HTML and CLI reporting surfaces
15- - substantial performance work for faster cold and warm runs
16-
17- Compatibility remains a first-class concern in this release:
18-
19- - baseline schema is bumped to ` 2.0 `
20- - ` fingerprint_version ` remains ` 1 `
21- - backward compatibility for legacy clone-only baselines is preserved
22-
23- This is a beta release intended to validate the new architecture, reporting surface, and performance profile before the
24- final ` 2.0.0 ` release.
25-
26- ### Fixes (feat/2.0.0)
11+ ### Overview
2712
28- - Fixed scanner root-exclude short-circuit: only an explicitly excluded root
29- directory is skipped; excluded segments in parent path no longer suppress
30- valid scans (prevents silent zero-file analysis for roots like ` build/project ` ).
31- - Optimized HTML snippet rendering path:
32- - ` _FileCache ` now caches full file lines once per file and serves
33- line-range slices without repeated full-file scans.
34- - Pygments imports are cached per importer identity to avoid repeated
35- dynamic import overhead in hot snippet loops while preserving testability.
36- - Optimized block explainability AST stats:
37- - added per-file statement index and range lookup via ` bisect ` ,
38- replacing repeated full ` ast.walk() ` scans per range.
39- - Added scanner regression coverage for roots under excluded parent directories.
40- - No baseline/cache/report schema contract changes; detector identity semantics
41- and golden compatibility preserved.
13+ - New stage-based pipeline architecture with unified clone + metrics baseline flow.
14+ - Expanded code-health analysis: complexity, coupling, cohesion, dependencies, dead code, and health.
15+ - Improved HTML and CLI reporting surfaces.
16+ - Significant performance work for faster cold and warm runs.
17+ - Baseline schema ` 2.0 ` , report schema ` 2.1 ` , cache schema ` 2.2 ` ; ` fingerprint_version ` remains ` 1 ` and legacy
18+ clone-only baselines stay compatible.
4219
4320### Architecture
4421
45- - Refactored CLI orchestration into a stage-based pipeline (` codeclone/pipeline.py ` ) to isolate discovery, processing ,
46- analysis, report writing, and gating.
22+ - Refactored CLI orchestration into a stage-based pipeline (` codeclone/pipeline.py ` ) that isolates discovery,
23+ processing, analysis, report writing, and gating.
4724- Introduced explicit domain layers:
4825 - ` codeclone/models.py ` — typed core models
4926 - ` codeclone/metrics/ ` — complexity, coupling, cohesion, dependencies, dead code, and health
50- - ` codeclone/report/ ` — merge, explain, serialize, and suggestions
27+ - ` codeclone/report/ ` — merge, explain, serialize, suggestions
5128 - ` codeclone/grouping.py ` — clone grouping domain
52- - Removed temporary legacy ` _report_* ` shim modules after migrating runtime and tests to ` codeclone.report.* ` .
29+ - Removed legacy ` _report_* ` shims after migrating runtime and tests to ` codeclone.report.* ` .
5330
5431### Baseline, Cache, and Report Contracts
5532
5633- Bumped baseline schema to ` 2.0 ` (` BASELINE_SCHEMA_VERSION ` ) while preserving compatibility checks for legacy ` 1.0 `
5734 clone-only payloads.
58- - Added unified baseline flow with optional top-level ` metrics ` stored in the same baseline file as clone keys .
35+ - Added a unified baseline flow with optional top-level ` metrics ` stored alongside clone keys in the same baseline file.
5936- Tracked embedded metrics snapshot integrity via ` meta.metrics_payload_sha256 ` .
6037- Preserved embedded metrics payload and hash when updating clone baseline content.
61- - Bumped cache schema to ` 2.1 ` .
62- - Bumped report schema to ` 2.1 ` .
63- - Consolidated report contract around canonical sections:
64- ` meta ` , ` inventory ` , ` findings ` , ` metrics ` , with ` derived ` and ` integrity `
65- as explicit companion layers.
66- - Structural findings now deduplicate repeated occurrences and use explicit
67- ` file_path ` item layout instead of a sentinel ` file_i=-1 ` .
68- - Tightened ` duplicated_branches ` reporting to suppress trivial single-statement
69- branch boilerplate without structural mass.
38+ - Bumped cache schema to ` 2.2 ` and report schema to ` 2.1 ` .
39+ - Extended cache metrics payload with canonical symbol-usage references:
40+ - ` referenced_qualnames ` in runtime entries
41+ - compact wire key ` rq ` in cache payload
42+ - Added additive cache payload key ` sr ` (segment report projection) to reuse merged
43+ segment suppression output on warm runs without cache schema/version bump.
44+ - Consolidated the report contract around canonical sections:
45+ ` meta ` , ` inventory ` , ` findings ` , ` metrics ` , with ` derived ` and ` integrity ` as companion layers.
46+ - Structural findings now deduplicate repeated occurrences and use an explicit ` file_path ` item layout instead of a
47+ sentinel ` file_i = -1 ` .
48+ - Tightened ` duplicated_branches ` reporting to suppress trivial single-statement boilerplate without structural mass.
49+
50+ ### Contract Stabilization Updates
51+
52+ - Added report-only structural finding families for clone cohort analysis:
53+ - ` clone_guard_exit_divergence `
54+ - ` clone_cohort_drift `
55+ - Added deterministic per-function stable structure facts in extraction/cache payloads and reused them for cohort
56+ structural findings without extra scans.
57+ - Extended cache wire ` u ` row with stable structure columns while preserving deterministic decode defaults for legacy
58+ rows.
59+ - Expanded ` tests/fixtures/golden_v2 ` contracts:
60+ - analysis snapshots now lock ` stable_structure ` and ` cohort_structural_findings `
61+ - CLI snapshots now lock structural group id/kind projections
62+ - Strengthened branch/invariant coverage for structural/report layers; coverage gate remains ` >=99% ` .
63+ - Synchronized contract docs with implemented code paths
64+ (` README ` , architecture, cache/report schema appendices, testing book).
7065
7166### Configuration and CLI UX
7267
73- - Added project config loading from ` pyproject.toml ` under ` [tool.codeclone] ` with strict key and type validation.
68+ - Added project configuration loading from ` pyproject.toml ` under ` [tool.codeclone] ` with strict key and type
69+ validation.
7470- Made precedence explicit: ` CLI (explicit flags) > pyproject.toml > parser/runtime defaults ` .
7571- Added a Python 3.10-compatible TOML loading path (` tomli ` fallback when ` tomllib ` is unavailable).
76- - Added optional-value report flags with deterministic defaults when passed without a path :
77- - ` --html ` -> ` .cache/codeclone/report.html `
78- - ` --json ` -> ` .cache/codeclone/report.json `
79- - ` --md ` -> ` .cache/codeclone/report.md `
80- - ` --sarif ` -> ` .cache/codeclone/report.sarif `
81- - ` --text ` -> ` .cache/codeclone/report.txt `
72+ - Added optional-value report flags with deterministic default paths when passed without a value :
73+ - ` --html ` → ` .cache/codeclone/report.html `
74+ - ` --json ` → ` .cache/codeclone/report.json `
75+ - ` --md ` → ` .cache/codeclone/report.md `
76+ - ` --sarif ` → ` .cache/codeclone/report.sarif `
77+ - ` --text ` → ` .cache/codeclone/report.txt `
8278- Added optional-value path flags for default-path intent:
8379 - ` --baseline `
8480 - ` --metrics-baseline `
@@ -87,41 +83,44 @@ final `2.0.0` release.
8783- Replaced confusing argparse-generated double-negation aliases with explicit flag pairs:
8884 - ` --no-progress ` / ` --progress `
8985 - ` --no-color ` / ` --color `
90- - Clarified CLI runtime footer wording: ` Pipeline done in X.XXs ` .
91- Reported time is pipeline time, not full process wall-clock including launcher or interpreter startup.
92- - Refreshed the terminal UI for both normal and ` --ci ` modes:
86+ - Clarified the CLI runtime footer wording: ` Pipeline done in X.XXs ` (pipeline time only, not full process wall-clock).
87+ - Refreshed the terminal UI for normal and ` --ci ` modes:
9388 - clearer run header with scan-root context
9489 - structured analysis summary and quality-metrics panels
9590 - explicit cache, clone, and baseline counters
96- - report path and pipeline-time footer integrated into the summary surface
97- - Fixed ` pyproject.toml ` override handling for ` metrics_baseline ` : a configured non-default metrics baseline path is now
98- respected even when ` --metrics-baseline ` is not passed explicitly.
99-
100- ### Documentation
101-
102- - Updated the root ` README.md ` to reflect CodeClone 2.0 as a structural clone detector, baseline-aware governance tool,
103- and code-health gate.
104- - Added a dedicated ` pyproject.toml ` configuration section (` [tool.codeclone] ` ) to the README.
105- - Documented default-path behavior for bare report flags (` --html ` , ` --json ` , ` --text ` ).
106- - Moved the long JSON report shape example under a collapsible ` <details> ` block for readability.
107- - Added conservative performance guidance in the README with local run numbers and a 100k LOC extrapolation.
108- - Updated contract docs in ` docs/book/* ` to reference ` codeclone/report/* ` directly instead of legacy shim paths.
109- - Documented CLI timing semantics in ` docs/book/09-cli.md ` .
91+ - report path and pipeline-time footer integrated into the summary
92+ - Fixed ` pyproject.toml ` override handling for ` metrics_baseline ` : a configured non-default path is now respected even
93+ when ` --metrics-baseline ` is not passed explicitly.
11094
11195### Report Provenance and UI
11296
113- - Added scan identity fields to report metadata:
97+ - Added scan identity fields to report meta
11498 - ` project_name `
11599 - ` scan_root `
116100- Rendered ` Project ` and ` Scan root ` in the HTML provenance panel.
117101- Added ` Project name ` and ` Scan root ` to TXT report metadata.
118102- Propagated the same fields into JSON report ` meta ` via the shared report metadata builder.
119- - Fixed baseline provenance after ` --update-baseline ` : report metadata now reflects the freshly saved clone baseline
120- hash (` baseline_payload_sha256 ` ) and verification state in the same run.
103+ - Fixed baseline provenance after ` --update-baseline ` : report metadata now reflects the freshly saved baseline hash
104+ (` baseline_payload_sha256 ` ) and verification state in the same run.
121105- Simplified dependency SVG rendering internals by removing unreachable guard branches while preserving deterministic
122106 output.
123107- Made suggestions table headers consistently render glossary help badges through a single deterministic template path.
124108
109+ ### Detection Quality
110+
111+ - Made the dead-code detector more conservative for non-actionable runtime patterns:
112+ - skips test paths and test entrypoint names
113+ - skips dunder methods
114+ - skips dynamic visitor methods (` visit_* ` ) and setup/teardown hooks
115+ - skips ` Protocol ` methods and stub-like callables (` @overload ` , ` @abstractmethod ` )
116+ - Reduced false positives without changing clone detection semantics.
117+ - Dead-code liveness now ignores references originating from test files, including cached test-file references, so
118+ production symbols used only in tests are still reported as dead-code candidates.
119+ - Dead-code liveness now uses exact canonical qualname references (including import-alias and module-alias usage)
120+ before fallback local-name checks, reducing false positives on re-export and alias wiring.
121+ - Refactored ` scanner.iter_py_files ` into deterministic helpers without semantic changes, reducing method complexity and
122+ keeping metrics-gate parity with the baseline.
123+
125124### Performance
126125
127126- Added adaptive multiprocessing thresholds so small batches stay sequential instead of paying process-pool overhead.
@@ -136,23 +135,25 @@ final `2.0.0` release.
136135- Improved warm-run responsiveness substantially while preserving deterministic behavior and output contracts.
137136- Deferred HTML renderer import in CLI so non-HTML runs do not pay template/render startup cost.
138137- Disabled transient status spinner contexts when ` --no-progress ` is active to reduce terminal I/O overhead.
139- - Added canonical cache-entry fast-path for already validated runtime entries while preserving fallback validation for
140- raw
141- or externally mutated payloads.
138+ - Added a canonical cache-entry fast path for already validated runtime entries while preserving fallback validation for
139+ raw or externally mutated payloads.
142140- Reused a shared parsed baseline payload when clone and metrics baselines point to the same file to avoid duplicate
143141 JSON reads/parses in one run.
144142
145- ### Detection Quality
143+ ### Fixes
146144
147- - Made the dead-code detector more conservative for non-actionable runtime patterns:
148- - skips test paths and test entrypoint names
149- - skips dunder methods
150- - skips dynamic visitor methods (` visit_* ` ) and setup/teardown hooks
151- - Reduced false positives without changing clone detection semantics.
152- - Dead-code liveness now ignores references originating from test files, including cached test-file references, so
153- production symbols used only in tests are still reported as dead-code candidates.
154- - Refactored ` scanner.iter_py_files ` into deterministic helpers without semantic changes, reducing method complexity to
155- keep metrics-gate parity with baseline.
145+ - Fixed scanner root-exclude short-circuit: only an explicitly excluded root directory is skipped; excluded segments in
146+ a parent path no longer suppress valid scans, preventing silent zero-file analysis for roots like ` build/project ` .
147+ - Optimized HTML snippet rendering path:
148+ - ` _FileCache ` now caches full file lines once per file and serves line-range slices without repeated full-file
149+ scans
150+ - Pygments imports are cached per importer identity to avoid repeated dynamic import overhead while preserving
151+ testability
152+ - Optimized block explainability AST stats:
153+ - added per-file statement index and range lookup via ` bisect ` , replacing repeated full ` ast.walk() ` scans per range
154+ - Added scanner regression coverage for roots under excluded parent directories.
155+ - No baseline/cache/report schema contract changes in this branch; detector identity semantics and golden compatibility
156+ are preserved.
156157
157158### Tests and Tooling
158159
0 commit comments