Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,4 @@ body:
id: notes
attributes:
label: Additional context
description: CFG structure, HTML screenshots, logs, etc.
description: CFG structure, HTML screenshots, logs, etc.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/cfg_semantics.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ body:
attributes:
label: Desired CFG behavior
validations:
required: true
required: true
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/false_positive.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ body:
attributes:
label: CFG-related?
options:
- label: Control flow structure differs meaningfully
- label: Control flow structure differs meaningfully
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ body:
- type: textarea
id: alternatives
attributes:
label: Alternatives considered
label: Alternatives considered
2 changes: 1 addition & 1 deletion .github/actions/codeclone/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ Runs CodeClone to detect architectural code duplication in Python projects.
- uses: orenlab/codeclone/.github/actions/codeclone@v1
with:
path: .
fail-on-new: true
fail-on-new: true
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,6 @@ htmlcov/
.DS_Store

# Logs
*.log
*.log

.claude
38 changes: 31 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,31 +1,55 @@
default_install_hook_types: [ pre-commit, pre-push ]

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-merge-conflict
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-added-large-files
- id: check-toml
- id: check-yaml

- repo: local
hooks:
- id: ruff-check
name: Ruff (lint)
entry: ruff check .
- id: ruff-format
name: Ruff (format)
entry: ruff format .
language: system
pass_filenames: false
types: [ python ]
stages: [ pre-commit ]

- id: ruff-format
name: Ruff (format)
entry: ruff format .
- id: ruff-check
name: Ruff (lint)
entry: ruff check .
language: system
pass_filenames: false
types: [ python ]
stages: [ pre-commit ]

- id: mypy
name: Mypy
entry: mypy .
language: system
pass_filenames: false
types: [ python ]
stages: [ pre-commit ]

- id: codeclone
name: CodeClone
entry: codeclone
language: system
pass_filenames: false
args: [ ".", "--ci" ]
types: [ python ]
types: [ python ]
stages: [ pre-commit ]

- id: pytest
name: Pytest
entry: pytest -q
language: system
pass_filenames: false
types: [ python ]
stages: [ pre-push ]
55 changes: 39 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# Changelog

## [1.4.4] - 2026-03-14

### Performance

- Optimized HTML snippet rendering hot path:
- file snippets now reuse cached full-file lines and slice ranges without
repeated full-file scans
- Pygments modules are loaded once per importer identity instead of
re-importing for each snippet
- Optimized block explainability range stats:
- replaced repeated full `ast.walk()` scans per range with a per-file
statement index + `bisect` window lookup

### Tests

- Preserved existing golden/contract behavior for `1.4.x` and kept report output
semantics unchanged while improving runtime overhead.

### Contract Notes

- No baseline/cache/report schema changes.
- No clone detection or fingerprint semantic changes.

## [1.4.3] - 2026-03-03

### Cache Contract
Expand Down Expand Up @@ -328,57 +351,57 @@ codeclone . --update-baseline

### Overview

This release focuses on security hardening, robustness, and long-term maintainability.
This release focuses on security hardening, robustness, and long-term maintainability.
No breaking API changes were introduced.

The goal of this release is to provide users with a safe, deterministic, and CI-friendly
tool suitable for security-sensitive and large-scale environments.

### Security & Robustness

- **Path Traversal Protection**
- **Path Traversal Protection**
Implemented strict path validation to prevent scanning outside the project root or
accessing sensitive system directories, including macOS `/private` paths.

- **Cache Integrity Protection**
- **Cache Integrity Protection**
Added HMAC-SHA256 signing for cache files to prevent cache poisoning and detect tampering.

- **Parser Safety Limits**
- **Parser Safety Limits**
Introduced AST parsing time limits to mitigate risks from pathological or adversarial inputs.

- **Resource Exhaustion Protection**
- **Resource Exhaustion Protection**
Enforced a maximum file size limit (10MB) and a maximum file count per scan to prevent
excessive memory or CPU usage.

- **Structured Error Handling**
- **Structured Error Handling**
Introduced a dedicated exception hierarchy (`ParseError`, `CacheError`, etc.) and replaced
broad exception handling with graceful, user-friendly failure reporting.

### Performance Improvements

- **Optimized AST Normalization**
- **Optimized AST Normalization**
Replaced expensive `deepcopy` operations with in-place AST normalization, significantly
reducing CPU and memory overhead.

- **Improved Memory Efficiency**
- **Improved Memory Efficiency**
Added an LRU cache for file reading and optimized string concatenation during fingerprint
generation.

- **HTML Report Memory Bounds**
- **HTML Report Memory Bounds**
HTML reports now read only the required line ranges instead of entire files, reducing peak
memory usage on large codebases.

### Architecture & Maintainability

- **Strict Type Safety**
- **Strict Type Safety**
Migrated all optional typing to Python 3.10+ `| None` syntax and achieved 100% `mypy` strict
compliance.

- **Modular CFG Design**
- **Modular CFG Design**
Split CFG data structures and builder logic into separate modules (`cfg_model.py` and
`cfg.py`) for improved clarity and extensibility.

- **Template Extraction**
- **Template Extraction**
Extracted HTML templates into a dedicated `templates.py` module.

- Added a `py.typed` marker for downstream type checkers.
Expand Down Expand Up @@ -420,13 +443,13 @@ support for Python 3.10–3.14 across the test matrix.

### Fixed

- **CFG Exception Handling**
- **CFG Exception Handling**
Fixed incorrect control-flow linking for `try`/`except` blocks.

- **Pattern Matching Support**
- **Pattern Matching Support**
Added missing structural handling for `match`/`case` statements in the CFG.

- **Block Detection Scaling**
- **Block Detection Scaling**
Made `MIN_LINE_DISTANCE` dynamic based on block size to improve clone detection accuracy
across differently sized functions.

Expand All @@ -436,7 +459,7 @@ support for Python 3.10–3.14 across the test matrix.

### BREAKING CHANGES

- **CLI Arguments**
- **CLI Arguments**
Renamed output flags for brevity and consistency:
- `--json-out` → `--json`
- `--text-out` → `--text`
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
[![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)

**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.

---
Expand All @@ -34,13 +34,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak

**Three Detection Levels:**

1. **Function clones (CFG fingerprint)**
1. **Function clones (CFG fingerprint)**
Strong structural signal for cross-layer duplication

2. **Block clones (statement windows)**
2. **Block clones (statement windows)**
Detects repeated local logic patterns

3. **Segment clones (report-only)**
3. **Segment clones (report-only)**
Internal function repetition for explainability; not used for baseline gating

**CI-Ready Features:**
Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ If you believe you have discovered a security vulnerability, **do not open a pub

Please report it privately via email:

**Email:** `pytelemonbot@mail.ru`
**Email:** `pytelemonbot@mail.ru`
**Subject:** `Security issue in CodeClone`

When reporting a vulnerability, please include:
Expand Down
Loading
Loading