split-commits: Add skill for non-interactive commit splitting#5
split-commits: Add skill for non-interactive commit splitting#5cgwalters wants to merge 3 commits intobootc-dev:mainfrom
Conversation
Add a simple Justfile with a 'check' target that runs Python syntax validation on all scripts in the repository. Signed-off-by: Colin Walters <walters@verbum.org>
Signed-off-by: Colin Walters <walters@verbum.org>
Add a new skill for splitting mixed working tree changes into clean, logical commits. This is designed for AI agents that don't have TTY access for interactive tools like `git add -i`. The tool uses a session-based workflow: prepare topics, review hunks, assign each hunk to a topic, then commit topics in logical order. It validates staleness via blob hashes and uses `git apply --cached` with 3-way merge fallback. Ported from devaipod contrib/. Assisted-by: OpenCode (Claude claude-opus-4-5@20251101) Signed-off-by: Colin Walters <walters@verbum.org>
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable new skill for non-interactive commit splitting, which is well-documented and thoughtfully designed for AI agents. The core logic is solid, particularly the use of blob hashes for staleness checks. I've identified a few areas for improvement, including a couple of high-severity bugs related to parsing diffs and handling command-line arguments that could lead to incorrect behavior or crashes. I've also noted some opportunities to enhance code style and maintainability. My detailed feedback and suggestions are in the comments below.
| changed_files = git("diff", "--name-only").stdout.strip().split("\n") | ||
| file_blobs = {} | ||
| for filepath in changed_files: | ||
| if filepath: | ||
| file_blobs[filepath] = { | ||
| "index": get_index_blob(filepath), | ||
| "worktree": hash_object(filepath), | ||
| } |
There was a problem hiding this comment.
The script does not correctly handle files that have been deleted from the working tree. git diff --name-only will list deleted files, but the subsequent call to hash_object(filepath) on line 120 will fail because the file no longer exists. This prevents the session from being prepared and makes it impossible to include file deletions in commits using this tool.
You should consider using git diff --name-status to detect file statuses and handle deleted files appropriately (e.g., by skipping the hash_object call for the worktree version of deleted files).
| match = re.match(r"@@ -(\d+)", hunk_header) | ||
| if match: | ||
| old_start = int(match.group(1)) | ||
| match = re.match(r"@@ -\d+(?:,\d+)? \+(\d+)", hunk_header) | ||
| if match: | ||
| new_start = int(match.group(1)) |
There was a problem hiding this comment.
The regular expression logic to parse the new_start line number from the hunk header is incorrect. The re.match on line 183 will always fail because it tries to match from the beginning of the string, while the + part of the hunk header is not at the start. This results in new_start always being its default value of 0, which is a bug that breaks hunk parsing, especially for new files.
You should use a single, more robust regular expression with re.search to capture both old_start and new_start line numbers.
| match = re.match(r"@@ -(\d+)", hunk_header) | |
| if match: | |
| old_start = int(match.group(1)) | |
| match = re.match(r"@@ -\d+(?:,\d+)? \+(\d+)", hunk_header) | |
| if match: | |
| new_start = int(match.group(1)) | |
| match = re.search(r"@@ -(\d+)(?:,\d+)? \+(\d+)(?:,\d+)?", hunk_header) | |
| if match: | |
| old_start = int(match.group(1)) | |
| new_start = int(match.group(2)) |
| elif cmd == "next": | ||
| hunk_id = int(sys.argv[2]) if len(sys.argv) > 2 else None | ||
| session.next(hunk_id) | ||
| elif cmd == "assign": | ||
| if len(sys.argv) < 4: | ||
| die("usage: git-split-commits assign HUNK_ID TOPIC") | ||
| session.assign(int(sys.argv[2]), sys.argv[3]) |
There was a problem hiding this comment.
The script does not handle cases where a non-integer value is provided for HUNK_ID in the next and assign commands. This will cause an unhandled ValueError and crash the script with a traceback.
You should wrap the int() conversions in a try...except ValueError block and call die() with a user-friendly error message to make the command-line interface more robust.
|
|
||
| At the current time, these skills are not intended for use outside | ||
| of the organization. | ||
| This is intentially split into its own git repository to make it |
| """ | ||
|
|
||
| import json | ||
| import os |
| old_start = new_start = 0 | ||
| if hunk_header.startswith("@@"): | ||
| # Format: @@ -old_start,old_count +new_start,new_count @@ | ||
| import re |
| if first_context and first_change: | ||
| break | ||
|
|
||
| hunk_id = len(list(self.hunks_dir.glob("*.patch"))) |
There was a problem hiding this comment.
The method for generating hunk_id by counting files on disk with self.hunks_dir.glob("*.patch") is inefficient and brittle. It performs a filesystem operation for every single hunk, and could be unreliable if the state directory is manipulated externally.
A more robust and efficient approach would be to use a simple integer counter that is initialized in the prepare method and incremented for each hunk.
| if topic in topic_counts: | ||
| topic_counts[topic] += 1 | ||
| else: | ||
| topic_counts[topic] = 1 |
There was a problem hiding this comment.
Add a new skill for splitting mixed working tree changes into clean,
logical commits. This is designed for AI agents that don't have TTY
access for interactive tools like
git add -i.The tool uses a session-based workflow: prepare topics, review hunks,
assign each hunk to a topic, then commit topics in logical order.
It validates staleness via blob hashes and uses
git apply --cachedwith 3-way merge fallback.
Assisted-by: OpenCode (Claude claude-opus-4-5@20251101)
Signed-off-by: Colin Walters walters@verbum.org