Fix set_batch_keys() return value handling and metadata isolation #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses critical bugs from PR #36 code review where
set_batch_keys()return values were ignored and metadata dicts were shared between CodeChunk instances.Core Issues
Return value ignored:
CodeChunk.set_batch_keys()returns a new immutable instance, but 5 call sites discarded the return value, causing batch keys to never be set:Type mismatch: Method expects
BatchKeys(id, idx, sparse) but was passed rawUUID7.Shared metadata: Shallow copy caused metadata dict references to be shared between instances, enabling race conditions:
Changes
semantic.py: Fixed return value capture at 5 call sites, addedBatchKeyscreation with enumerated indiceschunks.py: Added explicit metadata dict copy inset_batch_keys()to prevent shared referencessemantic.py: Removed redundant iteration in_finalize_chunks()(O(2n) → O(n))semantic.py: Tightened type annotationbatch_id: Any→batch_id: UUID7Impact
Batch keys are now actually set on chunks (previously never set). Metadata mutations are isolated per-instance (previously shared).
Original prompt
This section details on the original issue you should resolve
<issue_title>@claude please address the issues you identified in your review and make the suggested changes. Your review:</issue_title>
<issue_description>@claude please address the issues you identified in your review and make the suggested changes. Your review:
Claude Code Report
🚀 System Initialization
Available Tools: 20 tools loaded
I'll review PR #36 for bugs, security issues, and performance problems. Let me first fetch the pull request details.
Token usage: 18450 input, 3 output
🔧
BashParameters:
{ "command": "gh pr view 36 --json title,body,files,commits", "description": "Fetch PR knitli/codeweaver-mcp#36 details" }Result:
{"body":"Tests on `003-our-aim-to` branch were failing due to fixture configuration and model serialization issues. Fixed validation errors and pickle incompatibilities blocking test execution.\n\n## Changes\n\n### Test Fixtures\n- **Removed duplicate `governor` fixture** in `test_delimiter_basic.py` - now uses `chunk_governor` from `conftest.py` with proper `EmbeddingModelCapabilities` \n- **Added `CodeChunk.model_rebuild()`** in `conftest.py` with `EmbeddingBatchInfo` import to resolve forward references\n\n### Method Name Updates\n- **Replaced `set_batch_id()` → `set_batch_keys()`** in `semantic.py` (5 locations) - aligns with current `CodeChunk` API\n\n### Pickle Compatibility\n- **Changed `model_copy(deep=True)` → `model_copy(deep=False)`** in `CodeChunk.set_batch_keys()` - ast-grep `SgNode` objects in metadata cannot be pickled\n\n```python\n# Before: deep copy attempted to pickle unpicklable SgNode\nreturn self.model_copy(update={...}, deep=True)\n\n# After: shallow copy avoids pickling metadata\nreturn self.model_copy(update={...}, deep=False)\n```\n\n## Results\n- **37/43 unit tests passing** (was 0/43 before fixes)\n- All delimiter chunker tests passing (18/18)\n- All semantic error tests passing (11/11)\n- 5 remaining failures are edge cases unrelated to these fixes\n\n## Outstanding Issues\n- Integration tests timeout during collection (heavy imports suspected)\n- 5 edge case test failures need separate investigation\n\n> [!WARNING]\n>\n> <details>\n> <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary>\n>\n> #### I tried to connect to the following addresses, but was blocked by firewall rules:\n>\n> - `astral.sh`\n> - Triggering command: `curl -LsSf REDACTED` (dns block)\n>\n> If you need me to access, download, or install something from one of these locations, you can either:\n>\n> - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled\n> - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/knitli/codeweaver-mcp/settings/copilot/coding_agent) (admins only)\n>\n> </details>\n\n<!-- START COPILOT CODING AGENT SUFFIX -->\n\n\n\n<details>\n\n<summary>Original prompt</summary>\n\n> Repo: knitli/codeweaver-mcp\n> Use Branch: 003-our-aim-to\n> \n> We merged in the last of the specs for this branch (specs/003-our-aim-to/spec.md) last night but the tests haven't ran. I want you to run the tests and assess the situation if it seems to much to resolve, then I want you to audit the implementation and provide an assessment of the situation and remaining issues or missing pieces, and of course document your test observations. Otherwise, try to resolve failed tests.\n\n\n</details>\n\n\n\n<!-- START COPILOT CODING AGENT TIPS -->\n---\n\n💡 You can make Copilot smarter by setting up custom instructions, customizing its develo...Token usage: 18450 input, 3 output
Now let me examine the actual code changes in the PR:
Token usage: 20781 input, 1 output
🔧
BashParameters:
{ "command": "gh pr diff 36", "description": "Get diff for PR knitli/codeweaver-mcp#36" }Result: