Rewrite CLAUDE.md and add Claude Code rules by tlee732 · Pull Request #154 · indextables/tantivy4java

tlee732 · 2026-04-04T22:30:35Z

Summary

Rewrote CLAUDE.md from 1,690 lines to 75 lines — replaced cumulative changelog/feature diary with a concise project guide
Added 5 Claude Code rules in .claude/rules/ for consistent contributor behavior

Methodology

We ran the /claude-md-management:claude-md-improver audit skill, which:

Discovered all CLAUDE.md files in the repo
Evaluated each against a quality rubric (commands, architecture clarity, non-obvious patterns, conciseness, currency, actionability)
Produced a scored quality report (see table below)
Recommended targeted updates based on gaps found

The before/after scores below come directly from that audit. The rewrite was then reviewed interactively — we discussed what to keep from the old version (known limitations, debug env var, Python parity context), what to extract into rules (memory constants, JNI boundary), and what to drop entirely (changelog entries, stale benchmarks, local machine paths).

CLAUDE.md Quality Assessment

Criterion	Before (F: 25/100)	After (B+: 85/100)
Commands/workflows	5/20 — no build/test commands	18/20 — all make targets documented
Architecture clarity	8/20 — buried under "BREAKTHROUGH" headers	17/20 — directory map, API package table
Non-obvious patterns	2/15 — hard to find in 1,690 lines	13/15 — known limitations, debug env var
Conciseness	0/15 — 1,690 lines of feature announcements	14/15 — 75 lines, information-dense
Currency	5/15 — dates spanning Aug 2025 - Jan 2026, never pruned	13/15 — no stale status tracking
Actionability	5/15 — no "how to build/test/contribute"	10/15 — copy-paste ready commands

What was removed

Cumulative feature announcements with emoji headers ("REVOLUTIONARY BREAKTHROUGH", "MISSION ACCOMPLISHED")
Hardcoded test counts and coverage percentages that go stale
Azure OAuth setup guide (belongs in docs/)
Performance benchmark numbers (go stale)
References to local machine paths (file:/Users/schenksj/...)

What was added

Build & test commands (from Makefile)
Project structure directory map
Java API package reference table
Known limitations section (RangeAggregation not implemented, MillionRecordBulkRetrievalTest issue)
5 rules in .claude/rules/:
- memory-constants.md — always use Index.Memory constants, never hardcode heap sizes
- jni-boundary.md — Java handles I/O, Rust handles search/indexing
- auto-review.md — automatic code review + test coverage after every coding task
- research-before-code.md — research and propose options before writing code
- test-coverage-workflow.md — structured test planning with severity-prioritized table

Rationale

CLAUDE.md is re-read on every Claude Code turn. A 1,690-line changelog wastes context window on every message and makes it harder for Claude to find actionable information. The rewrite optimizes for "what do I need to know to work in this repo" rather than "what features have been shipped."

🤖 Generated with Claude Code

Replace 1,690-line cumulative changelog with 75-line project guide covering build/test commands, directory structure, API packages, known limitations, and architecture notes. Add 5 Claude Code rules: memory constants, JNI boundary, auto-review, research-before-code, and test coverage workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

schenksj

Looks great! Please address the conflicts and we can merge!

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

schenksj

Overall

Big improvement in signal-to-noise. The old CLAUDE.md is ~1,690 lines of cumulative "REVOLUTIONARY BREAKTHROUGH" changelog entries dated across Aug 2025 – Jan 2026 — dense with things that are either stale, duplicated in docs/, or feature announcements rather than actionable guidance. Trimming to ~75 lines is the right call, and since this file is re-read every turn the context savings compound.

The new structure (Build & Test → Project Structure → Java API Packages → Key Concepts → Known Limitations → Architecture Notes) is the right skeleton, and the rule files are a good extraction point for reusable guidance.

Verified accurate

Makefile targets — compile, test, test-cloud, test-all, package, clean, setup all exist.
Rust directory map — matches reality.
Most Java API packages — core, split, query, aggregation, batch, iceberg, delta, parquet, memory, config all exist.
Memory constants (MIN=15MB, DEFAULT=50MB, LARGE=128MB, XL=256MB) — match the codebase.
Debug env var TANTIVY4JAVA_DEBUG=1 — correct.
Package name io.indextables.tantivy4java — correct.

Issues to fix before merge

1. (Blocking) The RangeAggregation "known limitation" is factually wrong

The new CLAUDE.md says:

RangeAggregation — not implemented in the native Rust layer. Returns empty results.

This is incorrect. RangeAggregation is fully implemented:

native/src/split_searcher/aggregation_arrow_ffi.rs:412 — range_to_record_batch produces a real Arrow RecordBatch with key/doc_count/from/to columns plus sub-agg metrics, driven by BucketResult::Range { buckets } from Tantivy.
src/test/java/io/indextables/tantivy4java/RangeAggregationTest.java has hard assertions that would fail if results were empty — e.g. assertEquals(4L, rangeCounts.get("cheap")), assertEquals(3L, rangeCounts.get("budget")), total-doc-count checks, etc.

There is a real nearby limitation: extract_inner_buckets (aggregation_arrow_ffi.rs:603) returns Vec::new() for BucketResult::Range { .. }, meaning Range cannot be used as a nested inner bucket under another bucket aggregation for flattening purposes (the comment says "Terms only for now"). If this is what was meant, the correct phrasing is something like:

RangeAggregation as a nested sub-bucket — flattening Range buckets inside a parent bucket aggregation is not supported (see extract_inner_buckets in aggregation_arrow_ffi.rs). Top-level RangeAggregation works correctly.

Publishing "Returns empty results" is worse than not listing it at all — it will mislead contributors into avoiding a working path, and it directly contradicts a passing test suite.

2. (Minor) Missing packages in the Java API table

The table lists 10 packages; the actual layout also has:

result — SearchResult, Hit, aggregation results (user-facing, should be listed)
util — utilities
filter — filter APIs
split/merge — merge subpackage

examples is fine to omit. I'd add result at minimum since it's part of the user-facing API surface, and filter if it's non-trivial.

On the rules files

All five read well and are the right size (each <30 lines).

memory-constants.md — exact, grounded in a real historical incident. Good.
jni-boundary.md — captures a non-obvious decision rule. Good.
auto-review.md — This is an opt-in behavior change: every coding task will now auto-spawn a pr-review-toolkit:code-reviewer agent and run the test-coverage workflow. That's a meaningful tool/context cost on every task, not a free convention. Worth confirming the repo owner actually wants that default. The "When NOT to trigger" list is reasonable but doesn't include "small single-line fixes," which may be the highest-frequency case.
research-before-code.md — Strong rule. "NEVER start coding immediately" + "Propose 2-3 approaches with pros/cons" means a one-line helper fix turns into a research memo. Consider softening to "For non-trivial changes…" and leaning on the existing trivial-fix exception at the bottom.
test-coverage-workflow.md — The severity-prioritized table format is genuinely useful. Minor: "Review coverage once — don't write tests piecemeal" will bite in interactive debugging loops where the test set naturally evolves.

schenksj

please see comments in the conversation

schenksj requested changes Apr 5, 2026

View reviewed changes

Merge main and resolve CLAUDE.md conflict (keep rewrite)

53f80f1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tlee732 had a problem deploying to test-runners April 5, 2026 13:17 — with GitHub Actions Failure

schenksj requested changes Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite CLAUDE.md and add Claude Code rules#154

Rewrite CLAUDE.md and add Claude Code rules#154
tlee732 wants to merge 2 commits into
indextables:mainfrom
tlee732:cleanup/claude-md

tlee732 commented Apr 4, 2026 •

edited

Loading

Uh oh!

schenksj left a comment

Uh oh!

schenksj left a comment •

edited

Loading

Uh oh!

schenksj left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tlee732 commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Methodology

CLAUDE.md Quality Assessment

What was removed

What was added

Rationale

Uh oh!

schenksj left a comment

Choose a reason for hiding this comment

Uh oh!

schenksj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Overall

Verified accurate

Issues to fix before merge

1. (Blocking) The RangeAggregation "known limitation" is factually wrong

2. (Minor) Missing packages in the Java API table

On the rules files

Uh oh!

schenksj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tlee732 commented Apr 4, 2026 •

edited

Loading

schenksj left a comment •

edited

Loading