feat(cli): add --max-file-size flag to index/sync/init (#369)#481
Open
eddieran wants to merge 1 commit into
Open
feat(cli): add --max-file-size flag to index/sync/init (#369)#481eddieran wants to merge 1 commit into
eddieran wants to merge 1 commit into
Conversation
) CI pipelines indexing only source code can now override the compile-time 1 MiB ceiling without patching the codebase. The previous `MAX_FILE_SIZE` constant is renamed to `DEFAULT_MAX_FILE_SIZE` and exported; both `ExtractionOrchestrator` and the public `CodeGraph.indexAll({ maxFileSize })` / `sync({ maxFileSize })` surfaces now accept an override that falls through to the default when omitted, so behaviour is unchanged for callers that don't set the flag. The CLI accepts raw byte counts (`1048576`) and human-readable sizes with binary multipliers (`500kb`, `2 MB`, `1.5GB`, `700KiB`). Both decimal and IEC suffixes resolve to the binary base (×1024) — matches `du`/`ls -lh` and the 1 MiB default the codebase has always used. Invalid values exit with a clear error rather than silently coercing. Tests cover the parser in isolation, the library plumbing (smaller limit → skip, default → keep), and the CLI flag end-to-end (invalid → exit 1, valid → file dropped from index). Closes colbymchenry#369.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #369.
CI pipelines that only want source-code symbols (and would rather skip multi-MB media/vendor blobs) currently have to patch the codebase: the 1 MiB threshold lives as a compile-time constant in
src/extraction/index.ts. This PR makes that threshold configurable per invocation, with the same default so existing workflows are unaffected.New surface
CLI — three commands grow an identical
--max-file-size <size>flag:<size>accepts:1048576500kb/500 KB2mb/2 MB1.5gb700KiB/1MiBBoth decimal and IEC suffixes resolve to the binary base (×1024). That matches what
du -h/ls -lhreport and keeps the 1 MiB default the codebase has always used. Invalid values fail fast:Library —
IndexOptions.maxFileSize?: numberis added to bothCodeGraph.indexAll(opts)andCodeGraph.sync(opts). Omitting it keeps the existing 1 MiB default.DEFAULT_MAX_FILE_SIZE(the renamedMAX_FILE_SIZE) is also exported fromsrc/extraction/index.tsfor consumers that want to clamp or report against it.Implementation notes
indexAll,sync,indexFile, andindexFileWithContentmethods now take an optionalmaxFileSizeargument with the default constant as fallback — additive, non-breaking on the existing positional-arg shape.parseFileSize()lives insrc/utils.ts(it's small and pure, and may be useful elsewhere in the future). Returnsnullon bad input so the CLI can format a clean error and exit 1 — silently coercing to the default would have been worse than failing fast for a CI use case.Test plan
New
__tests__/max-file-size.test.ts(8 tests) covers:parseFileSize— plain bytes, kb/mb/gb (with and without spaces, case-insensitive), IEC kib/mib/gib, and explicitnullfor malformed inputs (empty, negative, unknown suffix, repeated dots).CodeGraph.indexAll({ maxFileSize })— tightened limit drops a ~5 KiB file withcode: 'size_exceeded'while keeping the small one; default keeps both.--max-file-size 1kb --quietagainst the built binary drops the larger file from the index while keeping the smaller one.Full suite green:
(+8 new tests over
main; no regressions.)