Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]

### Added
- **`codegraph index --max-file-size <size>` (also on `init -i` and `sync`) lets CI override the 1 MiB skip threshold (#369).** The previous compile-time `MAX_FILE_SIZE = 1 MiB` constant is now `DEFAULT_MAX_FILE_SIZE` and falls back unchanged when no override is given — so existing workflows behave identically. The flag accepts both raw byte counts (`1048576`) and human-readable sizes with binary multipliers (`500kb`, `2 MB`, `1.5GB`, `700KiB`); both decimal and IEC suffixes resolve to the binary base (×1024) to match what `du`/`ls -lh` report and the 1 MiB default the codebase has always used. Invalid values exit with a clear error (`Invalid --max-file-size value: …`) rather than silently coercing. Library consumers get the same control via `CodeGraph.indexAll({ maxFileSize })` / `sync({ maxFileSize })`. Closes #369.
- **Enterprise Spring / MyBatis flow now traces end-to-end (#389).** Three gaps that previously forced agents back to grep on large Spring/MyBatis codebases are closed:
- **MyBatis XML mapper indexing + Java↔XML bridge.** `*.xml` files containing `<mapper namespace="...">` are now first-class: each `<select|insert|update|delete id="X">` and `<sql id="X">` becomes a method-shaped node qualified as `<namespace>::<id>`, and a new synthesizer (`mybatis-java-xml`) links the matching Java mapper interface method → its XML statement with a `calls` edge. `<include refid="...">` to a `<sql>` fragment in the same mapper also resolves. Non-mapper XML (`pom.xml`, `web.xml`, `log4j.xml`, etc.) emits just a file node — no symbol noise. Validated on macrozheng/mall-tiny: all 6 custom-SQL Java mapper methods reach their XML counterparts; `trace(UmsRoleController.listResource, UmsResourceMapper::getResourceListByRoleId-xml)` connects in 4 hops across controller → service-iface → impl → mapper-iface → XML.
- **Spring `@Value`/`@ConfigurationProperties` config-key linkage.** `application.{yml,yaml,properties}` (+ profile variants `application-dev.yml`, `bootstrap.yml`, etc.) is parsed during indexing, with one `constant` node per leaf key qualified by its dotted path (`app.cache.name.user-token`). `@Value("${app.cache.name.user-token}")` and `@ConfigurationProperties(prefix = "app.cache")` references in Java/Kotlin emit binding nodes that resolve to the matching key (or, for `@ConfigurationProperties`, a key under the prefix). Spring's **relaxed binding** applies (kebab `cache-list` ↔ camel `cacheList` ↔ snake `cache_list` ↔ `CACHE_LIST`), so a Java `@Value("${app.retryCount}")` finds `app.retry-count` in `application.properties`. `${key:default}` form is supported; the default is stripped before lookup.
Expand Down
162 changes: 162 additions & 0 deletions __tests__/max-file-size.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
/**
* Tests for the configurable max-file-size limit (#369).
*
* Three layers under test:
* 1. `parseFileSize` — pure unit conversion of human-readable sizes.
* 2. `CodeGraph.indexAll({ maxFileSize })` — the library plumbing that
* controls which files the orchestrator skips.
* 3. `codegraph index --max-file-size` — the CLI flag that surfaces it,
* driven through the built binary end-to-end so the rejection path
* (invalid suffix → exit 1) and the happy path both stay covered.
*/

import { describe, it, expect, beforeAll, beforeEach, afterEach } from 'vitest';
import { execFileSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { CodeGraph } from '../src';
import { parseFileSize } from '../src/utils';

const BIN = path.resolve(__dirname, '../dist/bin/codegraph.js');

function createTempDir(): string {
return fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-max-size-'));
}

describe('parseFileSize()', () => {
it('accepts plain byte counts', () => {
expect(parseFileSize('0')).toBe(0);
expect(parseFileSize('1024')).toBe(1024);
expect(parseFileSize('1048576')).toBe(1024 * 1024);
});

it('accepts kb/mb/gb suffixes (case-insensitive, with or without spaces)', () => {
expect(parseFileSize('1kb')).toBe(1024);
expect(parseFileSize('500KB')).toBe(500 * 1024);
expect(parseFileSize('2 mb')).toBe(2 * 1024 * 1024);
expect(parseFileSize('1.5GB')).toBe(Math.floor(1.5 * 1024 * 1024 * 1024));
});

it('accepts IEC binary suffixes (kib/mib/gib)', () => {
expect(parseFileSize('1kib')).toBe(1024);
expect(parseFileSize('1 MiB')).toBe(1024 * 1024);
expect(parseFileSize('2GiB')).toBe(2 * 1024 * 1024 * 1024);
});

it('returns null for malformed inputs', () => {
expect(parseFileSize('')).toBeNull();
expect(parseFileSize(' ')).toBeNull();
expect(parseFileSize('abc')).toBeNull();
expect(parseFileSize('-1mb')).toBeNull();
expect(parseFileSize('1.5xb')).toBeNull(); // unknown unit
expect(parseFileSize('1.2.3mb')).toBeNull();
});
});

describe('CodeGraph.indexAll({ maxFileSize })', () => {
let tempDir: string;

beforeEach(() => { tempDir = createTempDir(); });
afterEach(() => { fs.rmSync(tempDir, { recursive: true, force: true }); });

it('skips files larger than a tightened maxFileSize', async () => {
// A small file that any reasonable limit will accept...
fs.writeFileSync(path.join(tempDir, 'small.ts'), 'export const x = 1;\n');
// ...and a larger file we want the override to exclude. ~5 KiB of source
// is well below the 1 MiB default, so the only thing that can drop it is
// a custom maxFileSize.
const bigContent = `export const items = [\n${' "x",\n'.repeat(700)}];\n`;
fs.writeFileSync(path.join(tempDir, 'big.ts'), bigContent);

const cg = CodeGraph.initSync(tempDir);
const result = await cg.indexAll({ maxFileSize: 1024 }); // 1 KiB

expect(result.success).toBe(true);
const sizeSkipped = result.errors.filter((e) => e.code === 'size_exceeded');
expect(sizeSkipped.map((e) => e.filePath)).toContain('big.ts');
expect(sizeSkipped.map((e) => e.filePath)).not.toContain('small.ts');

cg.close();
});

it('falls back to the 1 MiB default when no override is supplied', async () => {
// ~5 KiB — comfortably under the default. Both files must index.
fs.writeFileSync(path.join(tempDir, 'small.ts'), 'export const x = 1;\n');
const medium = `export const items = [\n${' "x",\n'.repeat(700)}];\n`;
fs.writeFileSync(path.join(tempDir, 'medium.ts'), medium);

const cg = CodeGraph.initSync(tempDir);
const result = await cg.indexAll();

expect(result.errors.filter((e) => e.code === 'size_exceeded')).toEqual([]);
expect(result.filesIndexed).toBeGreaterThanOrEqual(2);

cg.close();
});
});

describe('codegraph index --max-file-size CLI flag', () => {
let tempDir: string;

beforeAll(() => {
if (!fs.existsSync(BIN)) {
throw new Error(`dist/ not built — run \`npm run build\` before this test. Missing: ${BIN}`);
}
});

beforeEach(() => { tempDir = createTempDir(); });
afterEach(() => { fs.rmSync(tempDir, { recursive: true, force: true }); });

it('rejects an invalid size string with a clear error and exit code 1', () => {
// Initialize so we're past the "not initialized" guard and the flag is
// exercised on its own merits.
const cg = CodeGraph.initSync(tempDir);
cg.close();

let stderr = '';
let exitCode = 0;
try {
execFileSync(process.execPath, [BIN, 'index', '--max-file-size', 'banana', '--quiet'], {
cwd: tempDir,
encoding: 'utf-8',
env: { ...process.env, CODEGRAPH_ALLOW_UNSAFE_NODE: '1' },
stdio: ['ignore', 'pipe', 'pipe'],
});
} catch (err: unknown) {
const e = err as { status?: number; stderr?: string };
exitCode = e.status ?? 0;
stderr = e.stderr ?? '';
}
expect(exitCode).toBe(1);
expect(stderr).toMatch(/Invalid --max-file-size value/);
expect(stderr).toContain('banana');
});

it('accepts a human-readable size and applies it', () => {
fs.writeFileSync(path.join(tempDir, 'small.ts'), 'export const x = 1;\n');
// ~5 KiB file, deliberately above our 1 KiB CLI override.
const big = `export const items = [\n${' "x",\n'.repeat(700)}];\n`;
fs.writeFileSync(path.join(tempDir, 'big.ts'), big);

const cg = CodeGraph.initSync(tempDir);
cg.close();

execFileSync(process.execPath, [BIN, 'index', '--max-file-size', '1kb', '--quiet'], {
cwd: tempDir,
encoding: 'utf-8',
env: { ...process.env, CODEGRAPH_ALLOW_UNSAFE_NODE: '1' },
stdio: ['ignore', 'ignore', 'ignore'],
});

// Re-open the now-built index and assert big.ts dropped while small.ts kept.
const reopened = CodeGraph.openSync(tempDir);
try {
const files = reopened.getFiles().map((f) => f.path);
expect(files).toContain('small.ts');
expect(files).not.toContain('big.ts');
} finally {
reopened.close();
}
});
});
41 changes: 36 additions & 5 deletions src/bin/codegraph.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import { getGlyphs } from '../ui/glyphs';

import { buildNode25BlockBanner, buildNodeTooOldBanner, MIN_NODE_MAJOR } from './node-version-check';
import { relaunchWithWasmRuntimeFlagsIfNeeded } from '../extraction/wasm-runtime-flags';
import { parseFileSize } from '../utils';

// Lazy-load heavy modules (CodeGraph, runInstaller) to keep CLI startup fast.
async function loadCodeGraph(): Promise<typeof import('../index')> {
Expand Down Expand Up @@ -407,6 +408,25 @@ function writeErrorLog(projectPath: string, errors: Array<{ message: string; fil
fs.writeFileSync(logPath, lines.join('\n') + '\n');
}

/**
* Translate the `--max-file-size <size>` CLI option into a byte count for
* `IndexOptions.maxFileSize`. Empty / undefined returns `undefined` so the
* library default (1 MiB) applies. Invalid sizes exit the process with a
* clear error rather than silently coercing to the default.
*/
function resolveMaxFileSize(input: string | undefined): number | undefined {
if (input === undefined || input === '') return undefined;
const bytes = parseFileSize(input);
if (bytes === null) {
error(
`Invalid --max-file-size value: "${input}". ` +
`Use a non-negative number with an optional unit suffix (e.g. "500kb", "2mb", "1.5gb", "1048576").`,
);
process.exit(1);
}
return bytes;
}

// =============================================================================
// Commands
// =============================================================================
Expand All @@ -419,7 +439,8 @@ program
.description('Initialize CodeGraph in a project directory')
.option('-i, --index', 'Run initial indexing after initialization')
.option('-v, --verbose', 'Show detailed worker lifecycle and memory info')
.action(async (pathArg: string | undefined, options: { index?: boolean; verbose?: boolean }) => {
.option('--max-file-size <size>', 'Skip files larger than this (e.g. "500kb", "2mb", "1.5gb", or bytes). Default: 1mb')
.action(async (pathArg: string | undefined, options: { index?: boolean; verbose?: boolean; maxFileSize?: string }) => {
const projectPath = path.resolve(pathArg || process.cwd());
const clack = await importESM('@clack/prompts');

Expand Down Expand Up @@ -466,17 +487,20 @@ program

if (options.index) {
let result: IndexResult;
const maxFileSize = resolveMaxFileSize(options.maxFileSize);

if (options.verbose) {
result = await cg.indexAll({
onProgress: createVerboseProgress(),
verbose: true,
maxFileSize,
});
} else {
process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
const progress = createShimmerProgress();
result = await cg.indexAll({
onProgress: progress.onProgress,
maxFileSize,
});
await progress.stop();
}
Expand Down Expand Up @@ -562,7 +586,8 @@ program
.option('-f, --force', 'Force full re-index even if already indexed')
.option('-q, --quiet', 'Suppress progress output')
.option('-v, --verbose', 'Show detailed worker lifecycle and memory info')
.action(async (pathArg: string | undefined, options: { force?: boolean; quiet?: boolean; verbose?: boolean }) => {
.option('--max-file-size <size>', 'Skip files larger than this (e.g. "500kb", "2mb", "1.5gb", or bytes). Default: 1mb')
.action(async (pathArg: string | undefined, options: { force?: boolean; quiet?: boolean; verbose?: boolean; maxFileSize?: string }) => {
const projectPath = resolveProjectPath(pathArg);

try {
Expand All @@ -574,11 +599,12 @@ program

const { default: CodeGraph } = await loadCodeGraph();
const cg = await CodeGraph.open(projectPath);
const maxFileSize = resolveMaxFileSize(options.maxFileSize);

if (options.quiet) {
// Quiet mode: no UI, just run
if (options.force) cg.clear();
const result = await cg.indexAll();
const result = await cg.indexAll({ maxFileSize });
if (!result.success) process.exit(1);
cg.destroy();
return;
Expand All @@ -598,12 +624,14 @@ program
result = await cg.indexAll({
onProgress: createVerboseProgress(),
verbose: true,
maxFileSize,
});
} else {
process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
const progress = createShimmerProgress();
result = await cg.indexAll({
onProgress: progress.onProgress,
maxFileSize,
});
await progress.stop();
}
Expand All @@ -629,7 +657,8 @@ program
.command('sync [path]')
.description('Sync changes since last index')
.option('-q, --quiet', 'Suppress output (for git hooks)')
.action(async (pathArg: string | undefined, options: { quiet?: boolean }) => {
.option('--max-file-size <size>', 'Skip files larger than this (e.g. "500kb", "2mb", "1.5gb", or bytes). Default: 1mb')
.action(async (pathArg: string | undefined, options: { quiet?: boolean; maxFileSize?: string }) => {
const projectPath = resolveProjectPath(pathArg);

try {
Expand All @@ -642,9 +671,10 @@ program

const { default: CodeGraph } = await loadCodeGraph();
const cg = await CodeGraph.open(projectPath);
const maxFileSize = resolveMaxFileSize(options.maxFileSize);

if (options.quiet) {
await cg.sync();
await cg.sync({ maxFileSize });
cg.destroy();
return;
}
Expand All @@ -657,6 +687,7 @@ program

const result = await cg.sync({
onProgress: progress.onProgress,
maxFileSize,
});

await progress.stop();
Expand Down
Loading