-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat: Expose configurable indexing parameters for codebase indexing #10397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…r codebase indexing Adds three new configurable parameters for codebase indexing: - Embedding Batch Size (10-200, default 60): Number of code segments batched for embeddings - Max Chunk Size (200-5000 chars, default 1000): Maximum characters per code chunk - Parsing Concurrency (1-50, default 10): Number of concurrent file parsing operations Changes: - Added new config fields in packages/types/src/codebase-index.ts - Updated constants in src/services/code-index/constants/index.ts - Updated config-manager.ts to load and expose new settings - Updated DirectoryScanner to use configurable parsing concurrency - Added UI sliders in CodeIndexPopover.tsx for the new settings - Added i18n translations for new settings labels Closes #10396
Reviewed the latest indentation fix. All previous issues remain resolved and no new issues found.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
| } | ||
| return new DirectoryScanner(embedder, vectorStore, parser, this.cacheManager, ignoreInstance, batchSize) | ||
| // Get the configurable settings from config manager | ||
| const config = this.configManager.getConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maxChunkSize configuration is retrieved via config.maxChunkSize but never actually used. The CodeParser class in processors/parser.ts is a singleton that still uses the hardcoded MAX_BLOCK_CHARS constant directly. Unlike embeddingBatchSize and parsingConcurrency which are correctly passed to DirectoryScanner, the chunk size setting will have no effect on parsing behavior because the parser doesn't receive this configuration.
To fix this, the parser would need to either accept a maxChunkSize parameter (similar to how DirectoryScanner accepts batchSegmentThreshold), or read from the config at runtime.
Fix it with Roo Code or mention @roomote and request a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote request fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
Fixed the reported issue. All local checks passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
Fixed the reported issues. Added 9 missing translation keys to all 17 locale files. All local checks passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote The translation file is currently missing, please fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
The translation files were already added in commit d96f9f4. Verified that all 9 translation keys are present in all 18 locales and all tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote Fix it to ensure it passes Code QA Roo Code / platform-unit-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
Fixed the indentation issue in parser.ts. All local checks passed.
Fixed the reported issue. The |
- Modified CodeParser to accept maxBlockChars as constructor parameter - Updated FileWatcher to receive ICodeParser via constructor injection - Updated service-factory to create CodeParser with configured maxChunkSize - This ensures the codebaseIndexMaxChunkSize setting actually affects parsing
Add 9 missing translation keys to settings.json for all 17 locales: - embeddingBatchSizeLabel, embeddingBatchSizeDescription, embeddingBatchSizeResetTooltip - maxChunkSizeLabel, maxChunkSizeDescription, maxChunkSizeResetTooltip - parsingConcurrencyLabel, parsingConcurrencyDescription, parsingConcurrencyResetTooltip
Summary
This PR attempts to address Issue #10396 by exposing configurable chunking and batch processing parameters for codebase indexing.
Changes
Adds three new configurable parameters for codebase indexing:
Embedding Batch Size (10-200, default 60): Number of code segments batched together for embeddings. Higher values can speed up indexing on powerful hardware. Lower values reduce memory usage.
Max Chunk Size (200-5000 chars, default 1000): Maximum characters per code chunk. Larger chunks provide more context but may reduce search precision. Smaller chunks enable finer-grained search results.
Parsing Concurrency (1-50, default 10): Number of files to parse concurrently during indexing. Higher values speed up indexing but use more CPU and memory.
Implementation Details
packages/types/src/codebase-index.tswith Zod validationsrc/services/code-index/constants/index.tsto use defaults from the types packageconfig-manager.tsto load and expose new settings with gettersDirectoryScannerto use configurable parsing concurrencyCodeIndexPopover.tsxfor the new settings in the Advanced Settings sectionTesting
Feedback and guidance are welcome!