Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Dec 30, 2025

Summary

This PR attempts to address Issue #10396 by exposing configurable chunking and batch processing parameters for codebase indexing.

Changes

Adds three new configurable parameters for codebase indexing:

  • Embedding Batch Size (10-200, default 60): Number of code segments batched together for embeddings. Higher values can speed up indexing on powerful hardware. Lower values reduce memory usage.

  • Max Chunk Size (200-5000 chars, default 1000): Maximum characters per code chunk. Larger chunks provide more context but may reduce search precision. Smaller chunks enable finer-grained search results.

  • Parsing Concurrency (1-50, default 10): Number of files to parse concurrently during indexing. Higher values speed up indexing but use more CPU and memory.

Implementation Details

  • Added new config fields in packages/types/src/codebase-index.ts with Zod validation
  • Updated constants in src/services/code-index/constants/index.ts to use defaults from the types package
  • Updated config-manager.ts to load and expose new settings with getters
  • Updated DirectoryScanner to use configurable parsing concurrency
  • Added UI sliders in CodeIndexPopover.tsx for the new settings in the Advanced Settings section
  • Added i18n translations for new settings labels

Testing

  • All existing tests pass
  • Linting passes with no warnings

Feedback and guidance are welcome!

…r codebase indexing

Adds three new configurable parameters for codebase indexing:
- Embedding Batch Size (10-200, default 60): Number of code segments batched for embeddings
- Max Chunk Size (200-5000 chars, default 1000): Maximum characters per code chunk
- Parsing Concurrency (1-50, default 10): Number of concurrent file parsing operations

Changes:
- Added new config fields in packages/types/src/codebase-index.ts
- Updated constants in src/services/code-index/constants/index.ts
- Updated config-manager.ts to load and expose new settings
- Updated DirectoryScanner to use configurable parsing concurrency
- Added UI sliders in CodeIndexPopover.tsx for the new settings
- Added i18n translations for new settings labels

Closes #10396
@roomote
Copy link
Contributor Author

roomote bot commented Dec 30, 2025

Rooviewer Clock   See task on Roo Cloud

Reviewed the latest indentation fix. All previous issues remain resolved and no new issues found.

  • maxChunkSize configuration not applied to parser - Fixed. The CodeParser class now accepts a maxBlockChars parameter via constructor, the factory creates instances with config.maxChunkSize, and FileWatcher uses the injected parser instead of the singleton.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

}
return new DirectoryScanner(embedder, vectorStore, parser, this.cacheManager, ignoreInstance, batchSize)
// Get the configurable settings from config manager
const config = this.configManager.getConfig()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maxChunkSize configuration is retrieved via config.maxChunkSize but never actually used. The CodeParser class in processors/parser.ts is a singleton that still uses the hardcoded MAX_BLOCK_CHARS constant directly. Unlike embeddingBatchSize and parsingConcurrency which are correctly passed to DirectoryScanner, the chunk size setting will have no effect on parsing behavior because the parser doesn't receive this configuration.

To fix this, the parser would need to either accept a maxChunkSize parameter (similar to how DirectoryScanner accepts batchSegmentThreshold), or read from the config at runtime.

Fix it with Roo Code or mention @roomote and request a fix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote request fix

Copy link
Contributor Author

@roomote roomote bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

Fixed the reported issue. All local checks passed.

View commit | Revert commit

Copy link
Contributor Author

@roomote roomote bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

Fixed the reported issues. Added 9 missing translation keys to all 17 locale files. All local checks passed.

View commit | Revert commit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote The translation file is currently missing, please fix it

Copy link
Contributor Author

@roomote roomote bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

The translation files were already added in commit d96f9f4. Verified that all 9 translation keys are present in all 18 locales and all tests pass.

View commit | Revert commit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote Fix it to ensure it passes Code QA Roo Code / platform-unit-test

Copy link
Contributor Author

@roomote roomote bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

Fixed the indentation issue in parser.ts. All local checks passed.

View commit | Revert commit

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 30, 2025
@roomote
Copy link
Contributor Author

roomote bot commented Dec 30, 2025

Fixaroo Clock   See task on Roo Cloud

Fixed the reported issue. The maxChunkSize configuration is now properly applied to CodeParser. All local checks passed.

View commit | Revert commit

- Modified CodeParser to accept maxBlockChars as constructor parameter
- Updated FileWatcher to receive ICodeParser via constructor injection
- Updated service-factory to create CodeParser with configured maxChunkSize
- This ensures the codebaseIndexMaxChunkSize setting actually affects parsing
Add 9 missing translation keys to settings.json for all 17 locales:
- embeddingBatchSizeLabel, embeddingBatchSizeDescription, embeddingBatchSizeResetTooltip
- maxChunkSizeLabel, maxChunkSizeDescription, maxChunkSizeResetTooltip
- parsingConcurrencyLabel, parsingConcurrencyDescription, parsingConcurrencyResetTooltip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels.

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

4 participants