-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Pre-submission Checklist
- I have read the README.md
- I am using the latest version of Agent Docstrings
- I have searched for existing issues to see if this feature has been requested before
- This feature request is not a bug report (use Bug Report template for bugs)
Feature Category
Developer/API enhancement
Feature Summary
The main goal is not just to check that the tool works, but to deliberately find its weak spots. We need to choose files that will test:
- Correctness on real-world, non-“laboratory” code.
- Reliability when faced with complex and non-standard syntax.
- Performance on large files.
- Resilience to regressions as the code evolves.
Selection Criteria
I propose picking files according to the following four criteria:
1. Language Variety and Parser Quality
We need to test both categories of parsers:
- AST-based parsers (Python, Go): Here, the goal is to ensure correct handling of all syntactic constructs supported by the AST. We should look for files with complex function declarations, decorators, generics (in Go), etc.
- Regex-based parsers (C++, C#, JS, TS, Java, etc.): This group is critical. We need files that are likely to “break” our regex parser. We need examples where simple bracket counting and regex matching can fail.
2. Syntax Complexity and Diversity
For each language, we should find files containing:
-
Multi-line declarations: Functions or classes whose signatures span several lines.
-
Advanced language features:
- C++: Templates (
template), operator overloading, nested namespaces. - C# / Java: Generics, attributes/annotations on separate lines, nested and anonymous classes.
- JS / TS: Arrow functions,
async/await, decorators, default and named exports in the same file. - Python: Decorators, complex type annotations, functions with
*argsand**kwargs.
- C++: Templates (
-
Unusual formatting: Syntactically valid code formatted in odd ways (for example, brace on its own line, weird indentation).
-
Regex stress cases:
- Braces inside strings or comments: e.g.
var s = "{\"key\": \"value\"}";or// See function {Foo}. These are stress tests for our regex parsers.
- Braces inside strings or comments: e.g.
3. File Size and Structure
We should assemble a balanced set:
- Small files (up to 100 lines): For quick “smoke” tests.
- Medium files (300–1 000 lines): Representative of a typical project file.
- Large files (2 000+ lines): To check performance and behavior on code with many functions and classes.
- Mixed-structure files: For example, a file with multiple classes, nested classes, and top-level functions.
- Files without classes/functions: To see how the tool handles an empty file or one containing only variable declarations.
4. License and Repository Popularity
- License: It’s crucial to only take files from repositories under permissive licenses (MIT, Apache 2.0, BSD) that allow us to use their code. Avoid GPL/LGPL to prevent licensing conflicts in our codebase.
- Popularity: Prefer well-known, actively maintained projects (for example,
requests,pandas,Docker,React,VS Code). Their code is typically high-quality and reflects modern language practices.
Problem Statement
Lack of testing on real data. All tests in the project are currently generated by AI.
Proposed Solution
-
Select repositories: Identify 2–3 popular repositories for each key language group (AST-based and regex-based).
-
Pick files: Within those repos, deliberately find 3–5 files that match the complexity and edge-case criteria above.
-
Create a fixture set: Copy the selected files into a dedicated directory in our project, e.g.
tests/integration_fixtures/. IMPORTANT: prepend each copied file with a comment noting its original source URL on GitHub and its license. -
Write tests: For each fixture file, write a test that:
a. Copies the fixture into a temporary directory.
b. Runs our generator on it.
c. Compares the output to a pre-saved “golden” snapshot.- Snapshot testing is ideal here. A test only fails if the generated header changes, making it easy to catch regressions. For Python, we can use the
pytest-snapshotlibrary.
- Snapshot testing is ideal here. A test only fails if the generated header changes, making it easy to catch regressions. For Python, we can use the
Priority Level
🤔 Medium - Would be nice to have
Implementation Complexity
- 🤔 Simple - Minor change or addition
- 😑 Moderate - Requires new parsing logic
- 🛠️ Complex - Major feature requiring significant development
- 😎 I don't know
Contribution
- I would like to implement this feature myself
- I can help with testing
- I can provide sample code files for testing