This document outlines planned features, improvements, and milestones for the code2llm project with LLM integration focus.
✅ Completed:
- Core analysis engine with caching and parallel processing
- NLP Processing Pipeline (Query Normalization, Intent Matching, Entity Resolution)
- Multilingual support (EN/PL)
- Comprehensive test suite
- CLI with multiple output formats
- PyPI publication ready
- TOON v2 Format — health-first diagnostics (
analysis.toon) - Format Taxonomy (v0.3.0) — 4 purpose-built output formats:
project.map— structural map (modules, imports, signatures, types)analysis.toon— health diagnostics (HEALTH, REFACTOR, COUPLING, LAYERS)flow.toon— data-flow analysis (PIPELINES, TRANSFORMS, CONTRACTS, DATA_TYPES)context.md— LLM narrative (architecture, patterns, API surface)- CLI:
--format map,toon,flow,context,all
- AST-based type inference + side-effect detection (v0.3.1):
TypeInferenceEngine— parses return annotations, argument types, name-based fallbackSideEffectDetector— AST scan for IO, cache, mutation, pure classification- Enhanced CONTRACTS: IN/OUT types, SIDE-EFFECT, INVARIANT, SMELL markers
- Enhanced DATA_TYPES: source counts, hub-type split recommendations
- networkx-based pipeline detection (v0.3.2):
PipelineDetector— DiGraph call graph, longest-path detection, cycle-safe- Domain classification: NLP, Analysis, Export, Refactor, Core, IO
- Entry/exit labeling, purity aggregation, bottleneck identification
- Format quality benchmark + rename (v0.3.3):
benchmark_format_quality.py— ground-truth project, 8 problems, 4-axis scoring- 24 format quality tests (
test_format_quality.py) llm_exporter→context_exporterrename with backward compat
- Rename + structural cleanup (v0.4.0):
code2flow→code2llm— full package rename (folder, imports, CLI, docs)- CLI: all 7 exporters connected (Toon, Map, Flow, Context, YAML, JSON, Mermaid)
- Removed dead code:
optimization/(1590L),visualizers/(150L) - Moved root-level generators to
generators/subpackage - Renamed sprint-based tests to feature-based names
- Updated all documentation references
- Bug fixes + EvolutionExporter (v0.5.0):
- Fixed MermaidExporter: 3 distinct outputs (flow.mmd, calls.mmd, compact_flow.mmd)
- Fixed SideEffectDetector:
dict.get()false positive as IO - Fixed coupling matrix: candidate-based callee disambiguation
- Fixed pipeline detection: safe ambiguous callee handling
- New
EvolutionExporter→evolution.toon— ranked refactoring queue - CLI:
--format evolution(8 total output formats)
Status: Not Started | Priority: High | Effort: Medium
- Integrate sentence transformers for semantic embeddings
- Build vector index of codebase for similarity search
- Add
semantic_search()method to ProjectAnalyzer - Support queries like "find code that handles authentication"
Technical Notes:
- Use
sentence-transformerslibrary - Store embeddings in cache alongside AST
- Consider HNSW for fast approximate search
Status: Not Started | Priority: High | Effort: Medium
- Factory pattern detection
- Singleton pattern detection
- Observer pattern detection
- Strategy pattern detection
- Template method pattern detection
Implementation:
class PatternDetector:
def detect_factory(self, classes: List[ClassInfo]) -> List[Pattern]
def detect_singleton(self, classes: List[ClassInfo]) -> List[Pattern]
def detect_observer(self, classes: List[ClassInfo]) -> List[Pattern]Status: Not Started | Priority: High | Effort: Large
- Streamlit-based web interface
- Upload and analyze projects via browser
- Interactive graph visualization (D3.js/Plotly)
- Natural language query interface
- Export to PNG/SVG/PDF
Components:
- File upload with drag-and-drop
- Project browser with tree view
- Search interface with filters
- Graph visualization with zoom/pan
Status: Not Started | Priority: Medium | Effort: Large
- Extension manifest and activation
- Sidebar panel for project structure
- Code lens for function call graphs
- Hover information with call statistics
- Command palette integration
- Settings synchronization
Features:
- Right-click "Show Call Graph"
- Inline hints for recursive functions
- Status bar with project metrics
Status: Not Started | Priority: Medium | Effort: Medium
- File watching with watchdog
- Incremental analysis (only changed files)
- Background analysis daemon
- WebSocket updates for UI
Performance Targets:
- < 100ms for incremental updates
- < 5s for full project re-analysis
Status: Not Started | Priority: Medium | Effort: Medium
- Diff analysis between commits
- Show changed functions/classes
- Impact analysis for PRs
- Code churn visualization
- Contributor statistics
Status: Not Started | Priority: Medium | Effort: Large
- JS/TS AST parsing with Babel
- CommonJS and ES module resolution
- TypeScript type information extraction
- JSX/TSX component analysis
Architecture:
class LanguageAnalyzer(ABC):
@abstractmethod
def parse_file(self, path: str) -> AST
@abstractmethod
def extract_functions(self, ast: AST) -> List[FunctionInfo]Status: Not Started | Priority: Low | Effort: Large
- Go AST parsing
- Goroutine and channel analysis
- Interface implementation tracking
- Package dependency analysis
Status: Not Started | Priority: Low | Effort: Large
- Rust AST parsing
- Trait analysis
- Lifetime visualization
- Macro expansion tracking
Status: Not Started | Priority: Low | Effort: Medium
- Detect common vulnerabilities
- Hardcoded secret detection
- SQL injection pattern detection
- XSS vulnerability patterns
- Integration with CVE databases
Status: Not Started | Priority: Low | Effort: Medium
- Import cProfile/py-spy results
- Annotate call graph with timing
- Hot path identification
- Memory usage visualization
Status: Not Started | Priority: Low | Effort: Large
- YAML-based pattern definitions
- Pattern marketplace/registry
- User-contributed patterns
- Pattern testing framework
Example Pattern Definition:
name: database_repository
pattern:
class:
inherits: BaseRepository
methods:
- name: get_
return_type: Model
- name: save_
args: [Model]Status: Not Started | Priority: High | Effort: Medium
- Stable public API guarantee
- Backward compatibility policy
- Deprecation warnings
- Migration guides
Status: Not Started | Priority: Low | Effort: Large
- SAML/SSO authentication
- Role-based access control
- Audit logging
- On-premises deployment
- SCIM user provisioning
Status: Not Started | Priority: Medium | Effort: Ongoing
- Video tutorial series
- Interactive documentation
- Community Discord/Slack
- Monthly community calls
- Case study library
- Achieve 90%+ test coverage
- Add property-based testing (Hypothesis)
- Implement fuzzing for parsers
- Add mutation testing
- Profile and optimize hot paths
- Implement LRU cache for analysis results
- Add memory usage benchmarks
- Optimize graph layout algorithms
- API reference with auto-generation
- Architecture decision records (ADRs)
- Contributing guidelines
- Code of conduct
- Add more language stopwords to NLP config
- Improve error messages in CLI
- Add more output format examples
- Create tutorial notebooks
- Implement new pattern detector
- Add support for additional export formats
- Improve parallel processing stability
- Add fuzzy matching algorithms
- Implement semantic search
- Add new language support
- Build web UI components
- Create IDE plugins
-
LLM Integration
- Code explanation generation
- Automatic refactoring suggestions
- Documentation generation
-
Machine Learning
- Bug prediction models
- Code smell detection
- Performance bottleneck prediction
-
Visualization
- 3D code structure visualization
- VR/AR code exploration
- Haptic feedback for code review
-
Collaboration
- Real-time collaborative analysis
- Comment and annotation system
- Code review integration
| Version | Target Date | Focus | Status |
|---|---|---|---|
| v0.2.5 | Mar 2026 | TOON v2 format implementation | ✅ Done |
| v0.3.0 | Mar 2026 | Format taxonomy (map, toon, flow, context) | ✅ Done |
| v0.3.1 | Mar 2026 | CONTRACTS + DATA_TYPES enhancement (AST type inference, side-effect detection) | ✅ Done |
| v0.3.2 | Mar 2026 | networkx pipeline detection, domain grouping, entry/exit labeling | ✅ Done |
| v0.3.3 | Mar 2026 | Format quality benchmark, llm_exporter → context_exporter rename | ✅ Done |
| v0.4.0 | Mar 2026 | Rename code2flow → code2llm, structural cleanup, dead code removal | ✅ Done |
| v0.5.0 | Mar 2026 | Bug fixes, EvolutionExporter, format quality | ✅ Done |
| v0.5.1 | Mar 2026 | Structural refactoring (9 function splits, CC̄ 5.1→4.8), examples, auto-benchmark | ✅ Done |
| v0.6.0 | Q3 2026 | IDE integration, semantic code search | 📋 Planned |
| v0.7.0 | Q4 2026 | JS/TS support | 📋 Planned |
| v0.8.0 | Q1 2027 | Enterprise features | 📋 Planned |
| v1.0.0 | Q2 2027 | Stable API, mature platform | 📋 Planned |
- Pick an issue from the roadmap
- Discuss approach in GitHub issue
- Implement with tests
- Submit PR with documentation
- Iterate based on review
Have ideas or suggestions? Open a GitHub issue with the roadmap label.