This demo showcases a complete autonomous code refactoring workflow using Claude Code with comprehensive validation and iterative improvement.
code2llm → Claude Code → vallm → Runtime Tests → code2llm (loop)
↓ ↓ ↓ ↓ ↓
Analysis Refactoring Validation Functionality Re-analysis
- code2llm Analysis - Deep structural analysis and code smell detection
- Claude Code - Advanced refactoring with architectural improvements
- vallm Validation - Multi-criteria quality assessment (syntax, security, performance, semantics)
- Runtime Tests - Automated functionality validation with pytest
- code2llm Re-analysis - Verify improvements and detect remaining issues
- Iterative Loop - Continue until perfect code achieved (max 5 iterations)
- Code Structure: Functions, classes, modules, entry points
- Code Smells: Complexity, duplication, dead code, coupling issues
- Security: Vulnerabilities, hardcoded credentials, injection risks
- Performance: Inefficient algorithms, memory usage patterns
- Architecture: SOLID violations, tight coupling, separation of concerns
- Advanced Reasoning: Superior code understanding and refactoring
- Architectural Patterns: Design patterns, clean architecture principles
- Security Focus: Proactive vulnerability detection and fixing
- Performance Optimization: Algorithmic improvements and best practices
- vallm: Syntax, imports, complexity, security, semantics, performance
- Runtime Tests: Functional validation with auto-generated test suites
- Quality Metrics: Code quality scoring and improvement tracking
- Feedback Integration: Validation failures guide next iteration
- Progressive Improvement: Each iteration builds on previous results
- Best Version Tracking: Automatically saves the best refactored version
- Termination Criteria: Stops when perfect code achieved or max iterations reached
- Docker and Docker Compose
- ANTHROPIC_API_KEY environment variable
- Local Ollama (optional, for comparison)
# Set your Anthropic API key
export ANTHROPIC_API_KEY='your-anthropic-api-key'
# Run the demo
./run.sh11_claude_code_autonomous/
├── claude_autonomous_demo.py # Main autonomous workflow
├── legacy_code/
│ └── data_processor.py # Legacy code with multiple issues
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── docker-entrypoint.sh # Container startup script
├── run.sh # Demo runner script
└── README.md # This file
The demo refactors a complex data processing system with:
- SQL Injection: Direct string concatenation in queries
- Command Injection: Unsafe subprocess calls
- Pickle Deserialization: Unsafe object serialization
- Hardcoded Credentials: API keys in source code
- eval() Usage: Dangerous code execution
- SRP Violations: DataProcessor has too many responsibilities
- Tight Coupling: Direct access to internal state
- Global Variables: Shared state management issues
- No Error Handling: Missing exception management
- Deep Nesting: Complex conditional structures
- Duplicate Code: Repeated validation functions
- Dead Code: Unused functions and variables
- Long Parameter Lists: Maintainability issues
- Magic Numbers: Hardcoded values
- Inefficient Algorithms: O(n²) nested loops
- Memory Leaks: Unbounded cache growth
- Poor Data Structures: Suboptimal choices
- Syntax: Python syntax correctness
- Imports: Module import validation
- Complexity: Cyclomatic complexity analysis
- Security: Vulnerability detection
- Semantics: Code meaning and intent
- Performance: Efficiency analysis
- Functionality: Behavioral correctness
- Edge Cases: Boundary condition testing
- Error Handling: Exception management
- Integration: Component interaction
- Code Smells: Reduction in detected issues
- Coupling: Improved module independence
- Cohesion: Better relatedness of code elements
- Maintainability: Enhanced readability and structure
After autonomous refactoring, the code should have:
- Parameterized queries to prevent SQL injection
- Safe subprocess usage with proper validation
- Secure serialization methods
- Environment-based configuration
- Input sanitization and validation
- Single Responsibility Principle compliance
- Dependency injection implementation
- Proper abstraction layers
- Interface segregation
- Loose coupling design
- Eliminated code smells
- Reduced complexity
- Improved naming conventions
- Better error handling
- Comprehensive documentation
- Optimized algorithms
- Efficient data structures
- Proper memory management
- Reduced computational complexity
The demo generates several output files:
- best_refactored.py - Final optimized version
- refactored_v{1-5}.py - Iteration snapshots
- claude_autonomous.log - Detailed workflow logs
- claude-autonomous-output.log - Console output
- test_*.py - Auto-generated test files
Each iteration includes:
- Analysis: code2llm examines current state
- Refactoring: Claude Code generates improvements
- Validation: vallm assesses code quality
- Testing: Runtime validation of functionality
- Scoring: Overall quality assessment
- Decision: Continue if not perfect
The loop continues until:
- All validation criteria pass ✅
- All runtime tests pass ✅
- No code smells detected ✅
- Maximum iterations reached (5)
⚠️
- Validation errors guide specific fixes
- Test failures inform corrections
- Code analysis targets remaining issues
- Progressive improvement strategy
- Iteration-by-iteration scoring
- Best version automatic selection
- Improvement metrics visualization
- Comparative analysis
- Maximum iterations limit
- Validation weight adjustments
- Test coverage requirements
- Quality thresholds
This autonomous workflow is ideal for:
- Legacy Modernization: Updating old codebases
- Security Hardening: Vulnerability remediation
- Performance Optimization: Algorithmic improvements
- Code Quality: Maintaining high standards
- Technical Debt: Systematic reduction
Running this demo demonstrates:
- Autonomous Refactoring: AI-driven code improvement
- Multi-Tool Integration: Combining specialized tools
- Quality Assurance: Comprehensive validation strategies
- Iterative Development: Progressive enhancement approach
- Best Practices: Modern software engineering principles
Potential improvements to the workflow:
- Multi-Model Support: Integration with other LLMs
- Custom Validation: Domain-specific rules
- Performance Benchmarks: Quantitative measurements
- Code Review Integration: Team collaboration features
- CI/CD Pipeline: Automated deployment integration
For questions or issues with this demo:
- Check the generated log files
- Verify ANTHROPIC_API_KEY is set correctly
- Ensure Docker is running properly
- Review the troubleshooting section in logs
Enjoy the autonomous refactoring experience! 🚀