- Build classifier for common failure types: - Over-optimization (slowdown) - Correctness regression - Missed semantic opportunity - Added complexity without gain - Numerical instability - Race conditions (in parallel patterns) - Generate automated failure reports and statistics per category/model.