**Problem**: No proper way to measure experiments. **Task**: Build a small web UI for validation. **What to implement** - Show candidate pairs - Ask human mark: correct / wrong - Export gold standard - Compute precision / recall / F1 (if there is any gold)
Problem: No proper way to measure experiments.
Task: Build a small web UI for validation.
What to implement