Skip to content

Add Configurable Distractor / Irrelevant Code Generation to Variant Generator #6

@wlu03

Description

@wlu03

Goal

Enhance the variant generator to inject controlled amounts of irrelevant/distractor code around the core performance pattern. This will test the LLM’s ability to identify and localize the actual bottleneck in a noisy, more realistic codebase — a critical real-world skill. Currently, patterns are relatively clean. Adding distractors will make the benchmark significantly harder and more representative of repository-level optimization tasks.

Requirements

  • Add a new parameter noise_level (none | low | medium | high) to the variant generation system.
  • Support both single-file and multi-file distractor injection.
  • Distractors must not affect functional correctness or the ground-truth performance measurement of the target hotspot.
  • All variants remain fully reproducible via seeds.

Types of Distractors to Include

  • Dead / unused functions and variables
  • Boilerplate code (argument parsing, config loading, logging, error handling)
  • Semantically similar but low-impact code (e.g., another loop on small data)
  • Unrelated helper classes or utilities
  • Red herring functions that look optimizable but have negligible runtime impact
  • Comments, documentation, and unused imports
  • For multi-file: spread distractors across headers, utils, and main files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions