[Optimization 6/n] Introduce Optimization Worker #80

kaiming-cheng · 2026-01-18T22:00:09Z

This PR introduces OptimizationWorker from opt_worker.py. The OptimizationWorker integrated class from opt_worker_components, demonstrating the end-to-end usage of the optimization pipeline.

Changes

opt_worker.py introduces OptimizationWorker - a hardware-aware optimization worker that orchestrates the full optimization pipeline:

bottleneck_analyzer.py: We also add a new class to interface with the modular components in opt_worker_component/diagnose_prompt, which wraps the Judge LLM workflow for dual-bottleneck analysis

worker_util.py extracts shared utility functions used by both VerificationWorker and OptimizationWorker:

Test

worker = OptimizationWorker(
    worker_id=0,
    workdir=workdir,
    log_dir=log_dir,
    max_rounds=5,
    openai_model="gpt-5",
    high_reasoning_effort=True,
    # Hardware-aware parameters
    gpu_name=None,  # Auto-detect GPU
    enable_ncu_profiling=True,
    bottleneck_id=1,  # Focus on primary bottleneck
    # Benchmarking parameters
    benchmark_warmup=25,
    benchmark_repeat=100,
    # Performance safeguards
    divergence_threshold=50.0,  # Revert if 50% worse
    target_platform="cuda",
)

success, best_kernel, metrics = worker.optimize_kernel(
    kernel_code=kernel_code,
    problem_file=problem_file,
    test_code=test_code,
)

2026-01-18 13:00:06,083 - opt_worker_0 - INFO - [1] Profiling current kernel with NCU...
2026-01-18 13:00:27,331 - opt_worker_0 - INFO - ✅ NCU profiling completed for round 1
2026-01-18 13:00:27,331 - opt_worker_0 - INFO - [1] Analyzing bottleneck...
2026-01-18 13:04:27,073 - opt_worker_0 - INFO - [1] Bottleneck analysis complete: primary=memory-bound
2026-01-18 13:04:27,073 - opt_worker_0 - INFO - [1] Generating optimized kernel...
2026-01-18 13:06:47,056 - opt_worker_0 - INFO - [1] Verifying correctness...
2026-01-18 13:08:34,780 - opt_worker_0 - INFO - [1] ✅ Correctness check passed
2026-01-18 13:08:38,346 - opt_worker_0 - INFO - [1] 🎉 NEW BEST! 0.2761 ms (speedup: 1.09x, improvement: 7.9%)

Consolidates previous kernel_benchmark.py and pytorch_benchmark.py into a streamlined 3-file architecture with clear separation of concerns: Architecture: - benchmark.py (299 lines): Main Benchmark class with simplified API - benchmark_kernel(): Always uses subprocess for crash protection - benchmark_pytorch(): Always uses direct mode for stable code - BenchmarkLockManager: GPU lock management for multi-worker scenarios - timing.py (437 lines): Complete timing infrastructure - Timing: time_with_cuda_events(), time_with_triton_do_bench() - Loading: prepare_pytorch_model(), load_kernel_function() - Stats: compute_timing_stats() with essential metrics (mean/std/min/max) - kernel_subprocess.py (442 lines): Subprocess runner for kernel isolation - Crash protection for potentially buggy kernels - Clean CUDA state between runs - Timeout handling Key improvements: - Eliminated string code generation (was generating Python as strings) - Removed unnecessary statistics (median, p25/p75/p95/p99) - Removed confusing use_subprocess parameter (behavior now deterministic) - Fixed dtype bug causing incorrect speedup measurements - Reduced from 5 files to 3 files with clearer naming - Code reduction: ~1,400 lines → 1,178 lines Simple API: bench = Benchmark(logger, temp_dir, lock, worker_id) pytorch_result = bench.benchmark_pytorch(problem_file) kernel_result = bench.benchmark_kernel(kernel_file, problem_file) speedup = pytorch_result['stats']['mean'] / kernel_result['time_ms']

…lity to the worker_util.py

…. remove class that can be directly imported

… auxiliary kernels

Kaiming Cheng added 30 commits January 15, 2026 11:44

NCU profiling wrapper generation and execution

07a3268

Refactor profiling components and add kernel_perf_util

3c4b124

Refactor profiling components and add kernel_perf_util

11f4e79

Refactor profiling components and add kernel_perf_util

251f419

update directory name and add package in pyproject

b789660

Remove kernel_perf_util directory

4d35d57

move gpu spec.py to future PR and fix import

d871678

Add copyright header

db0c754

fix ruff

cd29759

address previous comments

bbfa6cd

fix ruff

543453a

Introducing benchmarking infra for kernel performance

4febdd6

fix ruff

d92a7b7

fix ruff

2994315

address comments

1378fc3

Diagnose module - prompt constructor

45fec80

Refactors the diagnose_prompt module into a modular architecture

b640cde

fix diff issue

e952123

fix ruff issue

e7ba29a

fix

72ac4d1

fix ruff

e2c599e

optimization prompt

d5e6edc

add optimization orchestrator and add an API in the worker.py

054367f

fix ruff

8f7cce7

fix

45ec33d

fix

dd55d1d

fix from e2e testing

04a4891

integrating opt component into optimization worker. Add necessary uti…

f057055

…lity to the worker_util.py

refactor helper function and clean up comments

7a0c656

Kaiming Cheng added 2 commits January 18, 2026 13:10

fix missing arg of _call_llm

010c66c

move bottleneck_analyzer to its own class in the opt_worker_component…

160774a

…. remove class that can be directly imported

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 18, 2026

kaiming-cheng changed the title ~~[Optimization 6/n] Add Optimization worker~~ [Optimization 6/n] Introduce Optimization Worker Jan 18, 2026

update ncu wrapper, profiler to capture the current kernel instead of…

0535ff8

… auxiliary kernels

kaiming-cheng requested review from Jack-Khuu and Laurawly January 20, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization 6/n] Introduce Optimization Worker #80

[Optimization 6/n] Introduce Optimization Worker #80

Uh oh!

kaiming-cheng commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Optimization 6/n] Introduce Optimization Worker #80

Are you sure you want to change the base?

[Optimization 6/n] Introduce Optimization Worker #80

Uh oh!

Conversation

kaiming-cheng commented Jan 18, 2026

Changes

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants