Run CPUFJ bursts at the root#1179
Conversation
|
/ok to test 64ba333 |
📝 WalkthroughWalkthroughIntroduces CPUFJ (CPU-based feasibility-jump heuristic) task execution into the root-cut processing loop via new structured task types, refactored initialization and solver extensions supporting work-unit limits, a factory API for task creation, and OpenMP task-based orchestration within branch-and-bound with explicit lifecycle management via scope guards. ChangesCPU Feasibility-Jump Root-Cut Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (1)
1768-1775:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnforce
work_unit_limiton the actual consumed work.
work_units_elapsedis only accrued and checked every 100 iterations, and the last partial batch is never flushed before exit. For the short root-cut bursts this PR adds, that means a smallwork_unit_limitcan be overshot by a wide margin while still underreporting the final work consumed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1768 - 1775, The code only updates fj_cpu.work_units_elapsed (via fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100 iterations, causing the final partial batch to be unaccounted and allowing work_unit_limit to be overshot; modify the loop so that you flush and apply the remaining memory statistics on every iteration boundary that may exit: call auto [loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work = (loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to break—either by moving this logic out of the 100-iteration guard into the common path or by ensuring you perform one final collect/update/notify/check right before breaking/returning so the true consumed work is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 2231-2237: The callback root_cut_cpufj_improvement_callback is
decoding the host-LP assignment against the live original_lp_ (via
uncrush_primal_solution) which can be concurrently mutated by
add_cuts/remove_cuts; capture and use the exact LP snapshot used to build the
CPUFJ task (or change the task to emit user-space assignments directly) so
uncrush_primal_solution runs against an immutable snapshot, and add
synchronization (or assert/guideline comment) around shared state mutation to
flag the missing concurrency protection for original_problem_/original_lp_
before calling set_new_solution().
- Around line 2254-2256: These return paths call set_solution_at_root(...) and
return immediately, which bypasses the existing async clique cleanup; insert a
call to finish_clique_thread() immediately before each early return so the async
clique task is stopped and joined. Specifically, add finish_clique_thread() just
prior to the set_solution_at_root(solution, cut_info); return
mip_status_t::OPTIMAL; sequence shown, and make the same change at the other
analogous exit that calls set_solution_at_root and returns (the second location
referenced in the comment).
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1881-1886: The task restart fails because run_fj_cpu_task only
clears fj_cpu.halted but not fj_cpu.preemption_flag, so after stop_fj_cpu_task
sets preemption_flag = true subsequent runs immediately exit; update
run_fj_cpu_task (and the duplicate block around the other overload at the
1890–1896 region) to reset fj_cpu.preemption_flag to false before calling
cpufj_solve_loop (i.e., clear the atomic preemption flag on the fj_cpu within
the fj_cpu_task_t<i_t,f_t> instance so the while (!fj_cpu.halted &&
!fj_cpu.preemption_flag.load()) loop can run again).
---
Outside diff comments:
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1768-1775: The code only updates fj_cpu.work_units_elapsed (via
fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100
iterations, causing the final partial batch to be unaccounted and allowing
work_unit_limit to be overshot; modify the loop so that you flush and apply the
remaining memory statistics on every iteration boundary that may exit: call auto
[loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work =
(loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to
fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if
non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to
break—either by moving this logic out of the 100-iteration guard into the common
path or by ensuring you perform one final collect/update/notify/check right
before breaking/returning so the true consumed work is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 86405377-b8f0-4c1e-888b-d909dd15851b
📒 Files selected for processing (5)
cpp/src/branch_and_bound/branch_and_bound.cppcpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuhcpp/src/mip_heuristics/feasibility_jump/fj_cpu.cucpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuhcpp/src/mip_heuristics/local_search/local_search.cu
| auto root_cut_cpufj_improvement_callback = | ||
| [this](f_t obj, const std::vector<f_t>& assignment, double) { | ||
| std::vector<f_t> user_assignment; | ||
| uncrush_primal_solution(original_problem_, original_lp_, assignment, user_assignment); | ||
| settings_.log.debug("Root cut CPUFJ found solution with objective %.16e\n", obj); | ||
| set_new_solution(user_assignment); | ||
| }; |
There was a problem hiding this comment.
Uncrush against the task’s LP snapshot, not the live root LP.
assignment comes from the host-LP snapshot used to build the CPUFJ task, but this callback decodes it against original_lp_, which the concurrent cut pass is still mutating. That can remap cut/slack columns incorrectly or race with add_cuts/remove_cuts, so a valid CPUFJ incumbent can be corrupted before set_new_solution() sees it. Please either capture the exact LP snapshot used for task creation or make the CPUFJ callback emit user-space assignments directly.
As per coding guidelines, flag missing synchronization for shared state in concurrent code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2231 - 2237, The
callback root_cut_cpufj_improvement_callback is decoding the host-LP assignment
against the live original_lp_ (via uncrush_primal_solution) which can be
concurrently mutated by add_cuts/remove_cuts; capture and use the exact LP
snapshot used to build the CPUFJ task (or change the task to emit user-space
assignments directly) so uncrush_primal_solution runs against an immutable
snapshot, and add synchronization (or assert/guideline comment) around shared
state mutation to flag the missing concurrency protection for
original_problem_/original_lp_ before calling set_new_solution().
| if (num_fractional == 0) { | ||
| set_solution_at_root(solution, cut_info); | ||
| return mip_status_t::OPTIMAL; |
There was a problem hiding this comment.
Run the existing clique-thread cleanup on these new root-cut exits.
These return paths bypass finish_clique_thread(), unlike the earlier root-level exits in this function. That leaves the async clique task running past solve() on these paths.
Proposed cleanup
for (i_t cut_pass = 0; cut_pass < settings_.max_cut_passes; cut_pass++) {
if (num_fractional == 0) {
set_solution_at_root(solution, cut_info);
+ finish_clique_thread();
return mip_status_t::OPTIMAL;
}
@@
- if (cut_pass_result.action == cut_pass_action_t::RETURN) { return cut_pass_result.status; }
+ if (cut_pass_result.action == cut_pass_action_t::RETURN) {
+ finish_clique_thread();
+ return cut_pass_result.status;
+ }Also applies to: 2540-2541
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2254 - 2256,
These return paths call set_solution_at_root(...) and return immediately, which
bypasses the existing async clique cleanup; insert a call to
finish_clique_thread() immediately before each early return so the async clique
task is stopped and joined. Specifically, add finish_clique_thread() just prior
to the set_solution_at_root(solution, cut_info); return mip_status_t::OPTIMAL;
sequence shown, and make the same change at the other analogous exit that calls
set_solution_at_root and returns (the second location referenced in the
comment).
| void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit) | ||
| { | ||
| cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber"); | ||
| auto& fj_cpu = *task.fj_cpu; | ||
| fj_cpu.halted = false; | ||
| cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit); |
There was a problem hiding this comment.
Keep fj_cpu_task_t restartable across bursts.
stop_fj_cpu_task() sets preemption_flag = true, but run_fj_cpu_task() only clears halted. If the same task is reused for multiple root-cut bursts, every run after the first stop will hit while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) and return immediately.
Suggested fix
template <typename i_t, typename f_t>
void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit)
{
cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber");
auto& fj_cpu = *task.fj_cpu;
+ task.preemption_flag.store(false);
fj_cpu.halted = false;
cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit);
}
template <typename i_t, typename f_t>
void stop_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task)
{
if (task.fj_cpu) {
auto& fj_cpu = *task.fj_cpu;
- fj_cpu.preemption_flag = true;
fj_cpu.halted = true;
}
}Also applies to: 1890-1896
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1881 - 1886,
The task restart fails because run_fj_cpu_task only clears fj_cpu.halted but not
fj_cpu.preemption_flag, so after stop_fj_cpu_task sets preemption_flag = true
subsequent runs immediately exit; update run_fj_cpu_task (and the duplicate
block around the other overload at the 1890–1896 region) to reset
fj_cpu.preemption_flag to false before calling cpufj_solve_loop (i.e., clear the
atomic preemption flag on the fj_cpu within the fj_cpu_task_t<i_t,f_t> instance
so the while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) loop can run
again).
| "[RootCut CPUFJ] "); | ||
| settings_.log.debug("Root cut CPUFJ final problem build time: %.6f seconds\n", | ||
| toc(root_cut_cpufj_build_start_time)); | ||
| f_t fj_time_limit = settings_.deterministic |
There was a problem hiding this comment.
Why does undetermistic path get 1s?
| namespace cuopt::linear_programming::detail { | ||
|
|
||
| template <typename f_t> | ||
| static f_t clamp_value(f_t value, f_t lower, f_t upper) |
There was a problem hiding this comment.
I think there is std::clamp available
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| static void rebuild_reverse_matrix(i_t n_variables, |
There was a problem hiding this comment.
We have a transpose function available in sparse_matrix.cpp even though the parameters are a bit different, we should reuse it.
| cuopt::scope_guard root_cut_cpufj_guard([&]() { stop_root_cut_cpufj(); }); | ||
|
|
||
| enum class cut_pass_action_t { CONTINUE, BREAK, RETURN }; | ||
| struct cut_pass_result_t { |
There was a problem hiding this comment.
I am not sure if I understand the need/logic for this. Can't we just launch a new cpufj task after each cut pass and wait until the next pass finishes and launch another cpufj task while staying in cut pass loop ?
Some primal heuristics, like CPUFJ, may benefit in some cases from operating on the problem with root cuts.
This PR adds support for this, by running bursts of a few iterations of CPUFJ per cut pass opportunistically, on threads that would otherwise remain idle; and also once after the root cut passes are completed.
In most cases, this provides no benefits, but some instances like
tutakifind their first integer incumbent much faster. Additionally,awheais now reliably feasibilized and solved to a gap of <5% on 10min runs.Description
Issue
Checklist