-
Notifications
You must be signed in to change notification settings - Fork 115
Batch PDLP #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Batch PDLP #791
Conversation
…ther finish with optimal
…and change initial to 128
📝 WalkthroughWalkthroughThis PR introduces batch processing capabilities to PDLP solvers and refactors hyper-parameters from static globals to runtime-configurable structures. Key changes include multi-climber support, new batch-aware data structures, integration with dual simplex branching, and extended termination/convergence tracking for multiple solutions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cpp/src/linear_programming/saddle_point.cu (1)
95-106:copy()only copies one batch.With batched buffers, copying only
primal_size_/dual_size_leaves the rest uninitialized. Update to copy the full buffer and assert batch compatibility.✅ Suggested fix
EXE_CUOPT_EXPECTS(this->primal_size_ == other.get_primal_size(), "Size of primal solution must be the same in order to copy"); EXE_CUOPT_EXPECTS(this->dual_size_ == other.get_dual_size(), "Size of dual solution must be the same in order to copy"); + EXE_CUOPT_EXPECTS(this->primal_solution_.size() == other.get_primal_solution().size(), + "Batch size mismatch for primal solution"); + EXE_CUOPT_EXPECTS(this->dual_solution_.size() == other.get_dual_solution().size(), + "Batch size mismatch for dual solution"); - raft::copy( - this->primal_solution_.data(), other.get_primal_solution().data(), this->primal_size_, stream); - raft::copy( - this->dual_solution_.data(), other.get_dual_solution().data(), this->dual_size_, stream); + raft::copy(this->primal_solution_.data(), + other.get_primal_solution().data(), + this->primal_solution_.size(), + stream); + raft::copy(this->dual_solution_.data(), + other.get_dual_solution().data(), + this->dual_solution_.size(), + stream);cpp/src/linear_programming/solver_solution.cu (1)
25-33: Initializetermination_stats_in single-solution constructors.These constructors now set
termination_status_but leavetermination_stats_empty; any call toget_solve_time()or per-id accessors will read out-of-bounds. Initializetermination_stats_with a single default entry so the non-batch invariants hold.🐛 Proposed fix
optimization_problem_solution_t<i_t, f_t>::optimization_problem_solution_t( pdlp_termination_status_t termination_status, rmm::cuda_stream_view stream_view) : primal_solution_{0, stream_view}, dual_solution_{0, stream_view}, reduced_cost_{0, stream_view}, + termination_stats_(1), termination_status_{termination_status}, error_status_(cuopt::logic_error("", cuopt::error_type_t::Success)) { } optimization_problem_solution_t<i_t, f_t>::optimization_problem_solution_t( cuopt::logic_error error_status_, rmm::cuda_stream_view stream_view) : primal_solution_{0, stream_view}, dual_solution_{0, stream_view}, reduced_cost_{0, stream_view}, + termination_stats_(1), termination_status_{pdlp_termination_status_t::NoTermination}, error_status_(error_status_) { }Also applies to: 37-44
cpp/src/linear_programming/termination_strategy/termination_strategy.cu (1)
318-352: Guardper_constraint_residualin batch (or store per‑climber l∞ residuals).The per‑constraint path uses scalar
*relative_l_inf_primal_residual/*relative_l_inf_dual_residualfor everyidx. In batch mode, climbers can diverge, so this can incorrectly mark all climbers as optimal/feasible based on a single residual. Consider disallowingper_constraint_residualwhenbatch_size > 1, or store per‑climber l∞ residuals and index byidx.🐛 Proposed guard
void pdlp_termination_strategy_t<i_t, f_t>::evaluate_termination_criteria( pdhg_solver_t<i_t, f_t>& current_pdhg_solver, rmm::device_uvector<f_t>& primal_iterate, rmm::device_uvector<f_t>& dual_iterate, const rmm::device_uvector<f_t>& dual_slack, rmm::device_uvector<f_t>& delta_primal_iterate, rmm::device_uvector<f_t>& delta_dual_iterate, i_t total_pdlp_iterations, const rmm::device_uvector<f_t>& combined_bounds, const rmm::device_uvector<f_t>& objective_coefficients) { + cuopt_expects(!(settings_.per_constraint_residual && climber_strategies_.size() > 1), + error_type_t::ValidationError, + "per_constraint_residual is not supported in batch mode"); raft::common::nvtx::range fun_scope("Evaluate termination criteria");
🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`:
- Around line 243-244: The per-id accessors (get_objective_value,
get_dual_objective_value, get_additional_termination_information) currently
check bounds against termination_status_.size() but index termination_stats_, so
add a size invariant: in the constructors that set/move termination_stats_ and
termination_status_ (the class ctor(s) that populate these vectors) assert or
throw if termination_stats_.size() != termination_status_.size() to guarantee
synchronization, and update the accessors to check the size of the vector they
actually index (use termination_stats_.size() when accessing termination_stats_)
as a secondary safety; reference termination_stats_, termination_status_,
get_objective_value, get_dual_objective_value, and
get_additional_termination_information when making these changes.
In `@cpp/include/cuopt/linear_programming/solve.hpp`:
- Around line 17-18: The public header currently includes the internal header
and exposes dual_simplex::user_problem_t<i_t, f_t> in the batch_pdlp_solve()
signature; remove the internal include and stop exposing that internal type in
the public API by either (a) introducing a public-facing problem abstraction
(e.g., a public problem_t or ProblemView) and changing batch_pdlp_solve() to
accept that, or (b) forward-declaring a minimal public interface and adapting
the implementation to convert from the public type to
dual_simplex::user_problem_t inside the cpp implementation; update all
references to dual_simplex::user_problem_t in the header to use the new public
type or opaque forward declaration and move any internal conversions into the
source file.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 158-159: The field mip_batch_pdlp_strong_branching in class
SimplexSolverSettings is not initialized in the constructor initializer list;
add it (set to 0 by default) to the constructor's initializer list alongside
other i_t members (e.g., random_seed, inside_mip, num_bfs_workers) so
mip_batch_pdlp_strong_branching is deterministically initialized when a
SimplexSolverSettings instance is created.
In `@cpp/src/linear_programming/cusparse_view.cu`:
- Around line 238-264: The helper my_cusparsespmm_preprocess currently forwards
cusparseSpMM_preprocess return status without checking it and all its call sites
ignore that return, so preprocessing failures are silent; fix by either (A)
adding CUSPARSE_CHECK around cusparseSpMM_preprocess inside
my_cusparsespmm_preprocess so it logs/throws on error and then return
CUSPARSE_STATUS_SUCCESS, or (B) leave the function returning cusparseStatus_t
and wrap every call site of my_cusparsespmm_preprocess with RAFT_CUSPARSE_TRY
(consistent with other CUSPARSE ops in this file) to propagate errors; update
symbols: my_cusparsespmm_preprocess and each call site that invokes it so
failures are not ignored.
In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`:
- Around line 77-78: Local rmm::device_buffer variables buffer_transpose_batch
and buffer_non_transpose_batch declared in the constructor shadow the class
member buffers and are destroyed on constructor exit, causing the CUDA graph to
reference freed memory; fix by removing the local declarations and
assigning/constructing the buffers into the existing member variables (the
member buffer_transpose_batch and buffer_non_transpose_batch) so the memory
remains valid for the SpMM preprocessing, CUDA graph capture in the constructor,
and later launch() calls.
- Around line 240-250: left_node can become 0 when current_batch_size is 1,
which would pass an invalid batch size to evaluate_node; change the
computation/use of left_node in the optimal_batch_size_handler so it's never
zero (for example assign left_node = std::max(1, current_batch_size / 2) or skip
calling evaluate_node when left_node == 0) and use that non-zero value when
calling evaluate_node<i_t,f_t>(...). Ensure you update any related logic that
assumes left_node > 0 (references: left_node, current_batch_size,
evaluate_node).
In `@cpp/src/linear_programming/pdhg.cu`:
- Around line 97-109: When new_bounds is provided, validate that
new_bounds.size() equals climber_strategies.size() to avoid out‑of‑bounds
accesses in the refine kernels; inside the constructor or the place where
new_bounds is consumed (the block that fills idx/lower/upper and copies into
new_bounds_idx_, new_bounds_lower_, new_bounds_upper_), add a check that throws
or returns an error if sizes differ (e.g., if (new_bounds.size() !=
climber_strategies.size()) { /* error */ }), and only proceed to build the
temporary vectors and call raft::copy when the sizes match; reference
new_bounds, climber_strategies, new_bounds_idx_, new_bounds_lower_, and
new_bounds_upper_ when implementing this validation.
- Around line 765-773: The three CUDA kernel launches
(refine_initial_primal_projection_kernel,
refine_primal_projection_major_batch_kernel,
refine_primal_projection_batch_kernel) must be followed by an immediate
RAFT_CUDA_TRY(cudaPeekAtLastError()); to surface silent GPU failures; update
each launch site (the calls that pass stream_view_ and device_span<> arguments
and return to host) to insert RAFT_CUDA_TRY(cudaPeekAtLastError()); right after
the <<<...>>> call so any launch/async errors are detected before continuing.
In `@cpp/src/linear_programming/pdlp_constants.hpp`:
- Around line 17-22: The function kernel_config_from_batch_size must guard
against batch_size == 0 to avoid computing block_size == 0 and calling
cuda::ceil_div with a zero divisor; add an early check at the top of
kernel_config_from_batch_size (or an assert) that returns a safe pair (e.g.,
std::make_pair(0u, 0u)) or triggers a clear precondition failure when batch_size
is 0, and only compute block_size and grid_size via std::min and cuda::ceil_div
when batch_size > 0 so no divide-by-zero occurs.
In
`@cpp/src/linear_programming/restart_strategy/localized_duality_gap_container.cu`:
- Around line 107-115: The resize_context method in
localized_duality_gap_container_t currently rejects new_size equal to batch_size
even though the message says "less than or equal"; update the assertion in
localized_duality_gap_container_t::resize_context to allow no-op resizes by
changing the check to require new_size <= batch_size (and ensure the associated
error message remains accurate), so callers that pass an unchanged size are
accepted.
In `@cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu`:
- Around line 1038-1054: The resize_context method in pdlp_restart_strategy_t
currently asserts new_size < batch_size which rejects equality despite the
message; change the assertion to allow equality (new_size <= batch_size) so a
no-op resize (new_size == batch_size) is permitted, and update the assertion
message if desired to match the condition; modify the cuopt_assert that
references batch_size in resize_context to use <= instead of < to fix the
behavior.
- Around line 689-785: In pdlp_restart_strategy_t<i_t,
f_t>::should_cupdlpx_restart the final std::copy currently overwrites the
current fixed_point_error_ with last_trial_fixed_point_error_, losing fresh
errors; reverse the copy to update last_trial_fixed_point_error_ from the
current fixed_point_error_ (i.e. copy from fixed_point_error_.begin()/end() into
last_trial_fixed_point_error_.begin()), and ensure the destination vector size
matches the source before copying.
In `@cpp/src/linear_programming/solve.cu`:
- Around line 707-724: The vector info of type
optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t
is default-constructed and then accessed at info[0], causing out-of-bounds
access; fix by sizing or populating info before use (e.g., call info.resize(1)
or info.emplace_back()) and then set info[0].primal_objective,
info[0].number_of_steps_taken and info[0].solve_time, so the std::move(info)
passed into the optimization_problem_solution_t constructor is valid; adjust
around symbols info, additional_termination_information_t, vertex_solution, and
start_solver in the crossover/termination block.
- Around line 556-566: The row-sense handling is inverted: for
user_problem.row_sense == 'G' (>=) you must set constraint_lower[i] =
user_problem.rhs[i] and constraint_upper[i] = +infinity, and for 'L' (<=) set
constraint_lower[i] = -infinity and constraint_upper[i] = user_problem.rhs[i];
update the loop that writes into constraint_lower and constraint_upper (the
block using user_problem.row_sense, m, rhs, and f_t/
std::numeric_limits<f_t>::infinity()) to swap the assignments for 'G' and 'L'
accordingly while leaving the equality branch unchanged.
- Around line 750-783: The warm-start buffers/values are typed as double but
must match the template precision; change initial_primal and initial_dual from
rmm::device_uvector<double> to rmm::device_uvector<f_t2> (use the existing using
f_t2 = typename type_2<f_t>::type), and change initial_step_size and
initial_primal_weight from double to f_t2 so they match
original_solution.get_pdlp_warm_start_data() and the set_initial_* APIs (update
their NaN initializers to std::numeric_limits<f_t2>::signaling_NaN()). Ensure
the assignments that construct the device_uvectors from
original_solution.get_primal_solution()/get_dual_solution() remain but now
target the f_t2-typed buffers.
In `@cpp/src/linear_programming/swap_and_resize_helper.cuh`:
- Around line 48-84: The current in-place swap in matrix_swap uses
cub::DeviceTransform::Transform with aliased input/output iterators (in_zip and
out_zip), which is unsafe; replace that call with thrust::transform to perform
the element-wise swap in-place (thrust documents input/output may coincide).
Locate matrix_swap and the transform invocation that uses
cub::DeviceTransform::Transform, and call thrust::transform with the same
in_zip, a counting end (in_zip + total_items) and out_zip, passing the existing
lambda (or equivalent functor) that swaps the tuple elements; ensure headers no
longer require cub for this operation and keep the matrix_swap_index_functor and
swap_pair_t usage unchanged.
In `@cpp/src/linear_programming/termination_strategy/convergence_information.cu`:
- Around line 249-256: The cuopt_assert in convergence_information_t<i_t,
f_t>::resize_context incorrectly uses new_size < batch_size which forbids
new_size == batch_size despite the error message; change the assertion to allow
equality (use new_size <= batch_size) so callers can perform a no-op resize, and
keep the existing message (or update it to match) while referencing
primal_objective_.size() and cuopt_assert for locating the check.
In `@cpp/src/mip/solve.cu`:
- Around line 61-65: You cloned settings.hyper_params and changed two flags but
never propagated that modified copy into the solver, so PDLP (consuming
settings_.hyper_params in mip_solver_t -> PDLP) still sees the original values;
either (A) update the settings object itself before constructing mip_solver_t
(assign the modified hyper_params back to settings.hyper_params so
mip_solver_t/PDLP use the new flags), or (B) if these flags are truly only for
the scaling strategy, add a comment and ensure only the scaling code receives
the local hyper_params (and do not expect PDLP to observe them). Locate the code
paths around the modified hyper_params, the callsite that constructs
mip_solver_t, and PDLP usage (pdlp.cu references) to implement option A (assign
back) or B (document and keep local), ensuring the flags are consistently
applied where consumed.
🟡 Minor comments (8)
cpp/src/linear_programming/saddle_point.cu-77-84 (1)
77-84:resize_contextrejects a no-op resize.The assert says “less than or equal” but enforces
<. This will fail if callers pass the current size. Either allow equality or update the message. I’d allow equality.✅ Suggested fix
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");cpp/src/linear_programming/saddle_point.cu-20-37 (1)
20-37: Validatebatch_sizein the constructor.
batch_sizecan be zero here, creating an invalid state that later asserts in swap/resize. Add an explicit guard up front. As per coding guidelines, validate problem sizes before expensive allocations.✅ Suggested fix
EXE_CUOPT_EXPECTS(primal_size > 0, "Size of the primal problem must be larger than 0"); EXE_CUOPT_EXPECTS(dual_size > 0, "Size of the dual problem must be larger than 0"); + EXE_CUOPT_EXPECTS(batch_size > 0, "Batch size must be greater than 0");cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh-30-31 (1)
30-31: Uninitialized memberbyte_needed_in default constructor.The default constructor leaves
byte_needed_uninitialized, which could lead to undefined behavior ifsegmented_sum_storage_.resize(byte_needed_, ...)is called beforebyte_needed_is set by a cub query. While the two-phase pattern should always query first, defensive initialization is safer.Proposed fix
// Empty constructor for when used in non batch mode - segmented_sum_handler_t() {} + segmented_sum_handler_t() : byte_needed_(0) {}cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.hpp-59-60 (1)
59-60: Document the lifetime requirement for reference members.The reference members
climber_strategies_andhyper_params_(lines 129-130) are part of an intentional architectural pattern used consistently across the solver (also inpdhg_solver_t,cusparse_view_t,termination_strategy). These references are passed through the component hierarchy and managed by the parentpdhg_solver_t, which orchestrates the lifetime of all sub-components.However, this pattern relies on an implicit caller contract: the referenced objects must remain valid for the duration of the solver's execution. Add a comment or docstring in the class documenting this lifetime requirement to prevent misuse.
cpp/src/mip/diversity/recombiners/sub_mip.cuh-73-84 (1)
73-84: Resolve TODO and confirm hyper‑params consistency for Sub‑MIP scaling.Line 73‑84: you now pass
context.settings.hyper_paramsinto scaling, but the TODO suggests uncertainty. Please confirm this is intended (and consistent with the MIP path’s hyper‑param overrides) and remove/track the TODO before merge.If you want, I can suggest a small wrapper to align Sub‑MIP hyper‑params with the MIP path.
benchmarks/linear_programming/cuopt/run_pdlp.cu-49-53 (1)
49-53: Update CLI help text for new solver modes.Line 50: the help string still lists only Stable3/Methodical1/Fast1, but Stable2 and Stable1 are now valid choices.
🔧 Suggested fix
- .help("Solver mode for PDLP. Possible values: Stable3 (default), Methodical1, Fast1") + .help("Solver mode for PDLP. Possible values: Stable3 (default), Stable2, Stable1, Methodical1, Fast1")cpp/tests/linear_programming/pdlp_test.cu-1011-1019 (1)
1011-1019: Potential stream synchronization issue inextract_subvector.The helper copies data on
vector.stream()and returns immediately. If the caller accesses the returnedsubvectoron a different stream or host-side without synchronization, there could be a race condition.Consider either:
- Synchronizing before return, or
- Documenting that the caller must synchronize.
🔧 Option 1: Add synchronization
template <typename T> rmm::device_uvector<T> extract_subvector(const rmm::device_uvector<T>& vector, size_t start, size_t length) { rmm::device_uvector<T> subvector(length, vector.stream()); raft::copy(subvector.data(), vector.data() + start, length, vector.stream()); + RAFT_CUDA_TRY(cudaStreamSynchronize(vector.stream())); return subvector; }cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu-160-160 (1)
160-160: Assertion message inconsistent with actual check.The assertion message says "New size must be less than or equal to batch size" but the code checks
new_size < batch_size(strictly less than). Either the check or the message should be corrected.Suggested fix
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");Or if the strict check is intentional:
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size < batch_size, "New size must be strictly less than batch size");
🧹 Nitpick comments (24)
cpp/src/linear_programming/utilities/ping_pong_graph.cuh (1)
97-105: Consider value-initializing CUDA handles.The
cudaGraph_tandcudaGraphExec_tmembers are uninitialized, which is safe given the*_initializedguards. However, value-initialization tonullptrwould make the code more defensive against accidental misuse and aligns with defensive coding practices.🔧 Optional improvement
private: - cudaGraph_t even_graph; - cudaGraph_t odd_graph; - cudaGraphExec_t even_instance; - cudaGraphExec_t odd_instance; + cudaGraph_t even_graph{nullptr}; + cudaGraph_t odd_graph{nullptr}; + cudaGraphExec_t even_instance{nullptr}; + cudaGraphExec_t odd_instance{nullptr};cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh (2)
39-51: Inconsistent stream parameter usage between methods.
segmented_sum_helperpassesstream_view_directly (lines 40, 42, 50), whilesegmented_reduce_helperpassesstream_view_.value()(lines 69, 71, 81). While both should work due to implicit conversions, this inconsistency could mask issues whenstream_view_is default-constructed (from the parameterless constructor).Consider using
stream_view_.value()consistently in both methods, or adding a check thatstream_view_is valid before use.Also applies to: 61-82
84-86: Consider makingbyte_needed_private or documenting its usage.The member
byte_needed_is public but appears to be internal state managed by the helper methods. Either make it private with accessor if needed, or add a brief comment explaining that it's updated by the helper methods.cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu (4)
32-33: Unused member variablesx_descrandy_descr.Local variables
x_descrandy_descr(lines 32-33) are used in the constructor but never assigned to the member variables of the same names (lines 144-145). The members appear to be unused dead code.Consider removing unused members
- cusparse_dn_mat_descr_wrapper_t<f_t> x_descr; - cusparse_dn_mat_descr_wrapper_t<f_t> y_descr; rmm::device_uvector<f_t> x; rmm::device_uvector<f_t> y;Also applies to: 144-145
166-176: Consider adding CUDA error checking after event operations.The event recording and synchronization lack explicit error checking. While
event_handler_tmay handle this internally, explicit checks (orRAFT_CUDA_TRYwrappers) would ensure benchmark failures are detected rather than silently producing incorrect timings.As per coding guidelines, every CUDA operation should have error checking.
312-318: Clarify the middle node calculation.The formula
((current_batch_size * 2) + current_batch_size) / 2equals1.5 * current_batch_size, which is the midpoint betweencurrent_batch_sizeandcurrent_batch_size * 2. This makes sense algorithmically, but a clearer expression or comment would improve readability.Suggested clarification
// Testing one last time between the two - const int middle_node = ((current_batch_size * 2) + current_batch_size) / 2; + // Midpoint between current_batch_size and optimal_batch_size (which is current_batch_size * 2) + const int middle_node = (current_batch_size + optimal_batch_size) / 2;
422-423: Unreachable code after exhaustive if-else.The
cuopt_assert(false, ...)andreturn 0at lines 422-423 are unreachable since the preceding if/else-if/else chain is exhaustive. While this serves as a defensive assertion, consider removing it or replacing with a comment to avoid static analysis warnings about unreachable code.cpp/src/linear_programming/utils.cuh (1)
194-221: TODO: Usei_tforproblem_size_in iterator wrappers.Line 219 has a TODO comment about using
i_tinstead ofintforproblem_size_. This should be addressed to maintain type consistency with the rest of the codebase and avoid potential truncation issues on large problems.Proposed fix for both iterators
template <typename f_t> -struct batch_wrapped_iterator { - batch_wrapped_iterator(const f_t* problem_input, int problem_size) +template <typename i_t, typename f_t> +struct batch_wrapped_iterator { + batch_wrapped_iterator(const f_t* problem_input, i_t problem_size) : problem_input_(problem_input), problem_size_(problem_size) { } - HDI f_t operator()(int id) { return problem_input_[id / problem_size_]; } + HDI f_t operator()(i_t id) { return problem_input_[id / problem_size_]; } const f_t* problem_input_; - int problem_size_; + i_t problem_size_; }; -template <typename f_t> +template <typename i_t, typename f_t> struct problem_wrapped_iterator { - problem_wrapped_iterator(const f_t* problem_input, int problem_size) + problem_wrapped_iterator(const f_t* problem_input, i_t problem_size) : problem_input_(problem_input), problem_size_(problem_size) { } - HDI f_t operator()(int id) { return problem_input_[id % problem_size_]; } + HDI f_t operator()(i_t id) { return problem_input_[id % problem_size_]; } const f_t* problem_input_; - // TODO use i_t - int problem_size_; + i_t problem_size_; };cpp/src/linear_programming/pdlp_climber_strategy.hpp (1)
19-21: Consider providing default initialization fororiginal_index.The struct lacks a default initializer for
original_index. If instances are default-constructed (e.g., in astd::vectorresize), the field will be uninitialized, which could lead to undefined behavior if read before being set.Suggested initialization
struct pdlp_climber_strategy_t { - int original_index; + int original_index{-1}; // -1 indicates uninitialized/invalid };cpp/src/linear_programming/pdlp_constants.hpp (1)
35-39: Avoid mutable header‑level flags for batch behavior.Line 35‑39:
static boolin a header yields one copy per TU and encourages global mutable state. Prefer wiring these through settings/hyper‑params, or make theminline constexprif they are compile‑time constants.As per coding guidelines, avoid thread‑unsafe global/static state.
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
86-87: Add Doxygen documentation for new public member.As per coding guidelines for public headers, new public members should have documentation comments. The
mip_batch_pdlp_strong_branchingmember lacks documentation explaining its purpose, valid values, and effect on solver behavior./** * `@brief` Enable batch PDLP for strong branching in MIP solver. * `@note` 0 = disabled (use Dual Simplex), 1 = enabled (use Batch PDLP) */ i_t mip_batch_pdlp_strong_branching = 0;cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)
63-68: Consider adding size validation for weight vector.The implementation assumes
weighthas at least one element (accessed viaweight.data()at lines 81, 88, 92-93). If an empty vector is passed, this would cause undefined behavior when dereferencing.Consider adding a debug assertion or precondition check:
cuopt_assert(weight.size() > 0, "weight vector must not be empty");benchmarks/linear_programming/cuopt/benchmark_helper.hpp (1)
62-71: Good refactor: Runtime hyper-params instead of static globals.The signature change to accept
pdlp_hyper_params_t& paramsimproves modularity and eliminates reliance on global state. One minor suggestion:Consider using
std::exit(EXIT_FAILURE)instead ofexit(-1)for portability and clarity.♻️ Minor improvement
- exit(-1); + std::exit(EXIT_FAILURE);cpp/src/linear_programming/swap_and_resize_helper.cuh (1)
86-96: Consider type consistency for index parameters.The function uses
intfor parameters while the template might be used with different index types. Consider using a template parameter orsize_tfor consistency.♻️ Suggested type improvement
template <typename host_vector_t> -void host_vector_swap(host_vector_t& host_vector, int left_swap_index, int right_swap_index) +void host_vector_swap(host_vector_t& host_vector, + typename host_vector_t::size_type left_swap_index, + typename host_vector_t::size_type right_swap_index) { - cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds"); - cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds"); + cuopt_assert(left_swap_index < host_vector.size(), "Left swap index out of bounds"); + cuopt_assert(right_swap_index < host_vector.size(), "Right swap index out of bounds"); cuopt_assert(left_swap_index < right_swap_index, "Left swap index must be less than right swap index");cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh (1)
115-116: Note: Reference member requires lifetime awareness.The
hyper_params_reference member means the caller must ensure the referencedpdlp_hyper_params_toutlives thepdlp_initial_scaling_strategy_tinstance. This is typical for performance-critical code but worth documenting if not already done elsewhere.cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)
120-133: Address TODO comments before merge.Multiple
TODO batch mode: tmpcomments indicate these additions may be temporary or need refinement. Consider documenting the intended long-term design forset_initial_step_sizeandset_initial_primal_weightor removing the TODO markers if the implementation is final.cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)
213-217: Duplicate helper function:finite_or_zeroexists in convergence_information.cu.This
finite_or_zerohelper is identical to the one defined incpp/src/linear_programming/termination_strategy/convergence_information.cu(lines 472-475 per the relevant snippets). Consider extracting it to a shared utility header to avoid duplication. As per coding guidelines, duplicate code should be refactored into shared utilities.♻️ Suggestion: Extract to shared utility
Move
finite_or_zeroand potentiallymax_abs_tto a shared header likelinear_programming/utils.cuh:// In linear_programming/utils.cuh template <typename f_t> HDI f_t finite_or_zero(f_t in) { return isfinite(in) ? in : f_t(0.0); } template <typename f_t> struct max_abs_t { HD f_t operator()(f_t a, f_t b) { return cuda::std::max(cuda::std::abs(a), cuda::std::abs(b)); } };cpp/src/dual_simplex/pseudo_costs.cpp (2)
162-210: Review batch PDLP strong branching implementation.The batch path correctly:
- Converts
root_solnto original problem space viauncrush_primal_solution- Extracts fractional values for the batch solver
- Interprets results with
obj_downat indexkandobj_upat indexk + fractional.size()However, there's no error handling if
batch_pdlp_solvefails or returns unexpected results. Consider adding validation:♻️ Suggested improvement: Add result validation
std::vector<f_t> primal_solutions = batch_pdlp_solve(original_problem, fractional, fraction_values); + + if (primal_solutions.size() != fractional.size() * 2) { + settings.log.printf("Warning: batch_pdlp_solve returned unexpected result size %ld, expected %ld\n", + primal_solutions.size(), fractional.size() * 2); + // Fall back to dual simplex path or handle error appropriately + }
175-180: Minor: Loop can use range-based iteration or reserve capacity.The loop building
fraction_valuescould benefit from reserving capacity upfront to avoid reallocations.♻️ Suggested improvement
std::vector<f_t> fraction_values; + fraction_values.reserve(fractional.size()); for (i_t k = 0; k < fractional.size(); k++) { const i_t j = fractional[k]; fraction_values.push_back(original_root_soln_x[j]); }cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (2)
207-207: Unnamed parameter reduces readability.The parameter
i_t = 0lacks a name, making the API less clear. Consider naming it for consistency with other per-id accessors.Suggested improvement
- f_t get_objective_value(i_t = 0) const; + f_t get_objective_value(i_t id = 0) const;
213-213: Same unnamed parameter issue.Suggested improvement
- f_t get_dual_objective_value(i_t = 0) const; + f_t get_dual_objective_value(i_t id = 0) const;cpp/src/linear_programming/pdhg.hpp (1)
29-29: Clarify the distinction betweenis_legacy_batch_modeparameter andbatch_mode_member.The constructor takes
is_legacy_batch_modeas a parameter but storesbatch_mode_as a member. The relationship between these is unclear from the header alone. Consider documenting whetherbatch_mode_is derived fromis_legacy_batch_modeor fromclimber_strategies.size() > 1.Also applies to: 89-89
cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu (1)
411-411: TODO comments indicate incomplete batch-per-climber restart logic.The comments "TODO later batch mode: handle if not all restart" and "TODO later batch mode: remove this once you want to do per climber restart" indicate this is intentionally deferred functionality. Consider tracking these in an issue.
Would you like me to open an issue to track implementing per-climber restart logic in batch mode?
Also applies to: 427-427, 484-484
cpp/src/linear_programming/termination_strategy/infeasibility_information.hpp (1)
30-42: Constructor signature significantly expanded for cuPDLPx infeasibility detection.The constructor now accepts scaled problem references, scaling strategy, and batch configuration. The comment annotations clarify which parameters are specifically for cuPDLPx infeasibility detection. However, the parameter list is becoming lengthy.
Consider grouping the cuPDLPx-specific parameters into a configuration struct to improve readability and maintainability:
struct cupdlpx_infeasibility_config_t { const problem_t<i_t, f_t>& op_problem_scaled; const cusparse_view_t<i_t, f_t>& scaled_cusparse_view; const pdlp_initial_scaling_strategy_t<i_t, f_t>& scaling_strategy; };
| pdlp_termination_status_t get_termination_status(i_t id = 0) const; | ||
| std::vector<pdlp_termination_status_t>& get_terminations_status(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the header file and understand its structure
find . -name "solver_solution.hpp" -type fRepository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Search for related .cpp implementation files
find . -path "*/linear_programming/pdlp/*" -name "*.cpp" -o -name "*.cu" | head -20Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Get an overview of the directory structure for pdlp
find . -path "*/linear_programming/pdlp/*" -type f | head -30Repository: NVIDIA/cuopt
Length of output: 318
🏁 Script executed:
# Read the solver_solution.hpp file to see the per-id accessors
wc -l ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hppRepository: NVIDIA/cuopt
Length of output: 123
🏁 Script executed:
# Read around lines 243-244 and 257-259 to see the accessor declarations
sed -n '235,265p' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hppRepository: NVIDIA/cuopt
Length of output: 1331
🏁 Script executed:
# Search for implementation files that might contain the implementation
find . -path "*/src/*" -name "*solver_solution*" -type f
find . -path "*/src/*" -name "*pdlp*" -type f | grep -E "\.(cpp|cu)$"Repository: NVIDIA/cuopt
Length of output: 297
🏁 Script executed:
# Check the implementation file for the per-id accessors
wc -l ./cpp/src/linear_programming/solver_solution.cuRepository: NVIDIA/cuopt
Length of output: 107
🏁 Script executed:
# Search for get_termination_status and get_additional_termination_information implementations
rg -n 'get_termination_status|get_additional_termination_information' ./cpp/src/linear_programming/solver_solution.cu -A 10Repository: NVIDIA/cuopt
Length of output: 4516
🏁 Script executed:
# Search for cuopt_assert definition
rg -n 'define.*cuopt_assert|^.*cuopt_assert.*=' --type cpp --type hpp | head -20Repository: NVIDIA/cuopt
Length of output: 83
🏁 Script executed:
# Look for assertion utilities or error handling
find . -path "*/include/*" -name "*assert*" -o -name "*error*" | grep -E "\.(hpp|h)$" | head -10Repository: NVIDIA/cuopt
Length of output: 129
🏁 Script executed:
# Check if bounds checking uses assertions or runtime checks
rg -n 'cuopt_assert' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hppRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Search for cuopt_assert definition across the codebase
rg -n 'define.*cuopt_assert|cuopt_assert\s*(' --max-count 20 | head -30Repository: NVIDIA/cuopt
Length of output: 180
🏁 Script executed:
# Check the error.hpp file for assertion macros
cat ./cpp/include/cuopt/error.hppRepository: NVIDIA/cuopt
Length of output: 5690
🏁 Script executed:
# Search for cuopt_assert definition
rg -n 'cuopt_assert' --max-count 30Repository: NVIDIA/cuopt
Length of output: 50369
🏁 Script executed:
# Look at the full context of cuopt_assert usage in solver_solution.cu
sed -n '310,400p' ./cpp/src/linear_programming/solver_solution.cuRepository: NVIDIA/cuopt
Length of output: 2874
🏁 Script executed:
# Find where termination_status_ and termination_stats_ are resized/initialized
rg -n 'termination_status_|termination_stats_' ./cpp/src/linear_programming/solver_solution.cu | head -50Repository: NVIDIA/cuopt
Length of output: 3382
🏁 Script executed:
# Check the class definition to see member variables and their types
sed -n '1,250p' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp | rg -n 'termination_status_|termination_stats_'Repository: NVIDIA/cuopt
Length of output: 1042
🏁 Script executed:
# Check how these vectors are initialized together to ensure they stay synchronized
sed -n '50,90p' ./cpp/src/linear_programming/solver_solution.cuRepository: NVIDIA/cuopt
Length of output: 1960
🏁 Script executed:
# Look for any code that might resize only one vector
rg -n 'resize|push_back|clear' ./cpp/src/linear_programming/solver_solution.cu | head -30Repository: NVIDIA/cuopt
Length of output: 512
🏁 Script executed:
# Check if cuopt_assert is actually enabled in production or if it's disabled
cat ./cpp/src/utilities/macros.cuh | head -30Repository: NVIDIA/cuopt
Length of output: 695
🏁 Script executed:
# Look for any constructor validation that ensures both vectors have same size
rg -B5 -A10 'termination_stats_.*termination_status_' ./cpp/src/linear_programming/solver_solution.cu | grep -E 'cuopt_expects|assert|size\(\)|throw'Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Verify the specific implementations that have the mismatch
sed -n '313,330p' ./cpp/src/linear_programming/solver_solution.cuRepository: NVIDIA/cuopt
Length of output: 806
Add bounds validation to ensure termination_stats_ and termination_status_ are same size, or check bounds against the correct vector.
The per-id accessors check bounds against termination_status_.size() but access termination_stats_[id]:
get_objective_value(i_t id)(line 320-324): checkstermination_status_.size()but accessestermination_stats_[id]get_dual_objective_value(i_t id)(line 326-330): checkstermination_status_.size()but accessestermination_stats_[id]get_additional_termination_information(i_t id)(line 387-391): checkstermination_status_.size()but accessestermination_stats_[id]
The constructors move both vectors together but don't validate they remain synchronized. Since cuopt_assert is disabled by default in production builds, add a size invariant check in constructors or adjust bounds checking to validate the vector being accessed.
🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp` around lines
243 - 244, The per-id accessors (get_objective_value, get_dual_objective_value,
get_additional_termination_information) currently check bounds against
termination_status_.size() but index termination_stats_, so add a size
invariant: in the constructors that set/move termination_stats_ and
termination_status_ (the class ctor(s) that populate these vectors) assert or
throw if termination_stats_.size() != termination_status_.size() to guarantee
synchronization, and update the accessors to check the size of the vector they
actually index (use termination_stats_.size() when accessing termination_stats_)
as a secondary safety; reference termination_stats_, termination_status_,
get_objective_value, get_dual_objective_value, and
get_additional_termination_information when making these changes.
|
|
||
| #include <dual_simplex/user_problem.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the user_problem.hpp file and surrounding context
fd -t f "user_problem.hpp" --exec cat {} +Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Examine the solve.hpp file to see what it includes and uses
cat -n cpp/include/cuopt/linear_programming/solve.hpp | head -50Repository: NVIDIA/cuopt
Length of output: 2490
🏁 Script executed:
# Check the directory structure to understand public vs internal layout
tree -L 3 cpp/include/cuopt/ 2>/dev/null || find cpp/include/cuopt -type d | head -20Repository: NVIDIA/cuopt
Length of output: 953
🏁 Script executed:
# Check if dual_simplex is documented as internal/public
find cpp -name "*.hpp" -o -name "*.h" | xargs grep -l "dual_simplex" | head -10Repository: NVIDIA/cuopt
Length of output: 469
🏁 Script executed:
# Find where user_problem.hpp actually is
find cpp -name "user_problem.hpp" -type fRepository: NVIDIA/cuopt
Length of output: 93
🏁 Script executed:
# Check what types/classes user_problem.hpp defines
find cpp -name "user_problem.hpp" -type f -exec cat {} +Repository: NVIDIA/cuopt
Length of output: 1462
🏁 Script executed:
# Look at more of solve.hpp to see how/if user_problem types are exposed
cat -n cpp/include/cuopt/linear_programming/solve.hpp | head -100Repository: NVIDIA/cuopt
Length of output: 5323
🏁 Script executed:
# Check if user_problem_t or similar types appear in the public function signature
grep -r "user_problem" cpp/include/cuopt/ 2>/dev/null || echo "No user_problem references in public headers"Repository: NVIDIA/cuopt
Length of output: 833
Remove internal header inclusion or provide abstraction for dual_simplex::user_problem_t in public API.
The public header includes dual_simplex/user_problem.hpp from the internal source tree and exposes dual_simplex::user_problem_t<i_t, f_t> in public function signatures (e.g., batch_pdlp_solve()). Users should not depend on internal implementation types. Either move the internal type to a public interface, or refactor to accept a public-facing problem representation instead.
🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/solve.hpp` around lines 17 - 18, The
public header currently includes the internal header and exposes
dual_simplex::user_problem_t<i_t, f_t> in the batch_pdlp_solve() signature;
remove the internal include and stop exposing that internal type in the public
API by either (a) introducing a public-facing problem abstraction (e.g., a
public problem_t or ProblemView) and changing batch_pdlp_solve() to accept that,
or (b) forward-declaring a minimal public interface and adapting the
implementation to convert from the public type to dual_simplex::user_problem_t
inside the cpp implementation; update all references to
dual_simplex::user_problem_t in the header to use the new public type or opaque
forward declaration and move any internal conversions into the source file.
| i_t mip_batch_pdlp_strong_branching; // 0 if not using batch PDLP for strong branching, 1 if | ||
| // using batch PDLP for strong branching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing initialization for mip_batch_pdlp_strong_branching.
The new field is not initialized in the constructor's initializer list (lines 42-98), leaving it with an indeterminate value. Other i_t members like random_seed, inside_mip, and num_bfs_workers are all explicitly initialized.
🐛 Proposed fix: Initialize to 0 (disabled by default)
Add to the constructor initializer list around line 91:
num_bfs_workers(std::max(num_threads / 4, 1)),
+ mip_batch_pdlp_strong_branching(0),
random_seed(0),🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp` around lines 158 - 159, The
field mip_batch_pdlp_strong_branching in class SimplexSolverSettings is not
initialized in the constructor initializer list; add it (set to 0 by default) to
the constructor's initializer list alongside other i_t members (e.g.,
random_seed, inside_mip, num_bfs_workers) so mip_batch_pdlp_strong_branching is
deterministically initialized when a SimplexSolverSettings instance is created.
| // TODO add proper checking | ||
| #if CUDA_VER_12_4_UP | ||
| template <typename T, | ||
| typename std::enable_if_t<std::is_same_v<T, float> || std::is_same_v<T, double>>*> | ||
| cusparseStatus_t my_cusparsespmm_preprocess(cusparseHandle_t handle, | ||
| cusparseOperation_t opA, | ||
| cusparseOperation_t opB, | ||
| const T* alpha, | ||
| const cusparseSpMatDescr_t matA, | ||
| const cusparseDnMatDescr_t matB, | ||
| const T* beta, | ||
| const cusparseDnMatDescr_t matC, | ||
| cusparseSpMMAlg_t alg, | ||
| void* externalBuffer, | ||
| cudaStream_t stream) | ||
| { | ||
| auto constexpr float_type = []() constexpr { | ||
| if constexpr (std::is_same_v<T, float>) { | ||
| return CUDA_R_32F; | ||
| } else if constexpr (std::is_same_v<T, double>) { | ||
| return CUDA_R_64F; | ||
| } | ||
| }(); | ||
| CUSPARSE_CHECK(cusparseSetStream(handle, stream)); | ||
| return cusparseSpMM_preprocess( | ||
| handle, opA, opB, alpha, matA, matB, beta, matC, float_type, alg, externalBuffer); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and get its size
wc -l cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 103
🏁 Script executed:
# Examine lines 238-264 to see the function definition
sed -n '238,264p' cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 1362
🏁 Script executed:
# Examine lines 536-558 to see call sites
sed -n '536,558p' cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 1226
🏁 Script executed:
# Search for all call sites of my_cusparsespmm_preprocess
rg -n "my_cusparsespmm_preprocess" cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 645
🏁 Script executed:
# Check if RAFT_CUSPARSE_TRY is used elsewhere in the codebase
rg -n "RAFT_CUSPARSE_TRY" --type cpp --type cu | head -20Repository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
# Check if CUSPARSE_CHECK is used in this file
rg -n "CUSPARSE_CHECK" cpp/src/linear_programming/cusparse_view.cu | head -20Repository: NVIDIA/cuopt
Length of output: 112
🏁 Script executed:
# Look for error handling patterns in the file
rg -n "cusparseStatus_t|CUSPARSE_CHECK|RAFT_CUSPARSE_TRY" cpp/src/linear_programming/cusparse_view.cu | head -30Repository: NVIDIA/cuopt
Length of output: 1729
🏁 Script executed:
# Check the context around the first call site at line 536
sed -n '530,560p' cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 1560
🏁 Script executed:
# Check the context around line 560
sed -n '555,575p' cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 870
🏁 Script executed:
# Verify the pattern of RAFT_CUSPARSE_TRY vs CUSPARSE_CHECK in similar contexts
sed -n '420,470p' cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 3171
Check cusparseSpMM_preprocess errors instead of dropping them.
my_cusparsespmm_preprocess returns cusparseStatus_t, but all call sites (lines 536, 548, 560, 572, 777, 789) ignore this return value. The function itself also fails to check the status of cusparseSpMM_preprocess before returning it. A failing preprocess will be silent. Wrap all call sites with RAFT_CUSPARSE_TRY (consistent with the pattern used throughout this file for similar CUSPARSE operations), or add CUSPARSE_CHECK inside the function and return success.
🐛 Example fix for call site at lines 536–547
- my_cusparsespmm_preprocess(handle_ptr_->get_cusparse_handle(),
- CUSPARSE_OPERATION_NON_TRANSPOSE,
- CUSPARSE_OPERATION_NON_TRANSPOSE,
- alpha.data(),
- A_T,
- batch_delta_dual_solutions,
- beta.data(),
- batch_tmp_primals,
- CUSPARSE_SPMM_CSR_ALG3,
- buffer_transpose_batch.data(),
- handle_ptr->get_stream());
+ RAFT_CUSPARSE_TRY(my_cusparsespmm_preprocess(handle_ptr_->get_cusparse_handle(),
+ CUSPARSE_OPERATION_NON_TRANSPOSE,
+ CUSPARSE_OPERATION_NON_TRANSPOSE,
+ alpha.data(),
+ A_T,
+ batch_delta_dual_solutions,
+ beta.data(),
+ batch_tmp_primals,
+ CUSPARSE_SPMM_CSR_ALG3,
+ buffer_transpose_batch.data(),
+ handle_ptr->get_stream()));Applies to: 536–547, 548–559, 560–571, 572–583, 777–788, 789–800.
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cusparse_view.cu` around lines 238 - 264, The
helper my_cusparsespmm_preprocess currently forwards cusparseSpMM_preprocess
return status without checking it and all its call sites ignore that return, so
preprocessing failures are silent; fix by either (A) adding CUSPARSE_CHECK
around cusparseSpMM_preprocess inside my_cusparsespmm_preprocess so it
logs/throws on error and then return CUSPARSE_STATUS_SUCCESS, or (B) leave the
function returning cusparseStatus_t and wrap every call site of
my_cusparsespmm_preprocess with RAFT_CUSPARSE_TRY (consistent with other
CUSPARSE ops in this file) to propagate errors; update symbols:
my_cusparsespmm_preprocess and each call site that invokes it so failures are
not ignored.
| rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view); | ||
| rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Local variables shadow member variables, causing use-after-free.
Lines 77-78 declare local rmm::device_buffer variables that shadow the member variables declared at lines 148-149. The local buffers are used for SpMM preprocessing and the CUDA graph capture (lines 82-136), but they are destroyed when the constructor exits. The captured graph will reference freed memory, leading to undefined behavior when launch() is called.
🐛 Proposed fix - assign to members instead of declaring locals
- rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);
- rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);
+ buffer_transpose_batch.resize(buffer_size_transpose_batch, stream_view);
+ buffer_non_transpose_batch.resize(buffer_size_non_transpose_batch, stream_view);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view); | |
| rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view); | |
| buffer_transpose_batch.resize(buffer_size_transpose_batch, stream_view); | |
| buffer_non_transpose_batch.resize(buffer_size_non_transpose_batch, stream_view); |
🤖 Prompt for AI Agents
In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`
around lines 77 - 78, Local rmm::device_buffer variables buffer_transpose_batch
and buffer_non_transpose_batch declared in the constructor shadow the class
member buffers and are destroyed on constructor exit, causing the CUDA graph to
reference freed memory; fix by removing the local declarations and
assigning/constructing the buffers into the existing member variables (the
member buffer_transpose_batch and buffer_non_transpose_batch) so the memory
remains valid for the SpMM preprocessing, CUDA graph capture in the constructor,
and later launch() calls.
| // Should be filled with more information from dual simplex | ||
| typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t info; | ||
| info.primal_objective = vertex_solution.user_objective; | ||
| info.number_of_steps_taken = vertex_solution.iterations; | ||
| auto crossover_end = std::chrono::high_resolution_clock::now(); | ||
| std::vector< | ||
| typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t> | ||
| info; | ||
| info[0].primal_objective = vertex_solution.user_objective; | ||
| info[0].number_of_steps_taken = vertex_solution.iterations; | ||
| auto crossover_end = std::chrono::high_resolution_clock::now(); | ||
| auto crossover_duration = | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(crossover_end - start_solver); | ||
| info.solve_time = crossover_duration.count() / 1000.0; | ||
| info[0].solve_time = crossover_duration.count() / 1000.0; | ||
| auto sol_crossover = optimization_problem_solution_t<i_t, f_t>(final_primal_solution, | ||
| final_dual_solution, | ||
| final_reduced_cost, | ||
| problem.objective_name, | ||
| problem.var_names, | ||
| problem.row_names, | ||
| info, | ||
| termination_status); | ||
| std::move(info), | ||
| {termination_status}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crossover termination info vector is never sized.
info is default‑constructed and then indexed at [0], which is out of bounds and will crash.
🐛 Suggested fix
- std::vector<
- typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t>
- info;
+ std::vector<
+ typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t>
+ info(1);🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solve.cu` around lines 707 - 724, The vector info
of type optimization_problem_solution_t<i_t,
f_t>::additional_termination_information_t is default-constructed and then
accessed at info[0], causing out-of-bounds access; fix by sizing or populating
info before use (e.g., call info.resize(1) or info.emplace_back()) and then set
info[0].primal_objective, info[0].number_of_steps_taken and info[0].solve_time,
so the std::move(info) passed into the optimization_problem_solution_t
constructor is valid; adjust around symbols info,
additional_termination_information_t, vertex_solution, and start_solver in the
crossover/termination block.
| rmm::device_uvector<double> initial_primal(0, stream); | ||
| rmm::device_uvector<double> initial_dual(0, stream); | ||
| double initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); | ||
| double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); | ||
|
|
||
| cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0"); | ||
| const int max_batch_size = settings.new_bounds.size(); | ||
| int optimal_batch_size = use_optimal_batch_size | ||
| ? detail::optimal_batch_size_handler(problem, max_batch_size) | ||
| : max_batch_size; | ||
| cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size, | ||
| "Optimal batch size should be between 1 and max batch size"); | ||
| using f_t2 = typename type_2<f_t>::type; | ||
|
|
||
| // If need warm start, solve the LP alone | ||
| if (primal_dual_init || primal_weight_init) { | ||
| pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings; | ||
| warm_start_settings.new_bounds.clear(); | ||
| warm_start_settings.method = cuopt::linear_programming::method_t::PDLP; | ||
| warm_start_settings.presolve = false; | ||
| warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3; | ||
| warm_start_settings.detect_infeasibility = false; | ||
| optimization_problem_solution_t<i_t, f_t> original_solution = | ||
| solve_lp(problem, warm_start_settings); | ||
| if (primal_dual_init) { | ||
| initial_primal = rmm::device_uvector<double>( | ||
| original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); | ||
| initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(), | ||
| original_solution.get_dual_solution().stream()); | ||
| initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_; | ||
| } | ||
| if (primal_weight_init) { | ||
| initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch in batch warm‑start buffers.
initial_primal/initial_dual are double even when f_t is float, which won’t compile for the float instantiation and can’t bind to set_initial_* expecting f_t*.
🐛 Suggested fix
- rmm::device_uvector<double> initial_primal(0, stream);
- rmm::device_uvector<double> initial_dual(0, stream);
- double initial_step_size = std::numeric_limits<f_t>::signaling_NaN();
- double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
+ rmm::device_uvector<f_t> initial_primal(0, stream);
+ rmm::device_uvector<f_t> initial_dual(0, stream);
+ f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN();
+ f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
@@
- initial_primal = rmm::device_uvector<double>(
+ initial_primal = rmm::device_uvector<f_t>(
original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());
- initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(),
- original_solution.get_dual_solution().stream());
+ initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(),
+ original_solution.get_dual_solution().stream());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rmm::device_uvector<double> initial_primal(0, stream); | |
| rmm::device_uvector<double> initial_dual(0, stream); | |
| double initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); | |
| double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); | |
| cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0"); | |
| const int max_batch_size = settings.new_bounds.size(); | |
| int optimal_batch_size = use_optimal_batch_size | |
| ? detail::optimal_batch_size_handler(problem, max_batch_size) | |
| : max_batch_size; | |
| cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size, | |
| "Optimal batch size should be between 1 and max batch size"); | |
| using f_t2 = typename type_2<f_t>::type; | |
| // If need warm start, solve the LP alone | |
| if (primal_dual_init || primal_weight_init) { | |
| pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings; | |
| warm_start_settings.new_bounds.clear(); | |
| warm_start_settings.method = cuopt::linear_programming::method_t::PDLP; | |
| warm_start_settings.presolve = false; | |
| warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3; | |
| warm_start_settings.detect_infeasibility = false; | |
| optimization_problem_solution_t<i_t, f_t> original_solution = | |
| solve_lp(problem, warm_start_settings); | |
| if (primal_dual_init) { | |
| initial_primal = rmm::device_uvector<double>( | |
| original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); | |
| initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(), | |
| original_solution.get_dual_solution().stream()); | |
| initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_; | |
| } | |
| if (primal_weight_init) { | |
| initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_; | |
| } | |
| rmm::device_uvector<f_t> initial_primal(0, stream); | |
| rmm::device_uvector<f_t> initial_dual(0, stream); | |
| f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); | |
| f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); | |
| cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0"); | |
| const int max_batch_size = settings.new_bounds.size(); | |
| int optimal_batch_size = use_optimal_batch_size | |
| ? detail::optimal_batch_size_handler(problem, max_batch_size) | |
| : max_batch_size; | |
| cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size, | |
| "Optimal batch size should be between 1 and max batch size"); | |
| using f_t2 = typename type_2<f_t>::type; | |
| // If need warm start, solve the LP alone | |
| if (primal_dual_init || primal_weight_init) { | |
| pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings; | |
| warm_start_settings.new_bounds.clear(); | |
| warm_start_settings.method = cuopt::linear_programming::method_t::PDLP; | |
| warm_start_settings.presolve = false; | |
| warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3; | |
| warm_start_settings.detect_infeasibility = false; | |
| optimization_problem_solution_t<i_t, f_t> original_solution = | |
| solve_lp(problem, warm_start_settings); | |
| if (primal_dual_init) { | |
| initial_primal = rmm::device_uvector<f_t>( | |
| original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); | |
| initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(), | |
| original_solution.get_dual_solution().stream()); | |
| initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_; | |
| } | |
| if (primal_weight_init) { | |
| initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_; | |
| } |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solve.cu` around lines 750 - 783, The warm-start
buffers/values are typed as double but must match the template precision; change
initial_primal and initial_dual from rmm::device_uvector<double> to
rmm::device_uvector<f_t2> (use the existing using f_t2 = typename
type_2<f_t>::type), and change initial_step_size and initial_primal_weight from
double to f_t2 so they match original_solution.get_pdlp_warm_start_data() and
the set_initial_* APIs (update their NaN initializers to
std::numeric_limits<f_t2>::signaling_NaN()). Ensure the assignments that
construct the device_uvectors from
original_solution.get_primal_solution()/get_dual_solution() remain but now
target the f_t2-typed buffers.
| template <typename i_t, typename f_t> | ||
| void matrix_swap(rmm::device_uvector<f_t>& matrix, | ||
| i_t vector_size, | ||
| const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs) | ||
| { | ||
| if (swap_pairs.empty()) { return; } | ||
|
|
||
| const i_t batch_size = matrix.size() / vector_size; | ||
| cuopt_assert(vector_size > 0, "Vector size must be greater than 0"); | ||
| cuopt_assert(batch_size > 0, "Batch size must be greater than 0"); | ||
|
|
||
| const size_t swap_count = swap_pairs.size(); | ||
| const size_t total_items = swap_count * static_cast<size_t>(vector_size); | ||
|
|
||
| auto counting = thrust::make_counting_iterator<size_t>(0); | ||
| auto left_index = thrust::make_transform_iterator( | ||
| counting, | ||
| matrix_swap_index_functor<i_t>{thrust::raw_pointer_cast(swap_pairs.data()), vector_size, true}); | ||
| auto right_index = thrust::make_transform_iterator( | ||
| counting, | ||
| matrix_swap_index_functor<i_t>{ | ||
| thrust::raw_pointer_cast(swap_pairs.data()), vector_size, false}); | ||
|
|
||
| auto left_perm = thrust::make_permutation_iterator(matrix.data(), left_index); | ||
| auto right_perm = thrust::make_permutation_iterator(matrix.data(), right_index); | ||
| auto in_zip = thrust::make_zip_iterator(left_perm, right_perm); | ||
| auto out_zip = thrust::make_zip_iterator(left_perm, right_perm); | ||
|
|
||
| cub::DeviceTransform::Transform( | ||
| in_zip, | ||
| out_zip, | ||
| total_items, | ||
| [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { | ||
| return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); | ||
| }, | ||
| matrix.stream()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
cub::DeviceTransform in-place transformation same input output iterators documentation
💡 Result:
CUB’s cub::DeviceTransform::Transform does not document whether the output iterator is allowed to alias (be the same as) any input iterator (i.e., in-place transform). The public header/API listing shows the overloads and parameters but includes no “aliasing/overlap” guarantee or prohibition. [2]
If you need a documented guarantee for in-place transformation, thrust::transform explicitly states: “The input and output sequences may coincide, resulting in an in-place transformation.” [1]
Practical takeaway:
- Need guaranteed in-place: use
thrust::transform(documented). [1] - Using
cub::DeviceTransformin-place: it may work, but it’s not specified in the docs, so don’t rely on it without confirmation/testing (or use a separate output buffer). [2]
Sources: [1] [2]
Use thrust::transform instead of cub::DeviceTransform for in-place transformations.
cub::DeviceTransform::Transform does not document support for in-place operations (input and output iterators aliasing the same memory). Replace with thrust::transform, which explicitly documents: "The input and output sequences may coincide, resulting in an in-place transformation." This also aligns with the guideline to prefer standard library utilities (Thrust) over non-standard alternatives.
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 48 - 84,
The current in-place swap in matrix_swap uses cub::DeviceTransform::Transform
with aliased input/output iterators (in_zip and out_zip), which is unsafe;
replace that call with thrust::transform to perform the element-wise swap
in-place (thrust documents input/output may coincide). Locate matrix_swap and
the transform invocation that uses cub::DeviceTransform::Transform, and call
thrust::transform with the same in_zip, a counting end (in_zip + total_items)
and out_zip, passing the existing lambda (or equivalent functor) that swaps the
tuple elements; ensure headers no longer require cub for this operation and keep
the matrix_swap_index_functor and swap_pair_t usage unchanged.
| template <typename i_t, typename f_t> | ||
| void convergence_information_t<i_t, f_t>::resize_context(i_t new_size) | ||
| { | ||
| [[maybe_unused]] const auto batch_size = static_cast<i_t>(primal_objective_.size()); | ||
| cuopt_assert(batch_size > 0, "Batch size must be greater than 0"); | ||
| cuopt_assert(new_size > 0, "New size must be greater than 0"); | ||
| cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow no-op resize when new_size == batch_size.
The assert currently rejects equality despite the message, which can fail callers that attempt a no-op resize.
🔧 Suggested fix
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+ cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/termination_strategy/convergence_information.cu`
around lines 249 - 256, The cuopt_assert in convergence_information_t<i_t,
f_t>::resize_context incorrectly uses new_size < batch_size which forbids
new_size == batch_size despite the error message; change the assertion to allow
equality (use new_size <= batch_size) so callers can perform a no-op resize, and
keep the existing message (or update it to match) while referencing
primal_objective_.size() and cuopt_assert for locating the check.
| // TODO ask Akif and Alice how was this passed down? | ||
| auto hyper_params = settings.hyper_params; | ||
| hyper_params.update_primal_weight_on_initial_solution = false; | ||
| hyper_params.update_step_size_on_initial_solution = true; | ||
| // if the input problem is empty: early exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n -C2 'update_primal_weight_on_initial_solution|update_step_size_on_initial_solution' -g'*.cu' -g'*.cpp' -g'*.cuh' -g'*.hpp'Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# First, let's see the file structure
git ls-files cpp/src/mip/ | head -20Repository: NVIDIA/cuopt
Length of output: 949
🏁 Script executed:
# Read the context around lines 61-65 in solve.cu with more lines for context
sed -n '40,120p' cpp/src/mip/solve.cuRepository: NVIDIA/cuopt
Length of output: 3897
🏁 Script executed:
rg -n 'update_primal_weight_on_initial_solution' --type cpp --type cu --type cuh --type hppRepository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
rg -n 'update_step_size_on_initial_solution' --type cpp --type cu --type cuh --type hppRepository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
# Check the scaling strategy implementation to see if it reads these flags
fd -e cu -e cpp -e cuh -e hpp | xargs rg -l 'pdlp_initial_scaling_strategy' | head -5Repository: NVIDIA/cuopt
Length of output: 220
🏁 Script executed:
# Look for where initial_solutions are used in the solver
rg -n 'initial_solutions' cpp/src/mip/solve.cu -B2 -A2Repository: NVIDIA/cuopt
Length of output: 618
🏁 Script executed:
rg -n 'update_primal_weight_on_initial_solution' -g '*.cpp' -g '*.cu' -g '*.cuh' -g '*.hpp'Repository: NVIDIA/cuopt
Length of output: 3881
🏁 Script executed:
rg -n 'update_step_size_on_initial_solution' -g '*.cpp' -g '*.cu' -g '*.cuh' -g '*.hpp'Repository: NVIDIA/cuopt
Length of output: 3306
🏁 Script executed:
# Search for pdlp_initial_scaling_strategy and read its constructor to see if it uses these flags
rg -n 'pdlp_initial_scaling_strategy' -g '*.cu' -g '*.cpp' -g '*.cuh' -g '*.hpp' | head -10Repository: NVIDIA/cuopt
Length of output: 1224
🏁 Script executed:
# Look at where these flags are consumed in pdlp.cu
sed -n '1200,1210p' cpp/src/linear_programming/pdlp.cuRepository: NVIDIA/cuopt
Length of output: 448
🏁 Script executed:
sed -n '1295,1305p' cpp/src/linear_programming/pdlp.cuRepository: NVIDIA/cuopt
Length of output: 470
🏁 Script executed:
# Check what solver_context.cuh shows about how settings flow
sed -n '25,60p' cpp/src/mip/solver_context.cuhRepository: NVIDIA/cuopt
Length of output: 1243
🏁 Script executed:
# Check how the solver is instantiated in solve.cu after scaling is created
sed -n '110,160p' cpp/src/mip/solve.cuRepository: NVIDIA/cuopt
Length of output: 2393
🏁 Script executed:
# Check how mip_solver_t is instantiated and what it does with settings
sed -n '1,80p' cpp/src/mip/solver.cuRepository: NVIDIA/cuopt
Length of output: 2982
🏁 Script executed:
# Verify how the PDLP solver receives settings from solver context
rg -n 'settings_.*hyper_params' cpp/src/mip/solver.cu cpp/src/mip/solver.cuh | head -20Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check if there's any way the modified hyper_params in solve.cu flows to PDLP
rg -n 'mip_solver_t.*settings' cpp/src/mip/solve.cu -B2 -A2Repository: NVIDIA/cuopt
Length of output: 323
🏁 Script executed:
# Check if pdlp_initial_scaling_strategy_t constructor stores or uses these flags
rg -n 'pdlp_initial_scaling_strategy_t.*hyper_params' cpp/src/linear_programming/ -B2 -A10 | head -60Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Search for the constructor implementation
fd -name '*.cuh' -o -name '*.cu' | xargs rg -l 'class pdlp_initial_scaling_strategy_t' 2>/dev/nullRepository: NVIDIA/cuopt
Length of output: 446
🏁 Script executed:
# Let's check the declaration and see what parameters it stores
rg -n 'class pdlp_initial_scaling_strategy_t|struct pdlp_initial_scaling_strategy_t' -A20Repository: NVIDIA/cuopt
Length of output: 6063
🏁 Script executed:
# Read the full pdlp_initial_scaling_strategy_t class definition
sed -n '23,150p' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuhRepository: NVIDIA/cuopt
Length of output: 4238
🏁 Script executed:
# Check the constructor of pdlp_initial_scaling_strategy_t
rg -n 'pdlp_initial_scaling_strategy_t.*(' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh -A15 | head -80Repository: NVIDIA/cuopt
Length of output: 174
🏁 Script executed:
# Check if pdlp_initial_scaling_strategy_t stores hyper_params
rg -n 'hyper_params' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuhRepository: NVIDIA/cuopt
Length of output: 285
🏁 Script executed:
# Now check if PDLP gets hyper_params from settings or from somewhere else
sed -n '1295,1310p' cpp/src/linear_programming/pdlp.cuRepository: NVIDIA/cuopt
Length of output: 824
🏁 Script executed:
# Let's check how the PDLP solver is instantiated and where it gets its settings
rg -n 'pdlp_solver_t.*settings' cpp/src/mip/ -B2 -A2Repository: NVIDIA/cuopt
Length of output: 471
🏁 Script executed:
# Check if settings are ever modified or passed differently to PDLP in run_solver
sed -n '200,300p' cpp/src/mip/solver.cuRepository: NVIDIA/cuopt
Length of output: 3447
🏁 Script executed:
# Verify if hyper_params from scaling strategy is used anywhere
rg -n 'hyper_params_' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu | head -20Repository: NVIDIA/cuopt
Length of output: 923
🏁 Script executed:
# Final verification: Check how PDLP is instantiated in the MIP solver and what settings it receives
rg -n 'pdlp_solver_t' cpp/src/mip/ -B2 -A2Repository: NVIDIA/cuopt
Length of output: 471
🏁 Script executed:
# Check if there's any path where modified hyper_params could reach PDLP from scaling strategy
rg -n 'get_scaled_op_problem\|scaling\.hyper_params' cpp/src/mip/ -B1 -A1Repository: NVIDIA/cuopt
Length of output: 38
Clarify and fix scope of MIP-specific hyper‑parameter overrides.
Lines 61–65: You clone settings.hyper_params and flip two flags (update_primal_weight_on_initial_solution, update_step_size_on_initial_solution), but this local copy is only passed to the scaling strategy. The original settings object (unmodified) is what gets passed to mip_solver_t and eventually to PDLP. These flags are consumed by PDLP (pdlp.cu:1203, 1298) from settings_.hyper_params, so your local modifications won't propagate. Either thread the modified hyper_params through to the solver, or confirm these flags are MIP-only and only intended for the scaling phase. The TODO comment suggests this scope was already unclear to the team.
🤖 Prompt for AI Agents
In `@cpp/src/mip/solve.cu` around lines 61 - 65, You cloned settings.hyper_params
and changed two flags but never propagated that modified copy into the solver,
so PDLP (consuming settings_.hyper_params in mip_solver_t -> PDLP) still sees
the original values; either (A) update the settings object itself before
constructing mip_solver_t (assign the modified hyper_params back to
settings.hyper_params so mip_solver_t/PDLP use the new flags), or (B) if these
flags are truly only for the scaling strategy, add a comment and ensure only the
scaling code receives the local hyper_params (and do not expect PDLP to observe
them). Locate the code paths around the modified hyper_params, the callsite that
constructs mip_solver_t, and PDLP usage (pdlp.cu references) to implement option
A (assign back) or B (document and keep local), ensuring the flags are
consistently applied where consumed.
📝 WalkthroughWalkthroughThis pull request refactors cuOpt's linear programming solver to transition from static/global hyperparameters to instance-based configurations and introduces batch processing capabilities. Changes include replacing global parameter declarations with a struct-based approach, adding multi-climber strategy support for parallel PDLP solving, threading parameters through solver components, and implementing batch context management utilities. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
cpp/src/linear_programming/utilities/problem_checking.cu (1)
68-81: Do not globally disable initial primal bounds validation.Commenting out the check affects all solver paths, not just batch PDLP warm starts. That can let non-batch warm starts violate bounds, risking solver instability or incorrect outcomes. Please gate this check behind an explicit “allow out-of-bounds initial primal” flag (or batch-PDLP mode) and keep validation for all other cases. As per coding guidelines, initialization of bounds and solver state must be validated before solve.
cpp/src/mip/diversity/diversity_manager.cu (1)
90-101: Incomplete configuration:config_idread from environment but never used.The code reads
CUOPT_CONFIG_IDfrom the environment variable and logs it, butconfig_idis never passed to any configuration logic. Theget_local_search_and_lm_from_configfunction inmulti_armed_bandit.cuhaccepts aconfig_idparameter to configure local search behavior, but inrun_local_search(line 113) it's called withls_mab_optioninstead. Either complete the implementation to apply the configuration, or remove this unused code block.cpp/src/dual_simplex/simplex_solver_settings.hpp (1)
42-96: Initialize mip_batch_pdlp_strong_branching.The new field is not initialized in the constructor, leaving it indeterminate. Default it to 0 to avoid unpredictable strong-branching selection.
🐛 Proposed fix
num_threads(omp_get_max_threads() - 1), num_bfs_workers(std::max(num_threads / 4, 1)), + mip_batch_pdlp_strong_branching(0), random_seed(0), inside_mip(0),Also applies to: 158-159
cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)
64-95: Add CUDA error checking to kernel launch.The call site in
pdlp_restart_strategy.cuexplicitly asserts thatweight.size() == 1, so the weight vector is constrained to a single element by design—batch-mode indexing is not a concern. However, theadd_weight_sumskernel launch on line 92 is missing error checking. Per the coding guidelines, all CUDA kernel launches must have error checking withCUDA_CHECKor equivalent (RAFT_CUDA_TRY).Suggested fix
- add_weight_sums<<<1, 1, 0, stream_view_>>>(weight.data(), - weight.data(), - sum_primal_solution_weights_.data(), - sum_dual_solution_weights_.data()); + RAFT_CUDA_TRY(add_weight_sums<<<1, 1, 0, stream_view_>>>(weight.data(), + weight.data(), + sum_primal_solution_weights_.data(), + sum_dual_solution_weights_.data()));cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
86-101: Public struct layout change can break ABI for binary clients.Adding
mip_batch_pdlp_strong_branching(Line 86) andhyper_params(Line 101) changes the size/offsets ofmip_solver_settings_t. If ABI stability is expected, consider a versioned struct or PIMPL/accessors, or bump the ABI/SONAME and document the change. As per coding guidelines, public API changes should be handled explicitly for compatibility.cpp/src/linear_programming/saddle_point.cu (1)
95-106: Copy should cover all batches.The buffers are now batched, but
copy()only copiesprimal_size_/dual_size_. That leaves other batches stale.🔧 Proposed fix
- raft::copy( - this->primal_solution_.data(), other.get_primal_solution().data(), this->primal_size_, stream); - raft::copy( - this->dual_solution_.data(), other.get_dual_solution().data(), this->dual_size_, stream); + EXE_CUOPT_EXPECTS(this->primal_solution_.size() == other.get_primal_solution().size(), + "Primal solution sizes must match for batched copy"); + EXE_CUOPT_EXPECTS(this->dual_solution_.size() == other.get_dual_solution().size(), + "Dual solution sizes must match for batched copy"); + raft::copy(this->primal_solution_.data(), + other.get_primal_solution().data(), + this->primal_solution_.size(), + stream); + raft::copy(this->dual_solution_.data(), + other.get_dual_solution().data(), + this->dual_solution_.size(), + stream);cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)
120-184: Public TODOs in the API surface.The new setters/getters are still marked “TODO batch mode: tmp.” Please finalize semantics or track via an issue before release. Happy to help.
Also applies to: 249-254
cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu (1)
439-543: Add error checking aftercub::DeviceTransform::Transformcalls using the post-call pattern.The
cub::DeviceTransform::Transformlaunches lack error checking. SinceTransformreturnsvoid(unlikeDeviceReduceoperations that returncudaError_t), error checking must be done viaRAFT_CUDA_TRY(cudaPeekAtLastError())immediately after each call, following the same pattern used for kernel launches in this file.🐛 Correct pattern
cub::DeviceTransform::Transform( cuda::std::make_tuple(op_problem_scaled_.objective_coefficients.data(), problem_wrap_container(cummulative_variable_scaling_)), op_problem_scaled_.objective_coefficients.data(), op_problem_scaled_.objective_coefficients.size(), cuda::std::multiplies<f_t>{}, stream_view_); + RAFT_CUDA_TRY(cudaPeekAtLastError());Per CUDA safety guidelines, all device operations must have error checking. This applies to all Transform calls at lines 439–543, 577–648, and 697–768.
cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu (1)
438-449: Add CUDA error checking after kernel launch.The
kernel_compute_kkt_scorekernel launch is missing error checking. Per coding guidelines, all CUDA kernel launches must be followed by error checking. The codebase consistently usesRAFT_CUDA_TRY(cudaPeekAtLastError())for this purpose—see other similar launches in the same file for the standard pattern.🔧 Proposed fix
kernel_compute_kkt_score<f_t><<<1, 1, 0, stream_view_>>>(l2_primal_residual.data(), l2_dual_residual.data(), gap.data(), primal_weight.data(), tmp_kkt_score_.data()); + RAFT_CUDA_TRY(cudaPeekAtLastError()); return tmp_kkt_score_.value(stream_view_);
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Around line 162-210: The batch PDLP path incorrectly uses
uncrush_primal_solution and original_root_soln_x while fractional[] is in
LP-space; remove the call to uncrush_primal_solution and build fraction_values
from the LP-space root_soln (use root_soln[j] for j = fractional[k]) so the
indices align with fractional, and pass those LP-space values into
batch_pdlp_solve (keep batch_pdlp_solve usage unchanged); update any uses of
original_root_soln_x later in this block to use root_soln instead.
In `@cpp/src/linear_programming/cusparse_view.cu`:
- Around line 122-129: The move-assignment for
cusparse_dn_mat_descr_wrapper_t<f_t>::operator= fails to transfer ownership
correctly: after assigning descr_ from other.descr_ the need_destruction_ flag
is not updated, risking leaks or double-destruction; update this operator to set
this->need_destruction_ = other.need_destruction_ (then set
other.need_destruction_ = false) so ownership of the descriptor is transferred
atomically (and optionally handle self-assignment if desired).
- Around line 238-264: The my_cusparsespmm_preprocess function must mirror the
SpMV dynamic-loading pattern: use dynamic_load_runtime::function to load the
symbol "cusparseSpMM_preprocess" via dlsym, check the returned wrapper's
has_value() before calling it, and if the symbol is missing gracefully skip the
preprocess step (return CUSPARSE_STATUS_SUCCESS or another agreed error code)
instead of calling the symbol directly; retain the existing float_type
computation, CUSPARSE_CHECK(cusparseSetStream(...)), and call the loaded
function through the wrapper when available.
In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`:
- Around line 186-209: Add a precondition guard in optimal_batch_size_handler to
prevent non-positive max_batch_size from reaching the log2/pow2 computation: if
max_batch_size <= 1 return 1 (same behavior as the existing ==1 early return) or
clamp the value before computing current_batch_size by computing int cap =
std::max(1, std::min(initial_batch_size, max_batch_size)); then use cap in the
std::log2/std::pow expression to ensure current_batch_size is never 0 and avoid
divide-by-zero in evaluate_node.
- Around line 17-140: The constructor SpMM_benchmarks_context_t creates local
variables buffer_transpose_batch, buffer_non_transpose_batch, x_descr and
y_descr that shadow class members, causing the captured CUDA graph to reference
destroyed objects (UAF). Fix by removing the local shadowing: initialize and use
the member descriptors and member rmm::device_buffer fields (the class members
named buffer_transpose_batch, buffer_non_transpose_batch, x_descr, y_descr)
instead of creating local ones, allocate the member buffers with buffer_size_*
before preprocessing and use member_buffer.data() and the member descriptors in
my_cusparsespmm_preprocess and cusparsespmm calls so the objects remain alive
for the graph capture and launch.
In `@cpp/src/linear_programming/pdhg.cu`:
- Around line 756-773: The new kernel launches (e.g., inside
refine_initial_primal_projection calling refine_initial_primal_projection_kernel
with stream_view_) currently omit CUDA error checks; add an explicit post-launch
check (cudaPeekAtLastError() or cudaGetLastError wrapped in your project’s
CUDA_CHECK/RAFT_CUDA_TRY macro) immediately after the <<<>>> launch for
non-graph paths, and for graph-captured execution route use the appropriate
graph-launch error check path, so that any kernel launch failure is
logged/propagated consistently; apply the same fix to the other refine_* launch
sites mentioned (the launches around the other refine_initial... blocks).
- Around line 97-109: The code copies new_bounds into device buffers without
validating size or indices, risking OOB in the refine kernels; in the
constructor validate that new_bounds.size() equals the expected batch size,
iterate new_bounds to ensure each tuple's index (std::get<0>) is within [0,
num_variables) and that lower <= upper (and finite) for std::get<1>/std::get<2>,
and fail-fast (throw or set error) before calling raft::copy into
new_bounds_idx_, new_bounds_lower_, new_bounds_upper_; perform these checks
using the same types (i_t, f_t) and the members
new_bounds_idx_/new_bounds_lower_/new_bounds_upper_ so any mismatch is caught
prior to launching kernels.
In `@cpp/src/linear_programming/pdlp_constants.hpp`:
- Around line 37-39: The two variables deterministic_batch_pdlp and
enable_batch_resizing are declared as static non-const globals in a header,
which creates per-translation-unit copies; change their definition to a single
shared symbol: either (A) move the definitions into a single .cpp and expose
extern bool deterministic_batch_pdlp; extern bool enable_batch_resizing; in the
header, or (B) if they are truly compile-time constants, replace them with
inline constexpr bool deterministic_batch_pdlp = true; inline constexpr bool
enable_batch_resizing = true; — pick the approach that matches intended
mutability and update any code that sets/reads these flags accordingly (or pass
them via a settings struct/singleton used by the solver).
- Around line 17-22: kernel_config_from_batch_size currently allows
block_size==0 when batch_size==0 (causing cuda::ceil_div(0,0)) and shadows the
namespace-scope block_size constant; fix by ensuring the local block size is at
least 1 (e.g., size_t local_block_size = std::min(static_cast<size_t>(256),
std::max(static_cast<size_t>(1), batch_size))) so ceil_div never divides by
zero, compute grid_size = cuda::ceil_div(batch_size, local_block_size) and
return {grid_size, local_block_size}, and rename the local variable (e.g.,
local_block_size) to avoid shadowing the namespace-scope block_size constant
used elsewhere.
In `@cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu`:
- Around line 782-784: The copy is reversed: it currently writes stale
last_trial_fixed_point_error_ into fixed_point_error_ so the history never
advances; change the copy to update last_trial_fixed_point_error_ from the
current fixed_point_error_ by copying from fixed_point_error_.begin()/end() into
last_trial_fixed_point_error_.begin(), ensuring the history array
(last_trial_fixed_point_error_) receives the newest values.
In `@cpp/src/linear_programming/solve.cu`:
- Around line 708-716: The code constructs an empty std::vector named info of
type optimization_problem_solution_t<i_t,
f_t>::additional_termination_information_t and then accesses info[0], causing
UB; fix by initializing or appending an element before writing fields—e.g.,
resize info to size 1 (info.resize(1)) or construct a local
additional_termination_information_t, fill its members (primal_objective,
number_of_steps_taken, solve_time using vertex_solution and start_solver), and
push_back/emplace_back it into info so info[0] is valid when accessed.
- Around line 750-783: The warm-start batch path declares initial_primal and
initial_dual as rmm::device_uvector<double>, which mismatches templated f_t
instantiations; change both to rmm::device_uvector<f_t> (or the appropriate
typedef alias used for the solver element type) and construct them from
original_solution.get_primal_solution()/get_dual_solution() so the
device_uvector element type matches f_t; keep initial_step_size and
initial_primal_weight as f_t and preserve the use of the original solution
streams when constructing the device_uvectors.
- Around line 556-563: The row-sense to bound mapping is inverted: in the loop
over i (using user_problem.row_sense) assign constraint_lower[i] =
user_problem.rhs[i] and constraint_upper[i] =
std::numeric_limits<f_t>::infinity() when row_sense == 'G' (>=), and assign
constraint_lower[i] = -std::numeric_limits<f_t>::infinity() and
constraint_upper[i] = user_problem.rhs[i] when row_sense == 'L' (<=); update the
block around the for-loop that sets constraint_lower/constraint_upper so 'G'
maps to a lower bound and 'L' maps to an upper bound (use the existing symbols
user_problem, constraint_lower, constraint_upper, f_t, and row_sense).
In
`@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu`:
- Around line 537-545: The kernel launch in
adaptive_step_size_strategy_t::get_primal_and_dual_stepsizes can OOB if
primal_step_size or dual_step_size are smaller than climber_strategies_.size();
add explicit size validation before launching compute_actual_stepsizes: compare
primal_step_size.size() and dual_step_size.size() against
climber_strategies_.size() and either resize the rmm::device_uvector buffers or
return/throw a clear error (with context) if they are too small, so the kernel
only writes within bounds using the existing stream_view_ and
kernel_config_from_batch_size logic.
- Around line 71-107: The three cub::DeviceSegmentedReduce::Sum calls in the
batch-mode block (the ones using tuple_multiplies<f_t>, power_two_func_t<f_t> on
norm_squared_delta_primal_ and norm_squared_delta_dual_) must be wrapped with
RAFT_CUDA_TRY so their returned cudaError_t is checked (this includes the
nullptr sizing calls); replace direct calls with
RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(...)) for each invocation and
likewise wrap the other three CUB Sum calls elsewhere (the other location noted
in the review) so failures propagate; keep existing local variables
(dot_product_bytes, dot_product_storage.resize, stream_view_, primal_size_,
dual_size_, climber_strategies_.size()) unchanged.
In `@cpp/src/linear_programming/swap_and_resize_helper.cuh`:
- Around line 49-66: In matrix_swap, avoid dividing matrix.size() by vector_size
before validating vector_size and ensure the matrix length is an exact multiple
of vector_size; first assert or return if vector_size == 0 (check vector_size >
0 before computing batch_size), compute batch_size only after that check (use
matrix.size() / vector_size into batch_size), and add a guard that matrix.size()
% vector_size == 0 (or log/throw) to prevent mismatched sizing; update the
function matrix_swap and any use of batch_size to rely on these validated
values.
- Around line 76-83: The CUB transform call using
cub::DeviceTransform::Transform (with in_zip, out_zip, total_items, lambda, and
matrix.stream()) is missing CUDA error checking; wrap the call with
RAFT_CUDA_TRY so the returned cudaError_t is checked and propagated (e.g.,
RAFT_CUDA_TRY(cub::DeviceTransform::Transform(...))); ensure you reference the
same parameters (in_zip, out_zip, total_items, matrix.stream()) and keep the
lambda as-is while adding the RAFT_CUDA_TRY wrapper to satisfy the coding
guideline requiring checks on all CUDA operations.
🟡 Minor comments (16)
cpp/src/mip/diversity/recombiners/sub_mip.cuh-72-74 (1)
72-74: Resolve or track the TODO before merge.The TODO signals uncertainty in this recombination path. Please confirm the intended behavior and either remove the TODO with a clarified comment or open a tracked issue.
cpp/src/mip/solve.cu-61-64 (1)
61-64: TODO comment should be resolved before merge.The TODO comment asks about how hyper-parameters were previously passed down. This should be resolved or removed before merging. The current implementation correctly copies
settings.hyper_paramsand applies MIP-specific overrides.Suggested action
Either:
- Remove the TODO after confirming the implementation is correct with Akif and Alice
- Add a brief comment explaining the MIP-specific overrides:
- // TODO ask Akif and Alice how was this passed down? - auto hyper_params = settings.hyper_params; + // Copy hyper_params from settings and apply MIP-specific overrides + auto hyper_params = settings.hyper_params; hyper_params.update_primal_weight_on_initial_solution = false; hyper_params.update_step_size_on_initial_solution = true;cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh-28-31 (1)
28-31: Default constructor leavesstream_view_in an invalid state.The default constructor on line 31 leaves
stream_view_uninitialized. If any method is called on an object constructed with the default constructor, it will use an invalid stream, leading to undefined behavior. Consider either:
- Removing the default constructor if "non-batch mode" should not use this class
- Initializing
stream_view_to a safe default (e.g.,rmm::cuda_stream_view{}for the default stream)Suggested fix
// Empty constructor for when used in non batch mode - segmented_sum_handler_t() {} + segmented_sum_handler_t() : stream_view_(rmm::cuda_stream_default) {}cpp/include/cuopt/linear_programming/constants.h-55-55 (1)
55-55: Document the new public parameter string.Line 55 adds a new public parameter (
CUOPT_MIP_BATCH_PDLP_STRONG_BRANCHING); please update user-facing docs/parameter tables and any CLI/config references accordingly. As per coding guidelines, ensure public API changes are documented.benchmarks/linear_programming/cuopt/run_pdlp.cu-49-52 (1)
49-52: CLI help text is out of sync with the allowed modes.Line 50 still lists only Stable3/Methodical1/Fast1 even though Stable1/Stable2 are now accepted in Line 52.
✅ Suggested update
- program.add_argument("--pdlp-solver-mode") - .help("Solver mode for PDLP. Possible values: Stable3 (default), Methodical1, Fast1") + program.add_argument("--pdlp-solver-mode") + .help("Solver mode for PDLP. Possible values: Stable3 (default), Stable2, Stable1, Methodical1, Fast1")cpp/src/linear_programming/saddle_point.hpp-64-67 (1)
64-67: Document the new batch_size parameter and context APIs.The constructor doc omits
batch_size, and the newswap_context/resize_contextmethods lack brief Doxygen notes. A small doc update would prevent ambiguity.📝 Documentation touch-up
* `@param` primal_size The size of the primal problem * `@param` dual_size The size of the dual problem + * `@param` batch_size Number of batch contexts to allocate * * `@throws` cuopt::logic_error if the problem sizes are not larger than 0. */ saddle_point_state_t(raft::handle_t const* handle_ptr, i_t primal_size, i_t dual_size, size_t batch_size); + /** + * `@brief` Swap internal buffers between batch contexts. + * `@param` swap_pairs Pairs of context indices to swap. + */ void swap_context(const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs); + /** + * `@brief` Resize internal buffers to a new batch size. + * `@param` new_size New batch size. + */ void resize_context(i_t new_size);Also applies to: 94-95
cpp/include/cuopt/linear_programming/solve.hpp-77-104 (1)
77-104: Tighten public docs (typos + output clarity).The new Doxygen block contains typos (“accross”, “thie”), and the output ordering for the returned vector isn’t described. Also consider a brief note on thread-safety / stream expectations for this public API.
✏️ Doc typo fixes
- * The only difference accross climbers will be one variable bound change. + * The only difference across climbers is a single variable bound change. @@ - * Let the optimal objective value of thie problem be obj_down + * Let the optimal objective value of this problem be obj_downAs per coding guidelines, public headers should document API behavior, including thread-safety and parameter/return expectations.
Also applies to: 112-117
cpp/src/linear_programming/solver_settings.cu-61-71 (1)
61-71: Validate optional initial step/primal weight inputs.Without guarding, negative or zero values can silently flow into the solver. Consider rejecting non-positive values on write.
✅ Add basic input validation
void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size) { + cuopt_expects(initial_step_size > 0, + error_type_t::ValidationError, + "initial_step_size must be > 0"); initial_step_size_ = std::make_optional(initial_step_size); } void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight) { + cuopt_expects(initial_primal_weight > 0, + error_type_t::ValidationError, + "initial_primal_weight must be > 0"); initial_primal_weight_ = std::make_optional(initial_primal_weight); }cpp/src/math_optimization/solver_settings.cu-74-75 (1)
74-75: Infeasibility tolerance defaults may be too strict for float builds.Line 74–75 set defaults to
1e-10, which is below float precision and can make infeasibility detection overly strict whenf_t=float. Consider making the default precision-aware (or documenting that these defaults assume double).🛠️ Precision-aware default suggestion
+ constexpr f_t infeas_tol_default = + std::is_same_v<f_t, float> ? static_cast<f_t>(1e-6) : static_cast<f_t>(1e-10); float_parameters = { ... - {CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.primal_infeasible_tolerance, 0.0, 1e-1, 1e-10}, - {CUOPT_DUAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.dual_infeasible_tolerance, 0.0, 1e-1, 1e-10} + {CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.primal_infeasible_tolerance, 0.0, 1e-1, infeas_tol_default}, + {CUOPT_DUAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.dual_infeasible_tolerance, 0.0, 1e-1, infeas_tol_default} };Based on learnings, guard against overly strict tolerances on edge cases.
cpp/src/linear_programming/saddle_point.cu-77-92 (1)
77-92: Allow no‑op resize or fix the assertion message.The assertion uses
<but the message says “less than or equal.” If no‑op resize is allowed, switch to<=; otherwise update the message.🔧 Proposed fix (if no‑op resize is OK)
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");cpp/src/linear_programming/swap_and_resize_helper.cuh-86-95 (1)
86-95: Validate non‑negative swap indices.Negative indices can pass the
< size()checks and still index OOB.🔧 Proposed fix
- cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds"); - cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds"); + cuopt_assert(left_swap_index >= 0 && right_swap_index >= 0, "Swap indices must be non-negative"); + cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds"); + cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds");cpp/src/linear_programming/saddle_point.cu-20-34 (1)
20-34: Validatebatch_sizein the constructor.Downstream code assumes a positive batch size; fail fast if it’s zero.
🔧 Proposed fix
EXE_CUOPT_EXPECTS(primal_size > 0, "Size of the primal problem must be larger than 0"); EXE_CUOPT_EXPECTS(dual_size > 0, "Size of the dual problem must be larger than 0"); + EXE_CUOPT_EXPECTS(batch_size > 0, "Batch size must be larger than 0");cpp/src/linear_programming/restart_strategy/localized_duality_gap_container.cu-110-115 (1)
110-115: Assertion rejects resizing to current size.The message says “less than or equal,” but the check uses
<. Allow equality to avoid spurious asserts when the size is unchanged.🐛 Suggested fix
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu-154-161 (1)
154-161: Allow resize to the same size (<=)The assert currently rejects
new_size == batch_size, even though the message says “less than or equal.” Allowing equality avoids unnecessary failures on no‑op resizes.🐛 Fix inequality
- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");cpp/src/linear_programming/termination_strategy/infeasibility_information.hpp-30-42 (1)
30-42: Ensure referenced scaled objects outlive this instanceThe class now stores references to
op_problem_scaled_,scaled_cusparse_view_,scaling_strategy_,climber_strategies_, andhyper_params_. Please document the lifetime expectations (or take ownership) to avoid dangling references.Also applies to: 95-137
cpp/src/linear_programming/utils.cuh-236-264 (1)
236-264: Handle zero‑constraint batches before moduloIn batch mode,
combined_bounds.size() % op_problem.n_constraintsis undefined whenn_constraints == 0. Please short‑circuit to avoid modulo‑by‑zero on empty problems.🐛 Guard for empty constraint sets
} else { + if (op_problem.n_constraints == 0) { + combined_bounds.resize(0, op_problem.handle_ptr->get_stream()); + return; + } // In batch mode we use combined_constraint_bounds in convergeance_information to fill the // primal residual which will be bigger
🧹 Nitpick comments (16)
cpp/src/utilities/copy_helpers.hpp (1)
325-329: Consider adding a const overload for consistency.The existing
make_spanforrmm::device_uvectorprovides both mutable and const overloads (lines 319-335). This new overload only provides the mutable version, which may limit its use in const contexts.♻️ Suggested const overload
template <typename T> raft::device_span<const T> make_span(thrust::universal_host_pinned_vector<T> const& container) { return raft::device_span<const T>(thrust::raw_pointer_cast(container.data()), container.size()); }cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp (1)
70-71: Minor style inconsistency between copy constructor and move assignment operator.The copy constructor on line 70 uses the fully qualified template type
pdlp_warm_start_data_t<i_t, f_t>, while the move assignment operator on line 71 uses the injected class namepdlp_warm_start_data_t. Both are valid C++, but for consistency, consider using the same style for both declarations.Suggested consistency fix
Either simplify the copy constructor to match:
- pdlp_warm_start_data_t(const pdlp_warm_start_data_t<i_t, f_t>& other); + pdlp_warm_start_data_t(const pdlp_warm_start_data_t& other);Or revert the move assignment to match the existing style (though the current change is fine).
cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh (2)
39-51: Inconsistent stream parameter passing to CUB APIs.In
segmented_sum_helper,stream_view_is passed directly (relying on implicit conversion), while insegmented_reduce_helper,stream_view_.value()is used explicitly. For consistency and clarity, use.value()in both methods since CUB APIs expectcudaStream_t.Suggested fix for consistency
cub::DeviceSegmentedReduce::Sum( - nullptr, byte_needed_, input, output, batch_size, problem_size, stream_view_); + nullptr, byte_needed_, input, output, batch_size, problem_size, stream_view_.value()); - segmented_sum_storage_.resize(byte_needed_, stream_view_); + segmented_sum_storage_.resize(byte_needed_, stream_view_.value()); cub::DeviceSegmentedReduce::Sum(segmented_sum_storage_.data(), byte_needed_, input, output, batch_size, problem_size, - stream_view_); + stream_view_.value());
84-86: Consider making member variables private.The member variables
byte_needed_,segmented_sum_storage_, andstream_view_are public but appear to be implementation details. Additionally,byte_needed_is uninitialized which could cause issues ifresizeis called with an uninitialized size.Suggested encapsulation
+private: size_t byte_needed_; rmm::device_buffer segmented_sum_storage_; rmm::cuda_stream_view stream_view_;Or initialize
byte_needed_to 0:- size_t byte_needed_; + size_t byte_needed_{0};cpp/src/linear_programming/pdlp_climber_strategy.hpp (1)
1-21: Consider namespacing the new strategy struct to avoid global pollution.Since this type is part of the cuOpt LP stack, wrapping it in
cuopt::linear_programmingkeeps the API surface consistent and avoids global namespace collisions.♻️ Suggested refactor
-struct pdlp_climber_strategy_t { - int original_index; -}; +namespace cuopt::linear_programming { +struct pdlp_climber_strategy_t { + int original_index; +}; +} // namespace cuopt::linear_programmingcpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
100-101: Resolve TODO before merge or convert it to a tracked issue.Line 100-101 indicates unresolved design work; please replace with an issue reference or remove once resolved.
If you want, I can draft the issue text or propose the resolution path.
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)
199-200: Document tighter infeasibility tolerances.Defaults changed to 1e‑10; update API docs/release notes so users know about the new numerical expectations. As per coding guidelines.
cpp/tests/linear_programming/pdlp_test.cu (1)
1011-1762: Prefer tolerant comparisons for floating batch outputs.Many assertions use
EXPECT_EQon floating objectives/residuals/solution values across batch entries. GPU reductions can be non‑bitwise deterministic across runs, so these can flake. ConsiderEXPECT_NEARwithtolerance(or guard with a deterministic-mode flag) for these comparisons.♻️ Example adjustment (representative)
- EXPECT_EQ(ref_primal, solution.get_additional_termination_information(i).primal_objective); + EXPECT_NEAR(ref_primal, + solution.get_additional_termination_information(i).primal_objective, + tolerance);cpp/src/linear_programming/utils.cuh (2)
190-229: Use size_t/i_t for wrapped iterator indices to avoid overflow
inttruncatesin.size()for large batches; usesize_t(or template oni_t) and build the counting iterator with the same type to avoid overflow.♻️ Safer index types
- batch_wrapped_iterator(const f_t* problem_input, int problem_size) + batch_wrapped_iterator(const f_t* problem_input, size_t problem_size) : problem_input_(problem_input), problem_size_(problem_size) { } - HDI f_t operator()(int id) { return problem_input_[id / problem_size_]; } + HDI f_t operator()(size_t id) { return problem_input_[id / problem_size_]; } const f_t* problem_input_; - int problem_size_; + size_t problem_size_;- problem_wrapped_iterator(const f_t* problem_input, int problem_size) + problem_wrapped_iterator(const f_t* problem_input, size_t problem_size) : problem_input_(problem_input), problem_size_(problem_size) { } - HDI f_t operator()(int id) { return problem_input_[id % problem_size_]; } + HDI f_t operator()(size_t id) { return problem_input_[id % problem_size_]; } const f_t* problem_input_; - int problem_size_; + size_t problem_size_;- return thrust::make_transform_iterator(thrust::make_counting_iterator(0), + return thrust::make_transform_iterator(thrust::make_counting_iterator<size_t>(0), problem_wrapped_iterator<f_t>(in.data(), in.size()));
543-565: Enforce size‑1 output for cublas nrm2 overload
cublasNrm2produces a single scalar. The device_uvector overload should guardresult.size() == 1(or use device_scalar) to prevent silent misuse.✅ Size guard
void inline my_l2_norm(const rmm::device_uvector<f_t>& input_vector, rmm::device_uvector<f_t>& result, raft::handle_t const* handle_ptr) { + cuopt_assert(result.size() == 1, "my_l2_norm expects a single-value output buffer"); my_l2_norm<i_t, f_t>(input_vector.data(), result.data(), input_vector.size(), handle_ptr); }cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh (2)
12-56: Add Doxygen docs for new public hyper‑params structThis is a new public type under
cpp/include; please add a brief Doxygen comment for the struct (and ideally clarify units/meaning of key fields) for API users. As per coding guidelines, public headers should document new public entities.
58-59: Track the TODO with an issue so it doesn’t get lostIf you want, I can help open an issue to track the planned removal of
pdlp_solver_mode.cpp/src/linear_programming/termination_strategy/convergence_information.hpp (1)
146-147: Consider makingbatch_mode_non-const for potential future flexibility.The
batch_mode_flag is initialized as aconst boolwith a default value offalse. While this works for the current design, if there's ever a need to toggle batch mode after construction (e.g., for problem recycling), this would require refactoring.If batch mode is truly immutable for the lifetime of the object, the current design is fine. However, consider whether the initialization value should come from a constructor parameter rather than being hardcoded to
false.cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cuh (3)
303-305: Publicbatch_mode_flag is exposed but may warrant encapsulation.The
batch_mode_member is declared in apublic:section. While this provides convenient access, consider whether a getter method would be more appropriate for encapsulation, especially since the member isconstand cannot be modified externally anyway.
366-368: Storage for batched dot product operations added.The
dot_product_storagebuffer anddot_product_bytesfollow the pattern used inconvergence_information.hppfor temporary workspace needed by batch operations.Minor style note: consider using trailing underscore (
dot_product_storage_,dot_product_bytes_) for consistency with other private members in the class.♻️ Suggested naming consistency fix
- rmm::device_buffer dot_product_storage; - size_t dot_product_bytes{0}; + rmm::device_buffer dot_product_storage_; + size_t dot_product_bytes_{0};
392-412: Template parameters should either be removed or documented for consistency.The functions
is_trust_region_restart,is_kkt_restart, andis_cupdlpx_restartmaintain clean, type-safe implementations withconstreference parameters. However, the template parametersi_tandf_tare not actually used in the function bodies—they only accesshyper_params.restart_strategyand cast toint.While the current design maintains consistency with template propagation patterns across the codebase (all callers pass explicit template arguments), consider whether these helpers could be simplified to non-template functions or, alternatively, add a comment explaining why the template parameters are required for consistency with the calling context.
| if (settings.mip_batch_pdlp_strong_branching) { | ||
| settings.log.printf("Batch PDLP strong branching enabled\n"); | ||
|
|
||
| std::chrono::steady_clock::time_point start_batch = std::chrono::steady_clock::now(); | ||
|
|
||
| // Use original_problem to create the BatchLP problem | ||
| csr_matrix_t<i_t, f_t> A_row(original_problem.A.m, original_problem.A.n, 0); | ||
| original_problem.A.to_compressed_row(A_row); | ||
|
|
||
| // Convert the root_soln to the original problem space | ||
| std::vector<f_t> original_root_soln_x; | ||
| uncrush_primal_solution(original_problem, original_lp, root_soln, original_root_soln_x); | ||
|
|
||
| std::vector<f_t> fraction_values; | ||
|
|
||
| for (i_t k = 0; k < fractional.size(); k++) { | ||
| const i_t j = fractional[k]; | ||
| fraction_values.push_back(original_root_soln_x[j]); | ||
| } | ||
|
|
||
| std::vector<f_t> primal_solutions = | ||
| batch_pdlp_solve(original_problem, fractional, fraction_values); | ||
| std::chrono::steady_clock::time_point end_batch = std::chrono::steady_clock::now(); | ||
| std::chrono::duration<f_t> duration = end_batch - start_batch; | ||
| settings.log.printf("Batch PDLP strong branching took %.2f seconds\n", duration.count()); | ||
|
|
||
| for (i_t k = 0; k < fractional.size(); k++) { | ||
| const i_t j = fractional[k]; | ||
|
|
||
| // Call BatchLP solver. Solve 2*fractional.size() subproblems. | ||
| // Let j = fractional[k]. We want to solve the two trial branching problems | ||
| // Branch down: | ||
| // minimize c^T x | ||
| // subject to lb <= A*x <= ub | ||
| // x_j <= floor(root_soln[j]) | ||
| // l <= x < u | ||
| // Let the optimal objective value of thie problem be obj_down | ||
| f_t obj_down = primal_solutions[k]; | ||
|
|
||
| // Branch up: | ||
| // minimize c^T x | ||
| // subject to lb <= A*x <= ub | ||
| // x_j >= ceil(root_soln[j]) | ||
| // Let the optimal objective value of thie problem be obj_up | ||
| f_t obj_up = primal_solutions[k + fractional.size()]; | ||
|
|
||
| pc.strong_branch_down[k] = obj_down - root_obj; | ||
| pc.strong_branch_up[k] = obj_up - root_obj; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Locate and examine batch_pdlp_solve implementation
fd -t f "\.cpp|\.hpp|\.h" | xargs rg -n "batch_pdlp_solve" -A 30 | head -100Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Find the definition of batch_pdlp_solve function
rg -n "^[a-zA-Z_<>\*\s:]+batch_pdlp_solve" --type cpp --type hpp --type h -A 20Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Examine uncrush_primal_solution to understand the contract
rg -n "uncrush_primal_solution" --type cpp --type hpp --type h -B 2 -A 15Repository: NVIDIA/cuopt
Length of output: 83
🏁 Script executed:
# Check where fractional indices are populated - understand their source space
rg -n "fractional\[" cpp/src/dual_simplex/pseudo_costs.cpp -B 5 -A 2Repository: NVIDIA/cuopt
Length of output: 2730
🏁 Script executed:
# Find root_obj definition and its context
rg -n "root_obj" cpp/src/dual_simplex/pseudo_costs.cpp -B 3 -A 1Repository: NVIDIA/cuopt
Length of output: 1915
🏁 Script executed:
# Search for batch_pdlp_solve definition more broadly
rg -n "batch_pdlp_solve" --type c --type cpp -B 2 -A 20 | head -150Repository: NVIDIA/cuopt
Length of output: 5374
🏁 Script executed:
# Look for where fractional vector is populated/created to understand its context
rg -n "fractional\s*=" cpp/src/dual_simplex/ --type cpp -B 3 -A 1 | head -80Repository: NVIDIA/cuopt
Length of output: 2968
🏁 Script executed:
# Find uncrush_primal_solution definition
rg -n "uncrush_primal_solution" --type cpp --type c -B 2 -A 15 | head -100Repository: NVIDIA/cuopt
Length of output: 8526
🏁 Script executed:
# Check function signature of the function containing batch PDLP code
sed -n '140,180p' cpp/src/dual_simplex/pseudo_costs.cppRepository: NVIDIA/cuopt
Length of output: 1888
🏁 Script executed:
# Look at the beginning of the file to understand includes and context
head -50 cpp/src/dual_simplex/pseudo_costs.cppRepository: NVIDIA/cuopt
Length of output: 1840
🏁 Script executed:
# Find fractional_variables function definition
rg -n "fractional_variables" --type cpp -B 2 -A 20 | head -150Repository: NVIDIA/cuopt
Length of output: 8282
🏁 Script executed:
# Check the problem transformation context - what is original_lp vs original_problem
rg -n "struct user_problem_t|struct lp_problem_t" --type cpp --type h -A 8 | head -100Repository: NVIDIA/cuopt
Length of output: 1224
🏁 Script executed:
# Trace where root_soln comes from - is it in LP space or original space?
rg -n "root_soln\|root_relax_soln_" cpp/src/dual_simplex/branch_and_bound.cpp -B 2 -A 2 | head -100Repository: NVIDIA/cuopt
Length of output: 38
Fix index space mismatch in batch PDLP strong branching.
The fractional indices are derived from the LP solution space (via fractional_variables operating on root_relax_soln_.x), but the batch PDLP path converts root_soln to the original problem space and then incorrectly indexes it with LP-space indices (lines 178–179). This fails when the LP has more variables than the original problem (typical with presolve slack variables), causing wrong variable values to be passed to batch_pdlp_solve.
The sequential branching path at line 42 correctly uses fractional[k] directly with original_lp, confirming both are in LP space. The batch path should either:
- Keep
root_solnin LP space and not calluncrush_primal_solution, or - Map the
fractionalindices to the original problem space when accessingoriginal_root_soln_x.
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/pseudo_costs.cpp` around lines 162 - 210, The batch PDLP
path incorrectly uses uncrush_primal_solution and original_root_soln_x while
fractional[] is in LP-space; remove the call to uncrush_primal_solution and
build fraction_values from the LP-space root_soln (use root_soln[j] for j =
fractional[k]) so the indices align with fractional, and pass those LP-space
values into batch_pdlp_solve (keep batch_pdlp_solve usage unchanged); update any
uses of original_root_soln_x later in this block to use root_soln instead.
| template <typename f_t> | ||
| cusparse_dn_mat_descr_wrapper_t<f_t>& cusparse_dn_mat_descr_wrapper_t<f_t>::operator=( | ||
| cusparse_dn_mat_descr_wrapper_t<f_t>&& other) | ||
| { | ||
| if (need_destruction_) { RAFT_CUSPARSE_TRY(cusparseDestroyDnMat(descr_)); } | ||
| descr_ = other.descr_; | ||
| other.need_destruction_ = false; | ||
| return *this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix ownership flag in cusparse_dn_mat_descr_wrapper_t move-assignment.
need_destruction_ is not updated from the moved-from object, so the descriptor can leak (or be left non-owning) after move-assigning into a default-constructed wrapper.
🔧 Suggested fix
cusparse_dn_mat_descr_wrapper_t<f_t>& cusparse_dn_mat_descr_wrapper_t<f_t>::operator=(
cusparse_dn_mat_descr_wrapper_t<f_t>&& other)
{
if (need_destruction_) { RAFT_CUSPARSE_TRY(cusparseDestroyDnMat(descr_)); }
- descr_ = other.descr_;
- other.need_destruction_ = false;
+ descr_ = other.descr_;
+ need_destruction_ = other.need_destruction_;
+ other.need_destruction_ = false;
return *this;
}As per coding guidelines, ensure resource lifecycles are correctly transferred and released.
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cusparse_view.cu` around lines 122 - 129, The
move-assignment for cusparse_dn_mat_descr_wrapper_t<f_t>::operator= fails to
transfer ownership correctly: after assigning descr_ from other.descr_ the
need_destruction_ flag is not updated, risking leaks or double-destruction;
update this operator to set this->need_destruction_ = other.need_destruction_
(then set other.need_destruction_ = false) so ownership of the descriptor is
transferred atomically (and optionally handle self-assignment if desired).
| // TODO add proper checking | ||
| #if CUDA_VER_12_4_UP | ||
| template <typename T, | ||
| typename std::enable_if_t<std::is_same_v<T, float> || std::is_same_v<T, double>>*> | ||
| cusparseStatus_t my_cusparsespmm_preprocess(cusparseHandle_t handle, | ||
| cusparseOperation_t opA, | ||
| cusparseOperation_t opB, | ||
| const T* alpha, | ||
| const cusparseSpMatDescr_t matA, | ||
| const cusparseDnMatDescr_t matB, | ||
| const T* beta, | ||
| const cusparseDnMatDescr_t matC, | ||
| cusparseSpMMAlg_t alg, | ||
| void* externalBuffer, | ||
| cudaStream_t stream) | ||
| { | ||
| auto constexpr float_type = []() constexpr { | ||
| if constexpr (std::is_same_v<T, float>) { | ||
| return CUDA_R_32F; | ||
| } else if constexpr (std::is_same_v<T, double>) { | ||
| return CUDA_R_64F; | ||
| } | ||
| }(); | ||
| CUSPARSE_CHECK(cusparseSetStream(handle, stream)); | ||
| return cusparseSpMM_preprocess( | ||
| handle, opA, opB, alpha, matA, matB, beta, matC, float_type, alg, externalBuffer); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, get the file size and explore the structure
wc -l cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Look at the area around lines 238-264 to see the context
sed -n '230,270p' cpp/src/linear_programming/cusparse_view.cu | cat -nRepository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Search for cusparseSpMV_preprocess to find the SpMV pattern mentioned
rg "cusparseSpMV_preprocess" cpp/src/linear_programming/cusparse_view.cu -A 5 -B 5Repository: NVIDIA/cuopt
Length of output: 168
🏁 Script executed:
# Search for dlsym usage in the file
rg "dlsym" cpp/src/linear_programming/cusparse_view.cu -B 5 -A 5Repository: NVIDIA/cuopt
Length of output: 877
🏁 Script executed:
# Find the full SpMV preprocess function implementation
rg -A 30 "template.*cusparseSpMV_preprocess" cpp/src/linear_programming/cusparse_view.cuRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Get the full context around lines 238-264 including broader context
sed -n '200,280p' cpp/src/linear_programming/cusparse_view.cu | cat -nRepository: NVIDIA/cuopt
Length of output: 4204
🏁 Script executed:
# Search for any fallback handling when dlsym fails
rg "has_value|cusparseSpMM_preprocess" cpp/src/linear_programming/cusparse_view.cu -B 3 -A 3Repository: NVIDIA/cuopt
Length of output: 597
Mirror the SpMV dynamic loading pattern for cusparseSpMM_preprocess.
The SpMV path correctly uses dynamic_load_runtime::function with dlsym to handle runtime CUDA version mismatches (symbol may not exist if runtime cuSPARSE < 12.4). The SpMM path directly calls cusparseSpMM_preprocess without any dynamic loading, which will fail at load time if the runtime cuSPARSE library predates CUDA 12.4, even though the code compiled with CUDA 12.4+. The #if CUDA_VER_12_4_UP guard only controls compile-time availability, not runtime availability.
Refactor my_cusparsespmm_preprocess to use the same pattern: load the symbol via dlsym, check has_value(), and gracefully skip the preprocess step if unavailable (or return an error code).
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cusparse_view.cu` around lines 238 - 264, The
my_cusparsespmm_preprocess function must mirror the SpMV dynamic-loading
pattern: use dynamic_load_runtime::function to load the symbol
"cusparseSpMM_preprocess" via dlsym, check the returned wrapper's has_value()
before calling it, and if the symbol is missing gracefully skip the preprocess
step (return CUSPARSE_STATUS_SUCCESS or another agreed error code) instead of
calling the symbol directly; retain the existing float_type computation,
CUSPARSE_CHECK(cusparseSetStream(...)), and call the loaded function through the
wrapper when available.
| template <typename i_t, typename f_t> | ||
| struct SpMM_benchmarks_context_t { | ||
| SpMM_benchmarks_context_t(cusparse_sp_mat_descr_wrapper_t<i_t, f_t>& A, | ||
| cusparse_sp_mat_descr_wrapper_t<i_t, f_t>& A_T, | ||
| int primal_size, | ||
| int dual_size, | ||
| size_t current_batch_size, | ||
| raft::handle_t const* handle_ptr) | ||
| : x(primal_size * current_batch_size, handle_ptr->get_stream()), | ||
| y(dual_size * current_batch_size, handle_ptr->get_stream()), | ||
| buffer_non_transpose_batch(0, handle_ptr->get_stream()), | ||
| buffer_transpose_batch(0, handle_ptr->get_stream()), | ||
| ping_pong_graph(handle_ptr->get_stream()) | ||
| { | ||
| auto stream_view = handle_ptr->get_stream(); | ||
| cusparse_dn_mat_descr_wrapper_t<f_t> x_descr; | ||
| cusparse_dn_mat_descr_wrapper_t<f_t> y_descr; | ||
|
|
||
| int rows_primal = primal_size; | ||
| int col_primal = current_batch_size; | ||
| int ld_primal = current_batch_size; | ||
|
|
||
| int rows_dual = dual_size; | ||
| int col_dual = current_batch_size; | ||
| int ld_dual = current_batch_size; | ||
|
|
||
| x_descr.create(rows_primal, col_primal, ld_primal, x.data(), CUSPARSE_ORDER_ROW); | ||
| y_descr.create(rows_dual, col_dual, ld_dual, y.data(), CUSPARSE_ORDER_ROW); | ||
|
|
||
| // Init buffers for SpMMs | ||
| const rmm::device_scalar<f_t> alpha{1, stream_view}; | ||
| const rmm::device_scalar<f_t> beta{0, stream_view}; | ||
| size_t buffer_size_non_transpose_batch = 0; | ||
| RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm_bufferSize( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A, | ||
| x_descr, | ||
| beta.data(), | ||
| y_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| &buffer_size_non_transpose_batch, | ||
| stream_view)); | ||
|
|
||
| size_t buffer_size_transpose_batch = 0; | ||
| RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm_bufferSize( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A_T, | ||
| y_descr, | ||
| beta.data(), | ||
| x_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| &buffer_size_transpose_batch, | ||
| stream_view)); | ||
|
|
||
| rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view); | ||
| rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view); | ||
|
|
||
| #if CUDART_VERSION >= 12040 | ||
| // Preprocess buffers for SpMMs | ||
| my_cusparsespmm_preprocess<f_t>( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A_T, | ||
| y_descr, | ||
| beta.data(), | ||
| x_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| buffer_transpose_batch.data(), | ||
| stream_view); | ||
|
|
||
| my_cusparsespmm_preprocess<f_t>( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A, | ||
| x_descr, | ||
| beta.data(), | ||
| y_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| buffer_non_transpose_batch.data(), | ||
| stream_view); | ||
| #endif | ||
|
|
||
| // First empty run for warm up and put it in a CUDA Graph | ||
| ping_pong_graph.start_capture(0); | ||
| RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A, | ||
| x_descr, | ||
| beta.data(), | ||
| y_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| (f_t*)buffer_non_transpose_batch.data(), | ||
| stream_view)); | ||
|
|
||
| RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm( | ||
| handle_ptr->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| alpha.data(), | ||
| A_T, | ||
| y_descr, | ||
| beta.data(), | ||
| x_descr, | ||
| (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2, | ||
| (f_t*)buffer_transpose_batch.data(), | ||
| stream_view)); | ||
|
|
||
| ping_pong_graph.end_capture(0); | ||
| // Just for the warm up | ||
| ping_pong_graph.launch(0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix buffer/descriptor shadowing to avoid UAF in captured graphs.
Local buffer_* and x_descr/y_descr shadow the members, so the captured CUDA graph references memory/descriptors that are destroyed at the end of the constructor. Use the member fields to keep them alive for the graph lifetime.
🐛 Suggested fix
- cusparse_dn_mat_descr_wrapper_t<f_t> x_descr;
- cusparse_dn_mat_descr_wrapper_t<f_t> y_descr;
+ // Use member descriptors to keep them alive for graph capture
@@
- rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);
- rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);
+ buffer_transpose_batch = rmm::device_buffer(buffer_size_transpose_batch, stream_view);
+ buffer_non_transpose_batch = rmm::device_buffer(buffer_size_non_transpose_batch, stream_view);🤖 Prompt for AI Agents
In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`
around lines 17 - 140, The constructor SpMM_benchmarks_context_t creates local
variables buffer_transpose_batch, buffer_non_transpose_batch, x_descr and
y_descr that shadow class members, causing the captured CUDA graph to reference
destroyed objects (UAF). Fix by removing the local shadowing: initialize and use
the member descriptors and member rmm::device_buffer fields (the class members
named buffer_transpose_batch, buffer_non_transpose_batch, x_descr, y_descr)
instead of creating local ones, allocate the member buffers with buffer_size_*
before preprocessing and use member_buffer.data() and the member descriptors in
my_cusparsespmm_preprocess and cusparsespmm calls so the objects remain alive
for the graph capture and launch.
| template <typename i_t, typename f_t> | ||
| int optimal_batch_size_handler(const optimization_problem_t<i_t, f_t>& op_problem, | ||
| int max_batch_size) | ||
| { | ||
| if (max_batch_size == 1) return 1; | ||
|
|
||
| // Try to quickly find what is the optimal batch size for the problem | ||
| // We run the two most ran SpMMs for both A and A_T and compute "time / batch_size" | ||
| // The one with the best ratio has the optimal batch size (since can solve most amount of work in | ||
| // least time) To try to have something representative we run each SpMM 5 times and take the | ||
| // average We do it for both A and A_T and take the sum since both will be run for each batch size | ||
|
|
||
| // We start with batch size 128 (power two better for cuSparse) and try to improve by either | ||
| // multitipling or dividing by 2 each time At max we take 5 steps of search | ||
|
|
||
| constexpr int max_steps = 4; // 4 because we already do one step for direction | ||
| constexpr int initial_batch_size = 128; | ||
| constexpr int benchmark_runs = 5; | ||
| // Take the floor power of two | ||
| // This ensures that we always start with a batch size that is a power of two or | ||
| // initial_batch_size | ||
| int current_batch_size = | ||
| std::pow(2, std::floor(std::log2(std::min(initial_batch_size, max_batch_size)))); | ||
| int optimal_batch_size = current_batch_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against non‑positive max_batch_size.
log2(min(initial,max_batch_size)) is undefined for <= 0, and current_batch_size can become 0, which later divides by zero in evaluate_node. Add a precondition check.
🐛 Suggested fix
- if (max_batch_size == 1) return 1;
+ cuopt_assert(max_batch_size > 0, "max_batch_size must be > 0");
+ if (max_batch_size == 1) return 1;🤖 Prompt for AI Agents
In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`
around lines 186 - 209, Add a precondition guard in optimal_batch_size_handler
to prevent non-positive max_batch_size from reaching the log2/pow2 computation:
if max_batch_size <= 1 return 1 (same behavior as the existing ==1 early return)
or clamp the value before computing current_batch_size by computing int cap =
std::max(1, std::min(initial_batch_size, max_batch_size)); then use cap in the
std::log2/std::pow expression to ensure current_batch_size is never 0 and avoid
divide-by-zero in evaluate_node.
| rmm::device_uvector<double> initial_primal(0, stream); | ||
| rmm::device_uvector<double> initial_dual(0, stream); | ||
| double initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); | ||
| double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); | ||
|
|
||
| cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0"); | ||
| const int max_batch_size = settings.new_bounds.size(); | ||
| int optimal_batch_size = use_optimal_batch_size | ||
| ? detail::optimal_batch_size_handler(problem, max_batch_size) | ||
| : max_batch_size; | ||
| cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size, | ||
| "Optimal batch size should be between 1 and max batch size"); | ||
| using f_t2 = typename type_2<f_t>::type; | ||
|
|
||
| // If need warm start, solve the LP alone | ||
| if (primal_dual_init || primal_weight_init) { | ||
| pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings; | ||
| warm_start_settings.new_bounds.clear(); | ||
| warm_start_settings.method = cuopt::linear_programming::method_t::PDLP; | ||
| warm_start_settings.presolve = false; | ||
| warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3; | ||
| warm_start_settings.detect_infeasibility = false; | ||
| optimization_problem_solution_t<i_t, f_t> original_solution = | ||
| solve_lp(problem, warm_start_settings); | ||
| if (primal_dual_init) { | ||
| initial_primal = rmm::device_uvector<double>( | ||
| original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); | ||
| initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(), | ||
| original_solution.get_dual_solution().stream()); | ||
| initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_; | ||
| } | ||
| if (primal_weight_init) { | ||
| initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use f_t-typed warm‑start buffers in the templated batch path.
The batch warm‑start path hardcodes double buffers. This breaks float instantiations (pointer/type mismatch) and can fail to compile or silently convert. Use f_t consistently.
🔧 Proposed fix
- rmm::device_uvector<double> initial_primal(0, stream);
- rmm::device_uvector<double> initial_dual(0, stream);
- double initial_step_size = std::numeric_limits<f_t>::signaling_NaN();
- double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
+ rmm::device_uvector<f_t> initial_primal(0, stream);
+ rmm::device_uvector<f_t> initial_dual(0, stream);
+ f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN();
+ f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
...
- initial_primal = rmm::device_uvector<double>(
+ initial_primal = rmm::device_uvector<f_t>(
original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());
- initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(),
+ initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(),
original_solution.get_dual_solution().stream());🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solve.cu` around lines 750 - 783, The warm-start
batch path declares initial_primal and initial_dual as
rmm::device_uvector<double>, which mismatches templated f_t instantiations;
change both to rmm::device_uvector<f_t> (or the appropriate typedef alias used
for the solver element type) and construct them from
original_solution.get_primal_solution()/get_dual_solution() so the
device_uvector element type matches f_t; keep initial_step_size and
initial_primal_weight as f_t and preserve the use of the original solution
streams when constructing the device_uvectors.
| if (batch_mode_) { | ||
| // Pass down any input pointer of the right type, actual pointer does not matter | ||
| size_t byte_needed = 0; | ||
| cub::DeviceSegmentedReduce::Sum( | ||
| nullptr, | ||
| byte_needed, | ||
| thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(), | ||
| norm_squared_delta_primal_.data()), | ||
| tuple_multiplies<f_t>{}), | ||
| interaction_.data(), | ||
| climber_strategies_.size(), | ||
| primal_size_, | ||
| stream_view_); | ||
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | ||
|
|
||
| cub::DeviceSegmentedReduce::Sum( | ||
| nullptr, | ||
| byte_needed, | ||
| thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}), | ||
| norm_squared_delta_primal_.data(), | ||
| climber_strategies_.size(), | ||
| primal_size_, | ||
| stream_view_); | ||
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | ||
|
|
||
| cub::DeviceSegmentedReduce::Sum( | ||
| nullptr, | ||
| byte_needed, | ||
| thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}), | ||
| norm_squared_delta_dual_.data(), | ||
| climber_strategies_.size(), | ||
| dual_size_, | ||
| stream_view_); | ||
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | ||
|
|
||
| dot_product_storage.resize(dot_product_bytes, stream_view_); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the full context around lines 71-107
head -n 120 cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | tail -n 60Repository: NVIDIA/cuopt
Length of output: 2265
🏁 Script executed:
# Also check lines 485-516
wc -l cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cuRepository: NVIDIA/cuopt
Length of output: 136
🏁 Script executed:
# Let's see if RAFT_CUDA_TRY is used elsewhere in this file
rg -n "RAFT_CUDA_TRY\|cub::Device" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cuRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check CUB DeviceSegmentedReduce documentation or look for return type in header
rg -A5 "DeviceSegmentedReduce::Sum" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -40Repository: NVIDIA/cuopt
Length of output: 1710
🏁 Script executed:
# Check how other CUDA operations are handled in this file
rg -n "cudaMalloc|cudaMemcpy|cudaFree|CUDA_CHECK|RAFT_CUDA_TRY" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cuRepository: NVIDIA/cuopt
Length of output: 201
🏁 Script executed:
# Look at the CUB include and understand the return type
find . -name "*.cuh" -o -name "*.h" | xargs rg "DeviceSegmentedReduce" | grep -E "(return|cudaError)" | head -5Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check the exact lines 485-516 to see the actual compute calls
sed -n '485,516p' cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cuRepository: NVIDIA/cuopt
Length of output: 1204
🏁 Script executed:
# Search for how CUB is used elsewhere in the codebase
rg -l "cub::Device" --type cu --type cuh | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && rg -B2 -A2 "cub::Device" {}'Repository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
# Check if CUB APIs return cudaError_t or void
rg "RAFT_CUDA_TRY\(cub::" --type cu --type cuhRepository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
# Look at RAFT_CUDA_TRY definition to understand what it expects
rg -B3 -A5 "define RAFT_CUDA_TRY\|RAFT_CUDA_TRY.*=" | head -30Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check the stream_view_ usage pattern - is it async?
rg "stream_view_\)" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -10Repository: NVIDIA/cuopt
Length of output: 502
🏁 Script executed:
# Search for CUB usage patterns across codebase without type filtering
find . -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" | xargs rg -l "cub::Device" 2>/dev/null | head -5Repository: NVIDIA/cuopt
Length of output: 295
🏁 Script executed:
# Look at one of these files to see how CUB is typically used
find . -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" | xargs rg -B2 -A2 "cub::Device" 2>/dev/null | head -50Repository: NVIDIA/cuopt
Length of output: 3838
🏁 Script executed:
# Check if RAFT has examples of wrapping CUB operations
find . -path ./build -prune -o -type f \( -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" \) -print | xargs rg "RAFT_CUDA_TRY.*cub" 2>/dev/null | head -10Repository: NVIDIA/cuopt
Length of output: 135
🏁 Script executed:
# Look for the RAFT wrapper/stream class definition to understand stream_view_
rg -l "stream_view" --max-count 5 | xargs grep -l "class.*stream\|struct.*stream" 2>/dev/null | head -3Repository: NVIDIA/cuopt
Length of output: 149
🏁 Script executed:
# Check CUB header for return type information
find . -name "*.cuh" -path "*cub*" 2>/dev/null | head -3 | xargs cat 2>/dev/null | grep -A10 "DeviceSegmentedReduce" | head -20Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Examine the barrier.cu pattern more closely
rg -B3 -A10 "RAFT_CUDA_TRY.*cub::DeviceSelect" cpp/src/dual_simplex/barrier.cuRepository: NVIDIA/cuopt
Length of output: 536
🏁 Script executed:
# Check if there are stream/async implications we need to understand
rg -B5 -A5 "stream_view_\|cudaStream" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -40Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Verify cudaPeekAtLastError usage pattern in file - is it capturing CUB errors?
rg -B10 -A2 "cudaPeekAtLastError" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cuRepository: NVIDIA/cuopt
Length of output: 1382
Wrap CUB segmented-reduce calls with RAFT_CUDA_TRY for error propagation
The cub::DeviceSegmentedReduce::Sum calls at lines 71–107 and 485–516 don't check return codes. All CUB device operations return cudaError_t and must be wrapped with RAFT_CUDA_TRY so CUDA failures propagate to the caller, including buffer-sizing calls with nullptr. This follows the established pattern in cpp/src/dual_simplex/barrier.cu and aligns with the requirement that all CUDA memory operations have error checking.
Apply to all six CUB calls
- cub::DeviceSegmentedReduce::Sum(
+ RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(
nullptr,
byte_needed,
thrust::make_transform_iterator(...),
interaction_.data(),
climber_strategies_.size(),
primal_size_,
- stream_view_);
+ stream_view_));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (batch_mode_) { | |
| // Pass down any input pointer of the right type, actual pointer does not matter | |
| size_t byte_needed = 0; | |
| cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(), | |
| norm_squared_delta_primal_.data()), | |
| tuple_multiplies<f_t>{}), | |
| interaction_.data(), | |
| climber_strategies_.size(), | |
| primal_size_, | |
| stream_view_); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}), | |
| norm_squared_delta_primal_.data(), | |
| climber_strategies_.size(), | |
| primal_size_, | |
| stream_view_); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}), | |
| norm_squared_delta_dual_.data(), | |
| climber_strategies_.size(), | |
| dual_size_, | |
| stream_view_); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| dot_product_storage.resize(dot_product_bytes, stream_view_); | |
| } | |
| if (batch_mode_) { | |
| // Pass down any input pointer of the right type, actual pointer does not matter | |
| size_t byte_needed = 0; | |
| RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(), | |
| norm_squared_delta_primal_.data()), | |
| tuple_multiplies<f_t>{}), | |
| interaction_.data(), | |
| climber_strategies_.size(), | |
| primal_size_, | |
| stream_view_)); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}), | |
| norm_squared_delta_primal_.data(), | |
| climber_strategies_.size(), | |
| primal_size_, | |
| stream_view_)); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum( | |
| nullptr, | |
| byte_needed, | |
| thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}), | |
| norm_squared_delta_dual_.data(), | |
| climber_strategies_.size(), | |
| dual_size_, | |
| stream_view_)); | |
| dot_product_bytes = std::max(dot_product_bytes, byte_needed); | |
| dot_product_storage.resize(dot_product_bytes, stream_view_); | |
| } |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu`
around lines 71 - 107, The three cub::DeviceSegmentedReduce::Sum calls in the
batch-mode block (the ones using tuple_multiplies<f_t>, power_two_func_t<f_t> on
norm_squared_delta_primal_ and norm_squared_delta_dual_) must be wrapped with
RAFT_CUDA_TRY so their returned cudaError_t is checked (this includes the
nullptr sizing calls); replace direct calls with
RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(...)) for each invocation and
likewise wrap the other three CUB Sum calls elsewhere (the other location noted
in the review) so failures propagate; keep existing local variables
(dot_product_bytes, dot_product_storage.resize, stream_view_, primal_size_,
dual_size_, climber_strategies_.size()) unchanged.
| void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes( | ||
| rmm::device_scalar<f_t>& primal_step_size, rmm::device_scalar<f_t>& dual_step_size) | ||
| rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size) | ||
| { | ||
| const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size()); | ||
| compute_actual_stepsizes<i_t, f_t> | ||
| <<<1, 1, 0, stream_view_>>>(this->view(), primal_step_size.data(), dual_step_size.data()); | ||
| <<<grid_size, block_size, 0, stream_view_>>>(this->view(), | ||
| make_span(primal_step_size), | ||
| make_span(dual_step_size), | ||
| climber_strategies_.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate output buffer sizes before kernel write
compute_actual_stepsizes writes climber_strategies_.size() entries; if the provided buffers are smaller, this is an OOB write. Add size checks before launching.
✅ Size guards
- const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size());
+ const auto batch_size = static_cast<i_t>(climber_strategies_.size());
+ cuopt_assert(primal_step_size.size() >= static_cast<size_t>(batch_size),
+ "primal_step_size too small for batch size");
+ cuopt_assert(dual_step_size.size() >= static_cast<size_t>(batch_size),
+ "dual_step_size too small for batch size");
+ const auto [grid_size, block_size] = kernel_config_from_batch_size(batch_size);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes( | |
| rmm::device_scalar<f_t>& primal_step_size, rmm::device_scalar<f_t>& dual_step_size) | |
| rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size) | |
| { | |
| const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size()); | |
| compute_actual_stepsizes<i_t, f_t> | |
| <<<1, 1, 0, stream_view_>>>(this->view(), primal_step_size.data(), dual_step_size.data()); | |
| <<<grid_size, block_size, 0, stream_view_>>>(this->view(), | |
| make_span(primal_step_size), | |
| make_span(dual_step_size), | |
| climber_strategies_.size()); | |
| void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes( | |
| rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size) | |
| { | |
| const auto batch_size = static_cast<i_t>(climber_strategies_.size()); | |
| cuopt_assert(primal_step_size.size() >= static_cast<size_t>(batch_size), | |
| "primal_step_size too small for batch size"); | |
| cuopt_assert(dual_step_size.size() >= static_cast<size_t>(batch_size), | |
| "dual_step_size too small for batch size"); | |
| const auto [grid_size, block_size] = kernel_config_from_batch_size(batch_size); | |
| compute_actual_stepsizes<i_t, f_t> | |
| <<<grid_size, block_size, 0, stream_view_>>>(this->view(), | |
| make_span(primal_step_size), | |
| make_span(dual_step_size), | |
| climber_strategies_.size()); |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu`
around lines 537 - 545, The kernel launch in
adaptive_step_size_strategy_t::get_primal_and_dual_stepsizes can OOB if
primal_step_size or dual_step_size are smaller than climber_strategies_.size();
add explicit size validation before launching compute_actual_stepsizes: compare
primal_step_size.size() and dual_step_size.size() against
climber_strategies_.size() and either resize the rmm::device_uvector buffers or
return/throw a clear error (with context) if they are too small, so the kernel
only writes within bounds using the existing stream_view_ and
kernel_config_from_batch_size logic.
| void matrix_swap(rmm::device_uvector<f_t>& matrix, | ||
| i_t vector_size, | ||
| const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs) | ||
| { | ||
| if (swap_pairs.empty()) { return; } | ||
|
|
||
| const i_t batch_size = matrix.size() / vector_size; | ||
| cuopt_assert(vector_size > 0, "Vector size must be greater than 0"); | ||
| cuopt_assert(batch_size > 0, "Batch size must be greater than 0"); | ||
|
|
||
| const size_t swap_count = swap_pairs.size(); | ||
| const size_t total_items = swap_count * static_cast<size_t>(vector_size); | ||
|
|
||
| auto counting = thrust::make_counting_iterator<size_t>(0); | ||
| auto left_index = thrust::make_transform_iterator( | ||
| counting, | ||
| matrix_swap_index_functor<i_t>{thrust::raw_pointer_cast(swap_pairs.data()), vector_size, true}); | ||
| auto right_index = thrust::make_transform_iterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against division-by-zero and mismatched matrix sizing.
matrix.size()/vector_size happens before validating vector_size; if it’s 0, this is UB. Also, ensure the matrix size is a clean multiple of vector_size to avoid silent truncation of batch_size.
🔧 Proposed fix
- const i_t batch_size = matrix.size() / vector_size;
- cuopt_assert(vector_size > 0, "Vector size must be greater than 0");
- cuopt_assert(batch_size > 0, "Batch size must be greater than 0");
+ cuopt_assert(vector_size > 0, "Vector size must be greater than 0");
+ cuopt_assert(matrix.size() % static_cast<size_t>(vector_size) == 0,
+ "Matrix size must be a multiple of vector size");
+ const i_t batch_size = matrix.size() / vector_size;
+ cuopt_assert(batch_size > 0, "Batch size must be greater than 0");🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 49 - 66,
In matrix_swap, avoid dividing matrix.size() by vector_size before validating
vector_size and ensure the matrix length is an exact multiple of vector_size;
first assert or return if vector_size == 0 (check vector_size > 0 before
computing batch_size), compute batch_size only after that check (use
matrix.size() / vector_size into batch_size), and add a guard that matrix.size()
% vector_size == 0 (or log/throw) to prevent mismatched sizing; update the
function matrix_swap and any use of batch_size to rely on these validated
values.
| cub::DeviceTransform::Transform( | ||
| in_zip, | ||
| out_zip, | ||
| total_items, | ||
| [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { | ||
| return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); | ||
| }, | ||
| matrix.stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "DeviceTransform::Transform" -C2Repository: NVIDIA/cuopt
Length of output: 168
🌐 Web query:
CUB DeviceTransform Transform cudaError_t return status
💡 Result:
cub::DeviceTransform::Transform(...) returns a cudaError_t that indicates whether the CUB call successfully set up and launched the transform work (and it may also surface a previously pending/asynchronous CUDA error, like many CUDA runtime calls). (nvidia.github.io)
In current CCCL/CUB, the return status comes from two places: (nvidia.github.io)
- Offset-range precheck: CUB selects a signed offset type from your
NumItemsTand callschoose_signed_offset<NumItemsT>::is_exceeding_offset_type(num_items). Ifnum_itemsis too large for that chosen signed offset type,Transformreturns that error immediately. (nvidia.github.io) - Dispatch/launch: otherwise it returns whatever
detail::transform::dispatch_t<...>::dispatch(...)returns (typicallycudaSuccessif launch succeeded; otherwise an appropriate CUDA error such as invalid configuration/arguments, etc.). (nvidia.github.io)
Also note the “temp storage” compatibility overload behavior: if you call the overload with d_temp_storage == nullptr, it sets temp_storage_bytes = 1 and returns cudaSuccess without doing work. (nvidia.github.io)
To catch errors that occur during execution (asynchronous kernel failures), check after synchronizing (e.g., cudaStreamSynchronize) and/or query with cudaGetLastError() / cudaPeekAtLastError(). (docs.nvidia.com)
Citations:
- 1: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai
- 2: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai
- 3: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai
- 4: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai
- 5: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai
- 6: https://docs.nvidia.com/cuda/archive/13.1.0/cuda-programming-guide/02-basics/asynchronous-execution.html?utm_source=openai
🌐 Web query:
RAFT_CUDA_TRY error checking macro RAPIDS
💡 Result:
RAFT_CUDA_TRY is RAFT/RAPIDS’ CUDA-runtime error-checking helper macro: you wrap CUDA API calls with it (e.g., RAFT_CUDA_TRY(cudaMemcpyAsync(...))), and it checks the returned status and throws an exception if the call failed. RAFT provides analogous macros for other CUDA libraries (RAFT_CUBLAS_TRY, RAFT_CUSOLVER_TRY). If you’re in a context where throwing is undesirable (commonly destructors), use the *_NO_THROW variants (RAFT_CUDA_TRY_NO_THROW, etc.), which log the error but do not throw. [1] [2]
Add CUDA error checking for the CUB transform call.
The cub::DeviceTransform::Transform call returns a cudaError_t status indicating setup/launch success or failure (e.g., invalid num_items size, launch errors). Per the coding guideline requiring error checking on all CUDA operations, wrap this call with RAFT_CUDA_TRY to catch and propagate errors appropriately.
🔧 Proposed fix
- cub::DeviceTransform::Transform(
- in_zip,
- out_zip,
- total_items,
- [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {
- return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));
- },
- matrix.stream());
+ RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
+ in_zip,
+ out_zip,
+ total_items,
+ [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {
+ return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));
+ },
+ matrix.stream()));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cub::DeviceTransform::Transform( | |
| in_zip, | |
| out_zip, | |
| total_items, | |
| [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { | |
| return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); | |
| }, | |
| matrix.stream()); | |
| RAFT_CUDA_TRY(cub::DeviceTransform::Transform( | |
| in_zip, | |
| out_zip, | |
| total_items, | |
| [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { | |
| return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); | |
| }, | |
| matrix.stream())); |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 76 - 83,
The CUB transform call using cub::DeviceTransform::Transform (with in_zip,
out_zip, total_items, lambda, and matrix.stream()) is missing CUDA error
checking; wrap the call with RAFT_CUDA_TRY so the returned cudaError_t is
checked and propagated (e.g.,
RAFT_CUDA_TRY(cub::DeviceTransform::Transform(...))); ensure you reference the
same parameters (in_zip, out_zip, total_items, matrix.stream()) and keep the
lambda as-is while adding the RAFT_CUDA_TRY wrapper to satisfy the coding
guideline requiring checks on all CUDA operations.
|
@Kh4ster Is this PR meant for 26.02, in that case please change merge branch to release/26.02 |
This PR allows PDLP to solve a batch of problems. Currently the only supported difference among problems is one variable bound being different per climber.
This PR also includes:
This PR is not ready to be merged:
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.