Batch PDLP #791

Kh4ster · 2026-01-23T11:02:52Z

This PR allows PDLP to solve a batch of problems. Currently the only supported difference among problems is one variable bound being different per climber.
This PR also includes:

batch_pdlp_solve API to be leveraged within strong branching
Additional cuopt_cli parameter to enable Batch PDLP over Dual Simplex for strong branching
Better warm start for Stable3
Properly handling the PDLP solver mode hyper parameters on the C++ side
New infeasiblity detection based of cuPDLPx method (not fully supported in batch mode)

This PR is not ready to be merged:

Need collaborators approval
Need to make sure the non-batch PDLP has no regression

Summary by CodeRabbit

New Features
- Added batch PDLP solve functionality for processing multiple problems in parallel.
- Introduced support for multiple solver strategies (Stable1, Stable2) with enhanced mode selection.
- Enabled batch PDLP-based strong branching in MIP solver.
Documentation
- Updated version information in documentation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ther finish with optimal

…and change initial to 128

coderabbitai · 2026-01-23T11:18:43Z

📝 Walkthrough

Walkthrough

This PR introduces batch processing capabilities to PDLP solvers and refactors hyper-parameters from static globals to runtime-configurable structures. Key changes include multi-climber support, new batch-aware data structures, integration with dual simplex branching, and extended termination/convergence tracking for multiple solutions.

Changes

Cohort / File(s)	Summary
PDLP Hyper-parameters Refactoring `cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh`, `cpp/src/linear_programming/pdlp_hyper_params.cu`, `cpp/src/linear_programming/solve.cu`, `cpp/src/linear_programming/solve.cuh`	Migrated from static extern parameters to runtime `pdlp_hyper_params_t` struct with default initializers. Added setter functions (`set_Stable1`, `set_Stable2`, etc.) accepting mutable `hyper_params` reference. Removed 86 lines of static declarations.
Solver Settings Additions `cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`, `cpp/include/cuopt/linear_programming/mip/solver_settings.hpp`, `cpp/src/linear_programming/solver_settings.cu`	Added `hyper_params` member and optional step-size/primal-weight setters/getters. Added `mip_batch_pdlp_strong_branching` flag to MIP settings. Updated tolerances from 1e-8 to 1e-10.
Batch PDLP Solve Interfaces `cpp/include/cuopt/linear_programming/solve.hpp`, `cpp/src/linear_programming/solve.cu`	Introduced two `batch_pdlp_solve` overloads for MPS models and user problems with fractional variables and root solutions. Supports per-batch result aggregation and termination tracking.
Dual Simplex Strong Branching `cpp/src/dual_simplex/pseudo_costs.cpp`, `cpp/src/dual_simplex/pseudo_costs.hpp`, `cpp/src/dual_simplex/branch_and_bound.cpp`, `cpp/src/dual_simplex/simplex_solver_settings.hpp`	Extended `strong_branching` signature to accept original problem and batch of root solutions. Introduced batch PDLP path when `mip_batch_pdlp_strong_branching` enabled, replacing per-variable streaming with single batch solve.
PDHG Batch Mode `cpp/src/linear_programming/pdhg.cu`, `cpp/src/linear_programming/pdhg.hpp`	Expanded constructor to accept climber strategies, hyper-parameters, and new bounds. Changed step-size/weight storage from scalar to vector. Added `swap_context`, `resize_context` methods. Implemented batch-aware projection/refinement kernels.
PDLP Core Solver `cpp/src/linear_programming/pdlp.cuh`	Upgraded scalar step-sizes/weights/primal-weights to vectors for batch support. Added context management methods. Modified accessor signatures to accept per-id indices. Added batch termination checking.
Restart Strategy Batch Support `cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu`, `cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cuh`, `cpp/src/linear_programming/restart_strategy/localized_duality_gap_container.`, `cpp/src/linear_programming/restart_strategy/weighted_average_solution.`	Extended constructors to accept climber strategies and hyper-parameters. Converted scalar buffers to vectors for per-climber storage. Added device kernels for swap/resize operations. Introduced cupdlpx restart view for batch processing.
Convergence & Termination Tracking `cpp/src/linear_programming/termination_strategy/convergence_information.`, `cpp/src/linear_programming/termination_strategy/infeasibility_information.`, `cpp/src/linear_programming/termination_strategy/termination_strategy.*`, `cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`, `cpp/src/linear_programming/solver_solution.cu`	Refactored termination/convergence data from scalars to vectors for multi-solution support. Added per-id accessors and batch status queries. Introduced `gpu_batch_additional_termination_information_t` for collecting per-climber metrics.
cuSPARSE & Step-Size Strategy `cpp/src/linear_programming/cusparse_view.cu`, `cpp/src/linear_programming/cusparse_view.hpp`, `cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu`, `cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.hpp`	Added dense matrix descriptor wrapper. Extended constructors with climber strategies and hyper-parameters. Introduced batch buffers and SPMM preprocessing. Updated adaptive step-size to use uvectors and per-index accessors.
Batch Utilities & Helpers `cpp/src/linear_programming/swap_and_resize_helper.cuh`, `cpp/src/linear_programming/utilities/segmented_sum_handler.cuh`, `cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`, `cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.hpp`, `cpp/src/linear_programming/utils.cuh`, `cpp/src/utilities/copy_helpers.hpp`	New swap/resize infrastructure with matrix permutation and bounds validation. Added segmented sum handler for batch reductions. Implemented SpMM benchmark and optimal batch size determination. Extended utility functions for batch-safe div and wrapped iterators.
Scaling & Initial Setup `cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu`, `cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh`, `cpp/src/linear_programming/saddle_point.cu`, `cpp/src/linear_programming/saddle_point.hpp`, `cpp/src/linear_programming/pdlp_constants.hpp`	Integrated runtime hyper-parameters into scaling strategy. Added bound rescaling support via PDHG. Implemented context swap/resize for saddle point state. Added kernel config helper and batch-mode flags.
Configuration & Data Models `cpp/src/linear_programming/optimization_problem.cu`, `cpp/src/linear_programming/pdlp_climber_strategy.hpp`, `cpp/src/linear_programming/CMakeLists.txt`, `cpp/include/cuopt/linear_programming/constants.h`, `cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp`	Minor numeric literal fixes. Added climber strategy struct. Updated CMakeLists to include optimal batch size handler. Aligned macro formatting in constants. Updated warm-start operator= to remove template parameters.
MIP & Benchmark Integration `cpp/src/math_optimization/solver_settings.cu`, `cpp/src/mip/solve.cu`, `cpp/src/mip/solver.cu`, `cpp/src/mip/diversity/diversity_manager.cu`, `cpp/src/mip/diversity/recombiners/sub_mip.cuh`, `benchmarks/linear_programming/cuopt/benchmark_helper.hpp`, `benchmarks/linear_programming/cuopt/run_pdlp.cu`	Integrated batch PDLP strong branching into MIP solver. Removed static hyper-parameter setup in favor of runtime configuration. Updated benchmark helper to populate hyper-parameters and handle Stable1/Stable2 modes.
Test Updates `cpp/tests/linear_programming/pdlp_test.cu`, `cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh`, `cpp/tests/mip/bounds_standardization_test.cu`, `cpp/tests/mip/elim_var_remap_test.cu`, `cpp/tests/mip/multi_probe_test.cu`	Extended PDLP tests with batch scenarios, per-climber verification, and varied bounds testing. Updated test utilities to accept separate termination info and primal solution. Removed obsolete setup_pdlp helpers.
Documentation `docs/cuopt/source/versions1.json`	Updated latest version from 26.02.00 to 26.04.00 with relative documentation path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Batch PDLP' is concise and directly reflects the main feature added in this changeset: batch solving capability for the PDLP solver.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cpp/src/linear_programming/saddle_point.cu (1)

95-106: copy() only copies one batch.

With batched buffers, copying only primal_size_/dual_size_ leaves the rest uninitialized. Update to copy the full buffer and assert batch compatibility.

✅ Suggested fix

   EXE_CUOPT_EXPECTS(this->primal_size_ == other.get_primal_size(),
                     "Size of primal solution must be the same in order to copy");
   EXE_CUOPT_EXPECTS(this->dual_size_ == other.get_dual_size(),
                     "Size of dual solution must be the same in order to copy");
+  EXE_CUOPT_EXPECTS(this->primal_solution_.size() == other.get_primal_solution().size(),
+                    "Batch size mismatch for primal solution");
+  EXE_CUOPT_EXPECTS(this->dual_solution_.size() == other.get_dual_solution().size(),
+                    "Batch size mismatch for dual solution");

-  raft::copy(
-    this->primal_solution_.data(), other.get_primal_solution().data(), this->primal_size_, stream);
-  raft::copy(
-    this->dual_solution_.data(), other.get_dual_solution().data(), this->dual_size_, stream);
+  raft::copy(this->primal_solution_.data(),
+             other.get_primal_solution().data(),
+             this->primal_solution_.size(),
+             stream);
+  raft::copy(this->dual_solution_.data(),
+             other.get_dual_solution().data(),
+             this->dual_solution_.size(),
+             stream);

cpp/src/linear_programming/solver_solution.cu (1)

25-33: Initialize termination_stats_ in single-solution constructors.

These constructors now set termination_status_ but leave termination_stats_ empty; any call to get_solve_time() or per-id accessors will read out-of-bounds. Initialize termination_stats_ with a single default entry so the non-batch invariants hold.

🐛 Proposed fix

 optimization_problem_solution_t<i_t, f_t>::optimization_problem_solution_t(
   pdlp_termination_status_t termination_status, rmm::cuda_stream_view stream_view)
   : primal_solution_{0, stream_view},
     dual_solution_{0, stream_view},
     reduced_cost_{0, stream_view},
+    termination_stats_(1),
     termination_status_{termination_status},
     error_status_(cuopt::logic_error("", cuopt::error_type_t::Success))
 {
 }

 optimization_problem_solution_t<i_t, f_t>::optimization_problem_solution_t(
   cuopt::logic_error error_status_, rmm::cuda_stream_view stream_view)
   : primal_solution_{0, stream_view},
     dual_solution_{0, stream_view},
     reduced_cost_{0, stream_view},
+    termination_stats_(1),
     termination_status_{pdlp_termination_status_t::NoTermination},
     error_status_(error_status_)
 {
 }

Also applies to: 37-44

cpp/src/linear_programming/termination_strategy/termination_strategy.cu (1)

318-352: Guard per_constraint_residual in batch (or store per‑climber l∞ residuals).

The per‑constraint path uses scalar *relative_l_inf_primal_residual / *relative_l_inf_dual_residual for every idx. In batch mode, climbers can diverge, so this can incorrectly mark all climbers as optimal/feasible based on a single residual. Consider disallowing per_constraint_residual when batch_size > 1, or store per‑climber l∞ residuals and index by idx.
🐛 Proposed guard
 void pdlp_termination_strategy_t<i_t, f_t>::evaluate_termination_criteria(
   pdhg_solver_t<i_t, f_t>& current_pdhg_solver,
   rmm::device_uvector<f_t>& primal_iterate,
   rmm::device_uvector<f_t>& dual_iterate,
   const rmm::device_uvector<f_t>& dual_slack,
   rmm::device_uvector<f_t>& delta_primal_iterate,
   rmm::device_uvector<f_t>& delta_dual_iterate,
   i_t total_pdlp_iterations,
   const rmm::device_uvector<f_t>& combined_bounds,
   const rmm::device_uvector<f_t>& objective_coefficients)
 {
+  cuopt_expects(!(settings_.per_constraint_residual && climber_strategies_.size() > 1),
+                error_type_t::ValidationError,
+                "per_constraint_residual is not supported in batch mode");
   raft::common::nvtx::range fun_scope("Evaluate termination criteria");

🤖 Fix all issues with AI agents

In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`:
- Around line 243-244: The per-id accessors (get_objective_value,
get_dual_objective_value, get_additional_termination_information) currently
check bounds against termination_status_.size() but index termination_stats_, so
add a size invariant: in the constructors that set/move termination_stats_ and
termination_status_ (the class ctor(s) that populate these vectors) assert or
throw if termination_stats_.size() != termination_status_.size() to guarantee
synchronization, and update the accessors to check the size of the vector they
actually index (use termination_stats_.size() when accessing termination_stats_)
as a secondary safety; reference termination_stats_, termination_status_,
get_objective_value, get_dual_objective_value, and
get_additional_termination_information when making these changes.

In `@cpp/include/cuopt/linear_programming/solve.hpp`:
- Around line 17-18: The public header currently includes the internal header
and exposes dual_simplex::user_problem_t<i_t, f_t> in the batch_pdlp_solve()
signature; remove the internal include and stop exposing that internal type in
the public API by either (a) introducing a public-facing problem abstraction
(e.g., a public problem_t or ProblemView) and changing batch_pdlp_solve() to
accept that, or (b) forward-declaring a minimal public interface and adapting
the implementation to convert from the public type to
dual_simplex::user_problem_t inside the cpp implementation; update all
references to dual_simplex::user_problem_t in the header to use the new public
type or opaque forward declaration and move any internal conversions into the
source file.

In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 158-159: The field mip_batch_pdlp_strong_branching in class
SimplexSolverSettings is not initialized in the constructor initializer list;
add it (set to 0 by default) to the constructor's initializer list alongside
other i_t members (e.g., random_seed, inside_mip, num_bfs_workers) so
mip_batch_pdlp_strong_branching is deterministically initialized when a
SimplexSolverSettings instance is created.

In `@cpp/src/linear_programming/cusparse_view.cu`:
- Around line 238-264: The helper my_cusparsespmm_preprocess currently forwards
cusparseSpMM_preprocess return status without checking it and all its call sites
ignore that return, so preprocessing failures are silent; fix by either (A)
adding CUSPARSE_CHECK around cusparseSpMM_preprocess inside
my_cusparsespmm_preprocess so it logs/throws on error and then return
CUSPARSE_STATUS_SUCCESS, or (B) leave the function returning cusparseStatus_t
and wrap every call site of my_cusparsespmm_preprocess with RAFT_CUSPARSE_TRY
(consistent with other CUSPARSE ops in this file) to propagate errors; update
symbols: my_cusparsespmm_preprocess and each call site that invokes it so
failures are not ignored.

In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`:
- Around line 77-78: Local rmm::device_buffer variables buffer_transpose_batch
and buffer_non_transpose_batch declared in the constructor shadow the class
member buffers and are destroyed on constructor exit, causing the CUDA graph to
reference freed memory; fix by removing the local declarations and
assigning/constructing the buffers into the existing member variables (the
member buffer_transpose_batch and buffer_non_transpose_batch) so the memory
remains valid for the SpMM preprocessing, CUDA graph capture in the constructor,
and later launch() calls.
- Around line 240-250: left_node can become 0 when current_batch_size is 1,
which would pass an invalid batch size to evaluate_node; change the
computation/use of left_node in the optimal_batch_size_handler so it's never
zero (for example assign left_node = std::max(1, current_batch_size / 2) or skip
calling evaluate_node when left_node == 0) and use that non-zero value when
calling evaluate_node<i_t,f_t>(...). Ensure you update any related logic that
assumes left_node > 0 (references: left_node, current_batch_size,
evaluate_node).

In `@cpp/src/linear_programming/pdhg.cu`:
- Around line 97-109: When new_bounds is provided, validate that
new_bounds.size() equals climber_strategies.size() to avoid out‑of‑bounds
accesses in the refine kernels; inside the constructor or the place where
new_bounds is consumed (the block that fills idx/lower/upper and copies into
new_bounds_idx_, new_bounds_lower_, new_bounds_upper_), add a check that throws
or returns an error if sizes differ (e.g., if (new_bounds.size() !=
climber_strategies.size()) { /* error */ }), and only proceed to build the
temporary vectors and call raft::copy when the sizes match; reference
new_bounds, climber_strategies, new_bounds_idx_, new_bounds_lower_, and
new_bounds_upper_ when implementing this validation.
- Around line 765-773: The three CUDA kernel launches
(refine_initial_primal_projection_kernel,
refine_primal_projection_major_batch_kernel,
refine_primal_projection_batch_kernel) must be followed by an immediate
RAFT_CUDA_TRY(cudaPeekAtLastError()); to surface silent GPU failures; update
each launch site (the calls that pass stream_view_ and device_span<> arguments
and return to host) to insert RAFT_CUDA_TRY(cudaPeekAtLastError()); right after
the <<<...>>> call so any launch/async errors are detected before continuing.

In `@cpp/src/linear_programming/pdlp_constants.hpp`:
- Around line 17-22: The function kernel_config_from_batch_size must guard
against batch_size == 0 to avoid computing block_size == 0 and calling
cuda::ceil_div with a zero divisor; add an early check at the top of
kernel_config_from_batch_size (or an assert) that returns a safe pair (e.g.,
std::make_pair(0u, 0u)) or triggers a clear precondition failure when batch_size
is 0, and only compute block_size and grid_size via std::min and cuda::ceil_div
when batch_size > 0 so no divide-by-zero occurs.

In
`@cpp/src/linear_programming/restart_strategy/localized_duality_gap_container.cu`:
- Around line 107-115: The resize_context method in
localized_duality_gap_container_t currently rejects new_size equal to batch_size
even though the message says "less than or equal"; update the assertion in
localized_duality_gap_container_t::resize_context to allow no-op resizes by
changing the check to require new_size <= batch_size (and ensure the associated
error message remains accurate), so callers that pass an unchanged size are
accepted.

In `@cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu`:
- Around line 1038-1054: The resize_context method in pdlp_restart_strategy_t
currently asserts new_size < batch_size which rejects equality despite the
message; change the assertion to allow equality (new_size <= batch_size) so a
no-op resize (new_size == batch_size) is permitted, and update the assertion
message if desired to match the condition; modify the cuopt_assert that
references batch_size in resize_context to use <= instead of < to fix the
behavior.
- Around line 689-785: In pdlp_restart_strategy_t<i_t,
f_t>::should_cupdlpx_restart the final std::copy currently overwrites the
current fixed_point_error_ with last_trial_fixed_point_error_, losing fresh
errors; reverse the copy to update last_trial_fixed_point_error_ from the
current fixed_point_error_ (i.e. copy from fixed_point_error_.begin()/end() into
last_trial_fixed_point_error_.begin()), and ensure the destination vector size
matches the source before copying.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 707-724: The vector info of type
optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t
is default-constructed and then accessed at info[0], causing out-of-bounds
access; fix by sizing or populating info before use (e.g., call info.resize(1)
or info.emplace_back()) and then set info[0].primal_objective,
info[0].number_of_steps_taken and info[0].solve_time, so the std::move(info)
passed into the optimization_problem_solution_t constructor is valid; adjust
around symbols info, additional_termination_information_t, vertex_solution, and
start_solver in the crossover/termination block.
- Around line 556-566: The row-sense handling is inverted: for
user_problem.row_sense == 'G' (>=) you must set constraint_lower[i] =
user_problem.rhs[i] and constraint_upper[i] = +infinity, and for 'L' (<=) set
constraint_lower[i] = -infinity and constraint_upper[i] = user_problem.rhs[i];
update the loop that writes into constraint_lower and constraint_upper (the
block using user_problem.row_sense, m, rhs, and f_t/
std::numeric_limits<f_t>::infinity()) to swap the assignments for 'G' and 'L'
accordingly while leaving the equality branch unchanged.
- Around line 750-783: The warm-start buffers/values are typed as double but
must match the template precision; change initial_primal and initial_dual from
rmm::device_uvector<double> to rmm::device_uvector<f_t2> (use the existing using
f_t2 = typename type_2<f_t>::type), and change initial_step_size and
initial_primal_weight from double to f_t2 so they match
original_solution.get_pdlp_warm_start_data() and the set_initial_* APIs (update
their NaN initializers to std::numeric_limits<f_t2>::signaling_NaN()). Ensure
the assignments that construct the device_uvectors from
original_solution.get_primal_solution()/get_dual_solution() remain but now
target the f_t2-typed buffers.

In `@cpp/src/linear_programming/swap_and_resize_helper.cuh`:
- Around line 48-84: The current in-place swap in matrix_swap uses
cub::DeviceTransform::Transform with aliased input/output iterators (in_zip and
out_zip), which is unsafe; replace that call with thrust::transform to perform
the element-wise swap in-place (thrust documents input/output may coincide).
Locate matrix_swap and the transform invocation that uses
cub::DeviceTransform::Transform, and call thrust::transform with the same
in_zip, a counting end (in_zip + total_items) and out_zip, passing the existing
lambda (or equivalent functor) that swaps the tuple elements; ensure headers no
longer require cub for this operation and keep the matrix_swap_index_functor and
swap_pair_t usage unchanged.

In `@cpp/src/linear_programming/termination_strategy/convergence_information.cu`:
- Around line 249-256: The cuopt_assert in convergence_information_t<i_t,
f_t>::resize_context incorrectly uses new_size < batch_size which forbids
new_size == batch_size despite the error message; change the assertion to allow
equality (use new_size <= batch_size) so callers can perform a no-op resize, and
keep the existing message (or update it to match) while referencing
primal_objective_.size() and cuopt_assert for locating the check.

In `@cpp/src/mip/solve.cu`:
- Around line 61-65: You cloned settings.hyper_params and changed two flags but
never propagated that modified copy into the solver, so PDLP (consuming
settings_.hyper_params in mip_solver_t -> PDLP) still sees the original values;
either (A) update the settings object itself before constructing mip_solver_t
(assign the modified hyper_params back to settings.hyper_params so
mip_solver_t/PDLP use the new flags), or (B) if these flags are truly only for
the scaling strategy, add a comment and ensure only the scaling code receives
the local hyper_params (and do not expect PDLP to observe them). Locate the code
paths around the modified hyper_params, the callsite that constructs
mip_solver_t, and PDLP usage (pdlp.cu references) to implement option A (assign
back) or B (document and keep local), ensuring the flags are consistently
applied where consumed.

🟡 Minor comments (8)

cpp/src/linear_programming/saddle_point.cu-77-84 (1)
77-84: resize_context rejects a no-op resize.

The assert says “less than or equal” but enforces <. This will fail if callers pass the current size. Either allow equality or update the message. I’d allow equality.
✅ Suggested fix
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");
cpp/src/linear_programming/saddle_point.cu-20-37 (1)
20-37: Validate batch_size in the constructor.

batch_size can be zero here, creating an invalid state that later asserts in swap/resize. Add an explicit guard up front. As per coding guidelines, validate problem sizes before expensive allocations.
✅ Suggested fix
   EXE_CUOPT_EXPECTS(primal_size > 0, "Size of the primal problem must be larger than 0");
   EXE_CUOPT_EXPECTS(dual_size > 0, "Size of the dual problem must be larger than 0");
+  EXE_CUOPT_EXPECTS(batch_size > 0, "Batch size must be greater than 0");
cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh-30-31 (1)
30-31: Uninitialized member byte_needed_ in default constructor.

The default constructor leaves byte_needed_ uninitialized, which could lead to undefined behavior if segmented_sum_storage_.resize(byte_needed_, ...) is called before byte_needed_ is set by a cub query. While the two-phase pattern should always query first, defensive initialization is safer.
Proposed fix
   // Empty constructor for when used in non batch mode
-  segmented_sum_handler_t() {}
+  segmented_sum_handler_t() : byte_needed_(0) {}
cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.hpp-59-60 (1)

59-60: Document the lifetime requirement for reference members.

The reference members climber_strategies_ and hyper_params_ (lines 129-130) are part of an intentional architectural pattern used consistently across the solver (also in pdhg_solver_t, cusparse_view_t, termination_strategy). These references are passed through the component hierarchy and managed by the parent pdhg_solver_t, which orchestrates the lifetime of all sub-components.

However, this pattern relies on an implicit caller contract: the referenced objects must remain valid for the duration of the solver's execution. Add a comment or docstring in the class documenting this lifetime requirement to prevent misuse.

cpp/src/mip/diversity/recombiners/sub_mip.cuh-73-84 (1)

73-84: Resolve TODO and confirm hyper‑params consistency for Sub‑MIP scaling.

Line 73‑84: you now pass context.settings.hyper_params into scaling, but the TODO suggests uncertainty. Please confirm this is intended (and consistent with the MIP path’s hyper‑param overrides) and remove/track the TODO before merge.

If you want, I can suggest a small wrapper to align Sub‑MIP hyper‑params with the MIP path.
benchmarks/linear_programming/cuopt/run_pdlp.cu-49-53 (1)
49-53: Update CLI help text for new solver modes.

Line 50: the help string still lists only Stable3/Methodical1/Fast1, but Stable2 and Stable1 are now valid choices.
🔧 Suggested fix
-    .help("Solver mode for PDLP. Possible values: Stable3 (default), Methodical1, Fast1")
+    .help("Solver mode for PDLP. Possible values: Stable3 (default), Stable2, Stable1, Methodical1, Fast1")
cpp/tests/linear_programming/pdlp_test.cu-1011-1019 (1)
1011-1019: Potential stream synchronization issue in extract_subvector.

The helper copies data on vector.stream() and returns immediately. If the caller accesses the returned subvector on a different stream or host-side without synchronization, there could be a race condition.

Consider either:

Synchronizing before return, or

Documenting that the caller must synchronize.
🔧 Option 1: Add synchronization
 template <typename T>
 rmm::device_uvector<T> extract_subvector(const rmm::device_uvector<T>& vector,
                                          size_t start,
                                          size_t length)
 {
   rmm::device_uvector<T> subvector(length, vector.stream());
   raft::copy(subvector.data(), vector.data() + start, length, vector.stream());
+  RAFT_CUDA_TRY(cudaStreamSynchronize(vector.stream()));
   return subvector;
 }
cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu-160-160 (1)
160-160: Assertion message inconsistent with actual check.

The assertion message says "New size must be less than or equal to batch size" but the code checks new_size < batch_size (strictly less than). Either the check or the message should be corrected.
Suggested fix
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");
Or if the strict check is intentional:
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size < batch_size, "New size must be strictly less than batch size");

🧹 Nitpick comments (24)

cpp/src/linear_programming/utilities/ping_pong_graph.cuh (1)
97-105: Consider value-initializing CUDA handles.

The cudaGraph_t and cudaGraphExec_t members are uninitialized, which is safe given the *_initialized guards. However, value-initialization to nullptr would make the code more defensive against accidental misuse and aligns with defensive coding practices.
🔧 Optional improvement
 private:
-  cudaGraph_t even_graph;
-  cudaGraph_t odd_graph;
-  cudaGraphExec_t even_instance;
-  cudaGraphExec_t odd_instance;
+  cudaGraph_t even_graph{nullptr};
+  cudaGraph_t odd_graph{nullptr};
+  cudaGraphExec_t even_instance{nullptr};
+  cudaGraphExec_t odd_instance{nullptr};
cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh (2)

39-51: Inconsistent stream parameter usage between methods.

segmented_sum_helper passes stream_view_ directly (lines 40, 42, 50), while segmented_reduce_helper passes stream_view_.value() (lines 69, 71, 81). While both should work due to implicit conversions, this inconsistency could mask issues when stream_view_ is default-constructed (from the parameterless constructor).

Consider using stream_view_.value() consistently in both methods, or adding a check that stream_view_ is valid before use.

Also applies to: 61-82

84-86: Consider making byte_needed_ private or documenting its usage.

The member byte_needed_ is public but appears to be internal state managed by the helper methods. Either make it private with accessor if needed, or add a brief comment explaining that it's updated by the helper methods.
cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu (4)
32-33: Unused member variables x_descr and y_descr.

Local variables x_descr and y_descr (lines 32-33) are used in the constructor but never assigned to the member variables of the same names (lines 144-145). The members appear to be unused dead code.
Consider removing unused members
-  cusparse_dn_mat_descr_wrapper_t<f_t> x_descr;
-  cusparse_dn_mat_descr_wrapper_t<f_t> y_descr;
   rmm::device_uvector<f_t> x;
   rmm::device_uvector<f_t> y;
Also applies to: 144-145

166-176: Consider adding CUDA error checking after event operations.

The event recording and synchronization lack explicit error checking. While event_handler_t may handle this internally, explicit checks (or RAFT_CUDA_TRY wrappers) would ensure benchmark failures are detected rather than silently producing incorrect timings.

As per coding guidelines, every CUDA operation should have error checking.

312-318: Clarify the middle node calculation.

The formula ((current_batch_size * 2) + current_batch_size) / 2 equals 1.5 * current_batch_size, which is the midpoint between current_batch_size and current_batch_size * 2. This makes sense algorithmically, but a clearer expression or comment would improve readability.
Suggested clarification
     // Testing one last time between the two
-    const int middle_node = ((current_batch_size * 2) + current_batch_size) / 2;
+    // Midpoint between current_batch_size and optimal_batch_size (which is current_batch_size * 2)
+    const int middle_node = (current_batch_size + optimal_batch_size) / 2;
422-423: Unreachable code after exhaustive if-else.

The cuopt_assert(false, ...) and return 0 at lines 422-423 are unreachable since the preceding if/else-if/else chain is exhaustive. While this serves as a defensive assertion, consider removing it or replacing with a comment to avoid static analysis warnings about unreachable code.
cpp/src/linear_programming/utils.cuh (1)
194-221: TODO: Use i_t for problem_size_ in iterator wrappers.

Line 219 has a TODO comment about using i_t instead of int for problem_size_. This should be addressed to maintain type consistency with the rest of the codebase and avoid potential truncation issues on large problems.
Proposed fix for both iterators
 template <typename f_t>
-struct batch_wrapped_iterator {
-  batch_wrapped_iterator(const f_t* problem_input, int problem_size)
+template <typename i_t, typename f_t>
+struct batch_wrapped_iterator {
+  batch_wrapped_iterator(const f_t* problem_input, i_t problem_size)
     : problem_input_(problem_input), problem_size_(problem_size)
   {
   }
-  HDI f_t operator()(int id) { return problem_input_[id / problem_size_]; }
+  HDI f_t operator()(i_t id) { return problem_input_[id / problem_size_]; }

   const f_t* problem_input_;
-  int problem_size_;
+  i_t problem_size_;
 };

-template <typename f_t>
+template <typename i_t, typename f_t>
 struct problem_wrapped_iterator {
-  problem_wrapped_iterator(const f_t* problem_input, int problem_size)
+  problem_wrapped_iterator(const f_t* problem_input, i_t problem_size)
     : problem_input_(problem_input), problem_size_(problem_size)
   {
   }
-  HDI f_t operator()(int id) { return problem_input_[id % problem_size_]; }
+  HDI f_t operator()(i_t id) { return problem_input_[id % problem_size_]; }

   const f_t* problem_input_;
-  // TODO use i_t
-  int problem_size_;
+  i_t problem_size_;
 };
cpp/src/linear_programming/pdlp_climber_strategy.hpp (1)
19-21: Consider providing default initialization for original_index.

The struct lacks a default initializer for original_index. If instances are default-constructed (e.g., in a std::vector resize), the field will be uninitialized, which could lead to undefined behavior if read before being set.
Suggested initialization
 struct pdlp_climber_strategy_t {
-  int original_index;
+  int original_index{-1};  // -1 indicates uninitialized/invalid
 };
cpp/src/linear_programming/pdlp_constants.hpp (1)

35-39: Avoid mutable header‑level flags for batch behavior.

Line 35‑39: static bool in a header yields one copy per TU and encourages global mutable state. Prefer wiring these through settings/hyper‑params, or make them inline constexpr if they are compile‑time constants.

As per coding guidelines, avoid thread‑unsafe global/static state.
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
86-87: Add Doxygen documentation for new public member.

As per coding guidelines for public headers, new public members should have documentation comments. The mip_batch_pdlp_strong_branching member lacks documentation explaining its purpose, valid values, and effect on solver behavior.
  /** 
   * `@brief` Enable batch PDLP for strong branching in MIP solver.
   * `@note` 0 = disabled (use Dual Simplex), 1 = enabled (use Batch PDLP)
   */
  i_t mip_batch_pdlp_strong_branching = 0;
cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)
63-68: Consider adding size validation for weight vector.

The implementation assumes weight has at least one element (accessed via weight.data() at lines 81, 88, 92-93). If an empty vector is passed, this would cause undefined behavior when dereferencing.

Consider adding a debug assertion or precondition check:
cuopt_assert(weight.size() > 0, "weight vector must not be empty");
benchmarks/linear_programming/cuopt/benchmark_helper.hpp (1)
62-71: Good refactor: Runtime hyper-params instead of static globals.

The signature change to accept pdlp_hyper_params_t& params improves modularity and eliminates reliance on global state. One minor suggestion:

Consider using std::exit(EXIT_FAILURE) instead of exit(-1) for portability and clarity.
♻️ Minor improvement
-    exit(-1);
+    std::exit(EXIT_FAILURE);
cpp/src/linear_programming/swap_and_resize_helper.cuh (1)
86-96: Consider type consistency for index parameters.

The function uses int for parameters while the template might be used with different index types. Consider using a template parameter or size_t for consistency.
♻️ Suggested type improvement
 template <typename host_vector_t>
-void host_vector_swap(host_vector_t& host_vector, int left_swap_index, int right_swap_index)
+void host_vector_swap(host_vector_t& host_vector, 
+                      typename host_vector_t::size_type left_swap_index, 
+                      typename host_vector_t::size_type right_swap_index)
 {
-  cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds");
-  cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds");
+  cuopt_assert(left_swap_index < host_vector.size(), "Left swap index out of bounds");
+  cuopt_assert(right_swap_index < host_vector.size(), "Right swap index out of bounds");
   cuopt_assert(left_swap_index < right_swap_index,
                "Left swap index must be less than right swap index");
cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh (1)

115-116: Note: Reference member requires lifetime awareness.

The hyper_params_ reference member means the caller must ensure the referenced pdlp_hyper_params_t outlives the pdlp_initial_scaling_strategy_t instance. This is typical for performance-critical code but worth documenting if not already done elsewhere.

cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

120-133: Address TODO comments before merge.

Multiple TODO batch mode: tmp comments indicate these additions may be temporary or need refinement. Consider documenting the intended long-term design for set_initial_step_size and set_initial_primal_weight or removing the TODO markers if the implementation is final.
cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)
213-217: Duplicate helper function: finite_or_zero exists in convergence_information.cu.

This finite_or_zero helper is identical to the one defined in cpp/src/linear_programming/termination_strategy/convergence_information.cu (lines 472-475 per the relevant snippets). Consider extracting it to a shared utility header to avoid duplication. As per coding guidelines, duplicate code should be refactored into shared utilities.
♻️ Suggestion: Extract to shared utility

Move finite_or_zero and potentially max_abs_t to a shared header like linear_programming/utils.cuh:
// In linear_programming/utils.cuh
template <typename f_t>
HDI f_t finite_or_zero(f_t in)
{
  return isfinite(in) ? in : f_t(0.0);
}

template <typename f_t>
struct max_abs_t {
  HD f_t operator()(f_t a, f_t b) { return cuda::std::max(cuda::std::abs(a), cuda::std::abs(b)); }
};
cpp/src/dual_simplex/pseudo_costs.cpp (2)
162-210: Review batch PDLP strong branching implementation.

The batch path correctly:

Converts root_soln to original problem space via uncrush_primal_solution

Extracts fractional values for the batch solver

Interprets results with obj_down at index k and obj_up at index k + fractional.size()

However, there's no error handling if batch_pdlp_solve fails or returns unexpected results. Consider adding validation:
♻️ Suggested improvement: Add result validation
     std::vector<f_t> primal_solutions =
       batch_pdlp_solve(original_problem, fractional, fraction_values);
+    
+    if (primal_solutions.size() != fractional.size() * 2) {
+      settings.log.printf("Warning: batch_pdlp_solve returned unexpected result size %ld, expected %ld\n",
+                          primal_solutions.size(), fractional.size() * 2);
+      // Fall back to dual simplex path or handle error appropriately
+    }
175-180: Minor: Loop can use range-based iteration or reserve capacity.

The loop building fraction_values could benefit from reserving capacity upfront to avoid reallocations.
♻️ Suggested improvement
     std::vector<f_t> fraction_values;
+    fraction_values.reserve(fractional.size());

     for (i_t k = 0; k < fractional.size(); k++) {
       const i_t j = fractional[k];
       fraction_values.push_back(original_root_soln_x[j]);
     }
cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (2)
207-207: Unnamed parameter reduces readability.

The parameter i_t = 0 lacks a name, making the API less clear. Consider naming it for consistency with other per-id accessors.
Suggested improvement
-  f_t get_objective_value(i_t = 0) const;
+  f_t get_objective_value(i_t id = 0) const;
213-213: Same unnamed parameter issue.
Suggested improvement
-  f_t get_dual_objective_value(i_t = 0) const;
+  f_t get_dual_objective_value(i_t id = 0) const;
cpp/src/linear_programming/pdhg.hpp (1)

29-29: Clarify the distinction between is_legacy_batch_mode parameter and batch_mode_ member.

The constructor takes is_legacy_batch_mode as a parameter but stores batch_mode_ as a member. The relationship between these is unclear from the header alone. Consider documenting whether batch_mode_ is derived from is_legacy_batch_mode or from climber_strategies.size() > 1.

Also applies to: 89-89

cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu (1)

411-411: TODO comments indicate incomplete batch-per-climber restart logic.

The comments "TODO later batch mode: handle if not all restart" and "TODO later batch mode: remove this once you want to do per climber restart" indicate this is intentionally deferred functionality. Consider tracking these in an issue.

Would you like me to open an issue to track implementing per-climber restart logic in batch mode?

Also applies to: 427-427, 484-484
cpp/src/linear_programming/termination_strategy/infeasibility_information.hpp (1)
30-42: Constructor signature significantly expanded for cuPDLPx infeasibility detection.

The constructor now accepts scaled problem references, scaling strategy, and batch configuration. The comment annotations clarify which parameters are specifically for cuPDLPx infeasibility detection. However, the parameter list is becoming lengthy.

Consider grouping the cuPDLPx-specific parameters into a configuration struct to improve readability and maintainability:
struct cupdlpx_infeasibility_config_t {
  const problem_t<i_t, f_t>& op_problem_scaled;
  const cusparse_view_t<i_t, f_t>& scaled_cusparse_view;
  const pdlp_initial_scaling_strategy_t<i_t, f_t>& scaling_strategy;
};

coderabbitai · 2026-01-23T11:18:46Z

cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp

+  pdlp_termination_status_t get_termination_status(i_t id = 0) const;
+  std::vector<pdlp_termination_status_t>& get_terminations_status();


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the header file and understand its structure find . -name "solver_solution.hpp" -type f

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Search for related .cpp implementation files find . -path "*/linear_programming/pdlp/*" -name "*.cpp" -o -name "*.cu" | head -20

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Get an overview of the directory structure for pdlp find . -path "*/linear_programming/pdlp/*" -type f | head -30

Repository: NVIDIA/cuopt

Length of output: 318

🏁 Script executed:

# Read the solver_solution.hpp file to see the per-id accessors wc -l ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp

Repository: NVIDIA/cuopt

Length of output: 123

🏁 Script executed:

# Read around lines 243-244 and 257-259 to see the accessor declarations sed -n '235,265p' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp

Repository: NVIDIA/cuopt

Length of output: 1331

🏁 Script executed:

# Search for implementation files that might contain the implementation find . -path "*/src/*" -name "*solver_solution*" -type f find . -path "*/src/*" -name "*pdlp*" -type f | grep -E "\.(cpp|cu)$"

Repository: NVIDIA/cuopt

Length of output: 297

🏁 Script executed:

# Check the implementation file for the per-id accessors wc -l ./cpp/src/linear_programming/solver_solution.cu

Repository: NVIDIA/cuopt

Length of output: 107

🏁 Script executed:

# Search for get_termination_status and get_additional_termination_information implementations rg -n 'get_termination_status|get_additional_termination_information' ./cpp/src/linear_programming/solver_solution.cu -A 10

Repository: NVIDIA/cuopt

Length of output: 4516

🏁 Script executed:

# Search for cuopt_assert definition rg -n 'define.*cuopt_assert|^.*cuopt_assert.*=' --type cpp --type hpp | head -20

Repository: NVIDIA/cuopt

Length of output: 83

🏁 Script executed:

# Look for assertion utilities or error handling find . -path "*/include/*" -name "*assert*" -o -name "*error*" | grep -E "\.(hpp|h)$" | head -10

Repository: NVIDIA/cuopt

Length of output: 129

🏁 Script executed:

# Check if bounds checking uses assertions or runtime checks rg -n 'cuopt_assert' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Search for cuopt_assert definition across the codebase rg -n 'define.*cuopt_assert|cuopt_assert\s*(' --max-count 20 | head -30

Repository: NVIDIA/cuopt

Length of output: 180

🏁 Script executed:

# Check the error.hpp file for assertion macros cat ./cpp/include/cuopt/error.hpp

Repository: NVIDIA/cuopt

Length of output: 5690

🏁 Script executed:

# Search for cuopt_assert definition rg -n 'cuopt_assert' --max-count 30

Repository: NVIDIA/cuopt

Length of output: 50369

🏁 Script executed:

# Look at the full context of cuopt_assert usage in solver_solution.cu sed -n '310,400p' ./cpp/src/linear_programming/solver_solution.cu

Repository: NVIDIA/cuopt

Length of output: 2874

🏁 Script executed:

# Find where termination_status_ and termination_stats_ are resized/initialized rg -n 'termination_status_|termination_stats_' ./cpp/src/linear_programming/solver_solution.cu | head -50

Repository: NVIDIA/cuopt

Length of output: 3382

🏁 Script executed:

# Check the class definition to see member variables and their types sed -n '1,250p' ./cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp | rg -n 'termination_status_|termination_stats_'

Repository: NVIDIA/cuopt

Length of output: 1042

🏁 Script executed:

# Check how these vectors are initialized together to ensure they stay synchronized sed -n '50,90p' ./cpp/src/linear_programming/solver_solution.cu

Repository: NVIDIA/cuopt

Length of output: 1960

🏁 Script executed:

# Look for any code that might resize only one vector rg -n 'resize|push_back|clear' ./cpp/src/linear_programming/solver_solution.cu | head -30

Repository: NVIDIA/cuopt

Length of output: 512

🏁 Script executed:

# Check if cuopt_assert is actually enabled in production or if it's disabled cat ./cpp/src/utilities/macros.cuh | head -30

Repository: NVIDIA/cuopt

Length of output: 695

🏁 Script executed:

# Look for any constructor validation that ensures both vectors have same size rg -B5 -A10 'termination_stats_.*termination_status_' ./cpp/src/linear_programming/solver_solution.cu | grep -E 'cuopt_expects|assert|size|throw'

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Verify the specific implementations that have the mismatch sed -n '313,330p' ./cpp/src/linear_programming/solver_solution.cu

Repository: NVIDIA/cuopt

Length of output: 806

Add bounds validation to ensure termination_stats_ and termination_status_ are same size, or check bounds against the correct vector.

The per-id accessors check bounds against termination_status_.size() but access termination_stats_[id]:

get_objective_value(i_t id) (line 320-324): checks termination_status_.size() but accesses termination_stats_[id]

get_dual_objective_value(i_t id) (line 326-330): checks termination_status_.size() but accesses termination_stats_[id]

get_additional_termination_information(i_t id) (line 387-391): checks termination_status_.size() but accesses termination_stats_[id]

The constructors move both vectors together but don't validate they remain synchronized. Since cuopt_assert is disabled by default in production builds, add a size invariant check in constructors or adjust bounds checking to validate the vector being accessed.

🤖 Prompt for AI Agents

In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp` around lines 243 - 244, The per-id accessors (get_objective_value, get_dual_objective_value, get_additional_termination_information) currently check bounds against termination_status_.size() but index termination_stats_, so add a size invariant: in the constructors that set/move termination_stats_ and termination_status_ (the class ctor(s) that populate these vectors) assert or throw if termination_stats_.size() != termination_status_.size() to guarantee synchronization, and update the accessors to check the size of the vector they actually index (use termination_stats_.size() when accessing termination_stats_) as a secondary safety; reference termination_stats_, termination_status_, get_objective_value, get_dual_objective_value, and get_additional_termination_information when making these changes.

coderabbitai · 2026-01-23T11:18:46Z

cpp/include/cuopt/linear_programming/solve.hpp

+
+#include <dual_simplex/user_problem.hpp>


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the user_problem.hpp file and surrounding context fd -t f "user_problem.hpp" --exec cat {} +

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Examine the solve.hpp file to see what it includes and uses cat -n cpp/include/cuopt/linear_programming/solve.hpp | head -50

Repository: NVIDIA/cuopt

Length of output: 2490

🏁 Script executed:

# Check the directory structure to understand public vs internal layout tree -L 3 cpp/include/cuopt/ 2>/dev/null || find cpp/include/cuopt -type d | head -20

Repository: NVIDIA/cuopt

Length of output: 953

🏁 Script executed:

# Check if dual_simplex is documented as internal/public find cpp -name "*.hpp" -o -name "*.h" | xargs grep -l "dual_simplex" | head -10

Repository: NVIDIA/cuopt

Length of output: 469

🏁 Script executed:

# Find where user_problem.hpp actually is find cpp -name "user_problem.hpp" -type f

Repository: NVIDIA/cuopt

Length of output: 93

🏁 Script executed:

# Check what types/classes user_problem.hpp defines find cpp -name "user_problem.hpp" -type f -exec cat {} +

Repository: NVIDIA/cuopt

Length of output: 1462

🏁 Script executed:

# Look at more of solve.hpp to see how/if user_problem types are exposed cat -n cpp/include/cuopt/linear_programming/solve.hpp | head -100

Repository: NVIDIA/cuopt

Length of output: 5323

🏁 Script executed:

# Check if user_problem_t or similar types appear in the public function signature grep -r "user_problem" cpp/include/cuopt/ 2>/dev/null || echo "No user_problem references in public headers"

Repository: NVIDIA/cuopt

Length of output: 833

Remove internal header inclusion or provide abstraction for dual_simplex::user_problem_t in public API.

The public header includes dual_simplex/user_problem.hpp from the internal source tree and exposes dual_simplex::user_problem_t<i_t, f_t> in public function signatures (e.g., batch_pdlp_solve()). Users should not depend on internal implementation types. Either move the internal type to a public interface, or refactor to accept a public-facing problem representation instead.

🤖 Prompt for AI Agents

In `@cpp/include/cuopt/linear_programming/solve.hpp` around lines 17 - 18, The public header currently includes the internal header and exposes dual_simplex::user_problem_t<i_t, f_t> in the batch_pdlp_solve() signature; remove the internal include and stop exposing that internal type in the public API by either (a) introducing a public-facing problem abstraction (e.g., a public problem_t or ProblemView) and changing batch_pdlp_solve() to accept that, or (b) forward-declaring a minimal public interface and adapting the implementation to convert from the public type to dual_simplex::user_problem_t inside the cpp implementation; update all references to dual_simplex::user_problem_t in the header to use the new public type or opaque forward declaration and move any internal conversions into the source file.

coderabbitai · 2026-01-23T11:18:46Z

cpp/src/dual_simplex/simplex_solver_settings.hpp

+  i_t mip_batch_pdlp_strong_branching;  // 0 if not using batch PDLP for strong branching, 1 if
+                                        // using batch PDLP for strong branching


⚠️ Potential issue | 🔴 Critical

Missing initialization for mip_batch_pdlp_strong_branching.

The new field is not initialized in the constructor's initializer list (lines 42-98), leaving it with an indeterminate value. Other i_t members like random_seed, inside_mip, and num_bfs_workers are all explicitly initialized.

🐛 Proposed fix: Initialize to 0 (disabled by default)

Add to the constructor initializer list around line 91:

num_bfs_workers(std::max(num_threads / 4, 1)), + mip_batch_pdlp_strong_branching(0), random_seed(0),

🤖 Prompt for AI Agents

In `@cpp/src/dual_simplex/simplex_solver_settings.hpp` around lines 158 - 159, The field mip_batch_pdlp_strong_branching in class SimplexSolverSettings is not initialized in the constructor initializer list; add it (set to 0 by default) to the constructor's initializer list alongside other i_t members (e.g., random_seed, inside_mip, num_bfs_workers) so mip_batch_pdlp_strong_branching is deterministically initialized when a SimplexSolverSettings instance is created.

coderabbitai · 2026-01-23T11:18:47Z

cpp/src/linear_programming/cusparse_view.cu

+// TODO add proper checking
+#if CUDA_VER_12_4_UP
+template <typename T,
+          typename std::enable_if_t<std::is_same_v<T, float> || std::is_same_v<T, double>>*>
+cusparseStatus_t my_cusparsespmm_preprocess(cusparseHandle_t handle,
+                                            cusparseOperation_t opA,
+                                            cusparseOperation_t opB,
+                                            const T* alpha,
+                                            const cusparseSpMatDescr_t matA,
+                                            const cusparseDnMatDescr_t matB,
+                                            const T* beta,
+                                            const cusparseDnMatDescr_t matC,
+                                            cusparseSpMMAlg_t alg,
+                                            void* externalBuffer,
+                                            cudaStream_t stream)
+{
+  auto constexpr float_type = []() constexpr {
+    if constexpr (std::is_same_v<T, float>) {
+      return CUDA_R_32F;
+    } else if constexpr (std::is_same_v<T, double>) {
+      return CUDA_R_64F;
+    }
+  }();
+  CUSPARSE_CHECK(cusparseSetStream(handle, stream));
+  return cusparseSpMM_preprocess(
+    handle, opA, opB, alpha, matA, matB, beta, matC, float_type, alg, externalBuffer);
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and get its size wc -l cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 103

🏁 Script executed:

# Examine lines 238-264 to see the function definition sed -n '238,264p' cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 1362

🏁 Script executed:

# Examine lines 536-558 to see call sites sed -n '536,558p' cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 1226

🏁 Script executed:

# Search for all call sites of my_cusparsespmm_preprocess rg -n "my_cusparsespmm_preprocess" cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 645

🏁 Script executed:

# Check if RAFT_CUSPARSE_TRY is used elsewhere in the codebase rg -n "RAFT_CUSPARSE_TRY" --type cpp --type cu | head -20

Repository: NVIDIA/cuopt

Length of output: 82

🏁 Script executed:

# Check if CUSPARSE_CHECK is used in this file rg -n "CUSPARSE_CHECK" cpp/src/linear_programming/cusparse_view.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 112

🏁 Script executed:

# Look for error handling patterns in the file rg -n "cusparseStatus_t|CUSPARSE_CHECK|RAFT_CUSPARSE_TRY" cpp/src/linear_programming/cusparse_view.cu | head -30

Repository: NVIDIA/cuopt

Length of output: 1729

🏁 Script executed:

# Check the context around the first call site at line 536 sed -n '530,560p' cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 1560

🏁 Script executed:

# Check the context around line 560 sed -n '555,575p' cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 870

🏁 Script executed:

# Verify the pattern of RAFT_CUSPARSE_TRY vs CUSPARSE_CHECK in similar contexts sed -n '420,470p' cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 3171

Check cusparseSpMM_preprocess errors instead of dropping them.

my_cusparsespmm_preprocess returns cusparseStatus_t, but all call sites (lines 536, 548, 560, 572, 777, 789) ignore this return value. The function itself also fails to check the status of cusparseSpMM_preprocess before returning it. A failing preprocess will be silent. Wrap all call sites with RAFT_CUSPARSE_TRY (consistent with the pattern used throughout this file for similar CUSPARSE operations), or add CUSPARSE_CHECK inside the function and return success.

🐛 Example fix for call site at lines 536–547

- my_cusparsespmm_preprocess(handle_ptr_->get_cusparse_handle(), - CUSPARSE_OPERATION_NON_TRANSPOSE, - CUSPARSE_OPERATION_NON_TRANSPOSE, - alpha.data(), - A_T, - batch_delta_dual_solutions, - beta.data(), - batch_tmp_primals, - CUSPARSE_SPMM_CSR_ALG3, - buffer_transpose_batch.data(), - handle_ptr->get_stream()); + RAFT_CUSPARSE_TRY(my_cusparsespmm_preprocess(handle_ptr_->get_cusparse_handle(), + CUSPARSE_OPERATION_NON_TRANSPOSE, + CUSPARSE_OPERATION_NON_TRANSPOSE, + alpha.data(), + A_T, + batch_delta_dual_solutions, + beta.data(), + batch_tmp_primals, + CUSPARSE_SPMM_CSR_ALG3, + buffer_transpose_batch.data(), + handle_ptr->get_stream()));

Applies to: 536–547, 548–559, 560–571, 572–583, 777–788, 789–800.

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/cusparse_view.cu` around lines 238 - 264, The helper my_cusparsespmm_preprocess currently forwards cusparseSpMM_preprocess return status without checking it and all its call sites ignore that return, so preprocessing failures are silent; fix by either (A) adding CUSPARSE_CHECK around cusparseSpMM_preprocess inside my_cusparsespmm_preprocess so it logs/throws on error and then return CUSPARSE_STATUS_SUCCESS, or (B) leave the function returning cusparseStatus_t and wrap every call site of my_cusparsespmm_preprocess with RAFT_CUSPARSE_TRY (consistent with other CUSPARSE ops in this file) to propagate errors; update symbols: my_cusparsespmm_preprocess and each call site that invokes it so failures are not ignored.

coderabbitai · 2026-01-23T11:18:47Z

cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu

+    rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);
+    rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);


⚠️ Potential issue | 🔴 Critical

Critical: Local variables shadow member variables, causing use-after-free.

Lines 77-78 declare local rmm::device_buffer variables that shadow the member variables declared at lines 148-149. The local buffers are used for SpMM preprocessing and the CUDA graph capture (lines 82-136), but they are destroyed when the constructor exits. The captured graph will reference freed memory, leading to undefined behavior when launch() is called.

🐛 Proposed fix - assign to members instead of declaring locals

- rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view); - rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view); + buffer_transpose_batch.resize(buffer_size_transpose_batch, stream_view); + buffer_non_transpose_batch.resize(buffer_size_non_transpose_batch, stream_view);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);

rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);

buffer_transpose_batch.resize(buffer_size_transpose_batch, stream_view);

buffer_non_transpose_batch.resize(buffer_size_non_transpose_batch, stream_view);

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu` around lines 77 - 78, Local rmm::device_buffer variables buffer_transpose_batch and buffer_non_transpose_batch declared in the constructor shadow the class member buffers and are destroyed on constructor exit, causing the CUDA graph to reference freed memory; fix by removing the local declarations and assigning/constructing the buffers into the existing member variables (the member buffer_transpose_batch and buffer_non_transpose_batch) so the memory remains valid for the SpMM preprocessing, CUDA graph capture in the constructor, and later launch() calls.

coderabbitai · 2026-01-23T11:18:47Z

cpp/src/linear_programming/solve.cu

    // Should be filled with more information from dual simplex
-    typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t info;
-    info.primal_objective      = vertex_solution.user_objective;
-    info.number_of_steps_taken = vertex_solution.iterations;
-    auto crossover_end         = std::chrono::high_resolution_clock::now();
+    std::vector<
+      typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t>
+      info;
+    info[0].primal_objective      = vertex_solution.user_objective;
+    info[0].number_of_steps_taken = vertex_solution.iterations;
+    auto crossover_end            = std::chrono::high_resolution_clock::now();
    auto crossover_duration =
      std::chrono::duration_cast<std::chrono::milliseconds>(crossover_end - start_solver);
-    info.solve_time    = crossover_duration.count() / 1000.0;
+    info[0].solve_time = crossover_duration.count() / 1000.0;
    auto sol_crossover = optimization_problem_solution_t<i_t, f_t>(final_primal_solution,
                                                                   final_dual_solution,
                                                                   final_reduced_cost,
                                                                   problem.objective_name,
                                                                   problem.var_names,
                                                                   problem.row_names,
-                                                                   info,
-                                                                   termination_status);
+                                                                   std::move(info),
+                                                                   {termination_status});


⚠️ Potential issue | 🔴 Critical

Crossover termination info vector is never sized.

info is default‑constructed and then indexed at [0], which is out of bounds and will crash.

🐛 Suggested fix

- std::vector< - typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t> - info; + std::vector< + typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t> + info(1);

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/solve.cu` around lines 707 - 724, The vector info of type optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t is default-constructed and then accessed at info[0], causing out-of-bounds access; fix by sizing or populating info before use (e.g., call info.resize(1) or info.emplace_back()) and then set info[0].primal_objective, info[0].number_of_steps_taken and info[0].solve_time, so the std::move(info) passed into the optimization_problem_solution_t constructor is valid; adjust around symbols info, additional_termination_information_t, vertex_solution, and start_solver in the crossover/termination block.

coderabbitai · 2026-01-23T11:18:47Z

cpp/src/linear_programming/solve.cu

+  rmm::device_uvector<double> initial_primal(0, stream);
+  rmm::device_uvector<double> initial_dual(0, stream);
+  double initial_step_size     = std::numeric_limits<f_t>::signaling_NaN();
+  double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
+
+  cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0");
+  const int max_batch_size = settings.new_bounds.size();
+  int optimal_batch_size   = use_optimal_batch_size
+                               ? detail::optimal_batch_size_handler(problem, max_batch_size)
+                               : max_batch_size;
+  cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,
+               "Optimal batch size should be between 1 and max batch size");
+  using f_t2 = typename type_2<f_t>::type;
+
+  // If need warm start, solve the LP alone
+  if (primal_dual_init || primal_weight_init) {
+    pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings;
+    warm_start_settings.new_bounds.clear();
+    warm_start_settings.method               = cuopt::linear_programming::method_t::PDLP;
+    warm_start_settings.presolve             = false;
+    warm_start_settings.pdlp_solver_mode     = pdlp_solver_mode_t::Stable3;
+    warm_start_settings.detect_infeasibility = false;
+    optimization_problem_solution_t<i_t, f_t> original_solution =
+      solve_lp(problem, warm_start_settings);
+    if (primal_dual_init) {
+      initial_primal = rmm::device_uvector<double>(
+        original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());
+      initial_dual      = rmm::device_uvector<double>(original_solution.get_dual_solution(),
+                                                 original_solution.get_dual_solution().stream());
+      initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_;
+    }
+    if (primal_weight_init) {
+      initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_;
+    }


⚠️ Potential issue | 🔴 Critical

Type mismatch in batch warm‑start buffers.

initial_primal/initial_dual are double even when f_t is float, which won’t compile for the float instantiation and can’t bind to set_initial_* expecting f_t*.

🐛 Suggested fix

- rmm::device_uvector<double> initial_primal(0, stream); - rmm::device_uvector<double> initial_dual(0, stream); - double initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); - double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); + rmm::device_uvector<f_t> initial_primal(0, stream); + rmm::device_uvector<f_t> initial_dual(0, stream); + f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); + f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); @@ - initial_primal = rmm::device_uvector<double>( + initial_primal = rmm::device_uvector<f_t>( original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); - initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(), - original_solution.get_dual_solution().stream()); + initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(), + original_solution.get_dual_solution().stream());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rmm::device_uvector<double> initial_primal(0, stream);

rmm::device_uvector<double> initial_dual(0, stream);

double initial_step_size = std::numeric_limits<f_t>::signaling_NaN();

double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();

cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0");

const int max_batch_size = settings.new_bounds.size();

int optimal_batch_size = use_optimal_batch_size

? detail::optimal_batch_size_handler(problem, max_batch_size)

: max_batch_size;

cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,

"Optimal batch size should be between 1 and max batch size");

using f_t2 = typename type_2<f_t>::type;

// If need warm start, solve the LP alone

if (primal_dual_init || primal_weight_init) {

pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings;

warm_start_settings.new_bounds.clear();

warm_start_settings.method = cuopt::linear_programming::method_t::PDLP;

warm_start_settings.presolve = false;

warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3;

warm_start_settings.detect_infeasibility = false;

optimization_problem_solution_t<i_t, f_t> original_solution =

solve_lp(problem, warm_start_settings);

if (primal_dual_init) {

initial_primal = rmm::device_uvector<double>(

original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());

initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(),

original_solution.get_dual_solution().stream());

initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_;

}

if (primal_weight_init) {

initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_;

}

rmm::device_uvector<f_t> initial_primal(0, stream);

rmm::device_uvector<f_t> initial_dual(0, stream);

f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN();

f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();

cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0");

const int max_batch_size = settings.new_bounds.size();

int optimal_batch_size = use_optimal_batch_size

? detail::optimal_batch_size_handler(problem, max_batch_size)

: max_batch_size;

cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,

"Optimal batch size should be between 1 and max batch size");

using f_t2 = typename type_2<f_t>::type;

// If need warm start, solve the LP alone

if (primal_dual_init || primal_weight_init) {

pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings;

warm_start_settings.new_bounds.clear();

warm_start_settings.method = cuopt::linear_programming::method_t::PDLP;

warm_start_settings.presolve = false;

warm_start_settings.pdlp_solver_mode = pdlp_solver_mode_t::Stable3;

warm_start_settings.detect_infeasibility = false;

optimization_problem_solution_t<i_t, f_t> original_solution =

solve_lp(problem, warm_start_settings);

if (primal_dual_init) {

initial_primal = rmm::device_uvector<f_t>(

original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());

initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(),

original_solution.get_dual_solution().stream());

initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_;

}

if (primal_weight_init) {

initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_;

}

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/solve.cu` around lines 750 - 783, The warm-start buffers/values are typed as double but must match the template precision; change initial_primal and initial_dual from rmm::device_uvector<double> to rmm::device_uvector<f_t2> (use the existing using f_t2 = typename type_2<f_t>::type), and change initial_step_size and initial_primal_weight from double to f_t2 so they match original_solution.get_pdlp_warm_start_data() and the set_initial_* APIs (update their NaN initializers to std::numeric_limits<f_t2>::signaling_NaN()). Ensure the assignments that construct the device_uvectors from original_solution.get_primal_solution()/get_dual_solution() remain but now target the f_t2-typed buffers.

coderabbitai · 2026-01-23T11:18:48Z

cpp/src/linear_programming/swap_and_resize_helper.cuh

+template <typename i_t, typename f_t>
+void matrix_swap(rmm::device_uvector<f_t>& matrix,
+                 i_t vector_size,
+                 const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs)
+{
+  if (swap_pairs.empty()) { return; }
+
+  const i_t batch_size = matrix.size() / vector_size;
+  cuopt_assert(vector_size > 0, "Vector size must be greater than 0");
+  cuopt_assert(batch_size > 0, "Batch size must be greater than 0");
+
+  const size_t swap_count  = swap_pairs.size();
+  const size_t total_items = swap_count * static_cast<size_t>(vector_size);
+
+  auto counting   = thrust::make_counting_iterator<size_t>(0);
+  auto left_index = thrust::make_transform_iterator(
+    counting,
+    matrix_swap_index_functor<i_t>{thrust::raw_pointer_cast(swap_pairs.data()), vector_size, true});
+  auto right_index = thrust::make_transform_iterator(
+    counting,
+    matrix_swap_index_functor<i_t>{
+      thrust::raw_pointer_cast(swap_pairs.data()), vector_size, false});
+
+  auto left_perm  = thrust::make_permutation_iterator(matrix.data(), left_index);
+  auto right_perm = thrust::make_permutation_iterator(matrix.data(), right_index);
+  auto in_zip     = thrust::make_zip_iterator(left_perm, right_perm);
+  auto out_zip    = thrust::make_zip_iterator(left_perm, right_perm);
+
+  cub::DeviceTransform::Transform(
+    in_zip,
+    out_zip,
+    total_items,
+    [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {
+      return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));
+    },
+    matrix.stream());
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

cub::DeviceTransform in-place transformation same input output iterators documentation

💡 Result:

CUB’s cub::DeviceTransform::Transform does not document whether the output iterator is allowed to alias (be the same as) any input iterator (i.e., in-place transform). The public header/API listing shows the overloads and parameters but includes no “aliasing/overlap” guarantee or prohibition. [2]

If you need a documented guarantee for in-place transformation, thrust::transform explicitly states: “The input and output sequences may coincide, resulting in an in-place transformation.” [1]

Practical takeaway:

Need guaranteed in-place: use thrust::transform (documented). [1]

Using cub::DeviceTransform in-place: it may work, but it’s not specified in the docs, so don’t rely on it without confirmation/testing (or use a separate output buffer). [2]

Sources: [1] [2]

Use thrust::transform instead of cub::DeviceTransform for in-place transformations.

cub::DeviceTransform::Transform does not document support for in-place operations (input and output iterators aliasing the same memory). Replace with thrust::transform, which explicitly documents: "The input and output sequences may coincide, resulting in an in-place transformation." This also aligns with the guideline to prefer standard library utilities (Thrust) over non-standard alternatives.

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 48 - 84, The current in-place swap in matrix_swap uses cub::DeviceTransform::Transform with aliased input/output iterators (in_zip and out_zip), which is unsafe; replace that call with thrust::transform to perform the element-wise swap in-place (thrust documents input/output may coincide). Locate matrix_swap and the transform invocation that uses cub::DeviceTransform::Transform, and call thrust::transform with the same in_zip, a counting end (in_zip + total_items) and out_zip, passing the existing lambda (or equivalent functor) that swaps the tuple elements; ensure headers no longer require cub for this operation and keep the matrix_swap_index_functor and swap_pair_t usage unchanged.

coderabbitai · 2026-01-23T11:18:48Z

cpp/src/linear_programming/termination_strategy/convergence_information.cu

+template <typename i_t, typename f_t>
+void convergence_information_t<i_t, f_t>::resize_context(i_t new_size)
+{
+  [[maybe_unused]] const auto batch_size = static_cast<i_t>(primal_objective_.size());
+  cuopt_assert(batch_size > 0, "Batch size must be greater than 0");
+  cuopt_assert(new_size > 0, "New size must be greater than 0");
+  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+


⚠️ Potential issue | 🟠 Major

Allow no-op resize when new_size == batch_size.

The assert currently rejects equality despite the message, which can fail callers that attempt a no-op resize.

🔧 Suggested fix

- cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size"); + cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/termination_strategy/convergence_information.cu` around lines 249 - 256, The cuopt_assert in convergence_information_t<i_t, f_t>::resize_context incorrectly uses new_size < batch_size which forbids new_size == batch_size despite the error message; change the assertion to allow equality (use new_size <= batch_size) so callers can perform a no-op resize, and keep the existing message (or update it to match) while referencing primal_objective_.size() and cuopt_assert for locating the check.

coderabbitai · 2026-01-23T11:18:48Z

cpp/src/mip/solve.cu

+  // TODO ask Akif and Alice how was this passed down?
+  auto hyper_params                                     = settings.hyper_params;
+  hyper_params.update_primal_weight_on_initial_solution = false;
+  hyper_params.update_step_size_on_initial_solution     = true;
  // if the input problem is empty: early exit


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n -C2 'update_primal_weight_on_initial_solution|update_step_size_on_initial_solution' -g'*.cu' -g'*.cpp' -g'*.cuh' -g'*.hpp'

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# First, let's see the file structure git ls-files cpp/src/mip/ | head -20

Repository: NVIDIA/cuopt

Length of output: 949

🏁 Script executed:

# Read the context around lines 61-65 in solve.cu with more lines for context sed -n '40,120p' cpp/src/mip/solve.cu

Repository: NVIDIA/cuopt

Length of output: 3897

🏁 Script executed:

rg -n 'update_primal_weight_on_initial_solution' --type cpp --type cu --type cuh --type hpp

Repository: NVIDIA/cuopt

Length of output: 82

🏁 Script executed:

rg -n 'update_step_size_on_initial_solution' --type cpp --type cu --type cuh --type hpp

Repository: NVIDIA/cuopt

Length of output: 82

🏁 Script executed:

# Check the scaling strategy implementation to see if it reads these flags fd -e cu -e cpp -e cuh -e hpp | xargs rg -l 'pdlp_initial_scaling_strategy' | head -5

Repository: NVIDIA/cuopt

Length of output: 220

🏁 Script executed:

# Look for where initial_solutions are used in the solver rg -n 'initial_solutions' cpp/src/mip/solve.cu -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 618

🏁 Script executed:

rg -n 'update_primal_weight_on_initial_solution' -g '*.cpp' -g '*.cu' -g '*.cuh' -g '*.hpp'

Repository: NVIDIA/cuopt

Length of output: 3881

🏁 Script executed:

rg -n 'update_step_size_on_initial_solution' -g '*.cpp' -g '*.cu' -g '*.cuh' -g '*.hpp'

Repository: NVIDIA/cuopt

Length of output: 3306

🏁 Script executed:

# Search for pdlp_initial_scaling_strategy and read its constructor to see if it uses these flags rg -n 'pdlp_initial_scaling_strategy' -g '*.cu' -g '*.cpp' -g '*.cuh' -g '*.hpp' | head -10

Repository: NVIDIA/cuopt

Length of output: 1224

🏁 Script executed:

# Look at where these flags are consumed in pdlp.cu sed -n '1200,1210p' cpp/src/linear_programming/pdlp.cu

Repository: NVIDIA/cuopt

Length of output: 448

🏁 Script executed:

sed -n '1295,1305p' cpp/src/linear_programming/pdlp.cu

Repository: NVIDIA/cuopt

Length of output: 470

🏁 Script executed:

# Check what solver_context.cuh shows about how settings flow sed -n '25,60p' cpp/src/mip/solver_context.cuh

Repository: NVIDIA/cuopt

Length of output: 1243

🏁 Script executed:

# Check how the solver is instantiated in solve.cu after scaling is created sed -n '110,160p' cpp/src/mip/solve.cu

Repository: NVIDIA/cuopt

Length of output: 2393

🏁 Script executed:

# Check how mip_solver_t is instantiated and what it does with settings sed -n '1,80p' cpp/src/mip/solver.cu

Repository: NVIDIA/cuopt

Length of output: 2982

🏁 Script executed:

# Verify how the PDLP solver receives settings from solver context rg -n 'settings_.*hyper_params' cpp/src/mip/solver.cu cpp/src/mip/solver.cuh | head -20

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check if there's any way the modified hyper_params in solve.cu flows to PDLP rg -n 'mip_solver_t.*settings' cpp/src/mip/solve.cu -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 323

🏁 Script executed:

# Check if pdlp_initial_scaling_strategy_t constructor stores or uses these flags rg -n 'pdlp_initial_scaling_strategy_t.*hyper_params' cpp/src/linear_programming/ -B2 -A10 | head -60

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Search for the constructor implementation fd -name '*.cuh' -o -name '*.cu' | xargs rg -l 'class pdlp_initial_scaling_strategy_t' 2>/dev/null

Repository: NVIDIA/cuopt

Length of output: 446

🏁 Script executed:

# Let's check the declaration and see what parameters it stores rg -n 'class pdlp_initial_scaling_strategy_t|struct pdlp_initial_scaling_strategy_t' -A20

Repository: NVIDIA/cuopt

Length of output: 6063

🏁 Script executed:

# Read the full pdlp_initial_scaling_strategy_t class definition sed -n '23,150p' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh

Repository: NVIDIA/cuopt

Length of output: 4238

🏁 Script executed:

# Check the constructor of pdlp_initial_scaling_strategy_t rg -n 'pdlp_initial_scaling_strategy_t.*(' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh -A15 | head -80

Repository: NVIDIA/cuopt

Length of output: 174

🏁 Script executed:

# Check if pdlp_initial_scaling_strategy_t stores hyper_params rg -n 'hyper_params' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cuh

Repository: NVIDIA/cuopt

Length of output: 285

🏁 Script executed:

# Now check if PDLP gets hyper_params from settings or from somewhere else sed -n '1295,1310p' cpp/src/linear_programming/pdlp.cu

Repository: NVIDIA/cuopt

Length of output: 824

🏁 Script executed:

# Let's check how the PDLP solver is instantiated and where it gets its settings rg -n 'pdlp_solver_t.*settings' cpp/src/mip/ -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 471

🏁 Script executed:

# Check if settings are ever modified or passed differently to PDLP in run_solver sed -n '200,300p' cpp/src/mip/solver.cu

Repository: NVIDIA/cuopt

Length of output: 3447

🏁 Script executed:

# Verify if hyper_params from scaling strategy is used anywhere rg -n 'hyper_params_' cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 923

🏁 Script executed:

# Final verification: Check how PDLP is instantiated in the MIP solver and what settings it receives rg -n 'pdlp_solver_t' cpp/src/mip/ -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 471

🏁 Script executed:

# Check if there's any path where modified hyper_params could reach PDLP from scaling strategy rg -n 'get_scaled_op_problem\|scaling\.hyper_params' cpp/src/mip/ -B1 -A1

Repository: NVIDIA/cuopt

Length of output: 38

Clarify and fix scope of MIP-specific hyper‑parameter overrides.

Lines 61–65: You clone settings.hyper_params and flip two flags (update_primal_weight_on_initial_solution, update_step_size_on_initial_solution), but this local copy is only passed to the scaling strategy. The original settings object (unmodified) is what gets passed to mip_solver_t and eventually to PDLP. These flags are consumed by PDLP (pdlp.cu:1203, 1298) from settings_.hyper_params, so your local modifications won't propagate. Either thread the modified hyper_params through to the solver, or confirm these flags are MIP-only and only intended for the scaling phase. The TODO comment suggests this scope was already unclear to the team.

🤖 Prompt for AI Agents

In `@cpp/src/mip/solve.cu` around lines 61 - 65, You cloned settings.hyper_params and changed two flags but never propagated that modified copy into the solver, so PDLP (consuming settings_.hyper_params in mip_solver_t -> PDLP) still sees the original values; either (A) update the settings object itself before constructing mip_solver_t (assign the modified hyper_params back to settings.hyper_params so mip_solver_t/PDLP use the new flags), or (B) if these flags are truly only for the scaling strategy, add a comment and ensure only the scaling code receives the local hyper_params (and do not expect PDLP to observe them). Locate the code paths around the modified hyper_params, the callsite that constructs mip_solver_t, and PDLP usage (pdlp.cu references) to implement option A (assign back) or B (document and keep local), ensuring the flags are consistently applied where consumed.

coderabbitai · 2026-01-23T11:34:18Z

📝 Walkthrough

Walkthrough

This pull request refactors cuOpt's linear programming solver to transition from static/global hyperparameters to instance-based configurations and introduces batch processing capabilities. Changes include replacing global parameter declarations with a struct-based approach, adding multi-climber strategy support for parallel PDLP solving, threading parameters through solver components, and implementing batch context management utilities.

Changes

Cohort / File(s)	Summary
Core Hyperparameter Restructuring `cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh`, `cpp/src/linear_programming/pdlp_hyper_params.cu`	Replaced ~45 extern global variable declarations with a single struct `pdlp_hyper_params_t` containing all hyper-parameters as default-initialized members. Removed corresponding .cu definition file.
Batch Processing Infrastructure `cpp/src/linear_programming/pdlp_climber_strategy.hpp`, `cpp/src/linear_programming/pdlp_constants.hpp`, `cpp/src/linear_programming/utilities/segmented_sum_handler.cuh`, `cpp/src/linear_programming/swap_and_resize_helper.cuh`, `cpp/src/linear_programming/optimal_batch_size_handler/*`	Added `pdlp_climber_strategy_t` struct, batch sizing helpers, segmented reduction utilities, matrix swap/resize operations, and optimal batch size computation for multi-climber PDLP solving.
Parameter Propagation Updates `cpp/include/cuopt/linear_programming/mip/solver_settings.hpp`, `cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`, `cpp/src/linear_programming/solver_settings.cu`	Added `hyper_params` and `mip_batch_pdlp_strong_branching` fields to settings structures; introduced getters/setters for optional initial step size and primal weight.
PDHG Solver Batch Support `cpp/src/linear_programming/pdhg.hpp`, `cpp/src/linear_programming/pdhg.cu`	Extended constructor to accept climber strategies, hyper-parameters, and new bounds; expanded buffers for per-batch state; added context swap/resize methods; introduced batch-aware kernel operations and reflected primal-dual computations.
Restart Strategy Refactoring `cpp/src/linear_programming/restart_strategy/`, `cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.`	Replaced scalar device parameters with per-batch device vectors; added swap/resize context management; introduced multi-restart mode support (CUPDLPX, KKT, Trust Region); updated all dependent functions to operate on device_uvector instead of device_scalar.
Termination and Convergence Tracking `cpp/src/linear_programming/termination_strategy/`, `cpp/src/linear_programming/cusparse_view.`	Converted scalar objective/residual tracking to device vectors for batch support; added per-climber convergence information; extended constructors to accept climber strategies and hyper-parameters; introduced batch-aware kernel launches and segmented reductions.
Solver Solution API Updates `cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`, `cpp/src/linear_programming/solver_solution.cu`	Changed termination status and stats from scalars to vectors; modified constructors to accept rvalue references; added id-indexed accessors (`get_objective_value(i_t id)`, `get_termination_status(i_t id)`); added batch result accessors.
Utility Enhancements `cpp/src/linear_programming/utils.cuh`, `cpp/src/linear_programming/initial_scaling_strategy/`, `cpp/src/linear_programming/saddle_point.`	Added batch-wrapped iterators, batch-safe division, batch-aware bound combining, new normalization overloads; extended scaling strategy and saddle point state with batch support; updated memory access patterns.
Batch PDLP Solving `cpp/include/cuopt/linear_programming/solve.hpp`, `cpp/src/linear_programming/solve.cu`, `benchmarks/linear_programming/cuopt/run_pdlp.cu`	Added new `batch_pdlp_solve` overloads for fractional variable processing; updated hyper-parameter setters to operate on struct parameters instead of globals; implemented batch solving orchestration with warm-start and aggregation support.
Test and Build Updates `cpp/tests/linear_programming/pdlp_test.cu`, `cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh`, `cpp/tests/mip/*`, `cpp/src/linear_programming/CMakeLists.txt`, `docs/cuopt/source/versions1.json`	Updated test cases for batch API; modified constraint sanity checks to accept termination info; removed global setup functions; updated build system to include new source files; bumped documentation version.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Batch PDLP' is concise and directly describes the main feature added in this changeset - batch processing support for the PDLP solver.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

cpp/src/linear_programming/utilities/problem_checking.cu (1)

68-81: Do not globally disable initial primal bounds validation.

Commenting out the check affects all solver paths, not just batch PDLP warm starts. That can let non-batch warm starts violate bounds, risking solver instability or incorrect outcomes. Please gate this check behind an explicit “allow out-of-bounds initial primal” flag (or batch-PDLP mode) and keep validation for all other cases. As per coding guidelines, initialization of bounds and solver state must be validated before solve.

cpp/src/mip/diversity/diversity_manager.cu (1)

90-101: Incomplete configuration: config_id read from environment but never used.

The code reads CUOPT_CONFIG_ID from the environment variable and logs it, but config_id is never passed to any configuration logic. The get_local_search_and_lm_from_config function in multi_armed_bandit.cuh accepts a config_id parameter to configure local search behavior, but in run_local_search (line 113) it's called with ls_mab_option instead. Either complete the implementation to apply the configuration, or remove this unused code block.
cpp/src/dual_simplex/simplex_solver_settings.hpp (1)
42-96: Initialize mip_batch_pdlp_strong_branching.

The new field is not initialized in the constructor, leaving it indeterminate. Default it to 0 to avoid unpredictable strong-branching selection.
🐛 Proposed fix
       num_threads(omp_get_max_threads() - 1),
       num_bfs_workers(std::max(num_threads / 4, 1)),
+      mip_batch_pdlp_strong_branching(0),
       random_seed(0),
       inside_mip(0),
Also applies to: 158-159
cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)
64-95: Add CUDA error checking to kernel launch.

The call site in pdlp_restart_strategy.cu explicitly asserts that weight.size() == 1, so the weight vector is constrained to a single element by design—batch-mode indexing is not a concern. However, the add_weight_sums kernel launch on line 92 is missing error checking. Per the coding guidelines, all CUDA kernel launches must have error checking with CUDA_CHECK or equivalent (RAFT_CUDA_TRY).
Suggested fix
-    add_weight_sums<<<1, 1, 0, stream_view_>>>(weight.data(),
-                                               weight.data(),
-                                               sum_primal_solution_weights_.data(),
-                                               sum_dual_solution_weights_.data());
+    RAFT_CUDA_TRY(add_weight_sums<<<1, 1, 0, stream_view_>>>(weight.data(),
+                                                             weight.data(),
+                                                             sum_primal_solution_weights_.data(),
+                                                             sum_dual_solution_weights_.data()));
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)

86-101: Public struct layout change can break ABI for binary clients.

Adding mip_batch_pdlp_strong_branching (Line 86) and hyper_params (Line 101) changes the size/offsets of mip_solver_settings_t. If ABI stability is expected, consider a versioned struct or PIMPL/accessors, or bump the ABI/SONAME and document the change. As per coding guidelines, public API changes should be handled explicitly for compatibility.
cpp/src/linear_programming/saddle_point.cu (1)
95-106: Copy should cover all batches.

The buffers are now batched, but copy() only copies primal_size_/dual_size_. That leaves other batches stale.
🔧 Proposed fix
-  raft::copy(
-    this->primal_solution_.data(), other.get_primal_solution().data(), this->primal_size_, stream);
-  raft::copy(
-    this->dual_solution_.data(), other.get_dual_solution().data(), this->dual_size_, stream);
+  EXE_CUOPT_EXPECTS(this->primal_solution_.size() == other.get_primal_solution().size(),
+                    "Primal solution sizes must match for batched copy");
+  EXE_CUOPT_EXPECTS(this->dual_solution_.size() == other.get_dual_solution().size(),
+                    "Dual solution sizes must match for batched copy");
+  raft::copy(this->primal_solution_.data(),
+             other.get_primal_solution().data(),
+             this->primal_solution_.size(),
+             stream);
+  raft::copy(this->dual_solution_.data(),
+             other.get_dual_solution().data(),
+             this->dual_solution_.size(),
+             stream);
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

120-184: Public TODOs in the API surface.

The new setters/getters are still marked “TODO batch mode: tmp.” Please finalize semantics or track via an issue before release. Happy to help.

Also applies to: 249-254
cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu (1)
439-543: Add error checking after cub::DeviceTransform::Transform calls using the post-call pattern.

The cub::DeviceTransform::Transform launches lack error checking. Since Transform returns void (unlike DeviceReduce operations that return cudaError_t), error checking must be done via RAFT_CUDA_TRY(cudaPeekAtLastError()) immediately after each call, following the same pattern used for kernel launches in this file.
🐛 Correct pattern
  cub::DeviceTransform::Transform(
    cuda::std::make_tuple(op_problem_scaled_.objective_coefficients.data(),
                          problem_wrap_container(cummulative_variable_scaling_)),
    op_problem_scaled_.objective_coefficients.data(),
    op_problem_scaled_.objective_coefficients.size(),
    cuda::std::multiplies<f_t>{},
    stream_view_);
+  RAFT_CUDA_TRY(cudaPeekAtLastError());
Per CUDA safety guidelines, all device operations must have error checking. This applies to all Transform calls at lines 439–543, 577–648, and 697–768.
cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu (1)
438-449: Add CUDA error checking after kernel launch.

The kernel_compute_kkt_score kernel launch is missing error checking. Per coding guidelines, all CUDA kernel launches must be followed by error checking. The codebase consistently uses RAFT_CUDA_TRY(cudaPeekAtLastError()) for this purpose—see other similar launches in the same file for the standard pattern.
🔧 Proposed fix
  kernel_compute_kkt_score<f_t><<<1, 1, 0, stream_view_>>>(l2_primal_residual.data(),
                                                           l2_dual_residual.data(),
                                                           gap.data(),
                                                           primal_weight.data(),
                                                           tmp_kkt_score_.data());
+  RAFT_CUDA_TRY(cudaPeekAtLastError());
  return tmp_kkt_score_.value(stream_view_);

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Around line 162-210: The batch PDLP path incorrectly uses
uncrush_primal_solution and original_root_soln_x while fractional[] is in
LP-space; remove the call to uncrush_primal_solution and build fraction_values
from the LP-space root_soln (use root_soln[j] for j = fractional[k]) so the
indices align with fractional, and pass those LP-space values into
batch_pdlp_solve (keep batch_pdlp_solve usage unchanged); update any uses of
original_root_soln_x later in this block to use root_soln instead.

In `@cpp/src/linear_programming/cusparse_view.cu`:
- Around line 122-129: The move-assignment for
cusparse_dn_mat_descr_wrapper_t<f_t>::operator= fails to transfer ownership
correctly: after assigning descr_ from other.descr_ the need_destruction_ flag
is not updated, risking leaks or double-destruction; update this operator to set
this->need_destruction_ = other.need_destruction_ (then set
other.need_destruction_ = false) so ownership of the descriptor is transferred
atomically (and optionally handle self-assignment if desired).
- Around line 238-264: The my_cusparsespmm_preprocess function must mirror the
SpMV dynamic-loading pattern: use dynamic_load_runtime::function to load the
symbol "cusparseSpMM_preprocess" via dlsym, check the returned wrapper's
has_value() before calling it, and if the symbol is missing gracefully skip the
preprocess step (return CUSPARSE_STATUS_SUCCESS or another agreed error code)
instead of calling the symbol directly; retain the existing float_type
computation, CUSPARSE_CHECK(cusparseSetStream(...)), and call the loaded
function through the wrapper when available.

In
`@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu`:
- Around line 186-209: Add a precondition guard in optimal_batch_size_handler to
prevent non-positive max_batch_size from reaching the log2/pow2 computation: if
max_batch_size <= 1 return 1 (same behavior as the existing ==1 early return) or
clamp the value before computing current_batch_size by computing int cap =
std::max(1, std::min(initial_batch_size, max_batch_size)); then use cap in the
std::log2/std::pow expression to ensure current_batch_size is never 0 and avoid
divide-by-zero in evaluate_node.
- Around line 17-140: The constructor SpMM_benchmarks_context_t creates local
variables buffer_transpose_batch, buffer_non_transpose_batch, x_descr and
y_descr that shadow class members, causing the captured CUDA graph to reference
destroyed objects (UAF). Fix by removing the local shadowing: initialize and use
the member descriptors and member rmm::device_buffer fields (the class members
named buffer_transpose_batch, buffer_non_transpose_batch, x_descr, y_descr)
instead of creating local ones, allocate the member buffers with buffer_size_*
before preprocessing and use member_buffer.data() and the member descriptors in
my_cusparsespmm_preprocess and cusparsespmm calls so the objects remain alive
for the graph capture and launch.

In `@cpp/src/linear_programming/pdhg.cu`:
- Around line 756-773: The new kernel launches (e.g., inside
refine_initial_primal_projection calling refine_initial_primal_projection_kernel
with stream_view_) currently omit CUDA error checks; add an explicit post-launch
check (cudaPeekAtLastError() or cudaGetLastError wrapped in your project’s
CUDA_CHECK/RAFT_CUDA_TRY macro) immediately after the <<<>>> launch for
non-graph paths, and for graph-captured execution route use the appropriate
graph-launch error check path, so that any kernel launch failure is
logged/propagated consistently; apply the same fix to the other refine_* launch
sites mentioned (the launches around the other refine_initial... blocks).
- Around line 97-109: The code copies new_bounds into device buffers without
validating size or indices, risking OOB in the refine kernels; in the
constructor validate that new_bounds.size() equals the expected batch size,
iterate new_bounds to ensure each tuple's index (std::get<0>) is within [0,
num_variables) and that lower <= upper (and finite) for std::get<1>/std::get<2>,
and fail-fast (throw or set error) before calling raft::copy into
new_bounds_idx_, new_bounds_lower_, new_bounds_upper_; perform these checks
using the same types (i_t, f_t) and the members
new_bounds_idx_/new_bounds_lower_/new_bounds_upper_ so any mismatch is caught
prior to launching kernels.

In `@cpp/src/linear_programming/pdlp_constants.hpp`:
- Around line 37-39: The two variables deterministic_batch_pdlp and
enable_batch_resizing are declared as static non-const globals in a header,
which creates per-translation-unit copies; change their definition to a single
shared symbol: either (A) move the definitions into a single .cpp and expose
extern bool deterministic_batch_pdlp; extern bool enable_batch_resizing; in the
header, or (B) if they are truly compile-time constants, replace them with
inline constexpr bool deterministic_batch_pdlp = true; inline constexpr bool
enable_batch_resizing = true; — pick the approach that matches intended
mutability and update any code that sets/reads these flags accordingly (or pass
them via a settings struct/singleton used by the solver).
- Around line 17-22: kernel_config_from_batch_size currently allows
block_size==0 when batch_size==0 (causing cuda::ceil_div(0,0)) and shadows the
namespace-scope block_size constant; fix by ensuring the local block size is at
least 1 (e.g., size_t local_block_size = std::min(static_cast<size_t>(256),
std::max(static_cast<size_t>(1), batch_size))) so ceil_div never divides by
zero, compute grid_size = cuda::ceil_div(batch_size, local_block_size) and
return {grid_size, local_block_size}, and rename the local variable (e.g.,
local_block_size) to avoid shadowing the namespace-scope block_size constant
used elsewhere.

In `@cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu`:
- Around line 782-784: The copy is reversed: it currently writes stale
last_trial_fixed_point_error_ into fixed_point_error_ so the history never
advances; change the copy to update last_trial_fixed_point_error_ from the
current fixed_point_error_ by copying from fixed_point_error_.begin()/end() into
last_trial_fixed_point_error_.begin(), ensuring the history array
(last_trial_fixed_point_error_) receives the newest values.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 708-716: The code constructs an empty std::vector named info of
type optimization_problem_solution_t<i_t,
f_t>::additional_termination_information_t and then accesses info[0], causing
UB; fix by initializing or appending an element before writing fields—e.g.,
resize info to size 1 (info.resize(1)) or construct a local
additional_termination_information_t, fill its members (primal_objective,
number_of_steps_taken, solve_time using vertex_solution and start_solver), and
push_back/emplace_back it into info so info[0] is valid when accessed.
- Around line 750-783: The warm-start batch path declares initial_primal and
initial_dual as rmm::device_uvector<double>, which mismatches templated f_t
instantiations; change both to rmm::device_uvector<f_t> (or the appropriate
typedef alias used for the solver element type) and construct them from
original_solution.get_primal_solution()/get_dual_solution() so the
device_uvector element type matches f_t; keep initial_step_size and
initial_primal_weight as f_t and preserve the use of the original solution
streams when constructing the device_uvectors.
- Around line 556-563: The row-sense to bound mapping is inverted: in the loop
over i (using user_problem.row_sense) assign constraint_lower[i] =
user_problem.rhs[i] and constraint_upper[i] =
std::numeric_limits<f_t>::infinity() when row_sense == 'G' (>=), and assign
constraint_lower[i] = -std::numeric_limits<f_t>::infinity() and
constraint_upper[i] = user_problem.rhs[i] when row_sense == 'L' (<=); update the
block around the for-loop that sets constraint_lower/constraint_upper so 'G'
maps to a lower bound and 'L' maps to an upper bound (use the existing symbols
user_problem, constraint_lower, constraint_upper, f_t, and row_sense).

In
`@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu`:
- Around line 537-545: The kernel launch in
adaptive_step_size_strategy_t::get_primal_and_dual_stepsizes can OOB if
primal_step_size or dual_step_size are smaller than climber_strategies_.size();
add explicit size validation before launching compute_actual_stepsizes: compare
primal_step_size.size() and dual_step_size.size() against
climber_strategies_.size() and either resize the rmm::device_uvector buffers or
return/throw a clear error (with context) if they are too small, so the kernel
only writes within bounds using the existing stream_view_ and
kernel_config_from_batch_size logic.
- Around line 71-107: The three cub::DeviceSegmentedReduce::Sum calls in the
batch-mode block (the ones using tuple_multiplies<f_t>, power_two_func_t<f_t> on
norm_squared_delta_primal_ and norm_squared_delta_dual_) must be wrapped with
RAFT_CUDA_TRY so their returned cudaError_t is checked (this includes the
nullptr sizing calls); replace direct calls with
RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(...)) for each invocation and
likewise wrap the other three CUB Sum calls elsewhere (the other location noted
in the review) so failures propagate; keep existing local variables
(dot_product_bytes, dot_product_storage.resize, stream_view_, primal_size_,
dual_size_, climber_strategies_.size()) unchanged.

In `@cpp/src/linear_programming/swap_and_resize_helper.cuh`:
- Around line 49-66: In matrix_swap, avoid dividing matrix.size() by vector_size
before validating vector_size and ensure the matrix length is an exact multiple
of vector_size; first assert or return if vector_size == 0 (check vector_size >
0 before computing batch_size), compute batch_size only after that check (use
matrix.size() / vector_size into batch_size), and add a guard that matrix.size()
% vector_size == 0 (or log/throw) to prevent mismatched sizing; update the
function matrix_swap and any use of batch_size to rely on these validated
values.
- Around line 76-83: The CUB transform call using
cub::DeviceTransform::Transform (with in_zip, out_zip, total_items, lambda, and
matrix.stream()) is missing CUDA error checking; wrap the call with
RAFT_CUDA_TRY so the returned cudaError_t is checked and propagated (e.g.,
RAFT_CUDA_TRY(cub::DeviceTransform::Transform(...))); ensure you reference the
same parameters (in_zip, out_zip, total_items, matrix.stream()) and keep the
lambda as-is while adding the RAFT_CUDA_TRY wrapper to satisfy the coding
guideline requiring checks on all CUDA operations.

🟡 Minor comments (16)

cpp/src/mip/diversity/recombiners/sub_mip.cuh-72-74 (1)

72-74: Resolve or track the TODO before merge.

The TODO signals uncertainty in this recombination path. Please confirm the intended behavior and either remove the TODO with a clarified comment or open a tracked issue.
cpp/src/mip/solve.cu-61-64 (1)
61-64: TODO comment should be resolved before merge.

The TODO comment asks about how hyper-parameters were previously passed down. This should be resolved or removed before merging. The current implementation correctly copies settings.hyper_params and applies MIP-specific overrides.
Suggested action

Either:

Remove the TODO after confirming the implementation is correct with Akif and Alice

Add a brief comment explaining the MIP-specific overrides:
-  // TODO ask Akif and Alice how was this passed down?
-  auto hyper_params                                     = settings.hyper_params;
+  // Copy hyper_params from settings and apply MIP-specific overrides
+  auto hyper_params = settings.hyper_params;
   hyper_params.update_primal_weight_on_initial_solution = false;
   hyper_params.update_step_size_on_initial_solution     = true;
cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh-28-31 (1)
28-31: Default constructor leaves stream_view_ in an invalid state.

The default constructor on line 31 leaves stream_view_ uninitialized. If any method is called on an object constructed with the default constructor, it will use an invalid stream, leading to undefined behavior. Consider either:

Removing the default constructor if "non-batch mode" should not use this class

Initializing stream_view_ to a safe default (e.g., rmm::cuda_stream_view{} for the default stream)
Suggested fix
  // Empty constructor for when used in non batch mode
-  segmented_sum_handler_t() {}
+  segmented_sum_handler_t() : stream_view_(rmm::cuda_stream_default) {}
cpp/include/cuopt/linear_programming/constants.h-55-55 (1)

55-55: Document the new public parameter string.

Line 55 adds a new public parameter (CUOPT_MIP_BATCH_PDLP_STRONG_BRANCHING); please update user-facing docs/parameter tables and any CLI/config references accordingly. As per coding guidelines, ensure public API changes are documented.
benchmarks/linear_programming/cuopt/run_pdlp.cu-49-52 (1)
49-52: CLI help text is out of sync with the allowed modes.

Line 50 still lists only Stable3/Methodical1/Fast1 even though Stable1/Stable2 are now accepted in Line 52.
✅ Suggested update
-  program.add_argument("--pdlp-solver-mode")
-    .help("Solver mode for PDLP. Possible values: Stable3 (default), Methodical1, Fast1")
+  program.add_argument("--pdlp-solver-mode")
+    .help("Solver mode for PDLP. Possible values: Stable3 (default), Stable2, Stable1, Methodical1, Fast1")
cpp/src/linear_programming/saddle_point.hpp-64-67 (1)
64-67: Document the new batch_size parameter and context APIs.

The constructor doc omits batch_size, and the new swap_context / resize_context methods lack brief Doxygen notes. A small doc update would prevent ambiguity.
📝 Documentation touch-up
   * `@param` primal_size The size of the primal problem
   * `@param` dual_size The size of the dual problem
+  * `@param` batch_size Number of batch contexts to allocate
   *
   * `@throws` cuopt::logic_error if the problem sizes are not larger than 0.
   */
   saddle_point_state_t(raft::handle_t const* handle_ptr,
                        i_t primal_size,
                        i_t dual_size,
                        size_t batch_size);

+  /**
+   * `@brief` Swap internal buffers between batch contexts.
+   * `@param` swap_pairs Pairs of context indices to swap.
+   */
   void swap_context(const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs);
+  /**
+   * `@brief` Resize internal buffers to a new batch size.
+   * `@param` new_size New batch size.
+   */
   void resize_context(i_t new_size);
Also applies to: 94-95
cpp/include/cuopt/linear_programming/solve.hpp-77-104 (1)
77-104: Tighten public docs (typos + output clarity).

The new Doxygen block contains typos (“accross”, “thie”), and the output ordering for the returned vector isn’t described. Also consider a brief note on thread-safety / stream expectations for this public API.
✏️ Doc typo fixes
- * The only difference accross climbers will be one variable bound change.
+ * The only difference across climbers is a single variable bound change.
@@
- *   Let the optimal objective value of thie problem be obj_down
+ *   Let the optimal objective value of this problem be obj_down
As per coding guidelines, public headers should document API behavior, including thread-safety and parameter/return expectations.

Also applies to: 112-117
cpp/src/linear_programming/solver_settings.cu-61-71 (1)
61-71: Validate optional initial step/primal weight inputs.

Without guarding, negative or zero values can silently flow into the solver. Consider rejecting non-positive values on write.
✅ Add basic input validation
 void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size)
 {
+  cuopt_expects(initial_step_size > 0,
+                error_type_t::ValidationError,
+                "initial_step_size must be > 0");
   initial_step_size_ = std::make_optional(initial_step_size);
 }

 void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight)
 {
+  cuopt_expects(initial_primal_weight > 0,
+                error_type_t::ValidationError,
+                "initial_primal_weight must be > 0");
   initial_primal_weight_ = std::make_optional(initial_primal_weight);
 }
cpp/src/math_optimization/solver_settings.cu-74-75 (1)
74-75: Infeasibility tolerance defaults may be too strict for float builds.

Line 74–75 set defaults to 1e-10, which is below float precision and can make infeasibility detection overly strict when f_t=float. Consider making the default precision-aware (or documenting that these defaults assume double).
🛠️ Precision-aware default suggestion
+  constexpr f_t infeas_tol_default =
+    std::is_same_v<f_t, float> ? static_cast<f_t>(1e-6) : static_cast<f_t>(1e-10);
   float_parameters = {
     ...
-    {CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.primal_infeasible_tolerance, 0.0, 1e-1, 1e-10},
-    {CUOPT_DUAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.dual_infeasible_tolerance, 0.0, 1e-1, 1e-10}
+    {CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.primal_infeasible_tolerance, 0.0, 1e-1, infeas_tol_default},
+    {CUOPT_DUAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.dual_infeasible_tolerance, 0.0, 1e-1, infeas_tol_default}
   };
Based on learnings, guard against overly strict tolerances on edge cases.
cpp/src/linear_programming/saddle_point.cu-77-92 (1)
77-92: Allow no‑op resize or fix the assertion message.

The assertion uses < but the message says “less than or equal.” If no‑op resize is allowed, switch to <=; otherwise update the message.
🔧 Proposed fix (if no‑op resize is OK)
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");
cpp/src/linear_programming/swap_and_resize_helper.cuh-86-95 (1)
86-95: Validate non‑negative swap indices.

Negative indices can pass the < size() checks and still index OOB.
🔧 Proposed fix
-  cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds");
-  cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds");
+  cuopt_assert(left_swap_index >= 0 && right_swap_index >= 0, "Swap indices must be non-negative");
+  cuopt_assert(left_swap_index < host_vector.size(), "Left swap index is out of bounds");
+  cuopt_assert(right_swap_index < host_vector.size(), "Right swap index is out of bounds");
cpp/src/linear_programming/saddle_point.cu-20-34 (1)
20-34: Validate batch_size in the constructor.

Downstream code assumes a positive batch size; fail fast if it’s zero.
🔧 Proposed fix
   EXE_CUOPT_EXPECTS(primal_size > 0, "Size of the primal problem must be larger than 0");
   EXE_CUOPT_EXPECTS(dual_size > 0, "Size of the dual problem must be larger than 0");
+  EXE_CUOPT_EXPECTS(batch_size > 0, "Batch size must be larger than 0");
cpp/src/linear_programming/restart_strategy/localized_duality_gap_container.cu-110-115 (1)
110-115: Assertion rejects resizing to current size.

The message says “less than or equal,” but the check uses <. Allow equality to avoid spurious asserts when the size is unchanged.
🐛 Suggested fix
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");
cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu-154-161 (1)
154-161: Allow resize to the same size (<=)

The assert currently rejects new_size == batch_size, even though the message says “less than or equal.” Allowing equality avoids unnecessary failures on no‑op resizes.
🐛 Fix inequality
-  cuopt_assert(new_size < batch_size, "New size must be less than or equal to batch size");
+  cuopt_assert(new_size <= batch_size, "New size must be less than or equal to batch size");
cpp/src/linear_programming/termination_strategy/infeasibility_information.hpp-30-42 (1)

30-42: Ensure referenced scaled objects outlive this instance

The class now stores references to op_problem_scaled_, scaled_cusparse_view_, scaling_strategy_, climber_strategies_, and hyper_params_. Please document the lifetime expectations (or take ownership) to avoid dangling references.

Also applies to: 95-137
cpp/src/linear_programming/utils.cuh-236-264 (1)
236-264: Handle zero‑constraint batches before modulo

In batch mode, combined_bounds.size() % op_problem.n_constraints is undefined when n_constraints == 0. Please short‑circuit to avoid modulo‑by‑zero on empty problems.
🐛 Guard for empty constraint sets
  } else {
+    if (op_problem.n_constraints == 0) {
+      combined_bounds.resize(0, op_problem.handle_ptr->get_stream());
+      return;
+    }
     // In batch mode we use combined_constraint_bounds in convergeance_information to fill the
     // primal residual which will be bigger

🧹 Nitpick comments (16)

cpp/src/utilities/copy_helpers.hpp (1)
325-329: Consider adding a const overload for consistency.

The existing make_span for rmm::device_uvector provides both mutable and const overloads (lines 319-335). This new overload only provides the mutable version, which may limit its use in const contexts.
♻️ Suggested const overload
template <typename T>
raft::device_span<const T> make_span(thrust::universal_host_pinned_vector<T> const& container)
{
  return raft::device_span<const T>(thrust::raw_pointer_cast(container.data()), container.size());
}
cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp (1)
70-71: Minor style inconsistency between copy constructor and move assignment operator.

The copy constructor on line 70 uses the fully qualified template type pdlp_warm_start_data_t<i_t, f_t>, while the move assignment operator on line 71 uses the injected class name pdlp_warm_start_data_t. Both are valid C++, but for consistency, consider using the same style for both declarations.
Suggested consistency fix

Either simplify the copy constructor to match:
-  pdlp_warm_start_data_t(const pdlp_warm_start_data_t<i_t, f_t>& other);
+  pdlp_warm_start_data_t(const pdlp_warm_start_data_t& other);
Or revert the move assignment to match the existing style (though the current change is fine).
cpp/include/cuopt/linear_programming/utilities/segmented_sum_handler.cuh (2)
39-51: Inconsistent stream parameter passing to CUB APIs.

In segmented_sum_helper, stream_view_ is passed directly (relying on implicit conversion), while in segmented_reduce_helper, stream_view_.value() is used explicitly. For consistency and clarity, use .value() in both methods since CUB APIs expect cudaStream_t.
Suggested fix for consistency
    cub::DeviceSegmentedReduce::Sum(
-      nullptr, byte_needed_, input, output, batch_size, problem_size, stream_view_);
+      nullptr, byte_needed_, input, output, batch_size, problem_size, stream_view_.value());

-    segmented_sum_storage_.resize(byte_needed_, stream_view_);
+    segmented_sum_storage_.resize(byte_needed_, stream_view_.value());

    cub::DeviceSegmentedReduce::Sum(segmented_sum_storage_.data(),
                                    byte_needed_,
                                    input,
                                    output,
                                    batch_size,
                                    problem_size,
-                                    stream_view_);
+                                    stream_view_.value());
84-86: Consider making member variables private.

The member variables byte_needed_, segmented_sum_storage_, and stream_view_ are public but appear to be implementation details. Additionally, byte_needed_ is uninitialized which could cause issues if resize is called with an uninitialized size.
Suggested encapsulation
+private:
   size_t byte_needed_;
   rmm::device_buffer segmented_sum_storage_;
   rmm::cuda_stream_view stream_view_;
Or initialize byte_needed_ to 0:
-  size_t byte_needed_;
+  size_t byte_needed_{0};
cpp/src/linear_programming/pdlp_climber_strategy.hpp (1)
1-21: Consider namespacing the new strategy struct to avoid global pollution.

Since this type is part of the cuOpt LP stack, wrapping it in cuopt::linear_programming keeps the API surface consistent and avoids global namespace collisions.
♻️ Suggested refactor
-struct pdlp_climber_strategy_t {
-  int original_index;
-};
+namespace cuopt::linear_programming {
+struct pdlp_climber_strategy_t {
+  int original_index;
+};
+}  // namespace cuopt::linear_programming
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)

100-101: Resolve TODO before merge or convert it to a tracked issue.

Line 100-101 indicates unresolved design work; please replace with an issue reference or remove once resolved.

If you want, I can draft the issue text or propose the resolution path.

cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

199-200: Document tighter infeasibility tolerances.

Defaults changed to 1e‑10; update API docs/release notes so users know about the new numerical expectations. As per coding guidelines.
cpp/tests/linear_programming/pdlp_test.cu (1)
1011-1762: Prefer tolerant comparisons for floating batch outputs.

Many assertions use EXPECT_EQ on floating objectives/residuals/solution values across batch entries. GPU reductions can be non‑bitwise deterministic across runs, so these can flake. Consider EXPECT_NEAR with tolerance (or guard with a deterministic-mode flag) for these comparisons.
♻️ Example adjustment (representative)
-    EXPECT_EQ(ref_primal, solution.get_additional_termination_information(i).primal_objective);
+    EXPECT_NEAR(ref_primal,
+                solution.get_additional_termination_information(i).primal_objective,
+                tolerance);
cpp/src/linear_programming/utils.cuh (2)
190-229: Use size_t/i_t for wrapped iterator indices to avoid overflow

int truncates in.size() for large batches; use size_t (or template on i_t) and build the counting iterator with the same type to avoid overflow.
♻️ Safer index types
-  batch_wrapped_iterator(const f_t* problem_input, int problem_size)
+  batch_wrapped_iterator(const f_t* problem_input, size_t problem_size)
     : problem_input_(problem_input), problem_size_(problem_size)
   {
   }
-  HDI f_t operator()(int id) { return problem_input_[id / problem_size_]; }
+  HDI f_t operator()(size_t id) { return problem_input_[id / problem_size_]; }

   const f_t* problem_input_;
-  int problem_size_;
+  size_t problem_size_;
-  problem_wrapped_iterator(const f_t* problem_input, int problem_size)
+  problem_wrapped_iterator(const f_t* problem_input, size_t problem_size)
     : problem_input_(problem_input), problem_size_(problem_size)
   {
   }
-  HDI f_t operator()(int id) { return problem_input_[id % problem_size_]; }
+  HDI f_t operator()(size_t id) { return problem_input_[id % problem_size_]; }

   const f_t* problem_input_;
-  int problem_size_;
+  size_t problem_size_;
-  return thrust::make_transform_iterator(thrust::make_counting_iterator(0),
+  return thrust::make_transform_iterator(thrust::make_counting_iterator<size_t>(0),
                                          problem_wrapped_iterator<f_t>(in.data(), in.size()));
543-565: Enforce size‑1 output for cublas nrm2 overload

cublasNrm2 produces a single scalar. The device_uvector overload should guard result.size() == 1 (or use device_scalar) to prevent silent misuse.
✅ Size guard
 void inline my_l2_norm(const rmm::device_uvector<f_t>& input_vector,
                        rmm::device_uvector<f_t>& result,
                        raft::handle_t const* handle_ptr)
 {
+  cuopt_assert(result.size() == 1, "my_l2_norm expects a single-value output buffer");
   my_l2_norm<i_t, f_t>(input_vector.data(), result.data(), input_vector.size(), handle_ptr);
 }
cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh (2)

12-56: Add Doxygen docs for new public hyper‑params struct

This is a new public type under cpp/include; please add a brief Doxygen comment for the struct (and ideally clarify units/meaning of key fields) for API users. As per coding guidelines, public headers should document new public entities.

58-59: Track the TODO with an issue so it doesn’t get lost

If you want, I can help open an issue to track the planned removal of pdlp_solver_mode.

cpp/src/linear_programming/termination_strategy/convergence_information.hpp (1)

146-147: Consider making batch_mode_ non-const for potential future flexibility.

The batch_mode_ flag is initialized as a const bool with a default value of false. While this works for the current design, if there's ever a need to toggle batch mode after construction (e.g., for problem recycling), this would require refactoring.

If batch mode is truly immutable for the lifetime of the object, the current design is fine. However, consider whether the initialization value should come from a constructor parameter rather than being hardcoded to false.
cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cuh (3)
303-305: Public batch_mode_ flag is exposed but may warrant encapsulation.

The batch_mode_ member is declared in a public: section. While this provides convenient access, consider whether a getter method would be more appropriate for encapsulation, especially since the member is const and cannot be modified externally anyway.

366-368: Storage for batched dot product operations added.

The dot_product_storage buffer and dot_product_bytes follow the pattern used in convergence_information.hpp for temporary workspace needed by batch operations.

Minor style note: consider using trailing underscore (dot_product_storage_, dot_product_bytes_) for consistency with other private members in the class.
♻️ Suggested naming consistency fix
-  rmm::device_buffer dot_product_storage;
-  size_t dot_product_bytes{0};
+  rmm::device_buffer dot_product_storage_;
+  size_t dot_product_bytes_{0};
392-412: Template parameters should either be removed or documented for consistency.

The functions is_trust_region_restart, is_kkt_restart, and is_cupdlpx_restart maintain clean, type-safe implementations with const reference parameters. However, the template parameters i_t and f_t are not actually used in the function bodies—they only access hyper_params.restart_strategy and cast to int.

While the current design maintains consistency with template propagation patterns across the codebase (all callers pass explicit template arguments), consider whether these helpers could be simplified to non-template functions or, alternatively, add a comment explaining why the template parameters are required for consistency with the calling context.

coderabbitai · 2026-01-23T11:34:21Z

cpp/src/dual_simplex/pseudo_costs.cpp

+  if (settings.mip_batch_pdlp_strong_branching) {
+    settings.log.printf("Batch PDLP strong branching enabled\n");
+
+    std::chrono::steady_clock::time_point start_batch = std::chrono::steady_clock::now();
+
+    // Use original_problem to create the BatchLP problem
+    csr_matrix_t<i_t, f_t> A_row(original_problem.A.m, original_problem.A.n, 0);
+    original_problem.A.to_compressed_row(A_row);
+
+    // Convert the root_soln to the original problem space
+    std::vector<f_t> original_root_soln_x;
+    uncrush_primal_solution(original_problem, original_lp, root_soln, original_root_soln_x);
+
+    std::vector<f_t> fraction_values;
+
+    for (i_t k = 0; k < fractional.size(); k++) {
+      const i_t j = fractional[k];
+      fraction_values.push_back(original_root_soln_x[j]);
+    }
+
+    std::vector<f_t> primal_solutions =
+      batch_pdlp_solve(original_problem, fractional, fraction_values);
+    std::chrono::steady_clock::time_point end_batch = std::chrono::steady_clock::now();
+    std::chrono::duration<f_t> duration             = end_batch - start_batch;
+    settings.log.printf("Batch PDLP strong branching took %.2f seconds\n", duration.count());
+
+    for (i_t k = 0; k < fractional.size(); k++) {
+      const i_t j = fractional[k];
+
+      // Call BatchLP solver. Solve 2*fractional.size() subproblems.
+      // Let j = fractional[k]. We want to solve the two trial branching problems
+      // Branch down:
+      // minimize c^T x
+      // subject to lb <= A*x <= ub
+      // x_j <= floor(root_soln[j])
+      // l <= x < u
+      // Let the optimal objective value of thie problem be obj_down
+      f_t obj_down = primal_solutions[k];
+
+      // Branch up:
+      // minimize c^T x
+      // subject to lb <= A*x <= ub
+      // x_j >= ceil(root_soln[j])
+      // Let the optimal objective value of thie problem be obj_up
+      f_t obj_up = primal_solutions[k + fractional.size()];
+
+      pc.strong_branch_down[k] = obj_down - root_obj;
+      pc.strong_branch_up[k]   = obj_up - root_obj;
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Locate and examine batch_pdlp_solve implementation fd -t f "\.cpp|\.hpp|\.h" | xargs rg -n "batch_pdlp_solve" -A 30 | head -100

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Find the definition of batch_pdlp_solve function rg -n "^[a-zA-Z_<>\*\s:]+batch_pdlp_solve" --type cpp --type hpp --type h -A 20

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Examine uncrush_primal_solution to understand the contract rg -n "uncrush_primal_solution" --type cpp --type hpp --type h -B 2 -A 15

Repository: NVIDIA/cuopt

Length of output: 83

🏁 Script executed:

# Check where fractional indices are populated - understand their source space rg -n "fractional\[" cpp/src/dual_simplex/pseudo_costs.cpp -B 5 -A 2

Repository: NVIDIA/cuopt

Length of output: 2730

🏁 Script executed:

# Find root_obj definition and its context rg -n "root_obj" cpp/src/dual_simplex/pseudo_costs.cpp -B 3 -A 1

Repository: NVIDIA/cuopt

Length of output: 1915

🏁 Script executed:

# Search for batch_pdlp_solve definition more broadly rg -n "batch_pdlp_solve" --type c --type cpp -B 2 -A 20 | head -150

Repository: NVIDIA/cuopt

Length of output: 5374

🏁 Script executed:

# Look for where fractional vector is populated/created to understand its context rg -n "fractional\s*=" cpp/src/dual_simplex/ --type cpp -B 3 -A 1 | head -80

Repository: NVIDIA/cuopt

Length of output: 2968

🏁 Script executed:

# Find uncrush_primal_solution definition rg -n "uncrush_primal_solution" --type cpp --type c -B 2 -A 15 | head -100

Repository: NVIDIA/cuopt

Length of output: 8526

🏁 Script executed:

# Check function signature of the function containing batch PDLP code sed -n '140,180p' cpp/src/dual_simplex/pseudo_costs.cpp

Repository: NVIDIA/cuopt

Length of output: 1888

🏁 Script executed:

# Look at the beginning of the file to understand includes and context head -50 cpp/src/dual_simplex/pseudo_costs.cpp

Repository: NVIDIA/cuopt

Length of output: 1840

🏁 Script executed:

# Find fractional_variables function definition rg -n "fractional_variables" --type cpp -B 2 -A 20 | head -150

Repository: NVIDIA/cuopt

Length of output: 8282

🏁 Script executed:

# Check the problem transformation context - what is original_lp vs original_problem rg -n "struct user_problem_t|struct lp_problem_t" --type cpp --type h -A 8 | head -100

Repository: NVIDIA/cuopt

Length of output: 1224

🏁 Script executed:

# Trace where root_soln comes from - is it in LP space or original space? rg -n "root_soln\|root_relax_soln_" cpp/src/dual_simplex/branch_and_bound.cpp -B 2 -A 2 | head -100

Repository: NVIDIA/cuopt

Length of output: 38

Fix index space mismatch in batch PDLP strong branching.

The fractional indices are derived from the LP solution space (via fractional_variables operating on root_relax_soln_.x), but the batch PDLP path converts root_soln to the original problem space and then incorrectly indexes it with LP-space indices (lines 178–179). This fails when the LP has more variables than the original problem (typical with presolve slack variables), causing wrong variable values to be passed to batch_pdlp_solve.

The sequential branching path at line 42 correctly uses fractional[k] directly with original_lp, confirming both are in LP space. The batch path should either:

Keep root_soln in LP space and not call uncrush_primal_solution, or

Map the fractional indices to the original problem space when accessing original_root_soln_x.

🤖 Prompt for AI Agents

In `@cpp/src/dual_simplex/pseudo_costs.cpp` around lines 162 - 210, The batch PDLP path incorrectly uses uncrush_primal_solution and original_root_soln_x while fractional[] is in LP-space; remove the call to uncrush_primal_solution and build fraction_values from the LP-space root_soln (use root_soln[j] for j = fractional[k]) so the indices align with fractional, and pass those LP-space values into batch_pdlp_solve (keep batch_pdlp_solve usage unchanged); update any uses of original_root_soln_x later in this block to use root_soln instead.

coderabbitai · 2026-01-23T11:34:21Z

cpp/src/linear_programming/cusparse_view.cu

+template <typename f_t>
+cusparse_dn_mat_descr_wrapper_t<f_t>& cusparse_dn_mat_descr_wrapper_t<f_t>::operator=(
+  cusparse_dn_mat_descr_wrapper_t<f_t>&& other)
+{
+  if (need_destruction_) { RAFT_CUSPARSE_TRY(cusparseDestroyDnMat(descr_)); }
+  descr_                  = other.descr_;
+  other.need_destruction_ = false;
+  return *this;


⚠️ Potential issue | 🟠 Major

Fix ownership flag in cusparse_dn_mat_descr_wrapper_t move-assignment.

need_destruction_ is not updated from the moved-from object, so the descriptor can leak (or be left non-owning) after move-assigning into a default-constructed wrapper.

🔧 Suggested fix

cusparse_dn_mat_descr_wrapper_t<f_t>& cusparse_dn_mat_descr_wrapper_t<f_t>::operator=( cusparse_dn_mat_descr_wrapper_t<f_t>&& other) { if (need_destruction_) { RAFT_CUSPARSE_TRY(cusparseDestroyDnMat(descr_)); } - descr_ = other.descr_; - other.need_destruction_ = false; + descr_ = other.descr_; + need_destruction_ = other.need_destruction_; + other.need_destruction_ = false; return *this; }

As per coding guidelines, ensure resource lifecycles are correctly transferred and released.

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/cusparse_view.cu` around lines 122 - 129, The move-assignment for cusparse_dn_mat_descr_wrapper_t<f_t>::operator= fails to transfer ownership correctly: after assigning descr_ from other.descr_ the need_destruction_ flag is not updated, risking leaks or double-destruction; update this operator to set this->need_destruction_ = other.need_destruction_ (then set other.need_destruction_ = false) so ownership of the descriptor is transferred atomically (and optionally handle self-assignment if desired).

coderabbitai · 2026-01-23T11:34:21Z

cpp/src/linear_programming/cusparse_view.cu

+// TODO add proper checking
+#if CUDA_VER_12_4_UP
+template <typename T,
+          typename std::enable_if_t<std::is_same_v<T, float> || std::is_same_v<T, double>>*>
+cusparseStatus_t my_cusparsespmm_preprocess(cusparseHandle_t handle,
+                                            cusparseOperation_t opA,
+                                            cusparseOperation_t opB,
+                                            const T* alpha,
+                                            const cusparseSpMatDescr_t matA,
+                                            const cusparseDnMatDescr_t matB,
+                                            const T* beta,
+                                            const cusparseDnMatDescr_t matC,
+                                            cusparseSpMMAlg_t alg,
+                                            void* externalBuffer,
+                                            cudaStream_t stream)
+{
+  auto constexpr float_type = []() constexpr {
+    if constexpr (std::is_same_v<T, float>) {
+      return CUDA_R_32F;
+    } else if constexpr (std::is_same_v<T, double>) {
+      return CUDA_R_64F;
+    }
+  }();
+  CUSPARSE_CHECK(cusparseSetStream(handle, stream));
+  return cusparseSpMM_preprocess(
+    handle, opA, opB, alpha, matA, matB, beta, matC, float_type, alg, externalBuffer);
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, get the file size and explore the structure wc -l cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Look at the area around lines 238-264 to see the context sed -n '230,270p' cpp/src/linear_programming/cusparse_view.cu | cat -n

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Search for cusparseSpMV_preprocess to find the SpMV pattern mentioned rg "cusparseSpMV_preprocess" cpp/src/linear_programming/cusparse_view.cu -A 5 -B 5

Repository: NVIDIA/cuopt

Length of output: 168

🏁 Script executed:

# Search for dlsym usage in the file rg "dlsym" cpp/src/linear_programming/cusparse_view.cu -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 877

🏁 Script executed:

# Find the full SpMV preprocess function implementation rg -A 30 "template.*cusparseSpMV_preprocess" cpp/src/linear_programming/cusparse_view.cu

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Get the full context around lines 238-264 including broader context sed -n '200,280p' cpp/src/linear_programming/cusparse_view.cu | cat -n

Repository: NVIDIA/cuopt

Length of output: 4204

🏁 Script executed:

# Search for any fallback handling when dlsym fails rg "has_value|cusparseSpMM_preprocess" cpp/src/linear_programming/cusparse_view.cu -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 597

Mirror the SpMV dynamic loading pattern for cusparseSpMM_preprocess.

The SpMV path correctly uses dynamic_load_runtime::function with dlsym to handle runtime CUDA version mismatches (symbol may not exist if runtime cuSPARSE < 12.4). The SpMM path directly calls cusparseSpMM_preprocess without any dynamic loading, which will fail at load time if the runtime cuSPARSE library predates CUDA 12.4, even though the code compiled with CUDA 12.4+. The #if CUDA_VER_12_4_UP guard only controls compile-time availability, not runtime availability.

Refactor my_cusparsespmm_preprocess to use the same pattern: load the symbol via dlsym, check has_value(), and gracefully skip the preprocess step if unavailable (or return an error code).

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/cusparse_view.cu` around lines 238 - 264, The my_cusparsespmm_preprocess function must mirror the SpMV dynamic-loading pattern: use dynamic_load_runtime::function to load the symbol "cusparseSpMM_preprocess" via dlsym, check the returned wrapper's has_value() before calling it, and if the symbol is missing gracefully skip the preprocess step (return CUSPARSE_STATUS_SUCCESS or another agreed error code) instead of calling the symbol directly; retain the existing float_type computation, CUSPARSE_CHECK(cusparseSetStream(...)), and call the loaded function through the wrapper when available.

coderabbitai · 2026-01-23T11:34:21Z

cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu

+template <typename i_t, typename f_t>
+struct SpMM_benchmarks_context_t {
+  SpMM_benchmarks_context_t(cusparse_sp_mat_descr_wrapper_t<i_t, f_t>& A,
+                            cusparse_sp_mat_descr_wrapper_t<i_t, f_t>& A_T,
+                            int primal_size,
+                            int dual_size,
+                            size_t current_batch_size,
+                            raft::handle_t const* handle_ptr)
+    : x(primal_size * current_batch_size, handle_ptr->get_stream()),
+      y(dual_size * current_batch_size, handle_ptr->get_stream()),
+      buffer_non_transpose_batch(0, handle_ptr->get_stream()),
+      buffer_transpose_batch(0, handle_ptr->get_stream()),
+      ping_pong_graph(handle_ptr->get_stream())
+  {
+    auto stream_view = handle_ptr->get_stream();
+    cusparse_dn_mat_descr_wrapper_t<f_t> x_descr;
+    cusparse_dn_mat_descr_wrapper_t<f_t> y_descr;
+
+    int rows_primal = primal_size;
+    int col_primal  = current_batch_size;
+    int ld_primal   = current_batch_size;
+
+    int rows_dual = dual_size;
+    int col_dual  = current_batch_size;
+    int ld_dual   = current_batch_size;
+
+    x_descr.create(rows_primal, col_primal, ld_primal, x.data(), CUSPARSE_ORDER_ROW);
+    y_descr.create(rows_dual, col_dual, ld_dual, y.data(), CUSPARSE_ORDER_ROW);
+
+    // Init buffers for SpMMs
+    const rmm::device_scalar<f_t> alpha{1, stream_view};
+    const rmm::device_scalar<f_t> beta{0, stream_view};
+    size_t buffer_size_non_transpose_batch = 0;
+    RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm_bufferSize(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A,
+      x_descr,
+      beta.data(),
+      y_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      &buffer_size_non_transpose_batch,
+      stream_view));
+
+    size_t buffer_size_transpose_batch = 0;
+    RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm_bufferSize(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A_T,
+      y_descr,
+      beta.data(),
+      x_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      &buffer_size_transpose_batch,
+      stream_view));
+
+    rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);
+    rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);
+
+#if CUDART_VERSION >= 12040
+    // Preprocess buffers for SpMMs
+    my_cusparsespmm_preprocess<f_t>(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A_T,
+      y_descr,
+      beta.data(),
+      x_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      buffer_transpose_batch.data(),
+      stream_view);
+
+    my_cusparsespmm_preprocess<f_t>(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A,
+      x_descr,
+      beta.data(),
+      y_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      buffer_non_transpose_batch.data(),
+      stream_view);
+#endif
+
+    // First empty run for warm up and put it in a CUDA Graph
+    ping_pong_graph.start_capture(0);
+    RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A,
+      x_descr,
+      beta.data(),
+      y_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      (f_t*)buffer_non_transpose_batch.data(),
+      stream_view));
+
+    RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmm(
+      handle_ptr->get_cusparse_handle(),
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      CUSPARSE_OPERATION_NON_TRANSPOSE,
+      alpha.data(),
+      A_T,
+      y_descr,
+      beta.data(),
+      x_descr,
+      (deterministic_batch_pdlp) ? CUSPARSE_SPMM_CSR_ALG3 : CUSPARSE_SPMM_CSR_ALG2,
+      (f_t*)buffer_transpose_batch.data(),
+      stream_view));
+
+    ping_pong_graph.end_capture(0);
+    // Just for the warm up
+    ping_pong_graph.launch(0);
+  }


⚠️ Potential issue | 🔴 Critical

Fix buffer/descriptor shadowing to avoid UAF in captured graphs.

Local buffer_* and x_descr/y_descr shadow the members, so the captured CUDA graph references memory/descriptors that are destroyed at the end of the constructor. Use the member fields to keep them alive for the graph lifetime.

🐛 Suggested fix

- cusparse_dn_mat_descr_wrapper_t<f_t> x_descr; - cusparse_dn_mat_descr_wrapper_t<f_t> y_descr; + // Use member descriptors to keep them alive for graph capture @@ - rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view); - rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view); + buffer_transpose_batch = rmm::device_buffer(buffer_size_transpose_batch, stream_view); + buffer_non_transpose_batch = rmm::device_buffer(buffer_size_non_transpose_batch, stream_view);

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu` around lines 17 - 140, The constructor SpMM_benchmarks_context_t creates local variables buffer_transpose_batch, buffer_non_transpose_batch, x_descr and y_descr that shadow class members, causing the captured CUDA graph to reference destroyed objects (UAF). Fix by removing the local shadowing: initialize and use the member descriptors and member rmm::device_buffer fields (the class members named buffer_transpose_batch, buffer_non_transpose_batch, x_descr, y_descr) instead of creating local ones, allocate the member buffers with buffer_size_* before preprocessing and use member_buffer.data() and the member descriptors in my_cusparsespmm_preprocess and cusparsespmm calls so the objects remain alive for the graph capture and launch.

coderabbitai · 2026-01-23T11:34:21Z

cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu

+template <typename i_t, typename f_t>
+int optimal_batch_size_handler(const optimization_problem_t<i_t, f_t>& op_problem,
+                               int max_batch_size)
+{
+  if (max_batch_size == 1) return 1;
+
+  // Try to quickly find what is the optimal batch size for the problem
+  // We run the two most ran SpMMs for both A and A_T and compute "time / batch_size"
+  // The one with the best ratio has the optimal batch size (since can solve most amount of work in
+  // least time) To try to have something representative we run each SpMM 5 times and take the
+  // average We do it for both A and A_T and take the sum since both will be run for each batch size
+
+  // We start with batch size 128 (power two better for cuSparse) and try to improve by either
+  // multitipling or dividing by 2 each time At max we take 5 steps of search
+
+  constexpr int max_steps          = 4;  // 4 because we already do one step for direction
+  constexpr int initial_batch_size = 128;
+  constexpr int benchmark_runs     = 5;
+  // Take the floor power of two
+  // This ensures that we always start with a batch size that is a power of two or
+  // initial_batch_size
+  int current_batch_size =
+    std::pow(2, std::floor(std::log2(std::min(initial_batch_size, max_batch_size))));
+  int optimal_batch_size = current_batch_size;


⚠️ Potential issue | 🟠 Major

Guard against non‑positive max_batch_size.

log2(min(initial,max_batch_size)) is undefined for <= 0, and current_batch_size can become 0, which later divides by zero in evaluate_node. Add a precondition check.

🐛 Suggested fix

- if (max_batch_size == 1) return 1; + cuopt_assert(max_batch_size > 0, "max_batch_size must be > 0"); + if (max_batch_size == 1) return 1;

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/optimal_batch_size_handler/optimal_batch_size_handler.cu` around lines 186 - 209, Add a precondition guard in optimal_batch_size_handler to prevent non-positive max_batch_size from reaching the log2/pow2 computation: if max_batch_size <= 1 return 1 (same behavior as the existing ==1 early return) or clamp the value before computing current_batch_size by computing int cap = std::max(1, std::min(initial_batch_size, max_batch_size)); then use cap in the std::log2/std::pow expression to ensure current_batch_size is never 0 and avoid divide-by-zero in evaluate_node.

coderabbitai · 2026-01-23T11:34:22Z

cpp/src/linear_programming/solve.cu

+  rmm::device_uvector<double> initial_primal(0, stream);
+  rmm::device_uvector<double> initial_dual(0, stream);
+  double initial_step_size     = std::numeric_limits<f_t>::signaling_NaN();
+  double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN();
+
+  cuopt_assert(settings.new_bounds.size() > 0, "Batch size should be greater than 0");
+  const int max_batch_size = settings.new_bounds.size();
+  int optimal_batch_size   = use_optimal_batch_size
+                               ? detail::optimal_batch_size_handler(problem, max_batch_size)
+                               : max_batch_size;
+  cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,
+               "Optimal batch size should be between 1 and max batch size");
+  using f_t2 = typename type_2<f_t>::type;
+
+  // If need warm start, solve the LP alone
+  if (primal_dual_init || primal_weight_init) {
+    pdlp_solver_settings_t<i_t, f_t> warm_start_settings = settings;
+    warm_start_settings.new_bounds.clear();
+    warm_start_settings.method               = cuopt::linear_programming::method_t::PDLP;
+    warm_start_settings.presolve             = false;
+    warm_start_settings.pdlp_solver_mode     = pdlp_solver_mode_t::Stable3;
+    warm_start_settings.detect_infeasibility = false;
+    optimization_problem_solution_t<i_t, f_t> original_solution =
+      solve_lp(problem, warm_start_settings);
+    if (primal_dual_init) {
+      initial_primal = rmm::device_uvector<double>(
+        original_solution.get_primal_solution(), original_solution.get_primal_solution().stream());
+      initial_dual      = rmm::device_uvector<double>(original_solution.get_dual_solution(),
+                                                 original_solution.get_dual_solution().stream());
+      initial_step_size = original_solution.get_pdlp_warm_start_data().initial_step_size_;
+    }
+    if (primal_weight_init) {
+      initial_primal_weight = original_solution.get_pdlp_warm_start_data().initial_primal_weight_;
+    }


⚠️ Potential issue | 🔴 Critical

Use f_t-typed warm‑start buffers in the templated batch path.

The batch warm‑start path hardcodes double buffers. This breaks float instantiations (pointer/type mismatch) and can fail to compile or silently convert. Use f_t consistently.

🔧 Proposed fix

- rmm::device_uvector<double> initial_primal(0, stream); - rmm::device_uvector<double> initial_dual(0, stream); - double initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); - double initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); + rmm::device_uvector<f_t> initial_primal(0, stream); + rmm::device_uvector<f_t> initial_dual(0, stream); + f_t initial_step_size = std::numeric_limits<f_t>::signaling_NaN(); + f_t initial_primal_weight = std::numeric_limits<f_t>::signaling_NaN(); ... - initial_primal = rmm::device_uvector<double>( + initial_primal = rmm::device_uvector<f_t>( original_solution.get_primal_solution(), original_solution.get_primal_solution().stream()); - initial_dual = rmm::device_uvector<double>(original_solution.get_dual_solution(), + initial_dual = rmm::device_uvector<f_t>(original_solution.get_dual_solution(), original_solution.get_dual_solution().stream());

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/solve.cu` around lines 750 - 783, The warm-start batch path declares initial_primal and initial_dual as rmm::device_uvector<double>, which mismatches templated f_t instantiations; change both to rmm::device_uvector<f_t> (or the appropriate typedef alias used for the solver element type) and construct them from original_solution.get_primal_solution()/get_dual_solution() so the device_uvector element type matches f_t; keep initial_step_size and initial_primal_weight as f_t and preserve the use of the original solution streams when constructing the device_uvectors.

coderabbitai · 2026-01-23T11:34:22Z

cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

+  if (batch_mode_) {
+    // Pass down any input pointer of the right type, actual pointer does not matter
+    size_t byte_needed = 0;
+    cub::DeviceSegmentedReduce::Sum(
+      nullptr,
+      byte_needed,
+      thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(),
+                                                                norm_squared_delta_primal_.data()),
+                                      tuple_multiplies<f_t>{}),
+      interaction_.data(),
+      climber_strategies_.size(),
+      primal_size_,
+      stream_view_);
+    dot_product_bytes = std::max(dot_product_bytes, byte_needed);
+
+    cub::DeviceSegmentedReduce::Sum(
+      nullptr,
+      byte_needed,
+      thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}),
+      norm_squared_delta_primal_.data(),
+      climber_strategies_.size(),
+      primal_size_,
+      stream_view_);
+    dot_product_bytes = std::max(dot_product_bytes, byte_needed);
+
+    cub::DeviceSegmentedReduce::Sum(
+      nullptr,
+      byte_needed,
+      thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}),
+      norm_squared_delta_dual_.data(),
+      climber_strategies_.size(),
+      dual_size_,
+      stream_view_);
+    dot_product_bytes = std::max(dot_product_bytes, byte_needed);
+
+    dot_product_storage.resize(dot_product_bytes, stream_view_);
+  }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full context around lines 71-107 head -n 120 cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | tail -n 60

Repository: NVIDIA/cuopt

Length of output: 2265

🏁 Script executed:

# Also check lines 485-516 wc -l cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 136

🏁 Script executed:

# Let's see if RAFT_CUDA_TRY is used elsewhere in this file rg -n "RAFT_CUDA_TRY\|cub::Device" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check CUB DeviceSegmentedReduce documentation or look for return type in header rg -A5 "DeviceSegmentedReduce::Sum" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -40

Repository: NVIDIA/cuopt

Length of output: 1710

🏁 Script executed:

# Check how other CUDA operations are handled in this file rg -n "cudaMalloc|cudaMemcpy|cudaFree|CUDA_CHECK|RAFT_CUDA_TRY" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 201

🏁 Script executed:

# Look at the CUB include and understand the return type find . -name "*.cuh" -o -name "*.h" | xargs rg "DeviceSegmentedReduce" | grep -E "(return|cudaError)" | head -5

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check the exact lines 485-516 to see the actual compute calls sed -n '485,516p' cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1204

🏁 Script executed:

# Search for how CUB is used elsewhere in the codebase rg -l "cub::Device" --type cu --type cuh | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && rg -B2 -A2 "cub::Device" {}'

Repository: NVIDIA/cuopt

Length of output: 82

🏁 Script executed:

# Check if CUB APIs return cudaError_t or void rg "RAFT_CUDA_TRY$cub::" --type cu --type cuh

Repository: NVIDIA/cuopt

Length of output: 82

🏁 Script executed:

# Look at RAFT_CUDA_TRY definition to understand what it expects rg -B3 -A5 "define RAFT_CUDA_TRY\|RAFT_CUDA_TRY.*=" | head -30

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check the stream_view_ usage pattern - is it async? rg "stream_view_$" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -10

Repository: NVIDIA/cuopt

Length of output: 502

🏁 Script executed:

# Search for CUB usage patterns across codebase without type filtering find . -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" | xargs rg -l "cub::Device" 2>/dev/null | head -5

Repository: NVIDIA/cuopt

Length of output: 295

🏁 Script executed:

# Look at one of these files to see how CUB is typically used find . -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" | xargs rg -B2 -A2 "cub::Device" 2>/dev/null | head -50

Repository: NVIDIA/cuopt

Length of output: 3838

🏁 Script executed:

# Check if RAFT has examples of wrapping CUB operations find . -path ./build -prune -o -type f $ -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.hpp" $ -print | xargs rg "RAFT_CUDA_TRY.*cub" 2>/dev/null | head -10

Repository: NVIDIA/cuopt

Length of output: 135

🏁 Script executed:

# Look for the RAFT wrapper/stream class definition to understand stream_view_ rg -l "stream_view" --max-count 5 | xargs grep -l "class.*stream\|struct.*stream" 2>/dev/null | head -3

Repository: NVIDIA/cuopt

Length of output: 149

🏁 Script executed:

# Check CUB header for return type information find . -name "*.cuh" -path "*cub*" 2>/dev/null | head -3 | xargs cat 2>/dev/null | grep -A10 "DeviceSegmentedReduce" | head -20

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Examine the barrier.cu pattern more closely rg -B3 -A10 "RAFT_CUDA_TRY.*cub::DeviceSelect" cpp/src/dual_simplex/barrier.cu

Repository: NVIDIA/cuopt

Length of output: 536

🏁 Script executed:

# Check if there are stream/async implications we need to understand rg -B5 -A5 "stream_view_\|cudaStream" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu | head -40

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Verify cudaPeekAtLastError usage pattern in file - is it capturing CUB errors? rg -B10 -A2 "cudaPeekAtLastError" cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1382

Wrap CUB segmented-reduce calls with RAFT_CUDA_TRY for error propagation

The cub::DeviceSegmentedReduce::Sum calls at lines 71–107 and 485–516 don't check return codes. All CUB device operations return cudaError_t and must be wrapped with RAFT_CUDA_TRY so CUDA failures propagate to the caller, including buffer-sizing calls with nullptr. This follows the established pattern in cpp/src/dual_simplex/barrier.cu and aligns with the requirement that all CUDA memory operations have error checking.

Apply to all six CUB calls

- cub::DeviceSegmentedReduce::Sum( + RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum( nullptr, byte_needed, thrust::make_transform_iterator(...), interaction_.data(), climber_strategies_.size(), primal_size_, - stream_view_); + stream_view_));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (batch_mode_) {

// Pass down any input pointer of the right type, actual pointer does not matter

size_t byte_needed = 0;

cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(),

norm_squared_delta_primal_.data()),

tuple_multiplies<f_t>{}),

interaction_.data(),

climber_strategies_.size(),

primal_size_,

stream_view_);

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}),

norm_squared_delta_primal_.data(),

climber_strategies_.size(),

primal_size_,

stream_view_);

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}),

norm_squared_delta_dual_.data(),

climber_strategies_.size(),

dual_size_,

stream_view_);

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

dot_product_storage.resize(dot_product_bytes, stream_view_);

}

if (batch_mode_) {

// Pass down any input pointer of the right type, actual pointer does not matter

size_t byte_needed = 0;

RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(thrust::make_zip_iterator(norm_squared_delta_primal_.data(),

norm_squared_delta_primal_.data()),

tuple_multiplies<f_t>{}),

interaction_.data(),

climber_strategies_.size(),

primal_size_,

stream_view_));

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(norm_squared_delta_primal_.data(), power_two_func_t<f_t>{}),

norm_squared_delta_primal_.data(),

climber_strategies_.size(),

primal_size_,

stream_view_));

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(

nullptr,

byte_needed,

thrust::make_transform_iterator(norm_squared_delta_dual_.data(), power_two_func_t<f_t>{}),

norm_squared_delta_dual_.data(),

climber_strategies_.size(),

dual_size_,

stream_view_));

dot_product_bytes = std::max(dot_product_bytes, byte_needed);

dot_product_storage.resize(dot_product_bytes, stream_view_);

}

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu` around lines 71 - 107, The three cub::DeviceSegmentedReduce::Sum calls in the batch-mode block (the ones using tuple_multiplies<f_t>, power_two_func_t<f_t> on norm_squared_delta_primal_ and norm_squared_delta_dual_) must be wrapped with RAFT_CUDA_TRY so their returned cudaError_t is checked (this includes the nullptr sizing calls); replace direct calls with RAFT_CUDA_TRY(cub::DeviceSegmentedReduce::Sum(...)) for each invocation and likewise wrap the other three CUB Sum calls elsewhere (the other location noted in the review) so failures propagate; keep existing local variables (dot_product_bytes, dot_product_storage.resize, stream_view_, primal_size_, dual_size_, climber_strategies_.size()) unchanged.

coderabbitai · 2026-01-23T11:34:22Z

cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu

 void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes(
-  rmm::device_scalar<f_t>& primal_step_size, rmm::device_scalar<f_t>& dual_step_size)
+  rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size)
 {
+  const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size());
  compute_actual_stepsizes<i_t, f_t>
-    <<<1, 1, 0, stream_view_>>>(this->view(), primal_step_size.data(), dual_step_size.data());
+    <<<grid_size, block_size, 0, stream_view_>>>(this->view(),
+                                                 make_span(primal_step_size),
+                                                 make_span(dual_step_size),
+                                                 climber_strategies_.size());


⚠️ Potential issue | 🟠 Major

Validate output buffer sizes before kernel write

compute_actual_stepsizes writes climber_strategies_.size() entries; if the provided buffers are smaller, this is an OOB write. Add size checks before launching.

✅ Size guards

- const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size()); + const auto batch_size = static_cast<i_t>(climber_strategies_.size()); + cuopt_assert(primal_step_size.size() >= static_cast<size_t>(batch_size), + "primal_step_size too small for batch size"); + cuopt_assert(dual_step_size.size() >= static_cast<size_t>(batch_size), + "dual_step_size too small for batch size"); + const auto [grid_size, block_size] = kernel_config_from_batch_size(batch_size);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes(

rmm::device_scalar<f_t>& primal_step_size, rmm::device_scalar<f_t>& dual_step_size)

rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size)

{

const auto [grid_size, block_size] = kernel_config_from_batch_size(climber_strategies_.size());

compute_actual_stepsizes<i_t, f_t>

<<<1, 1, 0, stream_view_>>>(this->view(), primal_step_size.data(), dual_step_size.data());

<<<grid_size, block_size, 0, stream_view_>>>(this->view(),

make_span(primal_step_size),

make_span(dual_step_size),

climber_strategies_.size());

void adaptive_step_size_strategy_t<i_t, f_t>::get_primal_and_dual_stepsizes(

rmm::device_uvector<f_t>& primal_step_size, rmm::device_uvector<f_t>& dual_step_size)

{

const auto batch_size = static_cast<i_t>(climber_strategies_.size());

cuopt_assert(primal_step_size.size() >= static_cast<size_t>(batch_size),

"primal_step_size too small for batch size");

cuopt_assert(dual_step_size.size() >= static_cast<size_t>(batch_size),

"dual_step_size too small for batch size");

const auto [grid_size, block_size] = kernel_config_from_batch_size(batch_size);

compute_actual_stepsizes<i_t, f_t>

<<<grid_size, block_size, 0, stream_view_>>>(this->view(),

make_span(primal_step_size),

make_span(dual_step_size),

climber_strategies_.size());

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu` around lines 537 - 545, The kernel launch in adaptive_step_size_strategy_t::get_primal_and_dual_stepsizes can OOB if primal_step_size or dual_step_size are smaller than climber_strategies_.size(); add explicit size validation before launching compute_actual_stepsizes: compare primal_step_size.size() and dual_step_size.size() against climber_strategies_.size() and either resize the rmm::device_uvector buffers or return/throw a clear error (with context) if they are too small, so the kernel only writes within bounds using the existing stream_view_ and kernel_config_from_batch_size logic.

coderabbitai · 2026-01-23T11:34:22Z

cpp/src/linear_programming/swap_and_resize_helper.cuh

+void matrix_swap(rmm::device_uvector<f_t>& matrix,
+                 i_t vector_size,
+                 const thrust::universal_host_pinned_vector<swap_pair_t<i_t>>& swap_pairs)
+{
+  if (swap_pairs.empty()) { return; }
+
+  const i_t batch_size = matrix.size() / vector_size;
+  cuopt_assert(vector_size > 0, "Vector size must be greater than 0");
+  cuopt_assert(batch_size > 0, "Batch size must be greater than 0");
+
+  const size_t swap_count  = swap_pairs.size();
+  const size_t total_items = swap_count * static_cast<size_t>(vector_size);
+
+  auto counting   = thrust::make_counting_iterator<size_t>(0);
+  auto left_index = thrust::make_transform_iterator(
+    counting,
+    matrix_swap_index_functor<i_t>{thrust::raw_pointer_cast(swap_pairs.data()), vector_size, true});
+  auto right_index = thrust::make_transform_iterator(


⚠️ Potential issue | 🔴 Critical

Guard against division-by-zero and mismatched matrix sizing.

matrix.size()/vector_size happens before validating vector_size; if it’s 0, this is UB. Also, ensure the matrix size is a clean multiple of vector_size to avoid silent truncation of batch_size.

🔧 Proposed fix

- const i_t batch_size = matrix.size() / vector_size; - cuopt_assert(vector_size > 0, "Vector size must be greater than 0"); - cuopt_assert(batch_size > 0, "Batch size must be greater than 0"); + cuopt_assert(vector_size > 0, "Vector size must be greater than 0"); + cuopt_assert(matrix.size() % static_cast<size_t>(vector_size) == 0, + "Matrix size must be a multiple of vector size"); + const i_t batch_size = matrix.size() / vector_size; + cuopt_assert(batch_size > 0, "Batch size must be greater than 0");

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 49 - 66, In matrix_swap, avoid dividing matrix.size() by vector_size before validating vector_size and ensure the matrix length is an exact multiple of vector_size; first assert or return if vector_size == 0 (check vector_size > 0 before computing batch_size), compute batch_size only after that check (use matrix.size() / vector_size into batch_size), and add a guard that matrix.size() % vector_size == 0 (or log/throw) to prevent mismatched sizing; update the function matrix_swap and any use of batch_size to rely on these validated values.

coderabbitai · 2026-01-23T11:34:22Z

cpp/src/linear_programming/swap_and_resize_helper.cuh

+  cub::DeviceTransform::Transform(
+    in_zip,
+    out_zip,
+    total_items,
+    [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {
+      return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));
+    },
+    matrix.stream());


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "DeviceTransform::Transform" -C2

Repository: NVIDIA/cuopt

Length of output: 168

🌐 Web query:

CUB DeviceTransform Transform cudaError_t return status

💡 Result:

cub::DeviceTransform::Transform(...) returns a cudaError_t that indicates whether the CUB call successfully set up and launched the transform work (and it may also surface a previously pending/asynchronous CUDA error, like many CUDA runtime calls). (nvidia.github.io)

In current CCCL/CUB, the return status comes from two places: (nvidia.github.io)

Offset-range precheck: CUB selects a signed offset type from your NumItemsT and calls choose_signed_offset<NumItemsT>::is_exceeding_offset_type(num_items). If num_items is too large for that chosen signed offset type, Transform returns that error immediately. (nvidia.github.io)

Dispatch/launch: otherwise it returns whatever detail::transform::dispatch_t<...>::dispatch(...) returns (typically cudaSuccess if launch succeeded; otherwise an appropriate CUDA error such as invalid configuration/arguments, etc.). (nvidia.github.io)

Also note the “temp storage” compatibility overload behavior: if you call the overload with d_temp_storage == nullptr, it sets temp_storage_bytes = 1 and returns cudaSuccess without doing work. (nvidia.github.io)

To catch errors that occur during execution (asynchronous kernel failures), check after synchronizing (e.g., cudaStreamSynchronize) and/or query with cudaGetLastError() / cudaPeekAtLastError(). (docs.nvidia.com)

Citations:

1: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai

2: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai

3: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai

4: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai

5: https://nvidia.github.io/cccl/cub/api/program_listing_file_cub_device_device_transform.cuh.html?utm_source=openai

6: https://docs.nvidia.com/cuda/archive/13.1.0/cuda-programming-guide/02-basics/asynchronous-execution.html?utm_source=openai

🌐 Web query:

RAFT_CUDA_TRY error checking macro RAPIDS

💡 Result:

RAFT_CUDA_TRY is RAFT/RAPIDS’ CUDA-runtime error-checking helper macro: you wrap CUDA API calls with it (e.g., RAFT_CUDA_TRY(cudaMemcpyAsync(...))), and it checks the returned status and throws an exception if the call failed. RAFT provides analogous macros for other CUDA libraries (RAFT_CUBLAS_TRY, RAFT_CUSOLVER_TRY). If you’re in a context where throwing is undesirable (commonly destructors), use the *_NO_THROW variants (RAFT_CUDA_TRY_NO_THROW, etc.), which log the error but do not throw. [1] [2]

Add CUDA error checking for the CUB transform call.

The cub::DeviceTransform::Transform call returns a cudaError_t status indicating setup/launch success or failure (e.g., invalid num_items size, launch errors). Per the coding guideline requiring error checking on all CUDA operations, wrap this call with RAFT_CUDA_TRY to catch and propagate errors appropriately.

🔧 Proposed fix

- cub::DeviceTransform::Transform( - in_zip, - out_zip, - total_items, - [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { - return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); - }, - matrix.stream()); + RAFT_CUDA_TRY(cub::DeviceTransform::Transform( + in_zip, + out_zip, + total_items, + [] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> { + return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values)); + }, + matrix.stream()));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cub::DeviceTransform::Transform(

in_zip,

out_zip,

total_items,

[] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {

return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));

},

matrix.stream());

RAFT_CUDA_TRY(cub::DeviceTransform::Transform(

in_zip,

out_zip,

total_items,

[] HD(thrust::tuple<f_t, f_t> values) -> thrust::tuple<f_t, f_t> {

return thrust::make_tuple(thrust::get<1>(values), thrust::get<0>(values));

},

matrix.stream()));

🤖 Prompt for AI Agents

In `@cpp/src/linear_programming/swap_and_resize_helper.cuh` around lines 76 - 83, The CUB transform call using cub::DeviceTransform::Transform (with in_zip, out_zip, total_items, lambda, and matrix.stream()) is missing CUDA error checking; wrap the call with RAFT_CUDA_TRY so the returned cudaError_t is checked and propagated (e.g., RAFT_CUDA_TRY(cub::DeviceTransform::Transform(...))); ensure you reference the same parameters (in_zip, out_zip, total_items, matrix.stream()) and keep the lambda as-is while adding the RAFT_CUDA_TRY wrapper to satisfy the coding guideline requiring checks on all CUDA operations.

rgsl888prabhu · 2026-01-23T20:12:44Z

@Kh4ster Is this PR meant for 26.02, in that case please change merge branch to release/26.02

Kh4ster added 30 commits November 25, 2025 10:25

compiling

844e3a0

Merge branch 'main' into batch_pdlp

7520dd6

working deterministic

37b24fe

tmp infeasiblity detection

397160c

tmp infeasble

4a2cf8a

working batch infeasible detection

94bc6fa

add big batch test

993c771

support for termination if some climber finish with infeasible while …

beb6614

…ther finish with optimal

fix solver settings bug

dd36c27

tmp spmm porting

3d70d05

tmp fix spmm infeasible

a7538c4

working, non deterministic, batch pdlp with infeasiblity detection

31603ab

minor cleanup and deeper test

96ee0eb

various fixes to support initial primal / dual solution

fd4fedd

tmp way to test initial primal weight and step size

bb9bfa6

push the branch and bound bencharmker

8e8538f

Merge branch 'main' into batch_pdlp

461b07e

add midding fix

5aa4c7a

move code that should to kernel

53d1649

fix forgotten device sync on pinned memory writes

f74ba5d

move dual objective computation to the GPU

acd021d

experimental row row mode

6884fa0

add some benchmark helpers

70fd4c8

tmp optimal batch size

748153c

tmp

7965da4

tmp optimal batch size

701ecb3

add support for optimal batch size in strong branching code

45382d2

fix reduced cost and last restart error

94c61ea

moving optimal batch size handler to a sperate file, put back bounds …

e6c4123

…and change initial to 128

add missing file

fc688d3

Kh4ster added 11 commits January 22, 2026 16:59

Merge branch 'batch_pdlp' into test_batch_pdlp

0d67602

adding best hyper parameter set

50a6575

remove printf

22f254b

Merge branch 'main' into batch_pdlp

9075a9a

fix tests

deb98dc

add default parameter to get dual objective

572f777

remove legacy benchmark file

8711984

update comments

e42a9fb

update solve comments

c2b2183

update comment pesudo cost

60c4e7d

style check

8d1ee52

Kh4ster requested review from a team as code owners January 23, 2026 11:02

Kh4ster requested review from hlinsen, kaatish and tmckayus January 23, 2026 11:02

Kh4ster assigned aliceb-nv and Kh4ster and unassigned aliceb-nv Jan 23, 2026

Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp mip labels Jan 23, 2026

fix style

30a1e4b

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

		pdlp_termination_status_t get_termination_status(i_t id = 0) const;
		std::vector<pdlp_termination_status_t>& get_terminations_status();

		i_t mip_batch_pdlp_strong_branching; // 0 if not using batch PDLP for strong branching, 1 if
		// using batch PDLP for strong branching

		rmm::device_buffer buffer_transpose_batch(buffer_size_transpose_batch, stream_view);
		rmm::device_buffer buffer_non_transpose_batch(buffer_size_non_transpose_batch, stream_view);


		#include <dual_simplex/user_problem.hpp>

Batch PDLP #791

Are you sure you want to change the base?

Batch PDLP #791

Uh oh!

Conversation

Kh4ster commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 23, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Kh4ster commented Jan 23, 2026 •

edited by coderabbitai bot

Loading