Add Schur completment and its mat-free mode by zitongzhan · Pull Request #35 · pypose/bae

zitongzhan · 2026-05-24T03:07:51Z

This pull request introduces significant improvements to the optimizer infrastructure, focusing on enhanced memory profiling, a new Schur complement optimizer, and better support for matrix-free operations.

Optimizer Enhancements

Added a new Schur optimizer class in bae.optim.optimizer, implementing the Schur complement method with support for both standard and matrix-free normal equations, block Jacobi preconditioning, and efficient memory usage.
Updated the LM optimizer to support a matrix_free_normal mode, allowing for more efficient computation and memory usage in large-scale problems.
Add a custom TrustRegion class that supports Warp, especially for use with the Schur optimizer.

Sparse Matrix and PyOps Improvements

Improved sparse matrix operations, including fixes to inv_op for correct tensor creation and a new test block in py_ops.py for diagonal operations on CUDA.

Added a section for future plans including a new backend for distributed solver.

… schur-matmul

…l path runs

gemini-code-assist

Code Review

This pull request introduces high-performance Triton kernels for sparse BSR operations, including matrix-vector multiplication, matrix-matrix multiplication, and transposition. It also implements a matrix-free NormalMatVec operator and a new Schur complement-based optimizer to improve the efficiency of bundle adjustment tasks. The bundle adjustment example was updated with CUDA memory snapshotting and Warp mempool reporting. Review feedback highlights a critical issue where in-place diagonal modifications in the LM and Schur optimizers cause damping factors to accumulate incorrectly during step rejections. Additionally, the reviewer recommends removing performance-hindering torch.cuda.empty_cache() calls, addressing potential divisions by zero in the Conjugate Gradient solver, and cleaning up redundant or commented-out code.

gemini-code-assist · 2026-05-24T03:16:21Z

+                    diag_scale *= 1.0 + pg['damping']
+                    A.set_damping(diag_scale - 1.0)
+                else:
+                    diagonal_op_(A, op=partial(torch.mul, other=1+pg['damping']))


The diagonal_op_ function performs an in-place multiplication on the matrix A. Since this is inside the while loop, if a step is rejected and the loop repeats, the damping will be applied cumulatively (e.g., $(1+\lambda_1)(1+\lambda_2)...$) instead of being applied to the original $J^T J$ diagonal. This deviates from the standard Levenberg-Marquardt algorithm and can lead to excessively aggressive damping. Consider cloning the matrix or resetting the diagonal before applying damping in each iteration.

gemini-code-assist · 2026-05-24T03:16:21Z

+                R = R.tensor()
+            else:
+                R = R.detach()
+            torch.cuda.empty_cache()


Calling torch.cuda.empty_cache() inside the optimization step is generally discouraged as it triggers a GPU synchronization and can significantly degrade performance. If memory management is a concern, it's better to optimize tensor lifecycles or use a dedicated memory pool. If this was added for debugging memory usage, it should be removed before merging.

gemini-code-assist · 2026-05-24T03:16:21Z

+
+        Ap = matvec(p)
+        Ap_flat = Ap.reshape(-1)
+        alpha = (rz / torch.dot(p.reshape(-1), Ap_flat)).item()


Potential division by zero if torch.dot(p.reshape(-1), Ap_flat) is zero (e.g., if the matrix is singular or not positive definite). While $J^T J$ is positive semi-definite, numerical issues or zero curvature directions could cause this to be zero. Consider adding a small epsilon or a check for numerical stability.

gemini-code-assist · 2026-05-24T03:16:21Z

+
+        rz_new = torch.dot(r_flat, z_flat)
+        beta = (rz_new / rz).item()
+        p.mul_(beta).add_(z)


Potential division by zero if rz is zero. Although the convergence check at line 692 should ideally terminate the loop if the residual is zero, a safety check for rz before division is recommended to prevent NaN values in case of numerical instability.

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan and others added 25 commits December 15, 2025 03:12

add normal matvec and memory profiler

9490ff8

print peak cuda allocation

9c90aca

add warp memory pool report

6256e79

use A._get_Jt when matrix_free_normal

3a5ce9b

add back schur by warp's matmul

0064146

safely import cudss

acd1b3c

Add future plans section to README

91c8ade

Added a section for future plans including a new backend for distributed solver.

add normal matvec and memory profiler

19774c3

print peak cuda allocation

4ca9c86

add warp memory pool report

b71f1a3

use A._get_Jt when matrix_free_normal

d678867

add back schur by warp's matmul

d127b88

Merge branch 'schur-matmul' of github.com:zitongzhan/bae_private into…

fa9ab70

… schur-matmul

Merge remote-tracking branch 'upstream/release' into schur-matmul

6619808

Preventing TrustRegion from accepting diverging steps

3e4761d

fix(optimizer/LM): Remove redundant solver calls so matrix_free_norma…

5d9e2b2

…l path runs

feat(optim/Schur): Add Matrix-Free path and matrix_free_normal branch

e34bea2

Resolving conflict with release branch in README

5f4f093

Version up to 0.2.1

f64d00b

Fix deprecated function in Warp

40798f1

Replace Warp with Triton kernels and adjust corresponding codes

165104d

Remove codes relevant to Chunk

b305f81

Merge branch 'release' into memory-issue-swp

3a97f9e

Remove ba_helpers.py

a0b4b8b

Fix a conflict in ba_example.py

f46fb74

github-code-quality Bot found potential problems May 24, 2026

View reviewed changes

Comment thread bae/sparse/warp_wrappers.py Fixed

Comment thread bae/optim/optimizer.py Fixed

Comment thread bae/sparse/py_ops.py Fixed

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

zitongzhan and others added 3 commits May 23, 2026 20:35

Potential fix for pull request finding 'Variable defined multiple times'

48ad787

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused local variable'

8cc6eb3

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

minimize diff

074b931

zitongzhan added 4 commits May 24, 2026 05:04

restore pysolvers

4746522

revert import shuffle

7f3ea3d

restore LM

d3e24d9

fix import order ba example

04908d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Schur completment and its mat-free mode#35

Add Schur completment and its mat-free mode#35
zitongzhan wants to merge 32 commits into
releasefrom
memory-issue-swp

zitongzhan commented May 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zitongzhan commented May 24, 2026

Optimizer Enhancements

Sparse Matrix and PyOps Improvements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants