Speed and memory optimizations: rewrites, inplace ops, and gradient graphs

Collecting ideas for performance work. ptgp relies on PyTensor's rewrite system to choose efficient algorithms, but we haven't done a thorough audit of what's actually happening at compile time.

The ptgp-local rewrites have already made a significant difference. For the unapproximated GP MLL, the joint forward+gradient graph compiles down to exactly 1 Cholesky and 0 MatrixInverse ops (the naive graph before rewrites has multiple of each). The rewrites that achieve this include PSD-aware slogdet lowering, Cholesky reuse across the graph, and a composite-merge pass that prevents the fusion optimizer from duplicating cubic ops.

Rewrites that are generally useful should be upstreamed into PyTensor. Rewrites that are GP-specific can live locally in ptgp, or live there until they're mature enough to upstream.

## Areas to investigate

- **Memory profiling and inplace operations** — are Cholesky, solve, and other cubic ops getting inplace flags where they should? I haven't looked closely at whether PyTensor is handling this well. There may be missed inplace opportunities, especially when tensors pass through assumption annotations.

- **PyTensor rewrites** — which rewrites are firing on our graphs and which aren't? Are there rewrites we should be registering ourselves (e.g. Woodbury for low-rank-plus-diagonal structure)?

- **Gradient graph quality** — PyTensor generates gradient graphs via autodiff, but for known kernels (Matern, ExpQuad, etc.) there are analytic gradient forms. It would be worth comparing the autodiff graphs to the analytic forms to see if there are unnecessary intermediate allocations or missed simplifications.

- **Structured matrix operations** — Kronecker, Toeplitz, banded, and block-diagonal solvers all exploit specific matrix structure to reduce the cost of Cholesky / solve below the dense O(N^3) baseline. Which of these structures show up in ptgp's graphs, and could PyTensor dispatch to them?

## Advanced / subjective rewrites

PyTensor can arbitrarily manipulate graphs, and some potential rewrites are subjective and situation-specific. The simplest example is the Woodbury lemma: whether to apply it depends on the relative shapes of the matrices involved. A more ambitious example is choosing between Cholesky and iterative solvers (CG, Lanczos) for `solve` and `slogdet` (see #11). PyTensor can inspect the incoming matrix shapes at compile time (if the user specifies them) and could potentially choose the algorithm automatically.

Could this be done automatically, or should users choose the approach? Both?

Other libraries handle this at runtime rather than compile time:
- **GPyTorch** uses a `LinearOperator` class where structured matrix types (`DiagLinearOperator`, `LowRankRootLinearOperator`, etc.) dispatch to specialized algorithms at runtime
- **GPJax** uses [CoLA](https://github.com/wilson-labs/cola) (Compositional Linear Algebra), which combines a linear operator abstraction with over 70 dispatch rules that select algorithms based on detected structure (e.g. `solve[Sum[LowRank, Diagonal]]` dispatches to Woodbury automatically)

ptgp's approach could be the compile-time version of the same idea: use shape info and assumption tags in the symbolic graph to choose algorithms after the initial PyTensor graph is created. How should this be implemented?

## Open questions

- What tools or approaches are people using to profile PyTensor graph execution?
- Are there other optimization opportunities we should be looking at?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed and memory optimizations: rewrites, inplace ops, and gradient graphs #9

Areas to investigate

Advanced / subjective rewrites

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Speed and memory optimizations: rewrites, inplace ops, and gradient graphs #9

Description

Areas to investigate

Advanced / subjective rewrites

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions