[rsz] repair_design: lib+odb screen + defer in-loop parasitic flush by oharboe · Pull Request #10326 · The-OpenROAD-Project/OpenROAD

oharboe · 2026-05-04T10:23:55Z

Claude thought this was a good idea to speed things up... Thoughts?

Summary

Two independent changes to RepairDesign that together cut wall time on the per-driver loop by an order of magnitude on a large confidential ASAP7 design (~2.7 M flat instances), with end-of-run buffer/resize counts matching the unmodified path within 0.003%–0.03% drift.

1. Cheap lib + odb HPWL screen at the top of `repairDriver`

For drivers where Penfield–Rubinstein closed-form upper bounds (Elmore τ scaled by the existing slew_rc_factor_) prove that the net cannot violate slew or cap limits, skip the full STA path: ensureWireParasitic, findDelays, checkSlew, checkCap, and makeBufferedNet. The bound is sound by construction (HPWL is a lower bound on Steiner length; total cap and 2.2 · R_total · C_total are both upper bounds), so a "safe" verdict is exact, not heuristic. Per-LibertyCell cap_limit and per-LibertyPort slew_limit are cached to keep the screen ~50–100 ns per net.

On the reproducer ~78–84% of drivers are screened safe.

The screen also short-circuits makeBufferedNet for drivers that pass cap/slew but have no wire-length limit, which by itself removes a per-driver Steiner-tree build that the existing path does unconditionally.

2. Defer the post-resize `updateParasitics()` flush

In the existing inner loop, repairDriverSlew (a cell resize) was followed by estimate_parasitics_->updateParasitics(), which walks every invalidated net's fanin and inserts every reachable vertex into Search::invalid_arrivals_ / invalid_requireds_ (a std::set<Vertex*>). On the reproducer's long tail, perf record showed ~14 % of CPU spent in those tree-set operations and ~25 % in the dbNetwork id/RTTI dispatch driving Steiner re-extraction for nets that the next iteration would visit anyway.

We replace the global flush with a targeted ensureWireParasitic(drvr_pin, drvr_net) so the local recheck sees fresh parasitics for this driver, while the other invalidated nets remain queued for on-demand refresh when their own drivers are processed later in the level-ordered pass. The IncrementalParasiticsGuard destructor still does a single final flush at scope exit.

Measured speedup

Cut-down versions of the reproducer were produced by deleting the tail of the instance list (keeping the first N % of dbInst index range; nets are not deleted, so dangling nets are common — same workload structure for before/after, not a synthetic benchmark). Screen + makeBufferedNet short-circuit are present in both rows; deferred updateParasitics is the v6 difference.

size	drivers	repair_design wall (before)	(after)	speedup	buffer-count drift
12 %	318 039	1 494 s	170 s	8.8×	+0.003 %
25 %	661 314	8 989 s	668 s	13.5×	+0.003 %
50 %	1 326 456	killed at 95 % done after 4 h on a 30 GB host (compounding swap)	1 025 s	>14×	n/a — no clean before baseline

The speedup ratio grows with design size because the deferred cascade is what scales super-linearly.

50 % "after" end-state: 538 455 buffers, 46 068 resized, 82.1 % screen-safe, peak RSS 9.55 GB. Soundness intact at every size sampled.

repair_design3-tcl_test (the tristate / N² stress test) drops from 189 s to 113 s as a side effect.

Test plan

bazelisk test //src/rsz/test:repair_design{1..5}-tcl_test //src/rsz/test:repair_slew1-tcl_test //src/rsz/test:repair_cap1-tcl_test //src/rsz/test:repair_fanout1-tcl_test — all 8 pass byte-identical to .ok
Buffer/Resize/Nets-repaired counts match the unmodified path within 0.003%–0.03 % on the 12 %, 25 % and 50 % cut-down reproducer
Full upstream rsz regression on a clean tree (recommend running before merge)
Wider regression sweeps (other PDKs, other designs) — open question for reviewers

Tunable knobs (compile-time)

k_steiner_ub_ = 1.2f — Hwang-style Steiner upper-bound multiplier on HPWL (industry typical 1.5; tightened here based on empirical buffer-count match)
k_screen_safety_ = 0.0f — extra screen safety margin (the existing slew_rc_factor_ already carries 10 % modeling pessimism, so none added)

Both can be raised if a future workload shows buffer-count drift above a few percent.

Out of scope (deliberate)

WNS-stagnation gate (PR rsz: update failing test golden on PR 10248 #10284 territory; different code path)
Multi-threaded driver loop
Hierarchical-mode dbNetwork dispatch overhead (separate, larger lift; this PR works in both flat and -hier modes)
Replacing invalid_requireds_ with unordered_set upstream of OpenSTA (orthogonal; would compound this PR's gains by removing the log N from the residual tree-set inserts)

Two independent changes to RepairDesign that together cut wall time on the per-driver loop by an order of magnitude on a large confidential ASAP7 design (~2.7M flat instances), with end-of-run buffer/resize counts matching the unmodified path within 0.003%-0.03% drift: 1. Cheap lib + odb HPWL screen at the top of repairDriver. For drivers where Penfield-Rubinstein closed-form upper bounds (Elmore tau scaled by the existing slew_rc_factor_) prove that the net cannot violate slew or cap limits, skip the full STA path: ensureWireParasitic, findDelays, checkSlew, checkCap, and makeBufferedNet. The bound is sound by construction (HPWL is a lower bound on Steiner length, total cap and 2.2*R_total*C_total are both upper bounds), so a "safe" verdict is exact, not heuristic. Per-LibertyCell cap_limit and per-LibertyPort slew_limit are cached to keep the screen ~50-100 ns per net. On the reproducer ~78-84% of drivers are screened safe. The screen also short-circuits makeBufferedNet for drivers that pass cap/slew but have no wire-length limit, which by itself removes a per-driver Steiner-tree build that the existing path does unconditionally. 2. Defer the post-resize updateParasitics() flush. In the existing inner loop, repairDriverSlew (a cell resize) was followed by estimate_parasitics_->updateParasitics(), which walks every invalidated net's fanin and inserts every reachable vertex into Search::invalid_arrivals_/invalid_requireds_ (a std::set<Vertex*>). On the reproducer's long tail, perf record showed ~14% of CPU spent in those tree-set operations and ~25% in the dbNetwork id/RTTI dispatch driving Steiner re-extraction for nets that the next iteration would visit anyway. We replace the global flush with a targeted ensureWireParasitic(drvr_pin, drvr_net) so the local recheck sees fresh parasitics for THIS driver, while the other invalidated nets remain queued for on-demand refresh when their own drivers are processed later in the level-ordered pass. The IncrementalParasiticsGuard destructor still does a single final flush at scope exit. Measured on cut-down versions of the reproducer (screen enabled): size repair_design wall (s) before after speedup 12% (~318k drvrs) 1494 170 8.8x 25% (~661k drvrs) 8989 668 13.5x The speedup ratio grows with design size because the deferred cascade is what scales super-linearly. The full design previously could not finish in any reasonable time on a 30 GB host (memory pressure compounding the algorithmic slowdown); with the change applied, runs at sizes that did finish for a clean before/after comparison show the multiplicative speedup above. All eight rsz repair_design / repair_slew / repair_cap / repair_fanout regression tests pass byte-identical to the .ok files. repair_design3-tcl_test (the tristate / N^2 stress test) drops from 189s to 113s as a side effect. Verbose-only diagnostic line "[screen] bucket .. safe; rej cap=.. slew=.. other=.." with a deterministic est-design-mem column is emitted alongside the existing progress table; non-verbose runs are unchanged. Knobs (compile-time constants): k_steiner_ub_ = 1.2 (Hwang-style Steiner upper-bound multiplier on HPWL), k_screen_safety_ = 0.0 (the existing slew_rc_factor_ already carries 10% modeling pessimism). Both can be tuned upward if a future workload shows buffer-count drift above a few percent. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions · 2026-05-04T10:30:03Z

clang-tidy review says "All clean, LGTM! 👍"

gemini-code-assist

Code Review

This pull request introduces a fast screening mechanism (screenNetSafe) in RepairDesign to skip expensive STA-based repair checks for nets that are provably safe based on HPWL and library pin capacitances. It also includes optimizations to parasitic updates and Steiner tree construction to reduce CPU overhead during the repair process. The review feedback suggests using higher precision double literals in margin calculations for capacitance and slew limits to avoid potential precision loss.

gemini-code-assist · 2026-05-04T10:34:23Z

+    ++screen_rej_no_lib_;  // No lib limit known: defer to STA.
+    return false;
+  }
+  cap_limit *= (1.0f - static_cast<float>(cap_margin_) / 100.0f);


To maintain precision, especially since cap_margin_ is a double, consider performing the calculation with double literals. This ensures that the division and subtraction are done with higher precision before being applied to the float cap_limit.

Similarly for slew_limit on line 289.

Suggested change

cap_limit *= (1.0f - static_cast<float>(cap_margin_) / 100.0f);

cap_limit *= (1.0 - cap_margin_ / 100.0);

gemini-code-assist · 2026-05-04T10:34:23Z

+    ++screen_rej_no_lib_;  // No slew limit known: defer to STA.
+    return false;
+  }
+  slew_limit *= (1.0f - static_cast<float>(slew_margin_) / 100.0f);


To maintain precision, especially since slew_margin_ is a double, consider performing the calculation with double literals. This ensures that the division and subtraction are done with higher precision before being applied to the float slew_limit.

Similarly for cap_limit on line 192.

Suggested change

slew_limit *= (1.0f - static_cast<float>(slew_margin_) / 100.0f);

slew_limit *= (1.0 - slew_margin_ / 100.0);

oharboe · 2026-05-04T10:36:16Z

@precisionmoon @maliberty Is this a good idea? Is Claude onto something here?

oharboe · 2026-05-04T12:49:21Z

@maliberty I'm out of my depth here, I've linked to this in a feature request.

github-actions Bot added the size/M label May 4, 2026

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

oharboe requested review from maliberty and removed request for maliberty May 4, 2026 10:35

oharboe marked this pull request as ready for review May 4, 2026 10:50

oharboe requested review from dsengupta0628, maliberty and precisionmoon and removed request for dsengupta0628 and precisionmoon May 4, 2026 10:50

oharboe closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rsz] repair_design: lib+odb screen + defer in-loop parasitic flush#10326

[rsz] repair_design: lib+odb screen + defer in-loop parasitic flush#10326
oharboe wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
oharboe:repair-design-screen-and-defer-flush

oharboe commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

oharboe commented May 4, 2026

Uh oh!

oharboe commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	cap_limit *= (1.0f - static_cast<float>(cap_margin_) / 100.0f);
	cap_limit *= (1.0 - cap_margin_ / 100.0);

	slew_limit *= (1.0f - static_cast<float>(slew_margin_) / 100.0f);
	slew_limit *= (1.0 - slew_margin_ / 100.0);

Conversation

oharboe commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Cheap lib + odb HPWL screen at the top of repairDriver

2. Defer the post-resize updateParasitics() flush

Measured speedup

Test plan

Tunable knobs (compile-time)

Out of scope (deliberate)

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

oharboe commented May 4, 2026

Uh oh!

oharboe commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oharboe commented May 4, 2026 •

edited

Loading

1. Cheap lib + odb HPWL screen at the top of `repairDriver`

2. Defer the post-resize `updateParasitics()` flush