rsz: bail repair_timing on obviously-futile designs by oharboe · Pull Request #10248 · The-OpenROAD-Project/OpenROAD

oharboe · 2026-04-24T08:32:51Z

Adds a deterministic WNS-stagnation gate to RepairSetup::terminateProgress(). Best-so-far WNS is sampled every pass into a 200-pass ring buffer; if the best observed value has not improved by max(1 ps, 0.5% * |initial_wns|) over a full window, the gate returns true. Combined with the existing two-consecutive-termination rule this aborts the phase after ~1200 passes of no WNS movement on designs where tiny TNS twitches would otherwise keep the optimizer running forever.

Motivated by automated architectural-exploration flows where the .sdc is held at an ambitious target while RTL evolves. Without this, a WNS gap of hundreds of ps grinds repair_timing for hours producing no useful movement and forcing users to guess SETUP_SLACK_MARGIN values to stop it. The gate fires only when no reasonable user would disagree that further effort is futile (1 ps absolute floor keeps tape-out grind untouched; the TNS fix-rate gate still owns termination near closure).

On trip, a single loud INFO log (RSZ-0234/235/236) names the best-effort WNS and notes this is probably an exploration run. No new Tcl flag, no new ORFS env var - hardcoded conservative defaults.

New test test/orfs/hopeless/ synthesizes 8 parallel 8-deep arithmetic pipelines on asap7 with a 50 ps clock and asserts both that the flow completes and that the gate log message appears (sh_test + grep). A check_same idempotent test guards the determinism property.

Adds a deterministic WNS-stagnation gate to RepairSetup::terminateProgress(). Best-so-far WNS is sampled every pass into a 200-pass ring buffer; if the best observed value has not improved by max(1 ps, 0.5% * |initial_wns|) over a full window, the gate returns true. Combined with the existing two-consecutive-termination rule this aborts the phase after ~1200 passes of no WNS movement on designs where tiny TNS twitches would otherwise keep the optimizer running forever. Motivated by automated architectural-exploration flows where the .sdc is held at an ambitious target while RTL evolves. Without this, a WNS gap of hundreds of ps grinds repair_timing for hours producing no useful movement and forcing users to guess SETUP_SLACK_MARGIN values to stop it. The gate fires only when no reasonable user would disagree that further effort is futile (1 ps absolute floor keeps tape-out grind untouched; the TNS fix-rate gate still owns termination near closure). On trip, a single loud INFO log (RSZ-0234/235/236) names the best-effort WNS and notes this is probably an exploration run. No new Tcl flag, no new ORFS env var - hardcoded conservative defaults. New test test/orfs/hopeless/ synthesizes 8 parallel 8-deep arithmetic pipelines on asap7 with a 50 ps clock and asserts both that the flow completes and that the gate log message appears (sh_test + grep). A check_same idempotent test guards the determinism property. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions

clang-tidy made some suggestions

gemini-code-assist

Code Review

This pull request introduces a WNS-stagnation gate to the timing repair logic to abort optimization phases when Worst Negative Slack fails to improve over a set number of iterations. The implementation includes a ring buffer for WNS history, methods for tracking and reporting stagnation, and a new regression test for infeasible designs. Review feedback identifies a unit mismatch in the stagnation tolerance constant and suggests replacing a magic number in the iteration logic with a named constant.

- RepairSetup.hh: include <cstddef> for size_t and <string> for std::string (include-cleaner warnings from clang-tidy CI bot). - RepairSetup.hh / RepairSetup.cc: name the stagnation-gate warmup threshold wns_stagnation_warmup_iterations_ instead of the bare 1000 literal, matching the style of the other tunables in this class. Not applied: gemini-code-assist's suggestion to change wns_stagnation_abs_tol_ from 1.0e-12f to 1.0e-3f. sta::Slack is measured in seconds (see src/sta/include/sta/Units.hh:73 "Sta internal units are always seconds, ..."), so 1.0e-12 s is 1 ps, which is the intended value. 1.0e-3 s would be 1 ms and would disable the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions · 2026-04-24T09:41:28Z

clang-tidy review says "All clean, LGTM! 👍"

oharboe · 2026-04-24T09:45:56Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a WNS-stagnation gate to the repair_timing process to prevent excessive iterations on infeasible designs where TNS might fluctuate but WNS remains stuck. The implementation uses a ring buffer to track WNS history and terminates progress if improvements fall below a deterministic threshold. A new regression test, "hopeless," is added to verify this behavior. Feedback includes replacing a non-ASCII character in log messages for better terminal compatibility and using std::max_element to improve code readability.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions · 2026-04-24T10:18:55Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions

clang-tidy made some suggestions

Wrap long sta::Slack initialization line that clang-format flagged on PR The-OpenROAD-Project#10248 after the gemini-code-assist suggestion was applied via the GitHub UI. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions

clang-tidy made some suggestions

The sliding-window check (window_best - window_oldest over 200 passes) mis-classified plateaus as futility: any real design whose WNS drops dramatically early then flat-lines at a topology-bound floor while TNS keeps improving (aes, clone_flat, repair_fanout*) tripped the gate and aborted repair_timing too early, leaving worse max_cap/max_slew slack and different .ok-file output. Replace with best_wns_ever vs initial_wns_: only fire when WNS has effectively never moved from its starting value, which is the real signature of an obviously-futile run. Also bump rel_tol 0.5% -> 5%. Empirically aes improves WNS by ~50% of initial, clone_flat by ~95%, repair_fanout by ~90%; the hopeless.v synthetic only moves WNS by ~2% because buffer insertion chips something off even a grossly-over-clocked design. 5% sits in the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions · 2026-04-24T15:19:13Z

clang-tidy review says "All clean, LGTM! 👍"

oharboe · 2026-04-24T16:51:14Z

use case: DSE should terminate quickly on lost causes without a human in the loop

oharboe · 2026-04-25T05:14:29Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a WNS-stagnation gate to the repair_timing process to prevent excessive iterations on designs where timing closure is infeasible. The gate monitors Worst Negative Slack (WNS) and terminates the repair phase if improvement falls below a defined threshold after a warmup period. The changes include new tracking variables, a reset mechanism for each repair phase, and detailed logging when stagnation is detected. Additionally, a new regression test named "hopeless" has been added to verify that the flow correctly identifies and bails on pathological designs. I have no feedback to provide.

maliberty · 2026-04-26T00:02:11Z

It might be better to wait for #10223 to land ; @jhkim-pii opinion?

jhkim-pii · 2026-04-26T01:18:13Z

I think merging this PR first would be better. Otherwise this should be rebuilt on top of the new Resizer architecture by author, which looks more difficult.

jhkim-pii · 2026-04-26T01:19:22Z

I am curious about ORFS desgin QoR impact by this PR.

oharboe · 2026-04-26T02:51:24Z

I am curious about ORFS desgin QoR impact by this PR.

I think none, it passes. There are no obviously futile design configurations in ORFS designs.

I had to add an obviously futile configuration in this PR to test.

oharboe · 2026-04-27T07:33:17Z

From my point of view it would be nice to have this merged, even if it is rewritten in the future. It captures a use-case that is important to us.

Presumably the test is what I really care about, the current code is just a stop-gap that reduces the urgency and the test case is what protects the use-case.

maliberty · 2026-04-27T20:18:12Z

There is no need for a full orfs run to just test rsz. This should be a unit test of a repair_setup command.

oharboe · 2026-04-27T21:10:22Z

There is no need for a full orfs run to just test rsz. This should be a unit test of a repair_setup command.

How do I write that so it isn't a "working as implemented" test?

oharboe · 2026-04-28T07:07:01Z

@jhkim-pii Please give me some guidance on writing a good test here. I'll pass it along to claude 🤦

jhkim-pii · 2026-04-28T08:00:40Z

@oharboe OK. I'll try to make the test case and share it. I think sharing the test case would be easier than explaining how to write such test.

oharboe · 2026-04-28T08:03:37Z

@oharboe OK. I'll try to make the test case and share it. I think sharing the test case would be easier than explaining how to write such test.

I asked Claude to make something... Should be done soon...

Tiny TCL unit test exercising the WNS-stagnation gate added in this PR without scraping log strings. It builds a flat block of 1200 register-to-register pairs (DFF_X2 driver -> DFF_X2 sink, no logic in between), holds the cell library to _X1 / _X2 sizes, and sets a 1 ps clock period so that no SizeUpMove / BufferMove / pin-swap / clone / split-load / unbuffer move can improve WNS. The endpoint count is chosen so the inner-loop counter passes the gate's warmup before the legacy phase has visited every endpoint. Without the gate the same .tcl runs to iteration 1200 (one futile pass per endpoint); with the gate it exits at iteration 1002 and the .ok diverges, so the test fails without the fix and passes with it. Replaces the orfs-based test/orfs/hopeless gate-coverage check that maliberty asked to convert into a repair_setup-only unit test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com> # Conflicts: # src/rsz/test/CMakeLists.txt

oharboe · 2026-04-28T08:20:09Z

@jhkim-pii I have no idea what Claude did there :-) It is short and it doesn't use verilog as input...

github-actions · 2026-04-28T08:24:07Z

clang-tidy review says "All clean, LGTM! 👍"

oharboe · 2026-04-28T11:57:35Z

@maliberty @jhkim-pii I don't know what's failing here, but I don't think it is related to this PR. I just merged with origin/master.

jhkim-pii · 2026-04-28T13:05:16Z

@jhkim-pii I have no idea what Claude did there :-) It is short and it doesn't use verilog as input...

Looks good.
I generated a test case that is similar to yours.

jhkim-pii · 2026-04-28T13:20:03Z

@maliberty @jhkim-pii I don't know what's failing here, but I don't think it is related to this PR. I just merged with origin/master.

I added the failing test case repair_hold_multi_output_load.tcl in another PR #10236.

Currently, repair_hold_multi_output_load.tcl is failing in the HEAD of OR master.

I found that merge of the PR #10141 changed the QoR slightly.

I'll open a new PR to rebase the failing test repair_hold_multi_output_load.tcl.

oharboe · 2026-04-28T13:22:22Z

@jhkim-pii Thank you!

I'll open a new PR to rebase the failing test repair_hold_multi_output_load.tcl.

Please close this PR when you open a new one. Thank you for nursing it through 😌

jhkim-pii · 2026-04-28T14:10:32Z

@oharboe OK. By the way, test/orfs/hopeless test case should be removed because it is redundant (you added a new rsz test case), right?

I wonder if you wish to keep the orfs/hopeless test case for any reason.

oharboe · 2026-04-28T14:55:53Z

@oharboe OK. By the way, test/orfs/hopeless test case should be removed because it is redundant (you added a new rsz test case), right?

I wonder if you wish to keep the orfs/hopeless test case for any reason.

No particular reason, please remove it. I just think that this feature should have a test that protects the use case :-)

jhkim-pii · 2026-04-29T01:04:11Z

@oharboe

PR description: Adds a deterministic WNS-stagnation gate to RepairSetup::terminateProgress(). Best-so-far WNS is sampled every pass into a 200-pass ring buffer; if the best observed value has not improved by max(1 ps, 0.5% * |initial_wns|) over a full window, the gate returns true.

--> I cannot find any ring buffer in the code. It looks like the code implementation is different from what you intended.

jhkim-pii · 2026-04-29T01:07:40Z

Matt rebased the repair_hold_multi_output_load.tcl regression, so there will be no issue if you pull origin/master again.

jhkim-pii · 2026-04-29T01:09:36Z

+      const std::string wns_msg = wnsStagnationReport(opto_iteration);
+      if (!wns_msg.empty()) {
+        logger_->info(RSZ, 236, "{}", wns_msg);
+      }


logger_->info(...) call in every caller site.
It would be better to move the logger_->info(...) call within wnsStagnationReport().
Then wnsStagnationReport() does not require the argument and string return.

jhkim-pii · 2026-04-29T01:39:27Z

@oharboe I implemented the ring buffer here in the new PR including this PR.

rsz: update failing test golden on PR 10248 #10284

oharboe · 2026-04-29T04:34:09Z

@oharboe I implemented the ring buffer here in the new PR including this PR.

rsz: update failing test golden on PR 10248 #10284

Thank you!

github-actions Bot added the size/M label Apr 24, 2026

github-actions Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/rsz/src/RepairSetup.hh

Comment thread src/rsz/src/RepairSetup.hh Outdated

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/rsz/src/RepairSetup.hh

Comment thread src/rsz/src/RepairSetup.cc Outdated

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/rsz/src/RepairSetup.cc Outdated

Comment thread src/rsz/src/RepairSetup.cc Outdated

oharboe and others added 2 commits April 24, 2026 12:13

Update src/rsz/src/RepairSetup.cc

41913a6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

Update src/rsz/src/RepairSetup.cc

8e5b2f2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/rsz/src/RepairSetup.cc Outdated

rsz: clang-format RepairSetup.cc

6466ae9

Wrap long sta::Slack initialization line that clang-format flagged on PR The-OpenROAD-Project#10248 after the gemini-code-assist suggestion was applied via the GitHub UI. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

github-actions Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/rsz/src/RepairSetup.cc Outdated

oharboe requested a review from maliberty April 24, 2026 16:50

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

maliberty requested a review from jhkim-pii April 26, 2026 00:02

oharboe and others added 2 commits April 28, 2026 10:04

Merge remote-tracking branch 'origin/master' into end-hopeless-repair

2d61012

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com> # Conflicts: # src/rsz/test/CMakeLists.txt

openroad-ci mentioned this pull request Apr 28, 2026

rsz: update failing test golden on PR 10248 #10284

Open

jhkim-pii reviewed Apr 29, 2026

View reviewed changes

oharboe closed this Apr 29, 2026

Conversation

oharboe commented Apr 24, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

oharboe commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

oharboe commented Apr 24, 2026

Uh oh!

oharboe commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

maliberty commented Apr 26, 2026

Uh oh!

jhkim-pii commented Apr 26, 2026

Uh oh!

jhkim-pii commented Apr 26, 2026

Uh oh!

oharboe commented Apr 26, 2026

Uh oh!

oharboe commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maliberty commented Apr 27, 2026

Uh oh!

oharboe commented Apr 27, 2026

Uh oh!

oharboe commented Apr 28, 2026

Uh oh!

jhkim-pii commented Apr 28, 2026

Uh oh!

oharboe commented Apr 28, 2026

Uh oh!

oharboe commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

oharboe commented Apr 28, 2026

Uh oh!

jhkim-pii commented Apr 28, 2026

Uh oh!

jhkim-pii commented Apr 28, 2026

Uh oh!

oharboe commented Apr 28, 2026

Uh oh!

jhkim-pii commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oharboe commented Apr 27, 2026 •

edited

Loading

jhkim-pii commented Apr 28, 2026 •

edited

Loading