Skip to content

rsz: update failing test golden on PR 10248#10284

Open
openroad-ci wants to merge 18 commits intoThe-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:secure-pr-10248-add-unit-test
Open

rsz: update failing test golden on PR 10248#10284
openroad-ci wants to merge 18 commits intoThe-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:secure-pr-10248-add-unit-test

Conversation

@openroad-ci
Copy link
Copy Markdown
Collaborator

This PR is based on the latest head of the original public PR:

The only additional change on top of that PR is the failing-test rebaseline commit:

  • 1dfe516 rsz: update repair hold multi-output golden

This updates only src/rsz/test/repair_hold_multi_output_load.ok for the current timing output. The previously local repair_setup_stagnation test commit was intentionally excluded because PR #10248 already added a similar repair_setup_hopeless unit test.

Verification:

bazel test --cache_test_results=no --test_output=errors //src/rsz/test:repair_hold_multi_output_load-tcl_test

Result: PASSED

oharboe and others added 9 commits April 24, 2026 10:31
Adds a deterministic WNS-stagnation gate to RepairSetup::terminateProgress().
Best-so-far WNS is sampled every pass into a 200-pass ring buffer; if the
best observed value has not improved by max(1 ps, 0.5% * |initial_wns|)
over a full window, the gate returns true. Combined with the existing
two-consecutive-termination rule this aborts the phase after ~1200 passes
of no WNS movement on designs where tiny TNS twitches would otherwise keep
the optimizer running forever.

Motivated by automated architectural-exploration flows where the .sdc is
held at an ambitious target while RTL evolves. Without this, a WNS gap of
hundreds of ps grinds repair_timing for hours producing no useful movement
and forcing users to guess SETUP_SLACK_MARGIN values to stop it. The gate
fires only when no reasonable user would disagree that further effort is
futile (1 ps absolute floor keeps tape-out grind untouched; the TNS
fix-rate gate still owns termination near closure).

On trip, a single loud INFO log (RSZ-0234/235/236) names the best-effort
WNS and notes this is probably an exploration run. No new Tcl flag, no
new ORFS env var - hardcoded conservative defaults.

New test test/orfs/hopeless/ synthesizes 8 parallel 8-deep arithmetic
pipelines on asap7 with a 50 ps clock and asserts both that the flow
completes and that the gate log message appears (sh_test + grep). A
check_same idempotent test guards the determinism property.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
- RepairSetup.hh: include <cstddef> for size_t and <string> for
  std::string (include-cleaner warnings from clang-tidy CI bot).
- RepairSetup.hh / RepairSetup.cc: name the stagnation-gate warmup
  threshold wns_stagnation_warmup_iterations_ instead of the bare 1000
  literal, matching the style of the other tunables in this class.

Not applied: gemini-code-assist's suggestion to change
wns_stagnation_abs_tol_ from 1.0e-12f to 1.0e-3f. sta::Slack is
measured in seconds (see src/sta/include/sta/Units.hh:73 "Sta internal
units are always seconds, ..."), so 1.0e-12 s is 1 ps, which is the
intended value. 1.0e-3 s would be 1 ms and would disable the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Wrap long sta::Slack initialization line that clang-format flagged on
PR The-OpenROAD-Project#10248 after the gemini-code-assist suggestion was applied via the
GitHub UI.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
The sliding-window check (window_best - window_oldest over 200 passes)
mis-classified plateaus as futility: any real design whose WNS drops
dramatically early then flat-lines at a topology-bound floor while TNS
keeps improving (aes, clone_flat, repair_fanout*) tripped the gate and
aborted repair_timing too early, leaving worse max_cap/max_slew slack
and different .ok-file output.

Replace with best_wns_ever vs initial_wns_: only fire when WNS has
effectively never moved from its starting value, which is the real
signature of an obviously-futile run.

Also bump rel_tol 0.5% -> 5%. Empirically aes improves WNS by ~50% of
initial, clone_flat by ~95%, repair_fanout by ~90%; the hopeless.v
synthetic only moves WNS by ~2% because buffer insertion chips
something off even a grossly-over-clocked design. 5% sits in the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Tiny TCL unit test exercising the WNS-stagnation gate added in this PR
without scraping log strings.  It builds a flat block of 1200
register-to-register pairs (DFF_X2 driver -> DFF_X2 sink, no logic in
between), holds the cell library to _X1 / _X2 sizes, and sets a 1 ps
clock period so that no SizeUpMove / BufferMove / pin-swap / clone /
split-load / unbuffer move can improve WNS.  The endpoint count is
chosen so the inner-loop counter passes the gate's warmup before the
legacy phase has visited every endpoint.

Without the gate the same .tcl runs to iteration 1200 (one futile pass
per endpoint); with the gate it exits at iteration 1002 and the .ok
diverges, so the test fails without the fix and passes with it.

Replaces the orfs-based test/orfs/hopeless gate-coverage check that
maliberty asked to convert into a repair_setup-only unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

# Conflicts:
#	src/rsz/test/CMakeLists.txt
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'WNS-stagnation' gate to the repair_timing flow in RepairSetup. This mechanism detects and aborts 'obviously futile' optimization runs where the Worst Negative Slack (WNS) fails to improve significantly after a warmup period, preventing the tool from grinding indefinitely on infeasible designs. The changes include new tracking methods in RepairSetup, updated logging, and a new regression test suite to verify the gate's behavior. I have no further feedback to provide.

This reverts commit 1dfe516.

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
…ate/OpenROAD into secure-pr-10248-add-unit-test

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

The window-improvement and threshold values are pure functions of
already-stored state (best_wns_, ring buffer, initial_wns_). Compute
them locally in terminateProgress() and recompute on demand in
wnsStagnationReport() instead of mirroring them on the class.

Removes 2 members and the "result-cached-for-display" smell.

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@jhkim-pii
Copy link
Copy Markdown
Contributor

jhkim-pii commented Apr 29, 2026

@jhkim-pii
Copy link
Copy Markdown
Contributor

jhkim-pii commented Apr 29, 2026

@oharboe
There are two issues.

1. WNS impact

  • Current WNS stagnation threshold 200 pass affects setup WNS (some designs showed better WNS while some other designs showed worse WNS in the dashboard).
  • 200 pass threshold is insufficient??? Need more investigation.
    --> Found the reason.
    • Repair setup is used at multiple stages (floorplan, placement, cts, ...).
    • The different optimization result in an early stage (e.g., floorplan) leads to different placement and different finish QoR.

2. TNS degradation

  • WNS stagnation checker does not consider the TNS improvement. So it causes TNS degradation.
  • I think WNS stagnation checker should not be enabled for LEGACY and LAST_GASP phases because it hurts TNS.

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
…-unit-test

Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>

# Conflicts:
#	src/rsz/test/CMakeLists.txt
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@jhkim-pii
Copy link
Copy Markdown
Contributor

jhkim-pii commented Apr 29, 2026

@oharboe I enabled your WNS stagnation checker for WNS phase only. If you do not take care of TNS, you can run repair_timing w/ WNS phase configuration. If it cannot satisfy your needs, please share your thoughts.

@oharboe
Copy link
Copy Markdown
Collaborator

oharboe commented Apr 29, 2026

@oharboe I enabled your WNS stagnation checker for WNS phase only. If you do not take care of TNS, you can run repair_timing w/ WNS phase configuration. If it cannot satisfy your needs, please share your thoughts.

Yes, I only have this problem when WNS < 0.

@jhkim-pii
Copy link
Copy Markdown
Contributor

@oharboe
It looks like my explanation was insufficient.

  1. The current WNS stagnation checker only monitors the WNS. It terminates the optimization if there is less than 0.5% WNS enhancement over 200 iterations.
  2. Originally, WNS stagnation checker was enabled for LEGACY, LAST_GASP, and WNS phases.
  3. LEGACY & LAST_GASP phases should improve both WNS & TNS.
  4. But the early termination by WNS monitoring can hurt TNS in some designs. Some designs still have room to improve TNS further, but the WNS stagnation checker eliminates the TNS improvement opportunity.
  5. So I applied the WNS stagnation checker only for WNS phase. It means that WNS stagnation checker is disabled in LEGACY & LAST_GASP policies.

I want to clarify these things.
A. In your experiment, are you interested in WNS only? (do not care how bad the TNS is)
If it is true, I think you can use WNS phase instead of LEGACY to ignore TNS.

B. If you care the TNS too, then the TNS monitoring logic enhancement is required.

So I wonder if you are ok w/ WNS phase (repair_timing -setup -phases "WNS"). Then, current version in this PR would be sufficient.

@oharboe
Copy link
Copy Markdown
Collaborator

oharboe commented Apr 30, 2026

When it is clearly hopeless to close timing(WNS=0), then I want the quick wins in WNS and TNS and then terminate. Terminating quickly is more important than the exact improvements in WNS and TNS, we just want to fix pathological fanout things that are quick to fix(such as pathological fanout?).

@jhkim-pii
Copy link
Copy Markdown
Contributor

I see. If repair_timing -setup -phases "WNS" with this PR cannot resolve your issue, please let us know.
We can find another solution (e.g., another TNS stagnation checking logic, apply multi-threading for faster runtime at the expense of small QoR loss, ...).

@oharboe
Copy link
Copy Markdown
Collaborator

oharboe commented May 1, 2026

I see. If repair_timing -setup -phases "WNS" with this PR cannot resolve your issue, please let us know. We can find another solution (e.g., another TNS stagnation checking logic, apply multi-threading for faster runtime at the expense of small QoR loss, ...).

Do I have to configure something or change options?

The change I made originally would do this without configuration, which is important for the automated design space exploration and I think a better user experience overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants