Rearchitect resizer repair setup#10223
Rearchitect resizer repair setup#10223openroad-ci wants to merge 62 commits intoThe-OpenROAD-Project:masterfrom
Conversation
- Details in `src/rsz/doc/architecture.md` Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
SetupLegacyPolicy::tryRepairTarget unconditionally returned after trySizeDownBatch, so a failed size-down silently skipped the remaining moves (size-up, swap-pins, buffering, cloning) for that target. Master's RepairSetup and SetupLegacyMtPolicy both fall through on failure; restore that behavior. SizeUpMtCandidate::apply committed the replacement without revalidating max-cap. Candidates are screened on a pre-batch snapshot, so two accepted size-ups on different sinks of the same fanin net could each pass independently yet combine past the driver's max-cap limit. Re-check replacementPreservesMaxCap at apply time so the second commit rejects cleanly instead of creating a silent violation. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
There was a problem hiding this comment.
Code Review
This pull request re-architects the repair_setup optimization flow into a policy-driven framework, replacing monolithic move classes with MoveGenerator and MoveCandidate pairs. It introduces a central Optimizer and various OptPolicy implementations, including experimental multi-threaded versions. The changes also include new stages for cache warming and data preparation to ensure thread safety. Feedback was provided regarding outdated documentation in Resizer.hh that incorrectly attributes move accounting to journaling methods instead of the new MoveCommitter class.
| // Jounalling support for checkpointing and backing out changes | ||
| // during repair timing. | ||
| // - Update ECO move counts |
There was a problem hiding this comment.
The comment here seems to be outdated. The journal* methods in Resizer.cc now just call the eco* methods and do not handle move counting themselves. The move accounting logic has been moved to the MoveCommitter class. To avoid confusion, this comment should be updated to reflect that these are wrappers and that move counting is handled elsewhere.
References
- When analyzing code within a loop, consider the entire loop structure. Operations at the beginning of the loop (e.g., journalBegin) can affect the logic within the loop body, potentially making similar calls inside the loop redundant.
erendn
left a comment
There was a problem hiding this comment.
I made some initial comments. There can also be the CostDeltas struct that @QuantamHD had mentioned. It can help set the scoring criteria based on the current policy. We need to have some heuristics for each move to estimate delay/power/area impact.
| | MT support | Added common `utl::ThreadPool` and MT-capable policies/generators. | | ||
| | Prewarm stage | Added policy-level Liberty-cell and driver-cache warmup before target preparation. Required to complete lazy updates and ensure thread safety. | | ||
| | Prepare stage | Added per-target `ArcDelayState` caching for expensive STA-derived data before MT generation/estimation. Required to reduce redundant computations by multi-threads. | | ||
| | Scope | The new architecture targets `repair_setup` only. It can be extended to `repair_design`, `repair_hold`, and `recover_power` later. | |
There was a problem hiding this comment.
I agree. We can score candidates for hold/power/area and have separate policies for those.
| | Prepare stage | Added per-target `ArcDelayState` caching for expensive STA-derived data before MT generation/estimation. Required to reduce redundant computations by multi-threads. | | ||
| | Scope | The new architecture targets `repair_setup` only. It can be extended to `repair_design`, `repair_hold`, and `recover_power` later. | | ||
| | Compatibility | `SetupLegacyPolicy` is the default path and is kept close to legacy repair behavior. The QoR goal is ORFS design parity. | | ||
| | Target model | `Target` exposes path-driver and instance view bits so generators can share one target representation. New views will be added later. | |
There was a problem hiding this comment.
It makes sense to have this Target expose endpoint and driver paths.
|
|
||
|
|
||
| [OptPolicy::iterate()] | ||
|
|
There was a problem hiding this comment.
I think we can also have policies like:
CommonViolatorsPolicy: Collect violators based on how frequently they are encountered along violating paths and try to repair them.
BurstUpdatePolicy: Try multiple moves at a time and update STA together. If the target metric improves, commit all of them. Otherwise, revert and try those moves one by one.
GreedyUpsizePolicy: Do a greedy upsizing on violating gates, possibly pre-placement.
There was a problem hiding this comment.
Yeah. Sounds good.
We can make new policies easily by adding new OptPolicy classes.
In the legacy resizer, it already has multiple phases (e.g., WNS, TNS, STARTPOINT_FANOUT, ENDPOINT_FANIN, LAST_GASP, ...), which is conceptually the same as policies.
I think we can split the legacy phases out of the big SetupLegacyPolicy class and provide flexible mixing of old and new policies.
| MoveGenerator / MoveCandidate | ||
| Own move-local legality and ECO plan | ||
| Decides how to optimize | ||
| MoveGenerator.isApplicable(): Check if the move is applicable quickly |
There was a problem hiding this comment.
Why doesn't MoveCandidate have isApplicable() instead?
There was a problem hiding this comment.
This is the current logic.
MoveGeneratorhasisApplicable()method. Necessary filtering should be done at the generate stage.MoveGeneratorshould create applicableMoveCandidates. But some of theMoveCandidates can be found to be illegal (e.g., DRC violation, ...) afterestimate().
There was a problem hiding this comment.
I didn't understand what isApplicable() does exactly. Does it check if the target has a certain view mask the move requires? Can we move that function to that move's candidate class?
There was a problem hiding this comment.
isApplicable() is the place to apply various filters.
- Target view validation check is one of them.
- If a certain move supports post-CTS stage only, we can add such stage filter in that API.
- If we move
isApplicable()toMoveCandidate, multipleMoveCandidates may do the same filtering. So having it inMoveGeneratorlooks better to me. Please let me know if you have a case thatMoveCandidate::isApplicable()is better thanMoveGenerator::isApplicable(). Having the both APIs will make users confusing.
|
|
||
| protected: | ||
| void buildMoveGenerators(...); | ||
| void prepareTargets(std::vector<Target>& targets) const; |
There was a problem hiding this comment.
What does it mean to "prepare targets"?
There was a problem hiding this comment.
Prepare is a stage to cache necessary data to reduce redundant computation during estimation by multiple worker threads.
prepareTargets() annotates additional cached data for the efficient move candidate estimation.
Different move types may require different data to be cached.
But basically, this is burden caused by inconvenience of the process of retrieving state/data retrieval from STA in multi-threading environment.
If STA provides MT-friendly APIs for the data retrieval, Prepare stage can be eliminated in the future.
src/rsz/doc/architecture.md has an explanation about Prepare stage.
|
|
||
| | Role | Class | Responsibility | | ||
| |---|---|---| | ||
| | Generator | `MoveGenerator` derived class | Decide applicability and create candidate objects for one target. | |
There was a problem hiding this comment.
Do we really need a separate MoveGenerator for each move type?
There was a problem hiding this comment.
I don't understand what you mean.
Do you think that a single MoveGenerator should know how to make all MoveCandidates of different move types?
In the current architecture, MoveGenerator and MoveCandidate are tightly-coupled.
I think this makes adding a new move more difficult because two new tightly-coupled classes should be created.
Currently, there are many generator and candidate classes for multiple move types.
The number of moves can be reduced by
- Once a new move supporting MT is developed, the old ST (single-thread) move should be retired.
- Merging similar moves. E.g.,
VtSwap,SizeUpMoveandSizeUpMatchMove->SizeMove(option control to enable Vt swap up-sizing & down-sizing the target)SizeDownMove->ReduceLoadMove// Because it reduces loads of the target instead of the target itself.BufferMoveandSplitLoadMove(subset ofBufferMove) ->BufferMove
There was a problem hiding this comment.
My question is also related to MoveGenerator::isApplicable(). What if we have only one MoveGenerator class and MoveCandidate classes have the isApplicable() function instead? It felt a bit redundant to have two classes for each move type.
And I agree that we should rename SizeDownMove to ReduceLoadMove and then we can implement a SizeDownMove that actually only downsizes the target.
There was a problem hiding this comment.
It felt a bit redundant to have two classes for each move type.
Yeah. I don't like the two mandatory classes for each move type.
But let's imagine that one big MoveGenerator generates MoveCandidates for all move types. I think it will significantly hurt the code readability. Having multiple MoveGenerators would provide better readability.
There was a problem hiding this comment.
This split is directly from @QuantamHD original proposal. I think separating generation from result doesn't imply redundancy as they two classes have different responsibilities.
| b. Develop a score metric for fair comparison among different moves | ||
| c. Ensure thread safety of APIs | ||
| d. Better delay estimation | ||
| e. Better optimization policy |
There was a problem hiding this comment.
We can add new move types:
RelocateMove: Move a gate closer to its violating fanout gates
DecomposeMove: Break a complex logic gate into smaller gates
InvBufferMove: Replace a buffer with two inverters
VTSwapLeakageMove: Reduce VT of a non-violating gate
There was a problem hiding this comment.
VTSwapLeakageMove: Reduce VT of a non-violating gate
I think a sizing move for power recovery can include this.
|
|
||
| 1. Architecture | ||
| a. Apply review feedback | ||
| b. Consider moving `MoveTracker` out of `MoveCommitter` for detailed tracking |
There was a problem hiding this comment.
I think we can also have different violator collection modes:
PathMode: Collect the gates along the violating path
FaninMode: Collect the fanin cone of an endpoint
FanoutMode: Collect the fanout cone of a startpoint
IntersectionMode: Collect the common violators that intersect multiple violating paths
Definitely, we need it. e.g., struct Estimate {
bool legal{false}; // Feasible and DRC clean
float score; // Output of a scoring equation (configurable) w/ the following PPA metrics.
// Used for MoveCandidate sorting
// Per-move-type adjustment may be required for fair cross-move comparison
// Timing
float delay_impr{0.0f}; // Delay improvement of the focused arc/path. Manage the rise/fall sense data or use sum/avg?
float slack_impr{0.0f}; // Slack improvement may not be the same as slack improvement.
float side_path_delay_impr{0.0f}; // Sum of delay deltas of the side-paths
float side_path_slack_impr{0.0f}; // Sum of slack deltas of the side-paths
// Power
float total_pwr_impr{0.0f}; // Total power improvement
float leak_pwr_impr{0.0f}; // Leakage power improvement
float dyn_pwr_impr{0.0f}; // Dynamic power improvement.
// Area
float area_impr{0.0f};
}The attributes can be more (mainly for analysis) or less than this.
|
|
For scoring I think we should separate estimation of fields (like area, power, timing) from the combining to a single score. I think the policy should do the combining as different policies might have different goals (e.g. power recover vs setup fixing). The policy can tell the move estimator what fields it needs to avoid unnecessary computation. |
| } | ||
|
|
||
| const odb::Point drvr_loc = resizer_.dbNetwork()->location(drvr_pin); | ||
| candidates.push_back(std::make_unique<SplitLoadCandidate>( |
There was a problem hiding this comment.
We can build multiple SplitLoadCandidate here where each one has a different split (not necessarily 50/50).
There was a problem hiding this comment.
Yes. It is an easy way to make the SplitLoadMove* support MT.
BTW, it looks like SplitLoad move is conceptually a subset of Buffer move. Then it may be better to make BufferGenerator generates load-splitting candidates together and remove the SplitLoad move because it is a subset of buffering.
There was a problem hiding this comment.
I think they have different goals. Split load is about separating off non-critical loads where buffering is about speeding up critical loads.
There was a problem hiding this comment.
@maliberty one of the ways generic buffering speeds up critical loads is by separating off non-critical loads so I agree with Jaehyun's "it looks like SplitLoad move is conceptually a subset of Buffer move."
To some degree the buffering code today already separates off loads (in what it calls "topology rewriting") and it might be smarter about it than SplitLoad, i.e. it does not use a fixed 50/50 split and considers placement. We can test if keeping SplitLoad is providing a benefit, perhaps we can retire it
There was a problem hiding this comment.
So long as the unified approach can achieve the same transformations / QOR that's fine.
| } | ||
|
|
||
| const float arrival_impr = context.current_delay - candidate_delay; | ||
| return {.legal = arrival_impr > 0.0f, |
There was a problem hiding this comment.
SetupLegacyPolicy allows worsening timing to get out of local minima. Maybe we should leave the "legality" to be decided by each policy based on estimated metrics.
There was a problem hiding this comment.
Right.
The current code is an initial draft and needs further improvement through research.
As a fundamental infra for optimization, more feature enhancements are needed especially for the delay/slack estimation to predict the timing impact more accurately over a wider range.
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Remove the defensive candidate_cell nullptr guard from estimateWindow. The ArcDelayState estimation path already requires generated candidates to provide a valid Liberty cell, and estimateWindow relies on prepared valid state before candidate-specific checks. Treating a null candidate as kMissingCandidatePort hides a caller contract violation. Keep the existing missing scene-cell and candidate-port checks for real candidate compatibility failures. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Cache the current total delay in ArcDelayState when the path-stage window is built. This avoids re-walking the stage window for every candidate estimate and keeps candidate scoring on prepared state. Also remove a small reporter-only containsPin helper by using std::find directly. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Merge origin/master into the rsz2 architecture branch and resolve setup-repair policy/generator conflicts. Port the new repair_timing fixes into the rsz2 move structure, keep retired legacy move files deleted, and merge the updated rsz/dbSta regression registrations. Adjust ODB hierarchical moditerm handling so escaped Verilog port names containing the hierarchy delimiter pass the new master tests. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Apply clang-format to the measured VT swap candidate reservation code so the branch passes the formatter used by CI. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Describe the current repair_setup phase pipeline driven by repair_timing -phases. Document the DelayEstimator main-thread prepare and worker-safe estimate split, plus the DelayEstimatorReporter diagnostic flow. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Reduce the synthetic fanout sizes for repair_design3, repair_design3_verbose, and repair_fanout6 so they can run in the default rsz CTest suite. Rebase affected golden logs, including repair_fanout6_multi because it sources repair_fanout6.tcl. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Limit measured VT swap and arithmetic replacement regressions to smaller but still meaningful coverage points. This keeps the tests focused on the intended behavior while reducing CI wall time. Use a compact pre-placed DEF fixture for replace_arith_modules3 so the test no longer runs global placement. Rebase the affected golden logs after the runtime changes. Testing: ctest -R 'rsz\.measured_vt_swap\.tcl$'; ctest -R 'rsz\.replace_arith_modules[123]\.tcl$'; git diff --check. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Mark the LEGACY_MT, MT1, and MEASURED_VT_SWAP setup policies as experimental and emit a warning when one is selected. The warning explicitly states that experimental policies should not be used for production. Rebase the affected rsz golden logs to include the new warning. Testing: cmake --build build --target openroad -j 16; ctest -R 'rsz\.(measured_vt_swap|repair_setup_legacy_mt|repair_setup_mt1|repair_setup_legacy_mt_tracker)\.tcl$'; git diff --check. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Rename the setup optimization policy base class, files, and policy configuration type to use the full OptimizationPolicy name. This keeps policy ownership and behavior unchanged while making the class role clearer in the rearchitected repair setup flow. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Resolve the resizer move architecture conflicts after the master merge. Port the resistance-aware reroute work into the generator/candidate architecture with a REROUTE setup phase policy. Keep the master swap-pin feedthrough net fix in Resizer::swapPins and add the corresponding C++ regression. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Merge origin/master at 35443bc into secure-rsz2. Keep the generator/candidate REROUTE architecture while accepting the non-conflicting upstream EstimateParasitics and Tcl updates. Verified with openroad build and rsz reroute regressions. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Apply the critical VT swap phase directly to collected instances instead of routing each instance through VtSwapGenerator. This matches the legacy master path more closely and avoids per-instance output slack scans during the hot apply loop. Keep move tracking support by resolving an output pin only when tracker detail logging needs it. AES TAT now matches master while preserving identical move counts, QoR, and final instance master maps. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Restore the legacy repair_setup phase contract for custom phase lists. Legacy-compatible phase pipelines now append an implicit CRIT_VT_SWAP tail unless the user already requested that phase explicitly. Also restore the default phase string to LEGACY LAST_GASP so the public interface matches the pre-split behavior while keeping CRIT_VT_SWAP available as an explicit phase token. Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
@erendn @maliberty I updated the followings. Changes
|
Summary
repair_setuparoundOptimizer, setup policies, move generators, and move candidates.MoveCommitter, repair target collection, delay estimation helpers, and sharedutl::ThreadPoolsupport.legacy_mt,mt1,measured_vt_swap).src/rsz/doc/architecture.md.TODO
a. Apply review feedback
b. Consider moving
MoveTrackerout ofMoveCommitterfor detailed trackinga. Envar or command option
b. Merge optimization policies (new) and phases (old)
a. Enhance VtSwap & SizeUp moves w/ MT
b. Develop a score metric for fair comparison among different moves
c. Ensure thread safety of APIs
d. Better delay estimation
e. Better optimization policy
Testing
legacy_mtpolicy regression:ctest -R legacy_mtmt1policy regression:ctest -R mt1cmake --build build --target TestResizerMt -j8ctest --test-dir build --output-on-failure -j8100% tests passed, 0 tests failed out of 7721rsztests (for fasterrsztesting)rsz.repair_design3.tclrsz.repair_design3_verbose.tclrsz.repair_fanout6.tcl