litepci asap7: hold structurally skew-bound (~2.2 ns CTS skew, ~11k violations)

## Context

From PR #154. With real wd_in-fixed FakeRAM and the corrected single-clock + generated-clock-feedthrough SDC, litepci-asap7 **closes setup** (reg-to-reg MET +0.758 ns at 3.6 ns) but **does not close hold**: ~2.2 ns CTS clock skew, ~11,000 hold violations.

## Root cause

The fixed LiteX RTL clocks **20 DMA/TLP FakeRAM macros** from a single small clock source — the `pcie_us/user_clk` blackbox-macro pin. Those 20 sink macros are **59% (util 60) – 74% (util 75)** of the core area. CTS must snake a deep buffer chain across the die around macro blockages:

- launch clock → one FakeRAM: 3 buffers, 0.21 ns
- capture clock → another FakeRAM: ~25 buffers, 2.94 ns
- → ~2.2–2.7 ns launch/capture skew → ~11k hold violations

## In-scope levers tried (all fail)

- **Utilization** 60→75: die −20%, setup TNS −75%, but skew only 2.7→2.2 ns and hold-violation count flat — skew is ~die-size-independent.
- **Macro placement** — one tight block (overflows die height), balanced perimeter ring (overflows usable edge length), two-band + central clock stripe (bands consume the full die, no stripe). All fail the same area/perimeter wall: the macros are too dense to co-locate compactly with the single clock-source pin *and* leave room for a balanced central clock spine at any util meeting the area goal.
- **SDC** — single-clock + generated-clock-feedthrough fixes removed the spurious −2 ns WNS but do not touch the physical skew.
- `SKIP_INCREMENTAL_REPAIR=1` (ODB-1200 workaround) also disables post-GRT hold repair, but data-side hold buffering cannot absorb a ~2.2 ns structural skew anyway.

## Why not just fix it

Closing hold would require **modifying the RTL** (the clock buffer/tree a real SoC integrator would add around the PCIe hard-IP user clock), which the HighTide benchmark charter forbids (RTL is a fixed input). So asap7's current deliverable is: setup closes, hold characterized as structurally skew-bound — a legitimate benchmark result.

## Possible future directions (need discussion)

1. **Useful-skew / CTS clustering knobs** — investigate whether ORFS CTS can be told to build a balanced multi-root tree to the FakeRAM clk pins (e.g. `CTS_*` clustering, per-sink-group roots). Lower confidence given the physical spread, but not yet exhausted.
2. **Macro-pin clock source modeling** — the clock root being a blackbox-macro output pin may prevent CTS from rooting/buffering optimally; explore an SDC/`PRE_CTS_TCL` approach that gives CTS a better insertion point.
3. **Charter exception** — if a single inserted clock buffer/tree at the `pcie_us/user_clk` boundary is considered integration glue (not core RTL), a minimal netlist-level CTS hint could be permitted.
4. **Accept + document** (current state) — keep the characterization; revisit if OpenROAD CTS gains better macro-aware balancing upstream.

See `designs/src/litepci/DECISIONS.md` ("Per-platform PPA + the structural hold-skew limit") for the full analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

litepci asap7: hold structurally skew-bound (~2.2 ns CTS skew, ~11k violations) #155

Context

Root cause

In-scope levers tried (all fail)

Why not just fix it

Possible future directions (need discussion)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

litepci asap7: hold structurally skew-bound (~2.2 ns CTS skew, ~11k violations) #155

Description

Context

Root cause

In-scope levers tried (all fail)

Why not just fix it

Possible future directions (need discussion)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions