Skip to content

litepci asap7: hold structurally skew-bound (~2.2 ns CTS skew, ~11k violations) #155

@mguthaus

Description

@mguthaus

Context

From PR #154. With real wd_in-fixed FakeRAM and the corrected single-clock + generated-clock-feedthrough SDC, litepci-asap7 closes setup (reg-to-reg MET +0.758 ns at 3.6 ns) but does not close hold: ~2.2 ns CTS clock skew, ~11,000 hold violations.

Root cause

The fixed LiteX RTL clocks 20 DMA/TLP FakeRAM macros from a single small clock source — the pcie_us/user_clk blackbox-macro pin. Those 20 sink macros are 59% (util 60) – 74% (util 75) of the core area. CTS must snake a deep buffer chain across the die around macro blockages:

  • launch clock → one FakeRAM: 3 buffers, 0.21 ns
  • capture clock → another FakeRAM: ~25 buffers, 2.94 ns
  • → ~2.2–2.7 ns launch/capture skew → ~11k hold violations

In-scope levers tried (all fail)

  • Utilization 60→75: die −20%, setup TNS −75%, but skew only 2.7→2.2 ns and hold-violation count flat — skew is ~die-size-independent.
  • Macro placement — one tight block (overflows die height), balanced perimeter ring (overflows usable edge length), two-band + central clock stripe (bands consume the full die, no stripe). All fail the same area/perimeter wall: the macros are too dense to co-locate compactly with the single clock-source pin and leave room for a balanced central clock spine at any util meeting the area goal.
  • SDC — single-clock + generated-clock-feedthrough fixes removed the spurious −2 ns WNS but do not touch the physical skew.
  • SKIP_INCREMENTAL_REPAIR=1 (ODB-1200 workaround) also disables post-GRT hold repair, but data-side hold buffering cannot absorb a ~2.2 ns structural skew anyway.

Why not just fix it

Closing hold would require modifying the RTL (the clock buffer/tree a real SoC integrator would add around the PCIe hard-IP user clock), which the HighTide benchmark charter forbids (RTL is a fixed input). So asap7's current deliverable is: setup closes, hold characterized as structurally skew-bound — a legitimate benchmark result.

Possible future directions (need discussion)

  1. Useful-skew / CTS clustering knobs — investigate whether ORFS CTS can be told to build a balanced multi-root tree to the FakeRAM clk pins (e.g. CTS_* clustering, per-sink-group roots). Lower confidence given the physical spread, but not yet exhausted.
  2. Macro-pin clock source modeling — the clock root being a blackbox-macro output pin may prevent CTS from rooting/buffering optimally; explore an SDC/PRE_CTS_TCL approach that gives CTS a better insertion point.
  3. Charter exception — if a single inserted clock buffer/tree at the pcie_us/user_clk boundary is considered integration glue (not core RTL), a minimal netlist-level CTS hint could be permitted.
  4. Accept + document (current state) — keep the characterization; revisit if OpenROAD CTS gains better macro-aware balancing upstream.

See designs/src/litepci/DECISIONS.md ("Per-platform PPA + the structural hold-skew limit") for the full analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions