- Overview
- Architecture
- Pipeline Stages
- Hazard Detection Unit
- Forwarding Unit
- Supported Instructions
- Bubble Sort Case Study
- Testbench
- FPGA Implementation Results
A fully functional 5-stage pipelined RISC-V RV32I processor implemented in Verilog and synthesized on a Xilinx Artix-7 FPGA. The core implements the classic IF → ID → EX → MEM → WB pipeline with:
- Full data forwarding — EX/MEM and MEM/WB forwarding paths eliminate most data hazards with zero stall cycles
- Load-use hazard detection — automatic 1-cycle stall insertion for load-use RAW hazards
- Register file write-forwarding — same-cycle bypass closes the WB→ID hazard class entirely
- In-EX branch resolution — branches evaluated in the Execute stage with a 2-instruction flush on taken branches
- Complete RV32I subset — ADDI, ADD, SUB, SLT, LW, SW, BEQ, BNE with proper sign extension for all immediate types
The design was validated end-to-end using an in-place bubble sort benchmark ([5,3,8,1,9,2] → [1,2,3,5,8,9]) exercising every hazard class simultaneously — and optimized from an initial 47-instruction NOP-heavy program down to a 25-instruction zero-NOP inner loop.
5-stage pipeline with forwarding (blue) and hazard detection (blue) control paths
The architecture follows the classic Patterson & Hennessy 5-stage RISC pipeline. Forwarding paths (shown in blue) bypass data from the EX/MEM and MEM/WB pipeline registers back to the ALU input muxes in the EX stage. The Hazard Detection Unit monitors load-use and branch-taken conditions, generating PCWrite, IF_ID_Write, ID_EX_Flush, and IF_ID_Flush control signals.
%%{init: {"flowchart": {"curve": "step"}}}%%
flowchart LR
%% Branch MUX and PC
Target([branch_target]) --> MUX{MUX}
Adder[PC + 4] --> MUX
MUX --> PC[Program Counter]
%% Memory and Math
PC --> Mem[(Instruction Memory)]
PC --> Adder
%% IF/ID Pipeline Register
Mem --> IFID[IF / ID Register]
Adder --> IFID
PC --> IFID
%% To Next Stage
IFID --> ID_Stage([To ID Stage])
%% Clean Styling
style MUX fill:#aecee,stroke:#2c3e50
style PC fill:#8f4f8,stroke:#2980b9
style Mem fill:#8f8f5,stroke:#16a085
style Adder fill:#ef9e7,stroke:#39c12
style IFID fill:#8f4f8,stroke:#2980b9
The PC register (PC_hazard) is gated by PCWrite from the Hazard Detection Unit. A 2:1 mux selects between PC+4 (normal) and branch_target (on branch_taken from EX stage). The IF/ID pipeline register additionally carries the raw PC value forward for branch target computation, and can be independently flushed (IF_ID_Flush) on a taken branch.
Key module: IF_ID_hazard — supports three independent control inputs:
IF_ID_Write— freeze register on load-use stallIF_ID_Flush— zero register on branch taken (kills wrong-path instruction)reset— asynchronous clear
flowchart LR
IFID([IF/ID Register]) --> RF[Register File]
IFID --> IMM[Immediate Generator]
IFID --> CU[Control Unit]
IFID --> IDEX[ID/EX Register]
WB([WB Stage]) --> RF
RF --> IDEX
IMM --> IDEX
CU --> IDEX
HDU([Hazard Detection Unit]) --> IDEX
IDEX --> EX([To EX Stage])
The decode stage hosts the 32×32 register file (2 async read ports, 1 sync write port), the Immediate Generator (handles I, S, B, J type sign extension), and the Control Unit (generates 8-bit control word from 7-bit opcode).
The ID/EX register (ID_EX_hazard) carries: rs1_data, rs2_data, rs1, rs2, rd, imm, funct3, funct7, all control signals, Branch, and PC — 14 fields total, all zeroed on flush or reset for clean bubble insertion.
flowchart LR
IDEX([ ID/EX Register]):::blue --> EX
subgraph EX [" EX Stage "]
direction TB
FU(Forwarding Unit):::teal
ALU(ALU):::navy
BL(Branch Logic):::purple
end
FWD([ EX/MEM & MEM/WB]):::blue --> FU
FU -->|ForwardA / ForwardB| ALU
IDEX -->|rs1 rs2 imm\nfunct3 funct7\nALUOp ALUSrc| ALU
ALU -->|zero flag| BL
IDEX -->|Branch PC| BL
EX --> EXMEM([ EX/MEM Register]):::green
BL -->|branch_taken\nbranch_target| IF([ Back to IF Stage]):::orange
classDef blue fill:#980b9,color:#fff,stroke:#1a5276
classDef teal fill:#48f77,color:#fff,stroke:#0e6655
classDef navy fill:#b2a4a,color:#fff,stroke:#2c3e50
classDef purple fill:#d3c98,color:#fff,stroke:#6c3483
classDef green fill:#e8449,color:#fff,stroke:#196f3d
classDef orange fill:#35400,color:#fff,stroke:#ba4a00
The Execute stage is the most complex stage. It contains:
- Forwarding Unit — 2-bit
ForwardA/ForwardBselect which value feeds the ALU (see Forwarding Unit) - ALU Control — decodes
ALUOp[1:0]+funct7/funct3into 4-bit ALU opcode - ALU — performs ADD, SUB, AND, OR, SLT (signed) with zero flag for branch comparison
- Branch Logic — evaluates
Branch & (zero ^ funct3[0])to producebranch_taken - Branch target — combinationally computed as
ID_EX_pc + ID_EX_imm
Branch resolution in EX means only 2 instructions must be flushed on a taken branch (the instructions in IF and ID that followed the wrong path).
ALU Operations:
alu_ctrl |
Operation | Used by |
|---|---|---|
0000 |
AND | R-type AND |
0001 |
OR | R-type OR |
0010 |
ADD | R-type ADD, LW/SW address, ADDI |
0110 |
SUB | R-type SUB, BEQ comparison |
0111 |
SLT | R-type SLT (signed less-than) |
flowchart LR
EXMEM([ EX/MEM Register]):::blue --> MEM
subgraph MEM [" MEM Stage "]
direction TB
DMEM[(Data Memory)]:::navy
end
EXMEM -->|alu_result as address| DMEM
EXMEM -->|rs2_data as write data| DMEM
MEM --> MEMWB([MEM/WB Register]):::green
EXMEM -->|alu_result rd RegWrite MemtoReg| MEMWB
DMEM -->|mem_data_out| MEMWB
MEMWB --> WB([ WB Stage]):::orange
classDef blue fill:#980b9,color:#fff,stroke:#1a5276
classDef navy fill:#b2a4a,color:#fff,stroke:#2c3e50
classDef green fill:#e8449,color:#fff,stroke:#196f3d
classDef orange fill:#35400,color:#fff,stroke:#ba4a00
A 64-word word-addressed data memory (Data_Memory). Writes are synchronous (on posedge clk when MemWrite=1). Reads are combinational (MemRead gated). Address decoding uses read_address[7:2] — i.e., byte addresses are shifted right by 2 to get word addresses, naturally supporting the RISC-V word-aligned LW/SW convention.
Both instruction and data memories are implemented as distributed LUT RAM (RAMS64E primitives), consuming no BRAM tiles.
flowchart LR
MEMWB([ MEM/WB Register]):::blue
subgraph WB [" WB Stage "]
direction TB
MUX{Writeback MUX}:::purple
end
MEMWB -->|mem_data| MUX
MEMWB -->|alu_result| MUX
MEMWB -->|MemtoReg select| MUX
MUX -->|WB_WriteData| RF([ Register File]):::orange
MEMWB -->|rd RegWrite| RF
classDef blue fill:#980b9,color:#fff,stroke:#1a5276
classDef purple fill:#d3c98,color:#fff,stroke:#6c3483
classDef orange fill:#35400,color:#fff,stroke:#ba4a00
A simple 2:1 mux selects between memory read data (for LW) and ALU result (for R-type/ADDI). The result feeds back to the register file write port in the ID stage, and simultaneously to the forwarding unit as MEM_WB_WriteData.
The Hazard Detection Unit (Hazard_Detection_Unit) is a purely combinational module that monitors two hazard conditions each cycle and generates the appropriate pipeline control signals.
A load-use hazard occurs when a LW instruction in the EX stage writes a register that the very next instruction (currently in ID) needs to read. Since the data from memory isn't available until the end of the MEM stage, the pipeline must stall for one cycle.
Detection condition:
if (ID_EX_MemRead &&
((ID_EX_rd == IF_ID_rs1) || (ID_EX_rd == IF_ID_rs2)) &&
(ID_EX_rd != 0))Response — 3 simultaneous actions:
| Signal | Value | Effect |
|---|---|---|
PCWrite |
0 | Freeze PC — re-fetch same instruction next cycle |
IF_ID_Write |
0 | Freeze IF/ID — hold the instruction being decoded |
ID_EX_Flush |
1 | Insert bubble — zero all control signals in ID/EX |
This creates one pipeline bubble between the LW and its dependent instruction, giving the load result time to appear in MEM_WB_WriteData. The MEM/WB forwarding path (ForwardA/B = 01) then delivers the correct value to the ALU in the next cycle.
| Cycle | IF | ID | EX | MEM | WB |
|---|---|---|---|---|---|
| N | INSTR2 | INSTR1 | LW x7 |
— | — |
| N+1 |
INSTR2 🔒 | INSTR1 🔒 | BUBBLE |
LW x7 |
— |
| N+2 | INSTR3 | INSTR2 | INSTR1 | BUBBLE |
LW x7 ✅ |
🔒 Frozen —
PCWrite = 0,IF_ID_Write = 0
⚠️ Stall inserted —ID_EX_Flush = 1injects a bubble into EX✅ MEM/WB Forwarding fires —
ForwardA = 2'b01delivers load result to INSTR1 in EX
When branch_taken is asserted from the EX stage, two instructions fetched along the wrong path (one in IF/ID, one in ID/EX) must be killed.
Detection condition:
if (branch_taken) begin
IF_ID_Flush = 1'b1; // kill wrong-path instruction in IF/ID
ID_EX_Flush = 1'b1; // kill wrong-path instruction in ID/EX
endPCWrite and IF_ID_Write remain 1 so the correct branch target is fetched immediately. The 2-cycle branch penalty is unavoidable with EX-stage resolution but is far better than the 3-cycle penalty of MEM-stage resolution.
The Forwarding Unit (Forwarding_Unit) eliminates most RAW data hazards without stalls by routing results from later pipeline stages back to the ALU inputs. It generates two 2-bit control signals each cycle.
| Value | Source | Meaning |
|---|---|---|
2'b00 |
ID_EX_rs1/rs2_data |
No forwarding — use register file value |
2'b10 |
EX_MEM_alu_result |
EX hazard — forward from ALU output (1 cycle old) |
2'b01 |
MEM_WB_WriteData |
MEM hazard — forward from writeback (2 cycles old) |
EX forwarding takes priority over MEM forwarding — if the same register is being written by both EX/MEM and MEM/WB (rare but possible in a correctly-running program), the most recent value (EX/MEM) wins:
// EX hazard — highest priority
if (EX_MEM_RegWrite && (EX_MEM_rd != 0) && (EX_MEM_rd == ID_EX_rs1))
ForwardA = 2'b10;
// MEM hazard — only if EX is not also writing this register
if (MEM_WB_RegWrite && (MEM_WB_rd != 0) &&
!(EX_MEM_RegWrite && EX_MEM_rd != 0 && EX_MEM_rd == ID_EX_rs1) &&
(MEM_WB_rd == ID_EX_rs1))
ForwardA = 2'b01;The same logic applies symmetrically for ForwardB / ID_EX_rs2.
| Hazard | Example | Handled by |
|---|---|---|
| ALU → ALU (1 cycle apart) | ADD x1,x2,x3 then ADD x4,x1,x5 |
ForwardA/B = 10 |
| ALU → ALU (2 cycles apart) | ADD x1,... then NOP then ADD x4,x1,... |
ForwardA/B = 01 |
| LW → ALU (1 cycle apart) | LW x7,4(x5) then SLT x9,x7,x6 |
HDU stall + ForwardA = 01 |
| ALU → Branch | SUB x12,x20,x3 then BEQ x4,x12,... |
ForwardB = 10 |
| WB write → ID read (same cycle) | LW x6 in WB, SLT reading x6 in ID |
Register file write-forwarding |
The standard EX/MEM and MEM/WB forwarding paths cover hazards that are visible in the EX stage. But there is a fourth hazard class: when WB is writing a register at the same cycle that ID is reading it. Because the register file uses a non-blocking write (<=), the combinational read ports see the old value within the same timestep.
By the time the dependent instruction reaches EX (next cycle), the MEM_WB register has already advanced to the next instruction, so the forwarding unit has no record of the write. The fix is a combinational bypass inside Register_Bank:
// Same-cycle write-forwarding
assign read_data1 = (rs1 == 5'b0) ? 32'b0 :
(RegWrite && rd==rs1) ? Write_data :
Registers[rs1];
assign read_data2 = (rs2 == 5'b0) ? 32'b0 :
(RegWrite && rd==rs2) ? Write_data :
Registers[rs2];| Instruction | Opcode | ALUSrc | MemtoReg | RegWrite | MemRead | MemWrite | Branch | ALUOp |
|---|---|---|---|---|---|---|---|---|
| R-type (ADD/SUB/SLT/AND/OR) | 0110011 |
0 | 0 | 1 | 0 | 0 | 0 | 10 |
| LW | 0000011 |
1 | 1 | 1 | 1 | 0 | 0 | 00 |
| SW | 0100011 |
1 | — | 0 | 0 | 1 | 0 | 00 |
| BEQ/BNE | 1100011 |
0 | — | 0 | 0 | 0 | 1 | 01 |
| ADDI | 0010011 |
1 | 0 | 1 | 0 | 0 | 0 | 10 |
| Type | Format | Sign Extension |
|---|---|---|
| I-type (LW, ADDI) | inst[31:20] |
{20{inst[31]}, inst[31:20]} |
| S-type (SW) | inst[31:25] | inst[11:7] |
{20{inst[31]}, inst[31:25], inst[11:7]} |
| B-type (BEQ, BNE) | inst[31]|inst[7]|inst[30:25]|inst[11:8]|0 |
13-bit signed, bit 0 always 0 |
| J-type (JAL) | inst[31]|inst[19:12]|inst[20]|inst[30:21]|0 |
21-bit signed, bit 0 always 0 |
The bubble sort benchmark was chosen specifically because it exercises every hazard class simultaneously in a tight inner loop. Sorting [5, 3, 8, 1, 9, 2] → [1, 2, 3, 5, 8, 9].
| Register | Role |
|---|---|
x1 |
Array base address (0) |
x2 |
Array length n = 6 |
x3 |
Outer loop counter i |
x4 |
Inner loop counter j |
x5 |
Byte address of arr[j] = base + j×4 |
x6 |
arr[j] (loaded each iteration) |
x7 |
arr[j+1] (loaded each iteration) |
x8 |
Temporary register for swap |
x9 |
SLT result: 1 if swap needed, 0 otherwise |
x11 |
Word size constant (4) |
x12 |
Inner loop limit = outer_limit - i |
x20 |
Outer loop limit = n-1 = 5 |
; ── Initialization ──────────────────────────────────────────────────────
0x00: ADDI x1, x0, 0 ; base = 0 (array starts at mem[0])
0x04: ADDI x2, x0, 6 ; n = 6
0x08: ADDI x11, x0, 4 ; word_size = 4
0x0C: ADDI x20, x0, 5 ; outer_limit = n-1 = 5
0x10: ADDI x3, x0, 0 ; i = 0
; ── Outer Loop ──────────────────────────────────────────────────────────
OUTER:
0x14: BEQ x3, x20, DONE ; if i == 5, sorting complete
0x18: ADDI x4, x0, 0 ; j = 0
0x1C: SUB x12, x20, x3 ; inner_limit = 5 - i (shrinks each pass)
; ── Inner Loop ──────────────────────────────────────────────────────────
INNER:
0x20: BEQ x4, x12, OUTER_INC ; if j == inner_limit, next outer pass
; Address calculation: x5 = base + j*4
0x24: ADD x5, x0, x0 ; x5 = 0
0x28: ADD x5, x4, x4 ; x5 = j + j = 2j
0x2C: ADD x5, x5, x5 ; x5 = 4j
0x30: ADD x5, x5, x1 ; x5 = base + 4j → &arr[j]
; Load and compare
0x34: LW x6, 0(x5) ; x6 = arr[j]
0x38: LW x7, 4(x5) ; x7 = arr[j+1]
0x3C: SLT x9, x7, x6 ; x9 = (arr[j+1] < arr[j]) ? 1 : 0
; Conditional swap
0x40: BEQ x9, x0, NO_SWAP ; if x9==0, already in order
0x44: ADD x8, x6, x0 ; temp = arr[j]
0x48: SW x7, 0(x5) ; arr[j] = arr[j+1]
0x4C: SW x8, 4(x5) ; arr[j+1] = temp
NO_SWAP:
0x50: ADDI x4, x4, 1 ; j++
0x54: BEQ x0, x0, INNER ; unconditional branch back
OUTER_INC:
0x58: ADDI x3, x3, 1 ; i++
0x5C: BEQ x0, x0, OUTER ; unconditional branch back
DONE:
0x60: BEQ x0, x0, DONE ; halt (infinite self-loop)
0x64-0x74: NOP × 5 ; pipeline drainRISC-V has no multiply instruction in the base subset used here, so j×4 is computed using shifts-via-addition:
j×4 = j + j = 2j → 2j + 2j = 4j
This takes 3 ADD instructions (x5 = j+j, x5 = x5+x5, x5 = x5+base) but is entirely correct and data-hazard free since each ADD writes a different-enough register that forwarding handles all dependencies.
Every type of hazard the pipeline can encounter fires in the inner loop:
1. SUB → BEQ (ForwardB = 10, EX hazard)
0x1C: SUB x12, x20, x3 ← writes x12
0x20: BEQ x4, x12, ... ← reads x12 one cycle later
EX_MEM_rd = x12 matches ID_EX_rs2 = x12 → ForwardB = 2'b10 → BEQ uses EX_MEM_alu_result (the subtraction result) directly. Zero stall cycles.
2. LW x7 → SLT (HDU stall + MEM/WB forwarding)
0x38: LW x7, 4(x5) ← writes x7 (load — result from memory)
0x3C: SLT x9, x7, x6 ← reads x7 (rs1) immediately after
The Hazard Detection Unit fires: ID_EX_MemRead=1, ID_EX_rd=x7, IF_ID_rs1=x7 → 1 bubble inserted. After the stall, MEM_WB_rd=x7 and ForwardA=2'b01 delivers the loaded value from WB_WriteData. One stall cycle, correctly handled.
3. LW x6 → SLT (Register file write-forwarding)
0x34: LW x6, 0(x5) ← writes x6 (two instructions before SLT)
0x3C: SLT x9, x7, x6 ← reads x6 (rs2)
By the time SLT reaches ID, LW x6 is in WB writing Registers[x6]. The non-blocking write makes the combinational read return the old value. The standard forwarding unit cannot help because by the time SLT is in EX, MEM_WB_rd has moved on to x7. The register file write-forwarding bypass is the only mechanism that saves this case. Without it: SLT(arr[j+1]=3, arr[j]=0)=0 (wrong). With it: SLT(arr[j+1]=3, arr[j]=5)=1 (correct).
4. BEQ x0, x0 (unconditional branches — no data hazard, 2-cycle flush)
0x54: BEQ x0, x0, INNER ← always taken
0x58: BEQ x0, x0, OUTER ← always taken
No data hazard (x0 is hardwired 0), but the 2-instruction flush applies — the two instructions fetched after each branch are squashed when branch_taken=1.
| Version | Instructions | NOPs | Status |
|---|---|---|---|
| Initial working (NOP-padded) | 47 | 21 | Correct but inefficient |
| After forwarding analysis | 31 | 1 | 20 NOPs removed |
| After register file fix | 30 | 0 | Final — all hazards handled in hardware |
Each NOP removal was justified by tracing the exact pipeline timing and confirming that hardware forwarding or the HDU handles the hazard. The one NOP that remained between the two LW instructions was only removable after fixing the register file write-forwarding — a bug that only manifested on the very first loop iteration when registers held their reset value of 0.
bubble_sort_tb.v is a self-checking testbench with three independent check groups and a live execution monitor.
1. Assert reset for 4 cycles
2. Write DMEM directly: D_Memory[0..5] = {5, 3, 8, 1, 9, 2}
3. Release reset
4. Run for 2000 clock cycles (generous for 6-element sort)
5. Run check groups
6. Print summary and $finish
Watchdog: $finish after 500,000 ns to catch infinite loops
CHECK 1 — Sorted Array in Data Memory
Verifies all 6 memory words match expected sorted output:
PASS | mem[0] = 1 (exp 1) | arr[0]=1 (was 5)
PASS | mem[4] = 2 (exp 2) | arr[1]=2 (was 3)
PASS | mem[8] = 3 (exp 3) | arr[2]=3 (was 8)
PASS | mem[12] = 5 (exp 5) | arr[3]=5 (was 1)
PASS | mem[16] = 8 (exp 8) | arr[4]=8 (was 9)
PASS | mem[20] = 9 (exp 9) | arr[5]=9 (was 2)
CHECK 2 — Control Registers
Verifies loop termination state — confirms both loops ran the correct number of iterations:
PASS | x1 = 0 (exp 0) | array base, unchanged
PASS | x2 = 6 (exp 6) | n
PASS | x3 = 5 (exp 5) | i at loop exit = n-1
PASS | x11 = 4 (exp 4) | word size constant
PASS | x20 = 5 (exp 5) | outer limit, unchanged
CHECK 3 — SLT Result Validity
Confirms x9 (the SLT result used for swap decisions) is a clean 0 or 1, not X/Z. This would catch a broken forwarding path producing unknown values.
============================================================
RESULT: 12 PASSED 0 FAILED
>>> ALL PASSED — Bubble sort verified! <<<
============================================================
Target: Xilinx Artix-7 35T (xc7a35tcpg236-2) | Tool: Vivado 2025.1 | State: Fully Routed
| Parameter | Value | Notes |
|---|---|---|
| Clock Constraint | 100 MHz (10 ns) | User-defined |
| Timing Status | ✅ ALL MET | 0 failing paths |
| Setup Slack (WNS) | +0.906 ns | No setup violations |
| Hold Slack (WHS) | +0.092 ns | No hold violations |
| Failing Endpoints | 0 / 2,651 | Complete pass |
| Max Achievable Frequency | ~110 MHz | 1000 / (10 − 0.906) ns |
| Critical Path Delay | 9.094 ns | 12 logic levels |
Critical Path Breakdown:
The worst-case timing path runs through the branch resolution chain — the longest combinational path in any EX-stage-resolved branch design:
ID_EX_reg (rs2_out) → Forwarding Unit (ForwardB) → ALU input mux
→ ALU (32-bit subtraction + carry chain) → zero flag
→ Branch_Logic (branch_taken) → HDU (ID_EX_Flush)
→ ID_EX_reg (rs1_data_out D-input)
Logic delay: 2.608 ns (28.8%)
Routing delay: 6.446 ns (71.2%)
Total: 9.054 ns
The routing-dominated delay profile (71% routing vs 29% logic) is typical of a small, well-synthesized design on a sparsely-utilized Artix-7, where placement spread increases wire lengths.
| Resource | Used | Available | Utilization |
|---|---|---|---|
| Slice LUTs | 1,003 | 20,800 | 4.82% |
| LUT as Logic | 971 | 20,800 | 4.67% |
| LUT as Dist. RAM | 32 | 9,600 | 0.33% |
| Flip-Flops | 1,301 | 41,600 | 3.13% |
| FDCE (async reset) | 1,252 | — | — |
| FDRE (sync reset) | 49 | — | — |
| F7/F8 Muxes | 384 | 24,450 | 1.57% |
| CARRY4 | 27 | 8,150 | 0.33% |
| Block RAM | 0 | 50 | 0% |
| DSP Blocks | 0 | 90 | 0% |
| IOBs | 72 | 106 | 67.92% |
| BUFG (clock) | 1 | 32 | 3.13% |
Key observations:
- Only 5% of the Artix-7 35T is used — significant room for adding peripherals, extending the ISA, or instantiating multiple cores
- Zero BRAM — both instruction and data memories synthesize to distributed LUT RAM (
RAMS64Eprimitives). The 32RAMS64Ecells are the register file (one 64×1 bit cell per register bit, read via 256 MUXF7 + 128 MUXF8 cascade) - Zero DSP — all 32-bit arithmetic (add, subtract, SLT) synthesizes to LUT6 + CARRY4 chains
- 1,252 FDCEs are the pipeline registers — IF/ID, ID/EX, EX/MEM, MEM/WB all use asynchronous reset (FDCE), while the PC uses synchronous reset (FDRE)
| Component | Power (W) | % of Dynamic |
|---|---|---|
| Total On-Chip | 0.130 W | — |
| Dynamic | 0.059 W | 100% |
| I/O | 0.037 W | 63% |
| Signals | 0.012 W | 20% |
| Slice Logic | 0.006 W | 10% |
| Clocks | 0.005 W | 8% |
| Static (leakage) | 0.070 W | — |
Per-stage dynamic power:
| Stage | Power (W) | Dominant Consumer |
|---|---|---|
| ID_STAGE | 0.009 | Register file reads (0.008 W) |
| EX_STAGE | 0.005 | ALU + forwarding muxes |
| WB_STAGE | 0.004 | Writeback mux driving regfile |
| IF_STAGE | 0.003 | PC + IMEM + IF/ID register |
| MEM_STAGE | 0.001 | Distributed RAM access |
============================================================
Bubble Sort Testbench
Input: [5, 3, 8, 1, 9, 2]
Expected: [1, 2, 3, 5, 8, 9]
============================================================
[WB ] t=195ns x1 <= 0
[WB ] t=205ns x2 <= 6
[WB ] t=215ns x11 <= 4
[WB ] t=225ns x20 <= 5
[WB ] t=235ns x3 <= 0
[BRN] t=295ns branch_taken target=0x00000018
...
[BRN] t=... branch_taken target=0x00000060 ← halt loop
------------------------------------------------------------
CHECK 1: Data Memory (sorted array)
------------------------------------------------------------
PASS | mem[0] = 1 (exp 1) | arr[0]=1 (was 5)
PASS | mem[4] = 2 (exp 2) | arr[1]=2 (was 3)
PASS | mem[8] = 3 (exp 3) | arr[2]=3 (was 8)
PASS | mem[12] = 5 (exp 5) | arr[3]=5 (was 1)
PASS | mem[16] = 8 (exp 8) | arr[4]=8 (was 9)
PASS | mem[20] = 9 (exp 9) | arr[5]=9 (was 2)
------------------------------------------------------------
CHECK 2: Control Registers
------------------------------------------------------------
PASS | x1 = 0 (exp 0) | x1 = 0 (array base, unchanged)
PASS | x2 = 6 (exp 6) | x2 = 6 (n)
PASS | x3 = 5 (exp 5) | x3 = 5 (i at loop exit = n-1)
PASS | x11 = 4 (exp 4) | x11 = 4 (word size constant)
PASS | x20 = 5 (exp 5) | x20 = 5 (outer limit, unchanged)
------------------------------------------------------------
CHECK 3: x9 (SLT result) is 0 or 1
------------------------------------------------------------
PASS | x9 = 0 (valid)
============================================================
RESULT: 12 PASSED 0 FAILED
>>> ALL PASSED — Bubble sort verified! <<<
============================================================

