This repository contains an optimized hardware implementation of a Signed 8x8-bit Multiplier using Radix-4 Booth Encoding. The design features Sign Extension Elimination to reduce hardware complexity and includes input/output pipelining to ensure high-speed timing closure.
The design has been synthesized and verified in Xilinx Vivado, achieving a clock frequency of 100 MHz with significant timing slack (supporting theoretical speeds up to ~180 MHz).
- Radix-4 Booth Encoding: Reduces the partial product count from 8 to 4, halving the adder tree depth.
- Signed Arithmetic: Natively handles 2's complement numbers.
-
Sign Extension Elimination: Uses the "Inverse Sign" (
$\bar{S}$ ) and "Hot 1" matrix trick to avoid full 16-bit sign extension for every row, saving logic resources. -
Pipelined Architecture: A 2-stage pipeline (Input Register
$\to$ Logic$\to$ Output Register) ensures short critical paths. - Verified Performance: Clean timing reports with zero violations.
The design is modular, separating the encoding logic from the summation logic.
Block Hierarchy:
Top: The wrapper module handling clocking, reset, and I/O registration.DFF: Input pipeline stage to bufferAandBoperands.Partial_Products: Instantiates 4 parallel Booth Encoders to generate partial product rows.Partial_Products_Adder: An adder tree that sums them using the Sign Extension Elimination method.
Figure 1: Synthesis Hierarchy showing the modular structure.
Instead of processing one bit at a time, we process 3 bits of the multiplier (
| Multiplier Bits | Operation | Description |
|---|---|---|
| 000, 111 | Zero | |
| 001, 010 | Add Multiplicand | |
| 011 | Shift Left (x2) | |
| 100 | Shift Left, Invert, Add 1 | |
| 101, 110 | Invert, Add 1 |
Standard multiplication requires sign-extending every partial product to the final width (16 bits), which wastes adder bits. This design uses the optimization constants (
Figure 2: Radix-4 Dot Diagram illustrating the partial product alignment and sign extension strategy.
The design was verified using a self-checking testbench covering corner cases (Max Positive, Max Negative, Zero).
Waveform Analysis:
The simulation confirms correct signed multiplication with a 2-cycle latency (Input Reg
Figure 3: Post-Synthesis Verification. Note the correct handling of negative numbers (e.g., $10 \times -2 = -20$).
Test Cases:
$4 \times 5 = 20$ -
$10 \times -2 = -20$ (Mixed Sign) -
$-10 \times -10 = 100$ (Negative$\times$ Negative) -
$127 \times 127 = 16129$ (Max Positive) -
$-128 \times 1 = -128$ (Max Negative)
The design is lightweight and timing-optimized.
The design meets timing constraints with ease.
- WNS (Worst Negative Slack):
+4.565 ns - WHS (Worst Hold Slack):
+0.194 ns - WPWS (Pulse Width Slack):
+4.500 ns
Figure 4: Vivado Timing Summary confirming zero violations.
Estimated On-Chip Power is minimal (~0.087 W), making it suitable for low-power applications.
Figure 5: Power Estimation Report.