

# **Design of High-Speed Multiplier using Parallel Prefix Adder**

N Venkateswara Rao<sup>1</sup>, B Navya Surya Ratnam<sup>2</sup>, T Divya<sup>3</sup>, V Vinay Chandra<sup>4</sup>

<sup>1</sup>MTech(PHD), Associate Professor, Malla Reddy institute of tech and sci, Telangana, India <sup>2,3,4</sup>Electronics & communication engineering, Malla Reddy institute of tech and sci, Telangana, India \_\_\_\_\_\*\*\*\_

**Abstract** - A multiplier is one of the key hardware blocks in most digital signal processing (DSP) systems. The complexity of the circuit depends mainly on the multiplication count needed to develop the method. As the circuit complexity increases the delay in the system conjointly increases which reduces the performance. A parallel array multiplier is the solution to satisfy high speed requirements. A traditional Braun multiplier comprises of an array of 16AND gates and 12 Full Adders of 28 Transistors. A new Braun Multiplier replaces the final stage of Full Adders with Kogge-stone adder (KSA) for quicker multiplication results. Three KSA designs are proposed using logic gates 12T XOR, 14T XOR, 22T XOR. The prototypes proposed for traditional Braun multiplier and Braun multiplier with KSA are designed in the Mentor Graphics tool for 180nm and 130nm technologies with source 1.8V and 1.3V respectively.

Key Words: Digital Signal Processing (DSP), Full Adder, Braun Multiplier, Parallel Prefix Adder, Kogge-Stone Adder (KSA).

## **1. INTRODUCTION**

The advances in VLSI technology, each in terms of speed and size, made it attainable to implement parallel multipliers hardware. Technology development further ensures better performance characteristics and widespread use in DSP systems. It carries out operations such as accumulating the sum of multiple products much faster than a common microprocessor. The DSP architecture is designed to perform parallel operation and thus reduces the computational complexity and increase the speed required for repetitive signal processing in such applications[1]. These features are designed for higher speed and throughput in the programmable DSP. There are a large number of programmable DSPs to choose from for a given application, based on factors such as speed, throughput, arithmetic capability, precision, scale, cost and power consumption[2]. The introduction of single-chip multipliers and their incorporation into microprocessor architecture is the most important reason why commercial VLSI chips are available that are capable of DSP functions[3]. Parallel prefix adders were established as the most efficient binary add-on circuits. Their regular structure and speedy performance make them especially attractive to implement VLSI[4]. Product generation of numbers calls for a single processor cycle. Either a software-based shift and add algorithm or one

using micro-coded controllers used as popular multiplication schemes to implement the same algorithm in hardware. Both of these options require multiplication completion by several processor cycles. Kogge-stone Adder employs XOR and AND logic gates to design parallel prefix adders[5][6]. In this work, Braun Multiplier with Kogge-stone Adder is used for decreasing the delay.

## 2. CONVENTIONAL BRAUN MULTIPLIER

CMOS Braun multiplier consists of an array of AND gates and Full Adders. It consists of 16 AND gates and 12 Full Adders with 28 Transistors each. FA1 to FA12 are the 12 Full Adders.



Fig -1: Circuit diagram of 4 x 4 multiplier

Consider the multiplication if two unsigned number A and B. Let number A be represented using 'm' bits (Am-1 Am-2.....A0) in(1) and number B using 'n' bits (Bn-1 Bn-2.....B0) in (2). The multiplicand A, the multiplier B, and the product P are given by (3)

 $A = \sum A_i 2^j (1)$ 

 $B = \sum B_i 2^j (2)$ 

 $P = \sum \left[ \sum A_i B_i 2^{i+j} \right]$  (3)

and can have a maximum of (m+n) bits.

e-ISSN: 2395-0056 p-ISSN: 2395-0072

#### 3. SIMULATION RESULT OF CONVENTIONAL 4-BIT BRAUN MULTIPLIER

Fig. 2 represents the schematic of the 4-bit Braun multiplier with 4x4 input pins and 8 output pins. The test bench for the multiplier is shown in the fig. 3. For inputs  $(X_0 - X_3, Y_0 - Y_3)$  V<sub>pulse</sub> is given as 1.8V and 1.3V for 180nm and 130nm technologies respectively. The output pins are given from P<sub>0</sub> to P<sub>7</sub>. Fig. 4 represents the results of the traditional Braun multiplier. The traditional Braun multiplier consists of 432 transistors and it is found that the conventional Braun multiplier has an overall delay of 535.72 psec.



Fig -2: Schematic of Braun multiplier





Fig -4: Output waveforms of conventional Braun multiplier

Table I. shows the list of time period and pulse width given to the inputs of the multiplier in test bench.

| Table -1: Inputs for Conventional Braun Multiple | plier |
|--------------------------------------------------|-------|
|--------------------------------------------------|-------|

| Inputs | Pulse<br>Width(ns) | Time<br>period(ns) |
|--------|--------------------|--------------------|
| X0     | 50                 | 20                 |
| X1     | 100                | 40                 |
| X2     | 150                | 60                 |
| X3     | 200                | 80                 |
| Y0     | 250                | 100                |
| Y1     | 300                | 120                |
| Y2     | 350                | 140                |
| Y3     | 400                | 160                |

#### 4. KOGGE-STONE ADDER

Kogge-stone adder(KSA) is a parallel prefix adder. Carry calculation is as fast as carry generation time O(log n). Braun multiplier is designed to reduce area and increase its speed with Kogge-Stone adder compared to conventional Braun multiplier, which is used in most modern DSPs. In [6], 4-bit Braun Multiplier with KSA is implemented on FPGA using HDL Verilog which reduces delay. In the last stage, where its functionality is similar to RCA, the delay of the whole multiplier depends mostly on the delay of Full adder array. The RCA-produced Delay can be reduced by using one of the "KOGGE STONE ADDER" Parallel Prefix Adders[4].

Illustration of KSA using an example of 3 bit

A=011 and B = 100 are inputs

a) I Step: Pre processing

А.

Computation of generate and propagate

signals  $Pi = Ai \mathbf{XOR} Bi and Gi = Ai \mathbf{AND} Bi$ 

P3 = 0 XOR 1=1; P2 = 1 XOR 0 = 1; P1 = 1 XOR 0 = 1; G3 = 0 AND 1=0;

G2 = 1 AND 0 =0;

$$G1 = 1 \text{ AND } 0 = 0;$$

b) II Step: Carry look ahead network

Computation of carries corresponding to each bit Pi:j= Pi:k+1 **AND** Pk:j and Gi:j = Gi:k+1 **OR** (Pi:k+1 **AND** Gk:j)

P3:2 = P3 AND P2; 1 AND 1 = 1;

G3:2 = G3 OR (P3 AND G2); 0 OR (1 AND 0) = 0;

P2:1 = P2 AND P1; 1 AND 1 = 1;

G2:1 = G2 OR (P2 AND G1); 0 OR (1 AND 0) = 0;

P3:1 = P3:2 AND P1; 1 AND 1= 1;

G3:1 = G3:2 OR (P3:2 AND G1); 0 OR (1 AND 0) = 0;

c) III Step: Post processing Computation of sum bits Si = Pi **XOR** Ci-1

S1 = P1XOR 0; 1 XOR 0 =1;

S2 = P2 XOR G1; 1 XOR 0 = 1;

S3 = P3 XOR G2; 1 XOR 0 = 0;

Cout = G3:1; 0;

**Sum Bits** : Cout S3 S2 S1 : 0111;

B. 3-Bit Kogge Stone Adder with 12T

Fig. 5 is a 3-bit CMOS Kogge-stone adder. It is made of 6 XOR gates (6\*12), 10 AND gates and 5 OR gates with total of 510 transistors.



Fig -5: Schematic of 3-bit KSA





Fig -6: Schematic of 12T XOR





## C. Kogge-Stone Adder with 14T XOR gates

Fig. 5 shows the 3-bit KSA used. Fig. 8 is 14T XOR gate design and KSA output results are shown in Fig. 9. 14T KSA requires 6 XOR gates (6\*14), 10 AND gates and 5 OR gates.



Fig -9: Waveforms of KSA with 14T XOR gate

#### D. Kogge-Stone Adder with 22T XOR gates

Fig. 10 shows the schematic of 22T XOR gate used in KSA. Fig. 11 shows the output waveforms of the 22T XOR gate KSA.



e-ISSN: 2395-0056 p-ISSN: 2395-0072



Fig -10: Schematic of 22T XOR gate



Fig -11: Waveforms of KSA with 22T XOR gate

## E. Comparison of Results of KSA

For KSA with 14 T XOR design delay, the power requirement and the transistor count is lower compared to the 22 T XOR KSA design. Table II shows a comparison between 12T, 14T, and 22T KSA.

| Га | hle | -2:  | Com   | narison | Results | of KSA  |
|----|-----|------|-------|---------|---------|---------|
| 10 | DIC | - 4. | COIII | parison | nesuits | 01 13/1 |

| KSA     | Delay<br>(180nm) | Delay<br>(130nm) | Transistor count |
|---------|------------------|------------------|------------------|
| 22T-XOR | 311.50ps         | 101.09ps         | 222              |
| 14T-XOR | 21.285ns         | 152.91ps         | 174              |
| 12-XOR  | 21.363ns         | 21.03ns          | 162              |

## 5. SIMULATION RESULTS OF BRAUN MULTIPLIER WITH KOGGE-STONE ADDER

The proposed multiplier's block diagram makes use of a three bit KSA in 4th stage of Braun multiplier. This proposed multiplier is implemented using Mentor graphics 180nm technology. To design 4 bit (n bit)Braun, it needs 3bit (n-1 bit) KSA. KSA is designed with 22T XOR, 14T XOR and 12T XOR gates. The circuit shown in the Fig. 12 is modified Braun multiplier with  $4 \times 4$  input and 8 output pins. of 8 outputs pins, P<sub>4</sub>-P<sub>7</sub> pins cause more delay in the output. Because delay is mainly due to these bits, this stage is replaced with KSA. Not only P<sub>4</sub>-P<sub>7</sub> pins stage but also other stages can be designed using KSA. But the area increases and system defects create problem for a couple of stages.

The circuit shown in the Fig. 13 is test bench for proposed Braun multiplier made using CMOS technology. The output of Braun multiplier with KSA is represented in Fig. 13. Table III represents the comparison between conventional Braun multiplier and the new design.



Fig -12: Schematic of proposed Braun multiplier



Fig -13: Test bench of proposed Braun multiplier



Fig -14: Output of Proposed Braun multiplier with KSA of 12T XOR

| 4-bit Braun<br>multiplier                 | Delay<br>(180nm) | Delay (130nm) |
|-------------------------------------------|------------------|---------------|
| Conventional<br>Design                    | 535.72ps         | 221.32ps      |
| Proposed design<br>with KSA of 22T<br>XOR | 134.72ps         | 124.48ps      |
| Proposed design<br>with KSA of 14T<br>XOR | 226.63ps         | 124.48ps      |
| Proposed design<br>with KSA of 12T<br>XOR | 236.71ps         | 124.48ps      |

 Table -3: Comparison of Conventional and Proposed

 designs of Braun Multiplier

### **6. CONCLUSION**

A conventional Braun multiplier and Braun multiplier with new design is designed and compared in terms of delay. Conventional Braun multiplier is designed with Full adder which consists of 28 transistors and AND gates comprising of 6 Transistors. The new design consists of 16 AND gates, 9 Full adders, one Kogge-stone adder (KSA). The KSA is designed with 3 models those are with 22T, 14T, 12T XOR gates. The delay reduced from Conventional Braun multiplier to new design of Braun multiplier with 22T KSA is 401ns, 309.09ns for Braun multiplier with 14T KSA and 299.01ns for Braun multiplier with 12T for 180nm and 96.84ns for 130nm technology.

## REFERENCES

- [1] Kogge, P.M., and Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations", IEEE Transactions on Computers. Vol.C-22, No.8., August 1973.
- [2] Avatar singh, S.Srinivasan, "Digital signal processing", HANDBOOK, 2004 First edition.
- [3] Y.Choi, "Parallel Prefix Adder Design"Proc. 17th IEEE Symposium on Computer Arithmetic, pp 90-98, 27th June,2005.
- [4] K. Vitoroulis and A. J. Al-Khalili, "Performance of Parallel Prefix Adders Implemented with FPGA technology," IEEE Northeast Workshop on Circuits and Systems, pp. 498-501, Aug. 2007.
- [5] A. Raju and S. K. Sa, "Design and performance analysis of multipliers using Kogge Stone Adder," 2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology



(iCATccT), Tumkur, 2017, pp. 94-99.

- [6] Ms. Madhu Thakur, Prof. Javed Ashraf, "Design of Braun Multiplier with Kogge Stone Adder & It's Implementation on FPGA", International Journal of Scientific & Engineering Research, Volume 3, Issue 10, October-2012.
- [7] V. Nafeez, M. V. Nikitha and M. P. Sunil, "A novel ultralow power and PDP 8T full adder design using bias voltage," 2017 2nd International Conference for Convergence in Technology (I2CT), Mumbai, 2017, pp. 1069-1073.
- [8] Dhyanendra Singh Chandel, Sachin Bandewar, Anand Kumar Singh" Low Power 10T XOR based 1 Bit Full Adder" International Journal of Computer Applications (0975 – 8887), Volume 121 – No.1, July 2015.
- [9] S. Shaik, K. S. R. Krishna and R. Vaddi, "Tunnel transistors with circuit co-design in designing reliable logic gates for energy efficient computing," 2015 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), Hyderabad, 2015, pp. 83-88.