Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm with Parallel Self-Time Adder

Sareddy. Sindhuja Reddy¹, CH. Bhanu Prakash²

¹Student, Master of Technology, Dept of Electronics and Communication Engineering Malla Reddy Engineering college(Autonomous), Telangana, India.
²Associate Professor, Dept of Electronics and Communication Engineering Malla Reddy Engineering college(Autonomous), Telangana, India.

Abstract- Multiplier is one of the most desirable components in DSP processors, Fast Fourier Transform Units and Arithmetic Logic Units. In this paper novel method for multiplier and accumulator(MAC) is proposed based on PASTA. Modified booth algorithm produces less delay in comparison with a regular multiplication process, and it also moderates the number of partial products. The major purpose of designing is to reduce the circuit complexity, power consumption and no loss of information. We also proposed a CSA design from the conventional system (modified booth algorithm) which exhibits high performance regarding computation, power consumption, and area. Area, delay and power complexities of the resulting design is reported. The proposed MAC design with PASTA shows better performance compare to the conventional method and has advantages of reduced area overhead and critical path delay. The results are simulated and synthesized using Xilinx ISE simulator.

Keywords: Multiplier and accumulator (MAC), modified booth algorithm (MBA), Carry Save adder(CSA).

I. INTRODUCTION

The propel improvement in the field of microelectronic makes it proficiently to utilize input energy to scramble the information and to exchange the information speed. In vast numbers of these abilities are produced given low power utilization keeping in mind the end goal to meet the well-liked applications. The multiplier is an extremely fundamental number juggling sensible unit and is utilized mostly in circuits. Convolution, sifting and inward items are the input procedures of computerized flag preparing which utilize the MAC application. Discrete wavelet transform or discrete cosine transform is the broadly used DSP methods which are not linear functions in nature. This is because they are principally done by repetitive application of addition and multiplication which determine the execution performance and speed of the entire calculation. The modified booth's algorithm (MBA) is usually used for high-speed multiplication.

Power dispersal is perceived as a essential parameter in present-day VLSI configuration field. To fulfill MOORE'S law and to deliver customer hardware products with more reinforcement and less weight, low power VLSI configuration is vital.

Quick multipliers are fundamental parts of computerized flag preparing frameworks. The speed of duplicate operation is of extraordinary significance in advanced flag handling also in the universally useful processors today, mainly since the media preparing took off. In the past duplication was for the most part executed using an arrangement of expansion, subtraction, and move operations. Growth can be considered as a progression of rehashed increases. The number to be included is the multiplicand, the quantity of times that it is incorporated in the multiplier, and the outcome is the result. Each progression of expansion creates a halfway item. In many PCs, the operand typically contains a similar number of bits. At the point when the operands are translated as whole numbers, the item is, for the most part, double the length of operands with a specific end goal to save the data content. This rehashed expansion strategy that is recommended by the number juggling definition is moderate that it is quite often supplanted by a calculation that makes utilization of positional portrayal. It is conceivable to deteriorate multipliers into two sections. The initial segment is committed to the age of fractional items, and the second one gathers and includes them.

The fundamental increase guideline is two overlap, i.e., assessment of fractional items and gathering of the moved incomplete items. It is performed by the progressive augmentations of the segments of the moved halfway item framework. The 'multiplier' is effectively moved and entryways the proper piece of the 'multiplicand.' The deferred, gated occurrence of the multiplicand should all be in a similar segment of the moved fractional item network.
They are then added to frame the item bit for the specific shape. Duplication is along these lines a multi-operand operation. To stretch out the duplication to both marked and unsigned numbers, an advantageous number framework would be the portrayal of numbers in two's complement form.

The MAC (Multiplier and Accumulator Unit) is utilized for picture handling and computerized flag preparing (DSP) in a DSP processor.

II. LITERATURE SURVEY

In an attempt to improve the speed of signal processing VLSI systems, a new architecture for high-speed Multiply Accumulate Units is proposed. The structural design is based on Binary trees constructed using 4-2 compressor circuits. Increasing the speed of operation is achieved by taking advantage of the available free input lines of the 4-2 compressors, which result from the parallelogram shape of the generated partial products, and using the bits of the accumulated value to fill in these gaps. This outcome in merging the accumulation operation within the multiplication process. An 8-bit Multiplier Accumulator prototype circuit using the proposed architecture is prototyped in 0.35-micron double metal CMOS technology and simulated using hspice. Simulation results at 3.3 V show that the proposed design has a delay of 4.26 ns with a 16.8 delay savings. At 150 MHz operating frequency, the power consumption is 324 miliWatts with a 23.04% power saving compared to other architectures not using the merging technique. Ayman A. Fayed, The Center for Advanced Computer Studies, the University of Louisiana at Lafayette, 70504 4330, USA Magdy A. Bayoumi, The Center for Advanced Computer Studies, University of Louisiana.

Adders are the main parts of processing circuits and play a vital role in all mathematical operations like subtraction, multiplication, division, etc. Carry Look ahead Adder (CLA) is one of the fastest adder structures that is widely used in the processing circuits. In this article, a new structure for adder is proposed. The proposed structure has extremely smaller on-chip area and delay and also it has lower power consumption. Using the proposed structure, a 64-bit adder is designed, and results are presented. The circuit is designed in TSMC 0.18μm CMOS technology with 1.8v power supply and simulated with HSPICE. Karami H. Fatemeh, Isfahan University of Technology, Isfahan 84156-83111, Iran Ali K. Horestani School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA 5005, Australia.

III. CONVENTIONAL SYSTEM

Modified Booth Algorithm:

It is a dominant algorithm for signed-number multiplication, which treats both positive and negative numbers uniformly. Multiplication consists of three steps 1) the initial pace to create the halfway items; 2) the second means to include the formed incomplete objects until the point when the last two last two columns remain; 3) the third means to process the last augmentation comes about by including the last two lines.

The number of partial products are significantly reduced in the initial step. We used the changed Booth encoding (MBE) conspire. It is known as the most proficient Booth encoding and interpreting plan. To multiply X by Y utilizing the adjusted Booth calculation begins from gathering Y by three bits and encoding into one of {-2, -1, 0, 1, 2}. Table I demonstrates the guidelines to create the encoded motions by MBE plan and Fig. 1 (a) explains the relating rationale chart. The Booth decoder creates the fractional items utilizing the encoded motions as appeared in Fig. 1(b).

Table 1: TRUTH TABLE OF MBE SCHEME

<table>
<thead>
<tr>
<th>h1a</th>
<th>h1b</th>
<th>h2a</th>
<th>h2b</th>
<th>value</th>
<th>X1</th>
<th>X2</th>
<th>Z</th>
<th>Neg</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>-2</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>-1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>-1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

The new MBE recorder [2] is designed according to the following analysis. Table (1) displays reality table of the new encoding plan.

Figure. 1. The Encoder and Decoder for the new MBE scheme. (a) Simple encoder (b) Decoder.
Fig. 2 demonstrates the created fractional items and sign extension plan of the 8-bit adjusted Booth multiplier. The fractional items produced by the adjusted Booth calculation are included parallel utilizing the Wallace tree until the point that the last two lines remain. The final augmentation comes about are created by including the last two lines.

CSA DESIGN:

Carry Save adder.

In Carry Save Adder (CSA), three bits are added parallel at a time. In this scheme, the carry is not propagated through the stages. Instead, carry is stored in the present phase, and updated as addend value in the next stage. Hence, the delay due to the carry is reduced in this scheme. The architecture of CSA is shown in Fig3.

IV. PROPOSED SYSTEM

DESIGN OF PASTA

In this section, the architecture and theory behind PASTA is presented. The adder first accepts two input operands to perform half
additions for each bit. Subsequently, it iterates using earlier generated carry and sums to perform half-additions repeatedly until all carry bits are consumed and settled at zero level.

A. Architecture of PASTA
The general architecture of the adder is shown in Fig. 7. The selection input for two-input multiplexers corresponds to the Req handshake signal and will be a single 0 to 1 transition denoted by SEL. It will initially select the actual operands during SEL = 0 and will switch to feedback/carry paths for subsequent iterations using SEL = 1. The feedback path from the HAs enables the multiple iterations to continue until the completion when all carry signals will assume zero values.

B. State Diagrams
In Fig. 8, two state outlines are drawn for the underlying stage and the iterative period of the proposed engineering. Each state is spoken to by (Ci+1 Si) match where Ci+1, Si speak to complete and aggregate esteem, individually, from the ith bit viper piece. Amid the underlying stage, the circuit just fills in as a combinational HA working in central mode. It is evident that because of the utilization of HAs rather than FAs, state (11) can’t show up.

V. SYNTHESIS AND SIMULATION RESULTS
For designing the multiplier and accumulator parallel self time adder was used. In this section, first, we will see the synthesis and simulation of the MAC using modified booth algorithm with parallel self time adder. They are designed on Xilinx ISE 14.7 with Verilog HDL. The RTL schematics and simulation results of the proposed design are shown below.
Comparison of the CSA and PASTA results are shown in Table III. They vary in area and delay which shows that there is a decrease in them and the performance is more better in PASTA.

Table III Results

<table>
<thead>
<tr>
<th></th>
<th>AREA</th>
<th>DELAY (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>SLICES</td>
<td>LUTS</td>
</tr>
<tr>
<td>CSA</td>
<td>95</td>
<td>170</td>
</tr>
<tr>
<td>PASTA</td>
<td>42</td>
<td>83</td>
</tr>
</tbody>
</table>

VI. CONCLUSION

In this paper, another MAC essential arrangement is proposed. The proposed procedure for PASTA has less combinational route delay exactly when appeared differently about the existing system. As pasta has utilized the range was lessened, and the deferral is likewise diminished that shows elite the proposed plan of MAC was executed and mixed through Xilinx ISE gadget. The proposed arrangement can be used efficiently where we require a quick of operations, for instance, DSP. The results are compared with CSA and PASTA. The results of pasta are better in performance.

REFERENCES


