A SPECULATIVE APPROXIMATE ADDER FOR ERROR RECOVERY UNIT

S.Sireesha1, Dr.T. Lalith Kumar2, T.Vijaya Nirmala3

1PG student, Dept. of ECE, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India
2professor, Dept. of ECE, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India
3Asst.professor, Dept. of ECE, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

Abstract - In this paper, a low delay consumption block-based carry speculative approximate adder is expected. Its design is based on partitioning the adder into some non-overlapped summation blocks whose designs may be selected from both the carry propagate and parallel-prefix adders. Here, the carry output of each block is speculated based on the input operands of the block itself and those of the next block. In this adder, the length of the carry chain is reduced to two blocks (worst case), where in most cases only one block is employed to calculate the carry output leading to a lower average delay. In addition, to reduce the output error rate an lower delay is also achieved through error detection and recovery mechanism.

Key Words: Approximate computing, Low delay, Speculative Adder.

I. INTRODUCTION

In stream digital networks, one of the code restraint is the thermal style power (TSP) could maximize the concert of digital networks. One of the approaches will aid to gain the most out of this restraint in the usage of the accurate computing approaches. It may be usaged for application arena such as multimedia and image processing, digital signal processing, wireless communication, machine learning and data mining which are naturally error-resilient. The approach may be used to realize more energy reduction and/or concert at the charge of some accuracy loss. In current years, different accurate computing approaches at various applications for software/hardware levels have been schedule. Illustration includes thread fusion and tunable kernels, accurate accelerator, impresio logic/arithmetic unit and approximate instruction set architecture (ISA). In this task, we deal with approximate adders which are utilized as the staple operator in Concerting of the arithemitc operations such as substraction, multiplication and division. Approximate adders have been collected many regard by the designers. In the state-of-the-art approximate adders, where most of them are based on the carry propagate designs, the energy and speed gains have been attained by hardware manipulation, logic simplification, and voltage over scaling. While some of the adders were based on a configurable output accuracy others had a fixed accuracy level. The accuracy configurability enforced few overheads in terms of delay, area and power which could maximized their usage in some applications where such reconfigurability is not required. In this paper, we propose a high performance yet low delay block-based carry speculative approximate adder structure which is called BCSA adder. In this design, the adder is partitioned into some non-overlapped parallel blocks, which in the worst-case, the carry output of a block is dependent on the carry output of the preceding block. To deduce the critical path more, we suggest an technique to foretell the carry output of a block based on its signals as well as of the succeeding block. The provision has a low hardware complication dominant a high delay (on average, about one block) and a rather high quality. To reach a lower error rate, an error detection and recovery mechanism is introduced, which propose the high output error rate. The effectiveness of this adder is compared with some of the state-of-the-art approximate adders. Finally, the efficiency of the adder studied using two image processing applications. The rest of the paper is unionized as follows.

II. RELATED WORKS

In this section, some of the earlier works in the area of the approximate adders are shortly reviewed.

A High Accuracy Block-based Approximate adder (HABA) was suggested in [9]. This adder proposed an error correction unit, which operated based on using the generate signal to decrease the accuracy loss. Although the power consumption and error rate were low, the vital path delay was even high due to its ripple carry propagation pattern. Reconfigurable approximate carry look-ahead adder (RAP-CLA), which is an approximate adder produced based on the exact carry look ahead adder, was anticipated [6]. This adder was able to swap between the exact and approximate operating styles during the runtime. In RAP-CLA, fixed-size intersection sub-blocks (windows) were used to calculate the carry output and sum bits. In [13] an accuracy surely degrading adder (GDA) was proposed which used \( \lceil n/2 \rceil \) bits blocks to calculate the output. GDA utilized multiplexers between sub-blocks to select between the correct and approximate input carries. The proposed error correction unit is a sequential circuit, which depending on the number of sub-adders, takes various cycles for correcting the outputs. In [13] an error liberal adder (ETA-I), where the add operation was divided into two independent parts such that the most important sum bits were calculated by exact FAs while the least compelling sum bits were generated by the XOR gates, was
suggested. For the least significant part, modified XOR gates were used. Although this adder reduced the power consumption, the error rate/distance was large.

III. INTERNAL DESIGN OF THE PROPOSED ADDER

The general design of an n-bit speculative approximate adder improved by a carry predictor unit is pictured in Fig1. The add operation is executed by \([n/l]\)-bit summation blocks working in parallel where \(l\) is bit length of each summation block. Each summation block admit an l-bit subadder, a carry predict unit, and a select unit. In this arrangement, the carry input of the \(i\)-th subadder, is chosen by \((i-1)\)-th Select unit from the carry signal generated by the \((i-1)\)-th Carry Predictor unit and the one generated by the \((i-1)\)-th sub-adder. Choosing the carry output of the Carry Predictor unit leads to a shorter critical path and lower energy consumption. In this case, the output between the blocks are cut at the cost of some accuracy loss. Thus, the accuracy of the add firm depends on the accuracy of the Carry Predictor unit, and also, the scheme of the carry output signal option. In our expected arrangement, the worst-case is the length of a carry chain which is equal to two blocks (i.e.2). In most of the state-of-the-art approximate adders, the carry input of each block is selected only by the basics of the input signals of preceding blocks (see, e.g., [10][11]). In this task, however, we introduced a speculative approximate adder that the carry input of the sub-adder of the \(i\)-th summation block analyzing that the carry input of the block is zero. Thus, in the worst case, the carry is propagated through two blocks (generated in the first bit place of the \(i\)-th block and propagated in the \((i+1)\)-th blocks). Now, in this task, we propose to determine the select \((\text{Sel})\) and carry input \((C_{\text{prdt}})\) and \(C_{\text{exac}}\) using

\[
\text{Sel}^i = K^{i+1} + G_i \quad \text{(8)}
\]

\[
C_{\text{prdt}} = G_i \quad \text{(9)}
\]

\[
C_{\text{add}} = P_{h1}G_i + P_{l1}P_{l2}G_{i+1} + \ldots + P_{k1}P_{h1}G_{i+k} \quad \text{(10)}
\]

Where the \(K^{i+1}\) is the kill signal of the first bit position of the \((i+1)\)-th block (i.e. \(a_{i+1}b_{i+1}\)), \(G_i\) is the generate signal of the last bit position of the \(i\)-th block (i.e. \(a_{i}b_{i}\)) and \(P_{hi}\) is the generate signal of the last bit position of the \(i\)-th block (i.e. \(a_{i}b_{i}\)). Based on the design of the expected model is depicted in Fig 2, where the carry propagate and parallel prefix adder designs (e.g. CLA, and RCA) could be covered for the sub-adders. Based on the carry output of the \(i\)-th block is determined under four cases where, for each of them, either or is chosen. In the proposed approach, we focus on reducing the error as much as likely by selecting a precise carry input on the later block. Therefore, in each of these events, the carry output is selected with the highest possible accuracy. Thus, in the following paragraphs, we discuss the idea behind the carry output selection for the four cases of

- In the first case \((K^{i+1} = 0 \text{ and } G_i = 0)\), since \(G_i = 0\), to reduce the probability of the error propagation in the proposed adder, in this case, the select unit circuit, chooses the \(C_{\text{add}}\) whose error probability is smaller than \(C_{\text{prdt}}\).

Fig 2: The structure of the proposed adder with Error Recovery Unit (ERU).

- In the second case \((K^{i+1} = 0 \text{ and } G_i = 1)\), because \(G_i = 1\), the speculative carry \((C_{\text{prdt}})\) is correct. Therefore, for this case, the \(C_{\text{prdt}}\) is selected as the \(C_{\text{add}}\).

- In the third case \((K^{i+1} = 1 \text{ and } G_i = 0)\), because \(G_i = 0\), independent from the accuracy of the carry input of the \((i+1)\)-th block, for shortening the critical path, we suggest to select \(C_{\text{prdt}}\) as the carry output of the \(i\)-th block.

- In the fourth case \((K^{i+1} = 1 \text{ and } G_i = 1)\), similar to the second case, since \(G_i = 1\), the predicted output of the block. Therefore, in the proposed approach, \(C_{\text{prdt}}\) is selected as the \(C_{\text{add}}\).

Between these cases, only in the first case, the carry is propagated in two blocks. Therefore, on average, the length of the carry propagation is close to one block.
In the third case, although the carry input of the block is killed and is not propagated, the carry input is employed to determine the first summation bit of the block. Therefore, if the carry input in this case is wrong, it impacts on the output accuracy of the summation. Hence, for improving the accuracy of the proposed adder, we suggest an error recovery unit which generates the first summation bit of the $i^{th}$ block ($S_0^i$) by

$$S_0^{i+1} = (K_i^{i+1} \cdot C_{add}^i) + (P_0^{i+1} \land C_{in}^{i+1})$$  \hspace{1cm} (11)$$

Note that $P_0^{i+1} \land C_{in}^{i+1}$ is the approximate summation output in the first bit of the $(i+1)^{th}$ block denoted as $AS_0^{i+1}$ in Fig. 2. Since the ERU is not on the critical path of adder, using the Error Recovery Unit (ERU) track to developing the accuracy with increasing the delay of the proposed adder structure. Fig. 3 shows the functionality of the proposed speculative approximate adder with and without the ERU. The error has been reduced by the error reduction unit. The ERU imposes only about 3% and 2% delay and area.

IV. RESULTS AND DISCUSSION

A. Error Metrics Evaluation

As early discourse, the Select and Carry Predictor units determine the accuracy of the approximate adders. In this section, the accuracy of the proposed adder is evaluated compared to the four latest state-of-the-art approximate adders. These adders include RAP-CLA, HABA and the BCSA. HABA equipped with the ERU unit proposed in (HABAERU). Note that the ERU circuit in HABAERU is different from the one we suggested for BSCAERU. For this study, three error metrics have been considered including Error Rate (ER), Normalized Mean Error Distance (NMED), and Mean Relative Error Distance (MRED). The NMED and MRED for an $n$-bit adder are obtained by

$$NMED = \frac{1}{2^n} \sum_{i=1}^{[N]} \frac{|S_i - S'_i|}{2^{[N]}} \hspace{1cm} (12)$$

$$MRED = \frac{1}{[N]} \sum_{i=1}^{[N]} \frac{|S_i - S'_i|}{S_i} \hspace{1cm} (13)$$

where $[N]$ shows the number of the input data samples, and $S_i$ ($S'_i$) is the true output of the add operation. The ER, NMED, and MRED of the considered adders under various block sizes and input operand widths are recorded in TABLE I. While only the error prosody for the cases of block sizes of 2, 4 and 8 are reported in this table, we have performed this study for the block sizes from 8 and 16-bit.

These metrics have been extracted by applying 65,536 (10 million) reliable odd numbers in the case of 8-bit adders. As the results show, in the case of 8-bit adder, the ER, NMED, and MRED of the BCSAERU is 0 meaning that BCSAERU is perfect. Among the deliberate adders, BCSAERU and HABA have the lowest ER compared to the other ones. On average, the ER of the BCSA is about 80% larger than HABA. On the other hand, the NMED and MRED of the BCSAERU (BCSA) are, on average, about 87% (52%) and 86% (40%) smaller than those of the other studies adders. In improver, for all the adders, by increasing the block size, the accuracy increases.

B. Design Parameters Evaluation

The hardware of BCSA, BCSAERU, HABA and RAP-CLA were described by Verilog HDL and synthesized by Synopsys Design Compiler. All the studies in this task have been performed using the typical process of the 15nm Fin FET Nan Gate technology with the operating voltage level of 0.8V and at the temperature of 25°C. The design
parameters of each adder under different block sizes have been reported in these games where the related block sizes are provided inside the circle. For extracting the power consumption (to obtain the energy consumption), up to 10M random stimuli were injected to the input of the netlist of the synthesized adders and the activity of the internal nodes of them were logged in the VCD format.

<table>
<thead>
<tr>
<th>ADDER TYPE</th>
<th>BLOCK SIZE</th>
<th>T(n)</th>
<th>T(16-bit)</th>
</tr>
</thead>
<tbody>
<tr>
<td>XARA</td>
<td>2</td>
<td>1.37</td>
<td>1.36</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>1.37</td>
<td>1.36</td>
</tr>
<tr>
<td>RAP-CLA</td>
<td>2</td>
<td>2.34</td>
<td>2.46</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2.34</td>
<td>2.46</td>
</tr>
<tr>
<td>BCSA</td>
<td>2</td>
<td>2.74</td>
<td>2.86</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>2.74</td>
<td>2.86</td>
</tr>
<tr>
<td>BCSA-ARU</td>
<td>2</td>
<td>3.14</td>
<td>3.26</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>3.14</td>
<td>3.26</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>ADDER TYPE</th>
<th>T(n)</th>
<th>T(16-bit)</th>
</tr>
</thead>
<tbody>
<tr>
<td>XARA</td>
<td>1.37</td>
<td>1.36</td>
</tr>
<tr>
<td>RAP-CLA</td>
<td>2.34</td>
<td>2.46</td>
</tr>
<tr>
<td>BCSA</td>
<td>2.74</td>
<td>2.86</td>
</tr>
<tr>
<td>BCSA-ARU</td>
<td>3.14</td>
<td>3.26</td>
</tr>
</tbody>
</table>

TABLE 1: comparison of accuracy results for 8-bits and 16-bits.

V. CONCLUSIONS

In this paper, we proposed a block-based carry speculative approximate adder (BCSA), which was based on dividing an exact adder into some non-overlapped blocks operated in parallel. Each block may be composed of any desired type of adders. In this adder, the length of carry chain was reduced to was utilized to estimate the carry. A select logic was recommended to guess the carry input of apiece block founded on some input operand bits of the current and next block. In addition, o reduce the error and delay with the declination of the accuracy loss, an error detection and recovery mechanism was suggested. Based on the results, for the different approximate performing styles, BCSAERU display on approximate adder.

REFERENCES