

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET

**Research** Paper



# Design of high speed low power optimized square root BK adder

Ranjith B Gowda<sup>1</sup>\*, R M Banakar<sup>1</sup>

<sup>1</sup> Department of Electronics & Communication Engineering B.V.B. College of Engineering & TechnologyHubli, India \*Corresponding author E-mail: ranjithgowda789@gmail.com

## Abstract

Adder is a basic building block in almost all the digital circuits used in todays digital world. Adders are used for address calculation, incrementing operation, table indices calculations and many other operations in digital processors. These operations require fast adders with reasonable design cost. Ripple carry adder (RCA) is the cheapest and most straight forward design but takes more computation time. For high speed applications Carry Look-ahead Adder (CLA) is preferred, but it has the limitation of increase in the total area of the design. Hence an adder which compromise between these two regarding area and power is Carry Select Adder (CSA). Parallel prefix adders are used to obtain quick results. In this course work, a new methodology to Modified Square Root Brent Kung adder (MSR-BK-A) is proposed to design an optimized adder and to calculate various performance parameters like area, power and delay for square root adder designs. By optimizing the structure of Binary-to-Excess-1 converter(BEC) and using it in Square Root BK adder, the power and delay can be reduced with a trade of in area. The simulated results conclude that, the MSR-BK-A with Modified BEC gives better performance in terms of power and delay. These designs have been simulated, verified and synthesized using Xilinx ISE 14.7 tool.

Keywords: Regular Linear Brent Kung Adder (RL-BK-A); Modified Linear Brent Kung Adder (M-RL-BK-A); Regular Square Root Brent Kung Adder (RSR-BK-A); Modified Square Root Brent Kung Adder (MSR-BK-A); New Optimized Square Root Brent Kung Adder (N-OSR-BK-A).

# 1. Introduction

One of the most commonly used logic operation in digital logic circuit is addition of binary numbers. An adder is a combinational digital logic circuit designed to perform an addition operation. In modern digital VLSI design, reducing the computation time of arithmetic operation is the field of interest in new digital signal processors and general purpose processors. Adders are not only a part of ALU unit, but it can be used to generate table indices, memory address calculations and many other similar operations. Designing a low power, high performance adder is a challenging task in the VLSI design systems. DSP processors execute various operations like filtering, signal processing, vector calculations, matrix reduction etc., These operations require intensive FFT computations. Millions and Billions of operations has to be performed per second using adders. Hence, performance of adder determines the overall performance of the architecture.

Designing an high efficient adder is one of the main interesting field of investigation. When high performances are necessary parallel prefix adders are the best choice for designing [1]. Due to their regular structure, they are commonly preferred for the design of an efficient adders in VLSI implementation. Delay of these adders are directly related to the number of stages used in the design. These adders are even suitable for wide length input data streams and are efficient binary adders. Parallel prefix adders are best suitable as compared to RCA and CLA, which uses less silicon area, path delay and are suitable for many DSP applications, where operational speed is the main issue. A CSA has been designed with parallel prefix adders to achieve high performance adder design. In CSA the result of the addition is computed in advance by taking two conditions for input carry. Two separate adders are used, which performs addition operation independently with respect to input carry, i.e. one addition with zero input carry and second addition with input carry of one. Finally when the actual input carry is known from the previous stage, one result is simply selected from the previously computed addition using multiplexer unit. Conventional CSA [2] are designed using RCA combination. Here BK adder performs the addition when input carry = 0 and RCA performs addition when input carry = 1. As this architecture consumes more area, BEC is introduced to perform the operation of RCA in the next design called M-RL-BK adder. A variable length BK adders with RCA are then designed to increase performance [3] by reducing the delay and power consumption in RSR-BK adders. By replacing variable length RCA with BEC a new structure called MSR-BK adder has been obtained. In the proposed architecture modified BEC structure is used for further improvement in delay and power utilization with a small area trade of.

The organization of the paper is as follows: Section 1.2 gives the basics of parallel prefix adders, Brent Kung adders and its variants are discussed in section 1.3, Section 1.4 describes the design of proposed architecture, Section 1.5 gives the simulation and comparison results of square root adders and section 1.6 concludes.

# 2. Concepts of parallel prefix adders

Parallel-prefix-adders [4] are the designers choice for high speed addition operation as they have flexible structure. There are many ways to obtain the structure of prefix adders, either using CLA [5] or by using tree structure [6] to increase the arithmetic operational speed. In parallel prefix adders addition operation is indicated in terms of carry input signal ( $c_i$ ), carry signal generated ( $g_i$ ) and carry



Copyright © 2018 Ranjith B Gowda, R M Banakar. This is an open access article distributed under the <u>Creative Commons Attribution License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

propagate signal  $(p_i)$  at each bit position [7]. All these parameters are related by the following equations:

 $g_i = a_i \text{ and } b_i$  $p_i = a_i \text{ xor } b_i$ 

 $c_i = g_i \text{ or } (p_i \text{ and } c_{i-1})$ 

 $s_i = p_i \; xor \; c_{i-1}$ 

Parallel prefix adders use three basic units:

- 1) Pre-processing unit
- 2) Carry generation unit
- 3) Post-processing unit

Parallel-prefix-adders finds the summation using these three units as shown in the below diagram.



Fig. 1.1: Structure of Parallel Prefix Adder.

#### 1.1. Pre-processing unit

In this first unit a generate (g<sub>i</sub>) and propagate (p<sub>i</sub>) signals are evaluated. They are calculated using the following equations

 $g_i = a_i and b_i$ 

 $p_i = a_i \ xor \ b_i$ 

#### 1.2. Carry generation unit

Here carry corresponding to each bit will be generated in parallel. This stage uses previously computed generate and propagate signals for the generation of carry signals.

#### 1.3. Post-processing unit

This is the last stage in parallel prefix adder, this stage computes the final sum and carry part of the result, they are given by

 $c_{i+1} = (p_i \mbox{ and } c_i \mbox{ }) \mbox{ or } g_i$ 

 $s_i = p_i \text{ xor } c_{i-1}$ 

# 3. Review of Brent-kung adder

Brent Kung adder is the most commonly used adder when high speed operation is required. It gives optimal number of stages by using logarithmic architecture. Because of this logarithmic architecture, there is a asymmetric load on each intermediate stages. BK adder uses prefix tree structure where, the computation is performed based on generate (g<sub>i</sub>) and propagate (p<sub>i</sub>) signals. The cost of implementation and complexity of routing is less in prefix adders. Since the logic level depth of BK adder is O(log2 [n]) [9], there is a decrease in the speed of operation as compared to CLA adders. There are many variants of BK adders, this paper discuss some of

the commonly used BK adder architectures:

# 1.1. Existing BK adders

Commonly used CSA adder consists of RCA adder and a multiplexer stage to select the desired output. BK adders are comparatively faster [10] than RCA adders. Regular Linear BK adder (RL-BK-A) has been proposed [11] in which it uses BK adders to add input bits when  $c_{in}=0$  and RCA adders to add 1 to this result when  $c_{in}=1$ . Use of RCA in this architecture has the disadvantage of increase in the total circuit area and delay. A modified architecture is

proposed to overcome the disadvantages of RL-BK-A, by using Binary-to-Excess-1 (BEC) converter instead of RCA, called Modified Linear BK adder (ML-BK-A). Using ML-BK-A the delay and the total area of the adder structure can be reduced as compared to RL-BK-A. Linear structure BK adder consumes more area with the increased delay. For area reduction and power minimization a new design is deployed. This new approach uses square root structure for BK adders.

#### 1.2. Existing BEC structure for modified BK Adders

Binary-to-Excess-1 converter is a combinational digital circuit used to perform addition of binary 1 to its input bits. The operation of BEC is similar to RCA and requires less number of logic gates, thereby reducing the total silicon area, power consumption and operational delay. It is easy to implement BEC than RCA. Using this BEC in Linear structure or in the Square Root sturucture it is possible to obtain ML-BK adder or MSR-BK adder respectively. Fig. 1.2 shows the structure of 4-bit BEC



Fig. 1.2: Structural Representation of 4-Bit BE-1 Converter.

# 4. A new optimized square root Brent kung carry select adder

In the new proposed approach, BEC structure is modified to minimize the combinational path delay and power utilization. Instead of using XOR gate, as in the existing BEC structure, a combination of XNOR gate, OR gate and NOT gates are used. Structure of optimized BEC is represented in the Fig. 1.3.



Fig. 1.3: Proposed Structure of BEC using XOR, OR and NOT gates.

Even though this proposed structure of BEC contains more number of logic gates, it is possible to eliminate multiplexer unit completely. In the existing architectures input bits are added, once by assuming zero input carry and next by assuming input carry is one. A BK adder is used to add the inputs when carry input is zero and BEC is used to add the inputs when carry input is one. Once the actual carry from the previous stage is known, the multiplexer unit simply select the output of BK adder or the output of BEC based on whether the actual carry input is zero or one respectively. In these designs multiplexer unit is required to select the desired output depending on the carry generated from the previous adder unit. So there is a carry propagation from each multiplexer unit, which increases the overall delay.

In the proposed architecture, modified BEC is designed such that it will do the function of adding 1 or 0 to the result of BK adder when carry input (carry out of previous stage) is 1 or 0 respectively,

thereby eliminating the multiplexer stage. The proposed architecture of BEC eliminates the multiplexer unit and the carry simply propagates from one BEC unit to next BEC unit without going into the multiplexer stage, hence reduces the overall delay. By eliminating multiplexer unit, it is observed that the power consumption of the final circuit can be minimized. Since the proposed structure of BEC needs more number of gates there is a possibility of increase in the total area. Hence there is an advantage that by using this new modified BEC structure, it is possible to minimize the overall delay and power utilization with a small area trade-of.

In the proposed architecture, square root structure of BK adder remains same. A BK adder will be used to add two input data when  $c_{in}=0$  and modified BEC will be used to add 1 to the result of BK adder when  $c_{in}=1$ . In this work a 16-bit New Optimized Square Root BK adder (N-OSR-BK) is proposed. This architecture uses five groups. The first stage contains only a 2-bit BK adder and further four groups consists of a BK adder and a modified BEC. Including this new BEC structure in the design, multiplexer stage can be removed, hence having the advantage of high speed and low power structure.

The structural block representation of 16-bit New Optimized Square Root BK adder is as represented in the Fig. 1.4.



Fig. 1.4: Block Representation of 16-Bit Optimized Square Root BK Adder.

### 1.1. High speed analysis

In this work the main goal of the design is to enhance the operational speed. This is possible when the total delay of the circuit is minimum. For the combinational circuits like adder, maximum combinational path delay gives the speed of the circuit. The combinational path delay in turn dependent on propagation delay or gate delay. The propagation delay of the circuit is the time duration between the input becomes stable and ready to change to the time duration that the output becomes stable or valid to change. Reducing the gate time delay reduces the overall delay of the circuit and hence increase in the performance. Let  $t_{hl}$  be the time required for the signal to go from high to low and  $t_{lh}$  is the time required for the signal to go from low to high then propagation delay time of the signal is the average of these two, i.e.,

#### $t_p = (t_{hl} + t_{lh})/2$

The terminology  $t_{hl}$  and  $t_{lh}$  always refers to the output transition. When the circuit contains more number of logic gates,  $t_{hl}$  and  $t_{lh}$  of all the gates should be added together to calculate the propagation delay. It should be noted that different logic gates generates different propagation delay. Maximum propagation delay is the highest path delay between when the input changes the value to the change in the value of the output. Critical path is the commonly used name of this delay path and the delay is called critical path delay. Critical path delay limits the maximum speed of operation of the circuit. It is necessary to reduce the critical path delay to enhance the execution speed of a digital circuit.

The critical path delay of the Modified Square Root BK adder is given by

Where,  $t_{BK1}$  is delay of first BK adder,  $t_{BEC1}$  is delay of first BEC and  $t_{m1}$  to  $t_{m4}$  is delay of the multiplexer 1 to 4.

The critical path delay of the New Optimized Square Root BK adder is given by

 $T_{OSR-BK} = t_{BK1} + t_{MBEC1} + t_{MBEC2} + t_{MBEC3} + t_{MBEC4}$ 

Where,  $t_{BK1}$  is delay of first BK adder,  $t_{BEC1}$  to  $t_{BEC4}$  is delay of modified BEC 1 to 4.

In the proposed architecture the multiplexer unit has been removed to reduce the critical path length and hence the combinational delay of the circuit. Existing BEC architecture requires 7 LUTs between input and output buffers and having a total delay of 1.421ns for 16bit adder. In the proposed BEC architecture there exists 6 LUTs between input and output buffer and having the delay of 1.222ns there by reducing the delay of almost 0.2ns for 16-bit adder. As the number of stages increases, the critical path length increases for the higher bit adders. It is concluded that in the proposed method it is possible to eliminate multiplexer unit in design and to reduce the length of critical path thereby improving the performance of adder.

#### 1.2. Low power analysis

Designing low power circuit is another challenging task in VLSI system. Scaling of transistor increases the power power density more than expected level. The power consumption in the VLSI circuit depends on the load capacitance (CL), the applied voltage and the switching frequency. Among these dynamic power dissipation is the main source of power dissipation. From the proposed design it is clear that as the switching frequency increases or the capacitive load increases, the power consumption increases. Reducing the number of logic units/gates reduces the switching activity and concluded that, if the circuit is designed with minimum of logic units/gates, it is possible to minimize the total circuit power.

 $T_{MSR-BK} = t_{BK1} + t_{BEC1} + t_{m1} + t_{m2} + t_{m3} + t_{m4}$ 

# 5. Discussions on simulation results and comparisons

Various adder designs like Regular Linear (RL) BK adder, Modified Linear (ML) BK adder, Regular Square Root BK adder, Modified Square Root BK adder and New Optimized Square Root BK adders are studied and implemented square root architectures. ML-BK adder, MSR-BK adder and New OSR-BK adders are designed, simulated, synthesized and verified using Spartan 6, XC6SLX45, CSG324 in Xilinx ISE 14.7 tool. The main VLSI design constraints like area, power consumption and delay are calculated for square root the architectures and the results are tabulated. The comparison of these architecture results shows that, the proposed architecture is having high speed and low power consumption as compared to the other architectures. These results are tabulated in table 1.1 as shown below.

| Table 1.1: Simulated Results for Three Square R | Root Adders |
|-------------------------------------------------|-------------|
|-------------------------------------------------|-------------|

| Adders                                | Delay<br>(ns) | Area<br>(LUT's) | Power<br>(mW) |
|---------------------------------------|---------------|-----------------|---------------|
| Regular Square Root BK Adder          | 11.538        | 34              | 89            |
| Modified Square Root BK Adder         | 11.378        | 37              | 89            |
| New Optimized Square Root BK<br>Adder | 10.802        | 37              | 87            |

Adder is the basic block of all the designs in digital signal processing applications. Any optimization done at this basic blocks is propagated as a beneficial performance factor in the computational blocks.

# 6. Conclusion

In this dissertation work, a new approach to Optimized Square Root BK adder is proposed. It has been designed using combination of BK adder and modified Binary-to-Excess-1 converter instead of RCA or existing BEC structure. Parallel prefix adders like BK tree adder is used, as the delay of these types of adders is minimum with reduced power. The adder architectures like RSR-BK adder, MSR-BK adder and proposed method are designed for 16-bit input data. VLSI design constraints like delay, area and power consumption are calculated for the square root adder structures. From these results, it is concluded that the proposed architecture is best suited, as compared to other architectures, when power and delay are the main design constraints for a given application but, with a small penalty of increase in the area. Further the proposed adder architecture can be used for the input data with higher order lengths.

# References

- G. Sivannarayana, R. babu Maddasani, and P. Ch, "Design and implementation of carry tree adders using low power fpgas," International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol. 1, no. 7, pp. Pp–295, 2012.
- [2] S. Parmar and K. P. Singh, "Design of high speed hybrid carry select adder," in Advance Computing Conference (IACC), 2013 IEEE 3rd International, pp. 1656–1663, IEEE, 2013.
- [3] Y. He, C.-H. Chang, and J. Gu, "An area efficient 64-bit square root carry-select adder for low power applications," in Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, pp. 4082–4085, IEEE, 2005.
- [4] M. Snir, "Depth-size trade-offs for parallel prefix computation," Journal of Algorithms, vol. 7, no. 2, pp. 185–201, 1986.
- [5] D. J. Jackson and S. J. Hannah, "Modelling and comparison of adder designs with verilog hdl," in System Theory, 1993. Proceedings SSST'93. Twenty-Fifth Southeastern Symposium on, pp. 406–410, IEEE, 1993.
- [6] B. W. Y. Wei and C. D. Thompson, "Area-time optimal adder design," IEEE transactions on Computers, vol. 39, no. 5, pp. 666–675, 1990.

- [7] H. Zhu, C.-K. Cheng, and R. Graham, "Constructing zero-deficiency parallel prefix adder of minimum depth," in Design Automation Conference, 2005. Proceedings of the ASP-DAC 2005. Asia and South Pacific, vol. 2, pp. 883–888, IEEE, 2005.
- [8] V. Dave, E. Oruklu, and J. Saniie, "Performance evaluation of flagged prefix adders for constant addition," in Electro/information Technology, 2006 IEEE International Conference on, pp. 415–420, IEEE, 2006.
- [9] R. P. Brent and H.-T. Kung, "A regular layout for parallel adders," IEEE transactions on Computers, no. 3, pp. 260–264, 1982.
- [10] A. Siliveru and M. Bharathi, "Design of kogge-stone and brent-kung adders using degenerate pass transistor logic," International Journal of Emerging Science and Engineering, vol. 1, no. 4, pp. 38–41, 2013.
- [11] Pallavi saxena, "Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder," 2015 International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA).