

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET

Research paper



# DA Based Systematic Approach Using Speculative Addition for High Speed DSP Applications

G. Reddy Hemantha<sup>1\*</sup>, S. Varadarajan<sup>2</sup>, M.N. Giriprasad<sup>3</sup>

<sup>1</sup>Research Scholar, <sup>2,3</sup>Professor,

<sup>1</sup>ECE Department, Jawaharlal Nehru Technological, University Ananthapuram <sup>2</sup>ECE Department, S V U College of Engineering, S V University, Tirupati <sup>3</sup>ECE Department, JNTU College of Engineering, JNTUA, Ananthapuram \*Corressponding Author E-mail: <u>hemanthag75@gmail.com</u>

#### Abstract

In recent years Parallel-prefix topologies has been emerged to offer a high-speed solution for many DSP applications. Here in this paper carrier approximation is introduced to incorporate speculation in Han Carlson prefix method. And overall latency is considerably reduced using single Brent-Kung addition as a pre and post processing unit. In order to improve the reliability error detection network is combined with the approximated adder and it is assert the error correction unit whenever speculation fails during carries propagation from LSB segment to MSB unit. The proposed speculative adder based on Han-Carlson parallel-prefix topology attains better latency reduction than variable latency Kogge-Stone topology. Finally, multiplier-accumulation unit (MAC) is designed using serial shift-based accumulation where the proposed speculative adder is used for partial product addition iteratively. The performance merits and latency reduction of proposed adder unit is proved through FPGA hardware synthesis. Obtained results show that proposed MAC unit outperforms both previously proposed speculative architectures and all other high-speed multiplication methods.

Keywords: Parallel-prefix adders, Speculation, MAC, and FPGA design etc.

# 1. Introduction

Binary addition has been emerged and most widely used in many DSP and wireless applications. It is also basic building blocks in arithmetic's such as multiplication and division. To reduce the delay and arithmetic complexity of the adder many works has been investigated using various contexts [1]. In most cases, the significant performance degradations happen only because of bit width of the adder. The speculative techniques [3-4] are widely used as countermeasure measures, which can be used to reduce sub-logarithmic delays by reducing critical path of the active input operands. On the other side parallel prefix architectures were used [5] for High speed accumulations in many digital devices. Recently, a numerous techniques are emerged such as Brent-Kung [6], Kogge-Stone [7], Sklansky [8], Han-Carlson [9], Ladner-Fischer [10]. In this paper, speculative prefix structures is presented for Ling carry computation to achieve both simplicity and high speed accumulation. The hardware complexity overhead due to the inclusion of error recovery is negligible as compared to critical path reduction techniques through several pipelined architectures without causing any latency issues. This paper, for the first time, we present speculation with variable latency to link equations via parallel-prefix computation. This work is also permitted to design high-speed and unique MAC hardware structure using single adder unit, thereby making them suitable for any DSP applications. To prove the efficiency of the proposed multiplier unit it is compared with state-of-the-art methods like high speed vedic and high radix booth multipliers.

Moreover, this methodology has several attractive features such as low complexity and high performance. Also, the Distributed arithmetic (DA) technique is used to get high performance for FIR filter design, where all bits of one tap unit are processed within the bounded delay.

# 2. Hybrid Parallel-Prefix Using Ling Computation

The goal for high-speed parallel architectures with reduced critical path employs a Han-Carlson prefix topology with carry generation using single Brent-Kung in both pre and post processing unit.

#### A. Speculative Prefix-Processing

The speculative computation is led the major differences in both latency and speed of accumulation compared to conventional prefix architectures. During speculative computation instead of waiting for carry propagation from least significant regions, approximated values are generated and propagated to the most significant sub block regions; in the post processing if approximation is not match with actual carries obtained from LSB block side error correction is asserted with causing any significant increment in critical path delay as shown in Fig 2.





Fig. 1: Proposed speculative parallel architecture

#### **B.** Error Correction

Error correction unit is asserted when approximation failed to match with the exact carry signals propagated called misprediction. The error correction unit is constructed with hierarchical levels of the XOR gates to obtain the exact addition output. The **Fig.** 1 shows the error correction unit of the proposed speculative parallel architecture; the error correction for any parallel prefix methods can be obtained in this same ways. It is unavoidable to get larger delay due to the incorporation of the error correction unit with basic cell structure of the speculative units, which will cause additional one clock cycle of computation as compared to speculative results with fine approximated carry values.

#### C. Post-Processing

The post processing unit is same for both speculative and non speculative results which are accomplished with the carries generated from pre-computation unit as shown in Fig 2. This postprocessing unit shares the similar characteristics with the conventional parallel prefix computational unit which consists of only XOR gates. The prefix architecture has Following definitions, each carry ci is equal to Gi:0. The prefix operator - - $(G_a.P_a) \circ (G_r.P_r)$ . This is utilized Brent Kung level hierarchy, generalization allows a generation of group term  $(G_{i:j};\,P_{i:j})$  to be - Representing the operator derived in overlapping manner. - as a processing node and the operating pairs (Gi:j; Pi:j) as the edges of a graph, prefix carry-propagation units as directed to next stages through graphs. Fig. 2 presents the 8-bit prefix topology and its logic-level implementation of the basic cells, while nodes are used to formulate error assertion unit. The last module of each column has forwarded through Brent-Kung level 2 since we need only a computation of Gi:0 for final accumulation results.



Fig. 2: Proposed speculative parallel architecture

# 3. Experimental Results

All types of adder unit was described in Verilog HDL and mapped on ALTERA cyclone FPGA), using the QUARTUS II EDA Design Compiler tool. Each adder unit was designed recursively to optimize the critical path with intention of getting minimum possible delay as shown in Fig 3. The generated net lists (*.edif* file) as shown in Fig 4 were forwarded to timing analyzer tool in order to compute the speed after post-routing of the design. All the speed constraints (operating conditions), such as temperature, paths of global clock, scaling level of positive edge to negative edges were held constant for all adder architecture.

#### A. Speed and Area Trade Off Performance

Here both speed and hardware utilization rate is totally depends on the number of bits used in the input operands and the reconfiguration rate. , and the key sizes used in each stage of hierarchical matching steps. The performance measure of accumulation units with various categories are shown in Table 1 performance metrics evaluated through shift based accumulation for multiplication is shown in Table 2.



Fig. 3: Simulated output

Table 1: Performance Comparison of Various Adder Units.

| Adder type                 | Area (LE's<br>used) | Fmax (operating<br>frequency) |
|----------------------------|---------------------|-------------------------------|
| Carry select adder         | 81                  | 196.39 MHz                    |
| Carry look ahead adder     | 133                 | 241.95 MHz                    |
| Kogge-stone adder          | 79                  | 306.0 MHz                     |
| Proposed speculative adder | 88                  | 1111.11 MHz                   |

Table 2: State-Of-The-Art Comparison Of Multiplier Units

| Area (LE's used) | Fmax(operating<br>frequency) |
|------------------|------------------------------|
| 182              | 239.35 MHz                   |
| 213              | 111.99 MHz                   |
| 143              | 737.46 MHz                   |
|                  | 182<br>213                   |

## 4. State of the Art Comparison

As this paper is the first attempt to include speculation in the prefix computation for modified carry propagation over traditional definition of carry equations. Though the precise error correction unit considerably reduces the overall performance, than all other methods, proposed method attains high speed with negotiable hardware complexity overhead. Moreover, multiplication through speculative adder exhibits a better quality metrics than adder one. It is incorporated as follows: in speculative computation all the carries are computed identically and irrelevant to each other, instead in all other parallel prefix computation methods, only 50% of the carries are computed through hierarchical tree based architecture.

Here through exhaustive test bench simulation functionality is verified as shown in Figure 3 and hardware synthesis is carried out using QUARTUS II FPGA synthesizer and its hardware RTL schematic and maximum operating clock speed shown is in Figure 4 and Figure 5. The proposed speculative approach reported with the speed of about more than one time faster than conventional prefix based architecture. Moreover, hardware complexity overhead is also reduced considerably better than the most recent advance prefix implementation.



Fig. 4: RTL view



Fig. 5: Fmax report

The key merits of the proposed methodology is the generation multiplied output using single adder unit from a hardware complexity point of view, such that it is consume lesser hardware than any other alternative. Other existing method like Vedic and booth radix-8 algorithm not ensures the better moderation from both resource utilization and operating frequency point of view as shown in Fig 5. Finally, when considering the complexity and the design performance, our proposed methodology is far superior to all other state-of-the-art comparisons. While our proposed work doesn't alters the functionality basic MAC unit, but existing methods required significant architecture modification for latency reduction.

## 5. Conclusion

In this paper variable latency speculative addition is proposed for high-speed DSP applications. Hybrid error detection network is incorporated to assert the error signal to minimize the error probability. An extensive MAC unit is proposed using speculative technology shows that proposed MAC unit outperforms all other state-of-the-art multiplication methods. Compared with traditional speculative methods, our method shows sensible improvements in overall latency reduction. The proposed MAC unit preserves all the benefits of high speed arithmetic units, while providing optimized low latency. Hence, high-speed DSP applications can be easily achieved by adopting the proposed adder architecture.

# References

- Koren. Computer Arithmetic Algorithms. Prentice-Hall Inc., New Jersey, 1993
- [2] Koren, Israel. Computer arithmetic algorithms. Universities Press, 2002.
- [3] Lu, Shih-Lien."Speeding up processing with approximation circuits." Computer 37.3 (2004): 67-73.
- [4] Chakraborty, Rajat Subhra, and Swarup Bhunia. "HARPOON: an obfuscation-based SoC design methodology for hardware protection." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28.10 (2009): 1493-1502.
- [5] I.Koren, Computer Arithmetic Algorithms. Natick, MA, USA: A K Peters, 2002.
- [6] Brent, Richard P., and H-T\_ Kung. "A regular layout for parallel adders." IEEE transactions on Computers 3 (1982): 260-264.
- [7] Kogge, Peter M., and Harold S. Stone. "A parallel algorithm for the efficient solution of a general class of recurrence equations." IEEE transactions on computers 100.8 (1973): 786-793.
- [8] Sklansky, Jack. "Conditional-sum addition logic." IRE Transactions on Electronic computers 2 (1960): 226-231.
- [9] Han, Tackdon, and David A. Carlson. "Fast area-efficient VLSI adders." Computer Arithmetic (ARITH), 1987 IEEE 8th Symposium on. IEEE, 1987.
- [10] Ladner, Richard E., and Michael J. Fischer. "Parallel prefix computation." Journal of the ACM (JACM) 27.4 (1980): 831-838.
- [11] Liu, Tong, and Shih-Lien Lu. "Performance improvement with circuit-level speculation." Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM International Symposium on. IEEE, 2000.
- [12] T. Padmapriya and V. Saminadan, "Priority based fair resource allocation and Admission Control Technique for Multi-user Multiclass downlink Traffic in LTE-Advanced Networks", International Journal of Advanced Research, vol.5, no.1, pp.1633-1641, January 2017.
- [13] S.V.Manikanthan and K.Srividhya "An Android based secure access control using ARM and cloud computing", Published in: Electronics and Communication Systems (ICECS), 2015 2nd International Conference on 26-27 Feb. 2015, Publisher: IEEE, DOI: 10. 1109/ECS.2015.7124833.