

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET

**Research Paper** 



# An Efficient MAC Architecture using Multiplier for DSP and DIP Operations

P. Rahul Reddy<sup>1</sup>, Pandya Vyomal N<sup>2</sup>, Abhishek Choubey<sup>3</sup>

<sup>1</sup>Associate Professor, Dept. of ECE, Sreenidhi Institute of Science & Technology, Ghatkesar, India. <sup>2</sup>Associate Professor, Dept. of ECE, Sreenidhi Institute of Science & Technology, Ghatkesar, India. <sup>3</sup>Associate Professor, Dept. of ECE, Sreenidhi Institute of Science & Technology, Ghatkesar, India.

#### Abstract:

DSP operations are very important part of engineering as well as medical discipline. For the designing of DSP operations Multiplication is play important role to perform signal processing operations. Multiplier is one of the critical components in the area of digital signal processing and hearing aids. So the objective is to design an efficient MAC hardware architecture using multiplier with assistance of compressors by conserving less area, power and delay. In this paper, efficient hardware architecture of MAC using a modified Wallace tree multiplier is proposed. The proposed MAC uses multiplier with novel compressor designs and adders as primitive building blocks for efficient application. Further, the Verilog-HDL coding of 8 bit MAC architecture and their FPGA implementation by Xilinx ISE 14.4 Synthesis Tool on Virtex7 kit have been done. The proposed compressor and adder based architecture used to be applied to MAC unit and in comparison to the previous design MAC unit and verified that the proposed architecture have reduce in terms of area, delay and power. The high performance is obtained by using a new hierarchical structure, these adders are called compressors. These compressors make the multipliers faster as compared to the conventional design used in Engineering, Science & Technology as well as medical discipline. Keywords: MAC, DSP, Adders, LP VLSI, Data path.

# 1. Introduction

 $\odot$ 

(cc)

Growing needs of high velocity knowledge signal processing influenced the researchers to seek fastest processors. The multiplier and multiplier-and-accumulator (MAC) [1] are the building blocks of the processor and exceptionally affects the rate of the processor. MAC is the key part of the digital signal processing and image/sound handling method, for example, separating, convolution and internal products thus high pace is applicable to strengthen for real processing purposes. Many researchers have attempted in designing MAC for top computational performance and low power consumption.

The architecture of MAC with power consumption reduction method is shown in Figure.1. The predominant unit of the low power MAC is control unit which generates control signals to the low power multiplier and adder in step with the accurate conditions. MAC unit is most of the time primary for kernel founded method which requires a large number of repetitive computational operations on a constant window. The repetitive operations can also be performed utilizing parallel processing concept which is predicted to lower the complexity and enhance the performance.

Image in the video sequences are frequently processed in raster scan system accordingly neighboring pixels almost always have the equal values or very small deviations. A DSP processor is designed to support fast execution of the repetitive, numerically intensive computations characteristic of digital signal processing algorithms [2-3].



Figure.1. Architecture of low power MAC unit

Copyright © 2018 Authors. This is an open access article distributed under the <u>Creative Commons Attribution License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## 2. Literature Survey

Multiplier architectures fall typically into two classes i.e., "tree" multipliers and "array" multipliers. Tree multipliers add as many partial products in parallel as practical and as a consequence, are very excessive efficiency architectures. Despondently, tree multipliers are very irregular, hard to design and as a consequence enormous. Array multipliers, then again, are very usual, small in dimension, however undergo in latency and propagation delay.

An efficient compressor architecture is actually suggested in this paper is actually, to balance back again the area, delay and power usage of the MAC architecture as an outcome of the point that the presence of supplementary selection of compressors. As a result they've an effect on the circuit layout stage or maybe the datapath optimizations are actually dealt with at the MAC amount for DSP apps. In MAC, furthermore the inclusion of carry propagate product in accumulate and multiply stages are actually consolidated to multiplier and accumulate phases of adders and compressors within the MAC architectures. FPGA domains have been used for design illustrations.

In order to reduce the power in MAC VLSI architecture, compressor with the low power and improvement in the datapaths at MAC level for DSP applications. Notwithstanding the boundaries of the methodology used, it was an important study for expansion of compressors elaborately in MAC levels.

## 3. Proposed System

#### A. Compressor

Usual implementation of the multiplier utilizes the carry save array multiplication which requires more computation leading to delayed output and consumes extra power. To strengthen the efficiency of the multiplier, compressors are utilized which performs reduction of partial products in parallel. This helps in growing the efficiency by using reducing the interconnect delays and the system imperfections associated with logic transitions, which leads to lowered power consumption.

Compressors are used to minimize delay and area which leads to increase the performance of the overall system. There are a number of compressor architectures discussed in the past[5-6]. Compressors may also be implemented utilizing two full adders, basic tree cells, and with present usual compressor cell of the library. Figures 2(a) and 2(b) exhibit the gate level structure of the compressor architectures. As reported within the Introduction, the design oriented architectures furnish larger efficiency, and similarly the traditional compressor architecture limits its utilization in low power restrained designs.





Figure. 2: (a) Neil and Harris compressor architecture [7]. (b) Compressor architecture using full adder [6].

Within the structure of above shown compressor cell comprises more interconnects, which changes the basis for the interconnect delays and increased glitches. On the opposite side the structure may behave faster however it's unused for the low power applications given that trade-off of delay can also be approved. Moreover to have the decreased leakage power the cells should have larger number of transistors within the stack, and right here on this architecture it is only two (for XOR and OR gates).

The normal compressor architecture additionally consists of inverters (e.g., embedded in AND & OR logics) in the principal course, which ends up in logic transitions and increases the power consumption. Consequently the regular compressor architecture of Figure.2(a) is evidently unsuited for low power applications. Figure.2(b) shows the compressor architecture constructed with the full adder. This compressor architecture might have fewer interconnects but the sum and elevate paths are shared and it requires more strength to power the signal; this results in the high power consumption. Such cells are fitted to the timing constraints because the larger drive strengths will raise the timing efficiency of the cell.

Provided that in Figure.3 proposes the compressor structure [11]. by means of higher fan-in gates that increases higher drive capability. Having said that larger fan-in gates occupies huge fragment of the logic and thus aids in gates reduction for implementation of architecture. By lesser gate count the area utilization and the usage of interconnect delays decreases which in turn generate low power energy efficient compressor architecture.



Figure. 3: Proposed compressor Cell

Another key thing to remember updated compressor architecture empowers advanced features comparable design specific/constraint specific architectures and permits exploiting especially for low power operations.

#### **B.** Multiply-Accumulate Unit

MAC is the rudimentary and most commonly used structure in most of the DSP applications to accomplish convolution, filtering and etc to accelerate the FIR or FFT computations in communications [7].



Figure. 4: Typical MAC architecture

To work efficiently Multipliers are broadly segregated into 3 stages as follows

- a) Partial product generation stage
- Partial product reduction stage b)
- c) Carry propagate addition stage

We've seen that the methods used in [8] used walllace tree for reduction of partial products, where as some conventional architectures uses both full adders and half adders in the sectional product stages. So therefore by incorporating the efficient compressors in the multipler certainly reduce gate count that can yield less number interconnects. Hence with less interconnects the system less prone to faults there by produces efficiency the MAC unit.

In conclusion the practice of proposed efficient compressors and adder configuration advances the area, delay and power effectively and more widely used in DSP applications [9-10]. To exemplify the consequence of compressors and adder architecture a MAC unit structure which includes additional numbers of compressors are included for better operation.

As depicted8 in Figure.5 clearly the usefulness of the MAC architecture that employs 4:2 and 3:2 compressor and half adders (HA's) in the partial product reduction stage where its partial products can be accumulated at later accumulation stage of the multiplier.



Figure. 5: Updated MAC architecture [8]

# 4. Results and Analysis

Here in this section both the typical and updated architectures at the compressor and MAC unit level were premeditated and verified.

Table.1 illustrates the updated architecture has improved results than the conventional architectures in FPGA. Table 1 indicates the Delay, Area & Power of the existing and updated compressor architecture. As reported in section 2, the results in Table 1 show that the design specific architectures are more efficient than the generic architectures that is the proposed compressor architecture leaks much less power than the existing compressor architecture.

Minimum interconnect delay and associated system glitches have decreased the delay and dynamic power consumption. As the number of compressors within the design increases the efficiency of the proposed compressor architecture additionally increases.

| Table1 Existing and proposed compressor architecture results |            |             |                         |
|--------------------------------------------------------------|------------|-------------|-------------------------|
| 8-Bit MAC De-                                                | Delay (ns) | Area        | Power(mW)               |
| sign                                                         |            |             |                         |
| MAC using full                                               | 9.728      | Slices: 237 | T <sub>p</sub> : 597.18 |
| adder based 4:2                                              |            | LUT's: 230  | Sp: 547.35              |
| compressor                                                   |            | FF's: 32    | D <sub>p</sub> : 49.83  |
| MAC using con-                                               | 8.858      | Slices: 182 | T <sub>p</sub> : 596.13 |
| ventional 4:2 com-                                           |            | LUT's: 181  | Sp: 546.84              |
| pressor                                                      |            | FF's: 32    | D <sub>p</sub> : 49.29  |
| Proposed                                                     | 5.913      | Slices: 32  | T <sub>p</sub> : 142.92 |
| -                                                            |            | LUT's: 149  | S <sub>p</sub> : 142.92 |
|                                                              |            | FF's: 32    | D <sub>p</sub> : 0.00   |





Figure. 6: proposed 4:2 Compressor output

Figure.6 clearly shows the simulation proposed MAC Compressor maintains very less delay in the order of 9.728nSec as tabulated in Table.1



Figure. 7: MAC Output

Figure.7 clearly shows the simulation of MAC unit



Figure. 8: MAC RTL Schematic

Figure.8 clearly shows the MAC RTL Schematic that contain LUTs ,FFs and slices that maintains optimum dynamic power compared to conventional designs shown in Table.1.

# 5. Conclusion

An efficient MAC architecture using multiplier has been determined in this work. Thus we propose a area efficient, low power and high speed MAC architecture which will be replacement over the existing architecture by replacing conventional 4:2 compressor with proposed 4:2 compressor. The proposed architectures have yielded better efficient results in terms of area, delay and power in the FPGA domain.

## References

- Omondi A R." Computer Arithmetic Systems". Englewood Cliffs, NJ: Prentice-Hall, 1994.
- [2] Ping-hua C, Juan Z. "High-speed Parallel 32×32-bMultiplier Using a Radix-16 Booth Encoder," Intelligent Information Technology Application Workshops, 2009.IITAW '09. Third International Symposium on, 2009, pp.406, 409.
- [3] Kiwon Choi; Minkyu Song, "Design of a high performance 32×32bit multiplier with a novel sign select Booth encoder," Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, vol.2, no., pp.701, 704 vol. 2, 6-9 May 2001.
- [4] S. D. Pezaris, "A 40-ns 17-bit by 17-bit array multipliers," IEEE Transactions on Computers, vol. 20, pp. 442-447, April 1971].
- [5] P. Aliparast, Z. D. Koozehkanani, and F. Nazari, "An ultra-high speed digital 4-2 compressor in 65-nm CMOS," International Journal of Computer Theory and Engineering, vol. 5, no. 4, pp. 593– 597, 2013.
- [6] N. W. Harris David, CMOS VLSI Design—A Circuits & System Perspective, Pearson Education, 2008.
- [7] Tung Thanh Hoang; Sjalander, M.; Larsson-Edefors, P., "A High-Speed, Energy-Efficient Two-Cycle Multiply Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol.57, no.12,pp.3073,3081, Dec. 2010.
- [8] Rajput, R.P. Swamy, M.N.S., "High Speed Modified Booth Encoder Multiplier for Signed and Unsigned Numbers," Computer Modelling and Simulation (UKSim),2012 UKSim 14th International Conference on , vol., no.,pp.649,654, 28-30 March 2012.
- [9] Begum JT, Naidu HS, Vaishnavi N, Sakana G, Prabhakaran N. Design and Implementation of Reconfigurable ALU for Signal Processing Applications. Indian Journal of Science and Technology. 2016 Jan; 9(2):1–6. DOI: 10.17485/ ijst/2016/v9i2/86343.
- [10] Nandal A, Vigneswaran T, Rana AT. Booth multiplier using reversible logic with low power and reduced logical complexity. In-

dian Journal of Science and Technology. 2014 Jan; 7(4):525–29. DOI: 10.17485/ijst/2014/v7i4/48644

[11] C. P. Narendra ; K. M. Ravi Kumar, "Low power compressor based MAC architecture for DSP applications", International Conference on Circuits, Communication, Control and Computing Year: 2014.