

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET

Research paper



# FPGA based Implementation and Verification of H.264/AVC Encoder

D.Raja Ramesh<sup>1\*</sup>, P.H.S.T.Murthy<sup>2</sup>, Maheshwarappa B<sup>3</sup>

<sup>1</sup> Department of Electronics and Communication Engineering, MVGR College of Engineering(A), Vizianagaram, India. <sup>2,3</sup> Department of Electronics and Communication Engineering, STJIT, Ranebennur, India \*Corresponding Author E-mail: rajaramesh09@gmail.com

### Abstract

FPGA prototyping in video processing is extremely essential as it verifies the functionality of the design. The Proposed architecture of H.264/AVC advanced video coding encoder for motion estimation is simulated, synthesized with the vivado Xilinx nexys4 DDR XC7A100TCSG324-2 field programmable gate array device hardware platform. The implemented architecture also compares with the Xilinx zynq-7000 system-on-chip (SOC) with clock frequency of 100MHz on a vivado Xilinx Artix-7 FPGA based with DDR3 memory which is compatible for real time applications for HDTV. This is suitable for high definition television applications, providing up to 60 frames 720p with PSNR around 34 db.

Keywords: FPGA Prototyping, Architecture, H.264/AVC, motion estimation, HDTV.

# 1. Introduction

H.264/AVC is a widely used technique for compression and video encoding. Advanced video coding is the most frequently used codec's for recording, compressing and distributing of high definition video applications. In the case of video processing, FPGA plays a vital role in video compressions. If a FPGA product is late to release in a market, then the product may not be used, then the costing of that company investment capital in the FPGA product. Later that design processing the FPGAs which is ready for production, while standard cell ASIC's also take more time compared with the FPGAs to reach the production. on FPGA board/kit. The FPGA development board is used for more complex functions and also compared the performance with various abstractions. It consists of various varieties of proven peripherals with some standard industry interfaces. It consists of various categories which are deals with the introduction, basic block diagram of H.264/AVC encoder, top level overview of an encoder, architecture of proposed FPGA, hardware block of an encoder and its operation, simulation and prototyping of H.264 encoder and finally the results with the conclusions.

## 2. Basic Block Diagram of H.264/Avc Encoder



Fig.1: Trade-off between the flexibility and performance of various verification methods.

From the figure it can be understood that FPGA hardware board has the proper balance between flexibility and performance. Therefore for the design either system-on-chip or general FPGA has check the most efficient method compared to the various other methods. FPGA is used for implementation of complex is such as general purpose or system-on-chip. FPGA-based development boards are used widely as they are extremely fast and flexible to handle. The simulation of this Encoder is implemented practically



Fig.2: Demonstration of Basic Block Diagram

The setup consists of any uncompressed video play out device that can stream the uncompressed video over SDI link. A SDI link is a serial digital interface which enters the transmission of data which is in the format of uncompressed and also unencrypted digital video. This device is connected to Artix-7 demonstration board via a SDI link provided on FMC based Serial IO daughter card. These

Copyright © 2018 Authors. This is an open access article distributed under the <u>Creative Commons Attribution License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

are to provide interfacing to the Artix board to drive the encoder input and for capturing the decoder output. A windows PC is required to program the FPGAs with the evaluation of bit files using the JTAG interface and also for configuration of the encoder through Ethernet link.

This real-time play-out hardware setup transmits the uncompressed streams over SDI interface to the board. This is run on the windows PC, which transmits the configuration data to the encoder over Ethernet link. The Artix-7 series board encodes the uncompressed input stream and transmits the encoded stream output over GbE link to the board. The second Artix-7 series board may also use for decodes the compressed stream and stores the decoded data in the external memory. A display controller then reads decoded images out of memory and drives the HDMI interface.H.264/AVC standard is jointly developed by ISO under Moving Picture Experts Group MPEG and ITU-T Video Coding Experts Group (VCEG). A current frame which is nothing but 'Fn' is implemented in to macro blocks, which undergo of either inter or intra prediction mode. A predicted frame (named as 'P') is formed based on reconstructed frame which is unfiltered (F`n).The reference picture use the previously encoded frames named as F'n-1. In Intra mode 'P' which is formed through the samples in the present frame that have already previously being reconstructed, where as in Inter mode 'P' is formed by motion compensation(MC) from previous frame F`n-1. The Predicted frame is now subtracted with the current frame 'Fn' to provide a new residual block which is name as Dn, and it is then implemented by discrete cosine transform or discrete wavelet transform followed by quantization. After quantization is obtained to get a set of quantized transform coefficients which are named as 'X', and which is then encoded with entropy encoding and NAL encode is also required for transmission or storage purpose. The transform coefficients 'X' which is sent to Inverse transform quantization that is Inverse discrete cosine transform or inverse discrete wavelet transform to identify a differential block which is named as D'n. The Predicted frame 'P' is added to differential block D'n is to create a reconstructed block which is called as uF`n, now it is then filtered by the de-blocking filter to minimize the effects of blocking filter distortion and also obtained the reconstructed reference frame called as F`n. Motion estimation and compensation are the two techniques used in inter prediction.

# 3. Top Level Overview of an Encoder



Fig.3: Top level Overview of Encoder

Fig.3 shows the top level overview of encoder. It contains basic pipelining architecture along with ADC/DAC and Video filter. FPGA and TI DSP has NAND and NOR Flash, DDR3 RAM Memory. There exists an I/O interface between FPGA and TI DSP via Ethernet MAC. H.264 Encoders and JPEG encoders are being used. The Encoder shall accept DVI inputs that meet the minimum requirements of the 'receiver' specification of the DDWG 1.0 standard. The encoder shall support embedding PiP of still image captured from the input video. PiP size will be 1/9 of the video frame. The PIP location (x, y co-ordinate) will be arbitrary and set by the control PC. JPEG encoders are also used for lossless

compression of images with high quality. A single JPEG decoder is used in TI DSP chip.

## 4. Architecture of Proposed FPGA



Fig.4: The Architecture of Proposed FPGA Prototype

The above figure has detailed view of the architecture for FPGA platform used in this paper. The main CPU is linked to the configuration of compatible FPGA hardware board in view of the bit pattern of the user FPGA's. The User with FPGA XC7A100TCSG-2 board optimized the architecture. The Linux kernal operating can interact with PC through the USB, Ethernet or PCIe interface. Vivado Xilinx 14.2 version design tool implements the simulation, synthesis and debug the hardware board. A Linux kernel provides the basic services and device drivers used on the Linux CPU. External memory of the FPGA platform is DDR4 memory, which also is applicable to the stock pile the read and writes the data and encoded data of the H.264 encoders. The compatible JTAG cable is coupled between PC with a FPGA, which can provide the configuration and also provide the debug through interface data via JTAG cable chain.

# 5. Hardware Block of an Encoder and its Operation

The hardware block of an encoder is presented in detailed. The hardware block comprises of combination of FPGA and TI DSP chip. The I/O peripherals are connected with their respective ports provided by the processor. Flash memory, and DDR3 and EEPROM is also able to connect the processor which is bidirectional. A JTAG is a device which helps in establishing the configuration and debugging interface through the JTAG chain. RS232 is a serial port which is communicated in which data is sent from TI DSP. A DDR 4 and SPI FLASH are interfaced with TI DSP Chip. The DC power jack, LED's Switch Audio line, GA-IN, VIDEO input and SDI input are the other I/O ports which are linked to these processors.



Fig.5: Hardware Block Diagram of Encoder

# 6. Simulation and Prototyping of H.264 Encoder



Fig.6: The proposed model and debugging of H.264 encoder



Fig.7: FPGA hardware prototype based on Xc7a100tcsg324-2 platform

Memories which are shared and distributed are used among the bus architecture with the stages of the data transfer, which could be the flowing of the pipeline is data-driven. This hike in the use of cache helps to obtain better timing performance. Controller modules provide the various signals which are depending on the coding behaviors for relevant implementation of the H.264 encoder. Direct Memory Access controller (DMAC) module also plays a key role for interface with the external SD card memory of DDR4. The above figures show the proposed model and debugging of the H.264 encoder. The proposed procedural model is divided into three categories. First one is the system controller which control the arbiter for the purpose of select the read/write the data path to DDR4 memory. Second one is the controller turns to the encoder path and the encoder architecture interacted with the DDR4 memory for that the code is carrying on that path. Third one, the encoded bit stream in the DDR4 is transferred to the result monitor through the arbiter. A H264 to DDR4 module and top view architecture to DDR4 module are designed to match the DDR4 protocol. The DDR4 module uses Xilinx Nexys4 DDR XC7A100TCSG-2 for verification and the DDR4 controller with physical layer uses Xilinx system generator tools, which is provided by Xilinx [5]. Fig. 6 shows the proposed model and overwrites with the FPGA hardware with that board based on Xc7a100tcsg324-2 platform [7], which is developed by the Xilinx Corporation. The platform can be connected to PC with either USB or Ethernet or PCIe interface. The "Xilinx USB Cable" cable is used to configure and debug the FPGAs via JTAG chain.

# 7. Results

Before FPGA based Hardware implementation, functional verification and synthesis according with FPGA design flow with vivado Xilinx 14.2 version design suite based on the proposed

architecture in Fig.5 and the simulation analysis is demonstrated in Fig.7 (a) and the detailed view of the analysis is presented below in the Fig.7 (b). Both the simulation and implementation results provide good evidence for the correctness of the encoder which is mentioned in synthesis report for the comparison of both the Artix-7 series and Zynq-7000 series.

### 7.1. Simulation Results



Fig.8: (a) Simulation waveform with Xilinx; (b) Debug waveform

The overall system is implemented on a Xilinx Artix-7 series with configuration of xc7a100tcsg-2 FPGA with 100MHz clock frequency, which is compatible for real-time High definition television applications. Table 1 shows the detailed summary for the synthesis utilization report for Artix-7 series hardware model compared with the another hardware model which is Zynq-7000 series in Table2 represents the synthesis report and also compared with the various hardware model performance for the proposed implementation and that in Ref. [9] and [10].

#### 7.2. Synthesis Report

| S.No | Slice Logic Utilization | Xc7a100tcsg324-2 (Used/Available) | Utilization |  |  |
|------|-------------------------|-----------------------------------|-------------|--|--|
| 1    | Slice Registers         | 77,646/ 301,440                   | 25%         |  |  |
| 2    | Slice LUTs              | 92,109/ 150,720                   | 61%         |  |  |
| 3    | Occupied Slices         | 33,718/37680                      | 89%         |  |  |
| 4    | RAMB36E1/FIFO36E1s      | 92/416                            | 22%         |  |  |
| 5    | DSP48E1s                | 28/768                            | 36%         |  |  |
| 6    | Bonded IOBs             | 183/600                           | 30%         |  |  |
|      |                         |                                   |             |  |  |

| ( | a) |  |
|---|----|--|
|   |    |  |

| S.No | Slice Logic Utilization | Xc7a100tcsg324-2 (Used/Available) | Utilization |
|------|-------------------------|-----------------------------------|-------------|
| 1    | Slice Registers         | 65,374/ 301,440                   | 21%         |
| 2    | Slice LUTs              | 70,134/ 150,720                   | 46%         |
| 3    | Occupied Slices         | 32,528/37680                      | 86%         |
| 4    | RAMB36E1/FIFO36E1s      | 78/416                            | 18%         |
| 5    | DSP48E1s                | 14/768                            | 18%         |
| 6    | Bonded IOBs             | 164/600                           | 27%         |

(b) **Fig.9**: (a) Synthesis results for Artix 7 Series Nexys4 DDR3 FPGA (b) using ZED Board SOC

In this paper, a verification and implementation system for H.264/AVC encoder with FPGA Platform is proposed and simulated synthesized based on the vivado Xilinx 2014.2 Design Suite with Artix-7 series based on Nexys4 DDR FPGA hardware and compared with Zync board platform with this Zync FPGA board optimize the real time encoding for the High definition television applications. To compare the result analysis with two FPGA Xilinx family series and in ZED board it is more optimized compared with the Artix-7 series with a fixed clock frequency of 100MHz which could also satisfy real time encoding for HDTV applications.

### References

- Wang, Y., Wang, Y.: China's IC Industry Development from the country of consumption to the power industry, p. 241. Science Press, Beijing (2008)
- [2] Huang, W., Wang, X., et al.: Implementation of high-speed verification platform based on emulator for reDSP & reMAP. In: IEEE 8th International Conference on AISC, pp. 682–685 (2009)
- [3] Wiegand, T., Sullivan, G.J., et al.: Overview of the H264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
- Puri, A., Chen, X., et al.: Video Coding Using the H.264/MPEG-4 AVC Compression Standard. Signal Processing: Image Communication 19, 793–849 (2004)
- Xilinx Artix-7 FPGA Memory Interface Solutions User GuideV3.91,http://www.xilinx.com/support/documentation/ip\_docu mentation/mig/v3\_91/ug406.pdf
- [6] Babionitakis, K., Lentaris, G., et al.: An Efficient H.264 VLSI Advanced Video Encoder. In: 13th IEEE International Conference on Electronics, Circuits and Systems, pp. 545–548 (2006)
- [7] Chen, T.C., Chien, S.Y., et al.: Analysis and architecture design of an HDTV720p 30frames/s H. 264/AVC encoder. IEEE Trans. Circuits Syst. Video Technol. 16(6), 673–688 (2006)
- [8] Wang, Y., Wang, Y.: China's IC Industry Development from the country of consumption to the power industry,p.241.Science Press, Beijing (2008)
- [9] Huang, W., Wang, X., et al.: Implementation of high-speed verification platform based on emulator for reDSP & reMAP. In: IEEE 8th International Conference on AISC, pp. 682–685 (2009)
- [10] Wiegand, T., Sullivan, G.J., et al.: Overview of the H264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
- [11] Puri, A., Chen, X., et al.: Video Coding Using the H.264/MPEG-4 AVC Compression Standard. Signal Processing: Image Communication 19, 793–849 (2004)
- [12] DN-DualV6-PCIe 4User Manual, http://www.dinigroup.com/new/ DN-DualV6-PCIe-4.php
- [13] Xilinx Virtex-6 FPGA Memory Interface Solutions User GuideV3.91,http://www.xilinx.com/support/documentation/ip\_docu mentation/mig/v3\_91/ug406.pdf
- [14] Babionitakis, K., Lentaris, G., et al.: An Efficient H.264 VLSI Advanced Video Encoder. In: 13th IEEE International Conference on Electronics, Circuits and Systems, pp. 545–548 (2006)
- [15] Chen, T.C., Chien, S.Y., et al.: Analysis and architecture design of an HDTV720p 30frames/s H. 264/AVC encoder. IEEE Trans. Circuits Syst. Video Technol. 16(6), 673–688 (2006)