Design & Simulation Of 64-Bit Hybrid Processor Instruction Set Using Verilog

As a part of my ongoing research on implementation of multi core hybrid processor on FPGA, I have developed data flow designs for most popularly used 20 processor instructions. I have made digital design, wrote code in Verilog HDL and simulated all the 20 instructions using Xilinx ISE 14.5. The data flow designs, symbolic representation and simulation results are explained in detail in this technical paper. This is partial implementation of Hybrid Processor & the other sub modules implementation on Xilinx FPGA will be published in my subsequent technical paper.


Introduction
As it was discussed in my previous paper [7], FPGA is emerging as a rapid prototype for its several advantages as was listed [7]. Many researchers have undertaken several partial implementation of RISC processor using FPGA. Processor design can be done using any one of the following approaches. Architecture based (Top down approach: The basic core architecture is kept as reference and instructions are derived based on this basic architecture. This means, better the architecture design, powerful the instructions. Instruction based (Bottom up approach): here the approach is to list out the most used instructions of popular processors & implement them one by one. By putting all these implementation of instructions together, will emerge architecture at the top level. Hence in this approach incremental growth of instruction implementation will lead to mega processor architecture. State Machine approach: In this approach, each instruction is treated as a fixed state machine. This means various instructions leads to various field state machines. The processor states can be like-FETCH, DECODE, EXECUTE, READ, WRITE, INTER-RUPT, SEND, RECEIVE, ROTATE, IN , OUT etc. Fixed Function Special Processors: This approach is used for designing custom processor with very specific & dedicated applications. Example: Video processor, audio processor, mobile processor, data acquisition, instrumentation & measurementsall these applications utilize processors only for high speed data processing, dedicated fixed functions or instructions.
All general purpose Processors will have several features set which are rarely used. Only 10% instructions are used 90% of the times. Hence this leads to enormous overhead or wastage of logic/ features /architecture /area/ cost/ delay etc leading to performance degradation of the overall processor application. Hence there is a strong need to develop application specific / application driven/customized processor for specific applications or fixed task based high performance processor to provide high performance & fixed functionality. Also, present day Processors needs features of flexibility -to add or remove features or instructions or functionality and plug & play or modular architecture. My proposed "Customized and Scalable Hybrid Processor design" is the innovative 5 th approach.  [6] et al, have proposed 8 bit single cycle processor with 10 bit address bus and four stage pipelined data flow (instruction fetch, instruction decoder and operand fetch, execute and write back). They have indicated simulation results for addition, subtraction and multiplication with RTL schematics. Saraswthi P, M K Chandrasen [7] et al, supposed to have implemented 32 bit CISC CPU architecture on FPGA with architecture logic unit, accumulator, 32 bit memory unit, 32 bit MUX, instruction register, program counter and indicated simulation screenshots. WojciechWójcik, JacekDługopolski [8] et al, has attempted to implement a multi core processor on FPGA using parallel processing characteristics. They have also experimented on number of parallel processors leading to the overall speed of processor operations and also characterized problem size versus efficiency of processors. Vijay R. Wadhankar, VaishaliTehre [9] et al, have attempted to implement RISC processor on FPGA and suggested a new architecture and specific design for instruction and control unit. They have shown simulation results of control unit for memory read and write operations.

A. Key findings of survey
After going through several technical papers, my observation is that there is no clear cut approach on whether fixed architectures will lead to powerful instruction set implementations (top down approach) or set of powerful and useful instructions will lead to an open ended architecture( bottom up approach). There is a big dilemma for Processor system designers. During my exhaustive survey, about various types of Processors, their functionality, feature set, instruction set, interrupts, associated special features, Processor design approach, platforms for implementation etc., In my already published 3 survey papers [10] [11] [12]. Many of the above said processor implementation attempts, I did not come across any complete processor architecture to support contem-porary instruction set implementation on FPGA using popular HDL (Verilog/VHDL). Also, none of the other related research works have explained the processor design implementation on FPGA at micro level or at data flow level or at instruction implementation levels. Hence there is a strong and serious need to attempt design, simulation, implementation & prototype testing ofa scalable general purpose Processor architecture to support required useful instruction set implementation on FPGA meaningful and serious approach is required to realize.

Proposed Methodology
Both RISC & CISC Processor Architecture have their own merits & demerits & neither RISC nor CISC standalone Processor can produce a complete solution to the present day computational needs, hence there is a strong need of Hybrid Processor. Our proposed architecture will utilize all the best features of both RISC and CISC.  The table also indicates the operation function of each instruction, along with respective op-codes and instruction decoder output. As indicated in the figure 1, for each instruction respective code is applied to the instruction decoder. The instruction decoder, depending on the op-code will enable only 1 out of 30 outputs which in turn will enable the respective instruction dataflow logic for hardware. Instruction Decoder: The above figure indicates an instruction decoder indigenously designed to handle 2 5 instructions. Based on the 5 bit command or Opcode any one of the 32 outputs of the decoder will get enabled i.e., for each Opcode applied as command to the instruction decoder, one particular output of the decoder will go high or enabled and remaining 31 outputs will be held low or disabled. The whole scheme works as per the command and enable assignments done in the table.
These single enables of each command will in turn enable required logic to execute the corresponding instructions as per the assignment table I. This instruction decoder design is fully scalable and can support hundreds of instructions with the increase in number of command bits. The number of instructions will be equal to the number of decoder outputs.

AND ACC, RegA (accumulator accumulator and regA):
• As shown in the waveform, between 0 to 300ns, reset is applied to command, datain1

OR ACC, Reg A (Accumulator Accumulator|regA)
• As shown in the figure, between 0 to 300ns, reset is applied to command, datain1

Addcacc, regA (accumulator accumulator + regA + carry)
The below figure indicates the simulation wave form of addition with carry instruction operation. •

Conclusion
There are several popular approaches to design and develop a contemporary processor with several useful instructions. And in this paper, I have evolved a fully scalable and open-ended Processor architecture for basic popular instructions of the processor that can be improvised into a complex next generation multi core Processor. My approach to processor design is-Instruction leading to Architecture. I have taken a subset of 20 popular instructions, along with an innovatively developed command driven Instruction Decoder to fetch and execute each instruction. As a part of my experimentation of implementation of Hybrid Processor on FPGA, I have achieved further results with respect to Interfacing Hardware modules with my Processor Core with Special Interrupt driven Instructions (additional instructions and hardware interfacing) and those will be published soon in my next research paper, as my continued ongoing Research process & methodology of Implementation on the proposed Xilinx FPGA target device.