

# DESIGN OF 16 BIT RISC PROCESSOR FOR ARITHMETIC AND LOGICAL OPERATIONS IN XILINX VIVADO

## K.G.Venkata Krishna<sup>1</sup>, N.K.KamalaDevi<sup>2</sup>, K.Vishnu Varma<sup>3</sup>, A.Yedukondalu<sup>4</sup>

 <sup>1</sup> Assistant Professor, Department of Electronics and Communication Engineering, Krishna University College of Engineering and Technology Krishna University, Machilipatnam Andhra Pradesh, India.
 <sup>2</sup>UG Student, Dept. of ECE, Krishna University College of Eng. &Tech, Machilipatnam, A.P, India.
 <sup>3</sup>UG Student, Dept. of ECE, Krishna University College of Eng. &Tech, Machilipatnam, A.P, India.
 <sup>4</sup>UG Student, Dept. of ECE, Krishna University College of Eng. &Tech, Machilipatnam, A.P, India.

\*\*\*\_\_\_\_\_

Abstract - The Re prefers a more condensed, straightforward set of instructions that all execute in the same amount of time. This processor preserves functional units without compromising performance. The design makes advantage of an architecture known as Harvardd, which includes separate data and instruction memory. A word of instructions has 24 bits in total. The CPU supports three addressing modes in addition to sixteen instructions. It contains sixteen general-purpose registers. Any register has the capacity to store 16 bits of data. The procedure performs 11 arithmetic and logical operations. Every module is developed and tested separately at every stage of implementation before being properly mapped into the toplevel module. The simulation results are verified using Xilinx Vivado 2023.1 once the design input and synthesis are finished using the same tool.

#### Key Words: RISC, 16-bit, VLSI, verilog

#### **1.INTRODUCTION**

When the performance of CISC fell short of expectations and the controller design grew more challenging, people started to consider alternate approaches. It has been found that when a CPU interfaces with memory, speed is lost. Reducing the complexity of the instruction set was the only option to raise CPI. Simpler in terms of design than in terms of functioning. As a result, the CPU is not required to access memory for very many instructions in a typical RISC architecture-probably only load and store. Ultimately, pipelining increased performance. Only a few additional registers can provide a new level of performance by lowering CPU and increasing throughput. Consequently, the instruction may be successfully executed in one clock cycle. It's a common misperception that when the term "Reduced Instruction Set Computer" is used, instructions are only removed to create a smaller set of instructions. In fact, RISC instruction sets have grown in size over time, and several of them now include more instructions than many **CISC CPUs.** 

#### 2. LITERATURE REVIEW

2.1 S. Lad and V. S. Bendre, "Design and Comparison of Multiplier using Vedic Sutras," 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 2019, pp. 1-5

Fast processing units are necessary for many real-time applications in modern computerized era. The basic building elements of these units are ALU and MAC, which are necessary for quick and effective execution. Digital signal processors primarily use multipliers as their primary component. ALU and MAC performance can be improved by modifying registers, multiplier, and adder to retain correctness and speed up execution. Due to the growing delays restrictions, the design of quicker multipliers is prioritized for implementation in processors. It is crucial to create quicker multipliers in order to increase multiplication speed.

#### 2.2 Balpande Vishwas V, Abhishek B. Pande, Meeta J. Walke, Bhavna D. Choudhari and Kiran R. Bagade. "Design and Implementation of 16 Bit Processor on FPGA." (2015).

This project involves the design of a 16-bit RISC processor and the Verilog HDL modeling of its constituent parts. Harvard architecture is the basis of the processor. This instruction set's extreme simplicity provides an indication of the sort of hardware that should be able to correctly execute the set of instructions. More sophisticated blocks like an ALU and memory have been built and simulated in addition to the sequential and combinational processor building blocks like adders and registers. In this project, comprehensive structural ALU modeling, beginning with half adders, has been completed. Ultimately, the semicustom layout was created just for ALU.

2.3 Seung Pyo Jung, Jingzhe Xu, Donghoon Lee, Ju Sung Park, Kang-joo Kim and Koon- shik Cho, "Design & verification of 16 bit RISC processor,"

#### 2008 International SoC Design Conference, Busan, 2008, pp. III-13-III-14

This article presents the design and verification process for a 16-bit RISC processor. The suggested processor features a Harvard design, with internal debug logic, a 5-stage pipeline for instruction execution, and a 24-bit address. The FPGA-based processor successfully executes the SOLA algorithm and the ADPCM vocoder. Personal digital assistants (PDAs) and portable multimedia players (PMPs) are not unique human inventions. Thus, SOC level ASICs (Application Specific Integrated Circuits) are used to create compact and low power processors. The 8051 and ARM 7 processors are the most widely used SOC level ASIC processors.

#### 2.4 Chandni N. Naik, Vaishnavi M. Velvani, Pooja J. Patel, Khushbu G. Parekh, "VLSI Based 16 Bit ALU with Interfacing Circuit", International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015.

This project uses VHDL to construct a 16-bit ALU that is interfaced with ROM and RAM. One of the most crucial CPU modules, the ALU allows for modifications to be made during the majority of instruction executions. Therefore, further ALU operation is a crucial duty. After that, Xilinx is used to implement this design. After creating an ALU, interfaced RAM and ROM with it. Waveforms are displayed for each result in the Xilinx program. The CPU is sped up by this project.

#### 2.5 Pushpalata Verma, K. K. Mehta, "Implementation of an Efficient Multiplier based on Vedic Mathematics Using EDA Tool", International Journal of Engineering and Advance Technology, Vol.1, no. 5, June, 2012

An optimized area efficient multiplier is built in this project. Due to the increasing growth of integration, many significant signal processing systems are designed on VLSI platforms. Systems and applications for signal processing need a lot of processing power, which means they consume a lot of energy. Performance and area are two crucial design factors for VLSI systems. In general, the multiplier element's performance determines the system's performance.

## **3. PROPOSED SYSTEM**

The processor's job is to efficiently carry out each and every instruction provided in accordance with machine language. Arithmetic and Logical Unit, or ALU, is a combinational circuit. This unit is made to execute different integers with different sets of instructions. Operation code (opcode) plus a few operands make up the instruction (machine word) that an ALU receives from a processor.

Thus, the operands are employed in the operation after the opcode instructs the ALU on which and what operation to carry out. A little collection of data storage facilities is referred to as the Register Bank. The ALU verifies the bits and signals whether the operation was successful by storing the result of the operation in an accumulator, which is then stored in a storage register. If the operation is unsuccessful, a status message also referred to as a status register or Z-Flag will be shown. Its job is to run programs and provide effective operation for the data kept in memory. A set of instructions is all that a processor needs to carry out a task in a computer. The command to be carried out is stored in the control unit. The address register, data register, and instruction register are among the registers found in a CPU. The CPU's function is to retrieve, decode, and carry out memory operations in accordance with the registers. Decoding the op-code, identifying the instruction, figuring out which operands are in memory, getting the operands from memory, and giving a processor an order to carry out the instruction are all part of the IR task. This is accomplished with the aid of a control unit, which produces the timing signals needed to regulate the several processing components involved in carrying out the command.







Fig -2: Control unit

The portion of the central processing unit (CPU) of a computer that controls how the processor operates is called the control unit. It was a component of John von Neumann's Von Neumann Architecture. The control unit is in charge of instructing the computer's memory, input and output devices, and arithmetic/logic unit on how to react to commands given to the processor.





Fig -3: Block diagram of Register file

A decoder and a collection of registers make up a basic register file. A data input and an address are needed for the register function. But in a modern processor architecture, this straightforward register file is useless since there are times when we don't want to write a new value to a register. Additionally, in a single cycle, we usually wish to post back one value and read two values simultaneously.



Fig -4: Arithmetic and logic Signals

## **3. VERILOG HDL AND XILINX VIVADO**

## **3.1 FLOW OF VLSI DESIGN**

The formal specification of a VLSI chip is the first step in the VLSI design cycle, which proceeds through several stages to generate a packaged chip.





#### 4. RESULTS



Fig -6: Schematic View

The output of a Vedic multiplier utilizing the UrdhvaTrigbhyam sutra is seen in the following 16-Bit Multiplier simulation result. The MAC procedures are carried out using two inputs, a=252 and b=846, to produce an output value of c=213192.

Compared to other multipliers, the latency and size of the gate rise relatively slowly as the number of bits increases. As a result, the CPU's speed, power, and timing are all efficient.

| Utilization                              |                                       |                       |                             |                 |                          |               |                     |                    | 7 - 8 - 9 - 7 - 1 |
|------------------------------------------|---------------------------------------|-----------------------|-----------------------------|-----------------|--------------------------|---------------|---------------------|--------------------|-------------------|
| Q 1 0                                    | Q 2 8 % Ha                            | marchy                |                             |                 |                          |               |                     |                    | 0                 |
| Hearthy                                  | Narra                                 | Sice (UTs<br>(1)46003 | Silor Registers<br>(265200) | 58em<br>(33450) | UIT as Logic<br>(134600) | 05Ps<br>(740) | Bonded K06<br>(400) | BURGCTINE.<br>(22) |                   |
| - Ska Look                               | ~ H processor                         | 570                   | 160                         | 203             | \$70                     |               |                     | 1.1                |                   |
| - Sike tilfs (191                        | II of contrainers                     | -49                   | 1                           |                 | .49                      |               |                     |                    |                   |
| UT at Logic (19)                         | To with the manufacture age to        | 1 8                   | 10                          | 5               |                          | .0            | 0                   | 6                  |                   |
| - Size Registers (201)                   | II of payments                        |                       | 68                          | 38              |                          |               |                     |                    |                   |
| Register as File Trop :-                 | II of propertients                    | 1                     | 5                           | 2               |                          |               |                     |                    |                   |
| - Ske Lopic Detribution                  | I with the second state of the second | 1                     | 5                           |                 |                          |               |                     |                    |                   |
| ~ Shia (19)                              | Contractions)                         |                       | 5                           | 1.1             |                          | . 0           |                     |                    |                   |
| SUCIL                                    | II 14 (400                            | 304                   | 45                          | 167             | 305                      | 1             |                     |                    |                   |
| SUCIM                                    |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| <ul> <li>Sice Registers (11%)</li> </ul> |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| Register driven from w                   |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| - Register shives horn in 2              |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| sulf in front of the                     |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| UUT In front of the                      |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| ~ 107 as logic (11)                      |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| uning D6 rudput unitys                   |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| using 05 and 06 c+1 a                    |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| Memory                                   |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| ~ 299                                    |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| ~ DSPs (<10)                             |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| DSP48E1 cirily                           |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| C ID and GT Specific                     |                                       |                       |                             |                 |                          |               |                     |                    |                   |
| utilization 1                            |                                       |                       |                             |                 |                          |               |                     |                    |                   |

Fig -6: System Utilization

Transcript window results for ALU are displayed in the following figure, which includes all matched conditions. The processor module's integrated multiplier operation is executed. The data from the transcript is compared and the multiplication process is confirmed.

|                                           | : 9 -      | 10.00    | 1 0    | Uncone | strained Paths | NONE - NONE -  | Setup               |             |             |           |             |              | -043      |
|-------------------------------------------|------------|----------|--------|--------|----------------|----------------|---------------------|-------------|-------------|-----------|-------------|--------------|-----------|
| General Information                       | Name       | Slack OT | Levels | Routes | High Tancut    | From           | To                  | Total Delay | Logic Delay | Net Delay | Requirement | Source Clock | Destrutio |
| Timer Settings                            | 🖕 Path T   |          | 92     | 37     | 22             | v6/R1,reg7//C  | G/(Siger_tuoule/Av  | 38,237      | 17.044      | 20,393    |             |              |           |
| Design Timing Summary                     | S Path 2   | -        | 88     | 17     | 22             | vi/R1,reg[7]/C | vk/aluout_regitt//D | 36,797      | 16.788      | 20.009    | -           |              |           |
| Methodology Summary (161)                 | Vath 3     |          | 82     | 16     | 22             | v6/81,reg(7)/C | v6/skout_regDI/D    | 33,810      | 15.636      | 18.174    |             |              |           |
| I & Check Timing (121)                    | Te Path 4  | -        | 76     | 15     | 22             | villen_mg/VC   | v6/aluovt_regEIVD   | 32.095      | 34,478      | 12.617    |             |              |           |
| Intra-Clock Paths                         | To Path 5  | -        | 70     | -14    | -22            | v6/81_reg7//C  | v&lakout_mg(4)/D    | 29,781      | 13.445      | 16.336    |             |              |           |
| inter-Clock Paths                         | Se Path 6  | -        | 64     | 13     | 22             | v6/R1_reg[7]/C | vi/akout_reg55/D    | 27.200      | 12.287      | 14.913    |             |              |           |
| Other Path Groups                         | To Path 7  |          | 58     | .12    | 22             | v6/81,reg[7]/C | v6/ahout_reg(6)/D   | 24,566      | 11.254      | 0.102     |             |              |           |
| User Ignored Paths                        | 1. Path 8  |          | 53     | 11     | 22             | vi/R1_reg/7/C  | v6/aluout_mg07yD    | 22.765      | 10.194      | 12.571    |             |              |           |
| <ul> <li>S Unconstrained Paths</li> </ul> | S Path 3   |          | 47     | 10     | 22             | v6/R1,reg7/JC  | v6/akout_regilit/D  | 20.560      | 9.042       | 11.518    |             |              |           |
| Show to NONE                              | Se Path 10 |          | 45     | . 9    | 22             | v6/R1.reg[7]/C | Cittger, hould be   | 18.012      | 7.554       | 10,128    |             |              |           |
| Hold (11)                                 |            |          |        |        |                |                |                     |             |             |           |             |              |           |





|                                         | Summery                                                                       |                                             |  |  |  |  |
|-----------------------------------------|-------------------------------------------------------------------------------|---------------------------------------------|--|--|--|--|
| Settings                                | Power analysis from Implemented netlist. Activity                             |                                             |  |  |  |  |
| Summary (19.202 W, Margin: N/A          | derived from constraints files, simula                                        | ation files or                              |  |  |  |  |
| Power Supply                            | vectorless analysis.                                                          |                                             |  |  |  |  |
| <ul> <li>Utilization Details</li> </ul> | Total On-Chip Power:                                                          | 19.202 W<br>Not Specified                   |  |  |  |  |
| Hierarchical (18.947 W)                 | Design Power Budget:                                                          |                                             |  |  |  |  |
| Signals (4.723 W)                       | Process:                                                                      | typical<br>N/A<br>60.9°C<br>24.1°C (12.7 W) |  |  |  |  |
| Data (4.692 W)                          | Power Budget Margin:                                                          |                                             |  |  |  |  |
| Clock Enable (0.03 W)                   | Junction Temperature:                                                         |                                             |  |  |  |  |
| Set/Reset (0 W)                         | Thermal Margin:                                                               |                                             |  |  |  |  |
| Logic (4.18 W)                          | Ambient Temperature:                                                          | 25.0 °C                                     |  |  |  |  |
| DSP (0.979 W)                           | Effective ØIA:                                                                | 1.9°C/W                                     |  |  |  |  |
| 1/O (9.066 W)                           | Power supplied to off-chip devices:                                           | 0 W 0                                       |  |  |  |  |
|                                         | Confidence level:                                                             | Low                                         |  |  |  |  |
|                                         | Launch Power Constraint Advisor to find and fix<br>invalid switching activity |                                             |  |  |  |  |

Fig -8: Power Report



Fig -9: Simulation Results

The process of proving a design's functional validity is called verification. It is the process of making that the logic design complies with the specified requirements.

The functioning of the test bench determines verification. The test bench's goal is to ascertain whether the design under test (DUT) is accurate. The Vedic multiplier design is merged with the 16-bit processing module, and the output is confirmed for the specified instruction set. The verification outcomes for the whole processing unit with ALU integration are shown.

# **5. CONCLUSIONS**

To build a 16-bit RISC processor, minimum functional units are used. The architecture of Harvard University served as the model for the plan. The Xilinx Vivado 2023.1 Design suite tool is used for the design entry and synthesis. According to the synthesis report, the design can achieve a minimum clock period of 14.95 nanoseconds. A simulation tool called Xilinx Vivado 2023.1. Design suite tool is employed. The accuracy of the functionality is assessed by comparing the output of the simulation with the expected results. There are several ways to improve the design. To build a more advanced design, other elements might be added to the current one. It is possible to increase the amount of instructions the CPU can process.

# REFERENCES

- [1] S. Lad and V. S. Bendre, "Design and Comparison of Multiplier using Vedic Sutras," 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 2019, pp. 1-5
- Balpande Vishwas V, Abhishek B. Pande, Meeta J.
   Walke, Bhavna D. Choudhari and Kiran R. Bagade.
   "Design and Implementation of 16 Bit Processor on FPGA." (2015).
- [3] F. Adamec and T. Fryza, "Design Time configurable processor basic structure," 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, Vienna, 2010, pp. 119-120, doi: 10.1109/DDECS.2010.5491804.
- [4] Mr. Nishant G. Deshpande, Prof. Rashmi Mahajan, "Ancient Indian Vedic Mathematics based Multiplier Design for High Speed and Low Power Processor", IJAREEIE, Pune, 2014

# BIOGRAPHIES



K.G.VENKATA KRISHNA,

Assistant Professor, Krishna University College of Engineering and Technology, Krishna University, Machilipatnam, A.P, India.



N.K.KAMALA DEVI, Student of Department of Electronics and Communication Engineering, Krishna University, Machilipatnam, A.P. India.



K.VISHNU VARMA, Student of Department of Electronics and Communication Engineering, Krishna University, Machilipatnam, A.P, India.





A.YEDUKONDALU Student of Department of **Electronics and Communication** Engineering, Krishna University, Machilipatnam, A.P, India.