# Design of 4x4 NoC Architecture using Random Arbiter with Performance Improvement

Dr. N M Mahesh Gowda<sup>1</sup>, Dr. R Manjunatha<sup>2</sup>, Kumar N Krishnamurthy<sup>3</sup>

<sup>1,2</sup> Associate Professor, Dept. of ECE, P.E.S College of Engineering, Mandya-570401, India

<sup>3</sup>Assistant Professor, Dept. of ECE, P.E.S College of Engineering, Mandya-571401, India

## ABSTRACT

Network-on-Chip (NoC) is a new approach for designing the communication subsystem among IP cores in a System-on-Chip (SoC). NoC applies networking theory and related methods to on-chip communication and brings out notable improvements over conventional bus and crossbar interconnections. NoC offers a great improvement over the issues like scalability, productivity, power efficiency and signal integrity challenges of complex SoC design. In a NoC, the communication among different nodes is achieved by routing packets through a pre-designed network according to different routing algorithms. Therefore, architecture and related routing algorithm plays an important role to the improvement of overall performance of a NoC. The technique one which is used presently in node is priority based technique packet routing which leads the packet stacking which intern leads to performance degradation. In this paper, proposes a modified random Arbiter combined with deterministic XY routing algorithm to be used on router of NoC. In this method router contains random arbiter along with priority encoder which results into fast way to transfer packet via a specific path between the nodes of the network without stacking. This in turn optimizes the packet storage area and avoids collision because node arbiter will service the packets randomly without any priority. In addition to that this method will ensure a packet always reaches the destination through the possible shortest path without deadlock and livelock. Xilinx 14.7 is used to design the 4x4 NoC router and synthesized using Modelsim 6.3. This architecture is implemented on FPGA kit to study the performance evaluation and area optimization.

Keywords: Arbiter, NoC, SoC, Deterministic Routing, round robin Arbiter

## I. INTRODUCTION

As the technology is improving the number of Intellectual Property (IP) modules in Systems-on-Chip (SoCs) increases, bus-based interconnection architectures may prevent these systems to meet the performance required by many applications. For systems with intensive parallel communication requirements buses may not provide the required bandwidth, latency, and power consumption. A solution for such a communication bottleneck is the use of an embedded switching network, called Network-on-Chip (NoC), to interconnect the IP modules in SoCs.

A network-on-chip is composed of three main building blocks. The first and most important one is the links that physically connect the nodes and actually implement the communication. The second block is the router, which implements the communication protocol (the decentralized logic behind the communication protocol). The router basically receives packets from the shared links and, according to the address informed in each packet, it forwards the packet to the core attached to it or to another shared link. The last building block is the network adapter (NA) or network interface (NI). This block makes the logic connection between the IP cores and the network, since each IP may have a distinct interface protocol with respect to the network.

The NoC uses the different routing algorithms and flow control technique to overcome the networking problem faced in previous on chip network architecture. Such problems render the chance for doing on chip network performance optimization due to the more packets on the network traffic merge is also one of the challenges.

The NoC router provides high speed and cost effective network. If more number of packets comes from different inputs and they are competing for same output at that instant dead lock and live lock situations arises. Such situations can be overcome with the help of intelligent arbiter of the router. This plays an important role to improve the performance of NoC.

NoC architecture is more preferable because of its performance, reusability and scalability than traditional bus based SoC. Packet congestion [3] on the network can be solved by proper design of arbiter on the NoC. Arbiter will generates the grants without any deadlock and live lock with a high priority. For the same reason analysis of arbiter performance is meaningful in the network on chip design.

In NoC router many arbitration techniques are presently used namely, round robin and packet priority assignment. According to the scheduling nature of round robin arbiter, if the packet is not executed within that scheduled time, then it will stop processing that packet in between and it will process the next incoming packet. At this point of time previous packet execution will be incomplete, this may lead to live lock problem on the network and the packet priority technique faces the deadlock problem. This problem is clearly addressed in this paper, the proposed method is a random arbitration technique combined with deterministic x-y routing algorithm to be used on router of NoC [4].

In this paper, area optimized 4x4 NoC architecture using Random Arbiter which is free from deadlock and livelock is designed and implemented on FPGA. Also investigation on shortfalls of the different methods and the present technology is made.

#### **II. RELATED WORK**

NoC uses the different routing algorithms and flow control technique to overcome the networking problem faced in previous on chip network architecture. Such problems render the chance for doing on chip network performance optimization [1], due to the more packets on the network traffic merge. Packet congestion [2] on the network can be solved by proper design of arbiter on the NoC. Arbiter will generates the grants without any deadlock and live lock with a high priority. For the same reason analysis of arbiter performance is meaningful in the network on chip design.

Router plays an important role in NoC. Router decides the packet travelling path from source to destination and service decision on the network. The control logic in the router is responsible for channel arbitration and to make routing decision [3]. According to the control logic grants of the router, input packets are moving towards the respective consequence (next) router through a crossbar switch, this process will continues until the packet reaches the destination. The conceptual diagram is shown in Figure-1.



Figure-1. Block Diagram of existing NOC Router

Arbiter uses round robin architecture: The arbiter extracts the source and destination address from the received packets and generates the grant signal for sending the input data from source side to the output port. The arbitration of the ports is controlled by Arbiter and this resolves the contention problem [4-5]. It holds the updated status of all the ports and hence has the knowledge of which ports are communicating with each other and which ports are free. Packets with the same destination and with a same priority for the same output port are scheduled by a round robin arbiter. In a given period of time if there are more than one input ports which requests the same output port then the arbiter will process according to the input priority request. Once the last packet finishes the transmission from the router, the arbiter will release the next input packet which is connected to the crossbar switch. Other waiting packets also get the service by the arbitration of arbiter.

In round robin arbiter operation, the request which is already serviced will have the lowest priority in the next round of arbitration [6-10]. Arbiter grant signal is helpful to decide the select line for multiplexer based crossbar and write or read signal from FIFO buffers. According to the scheduling nature of round robin arbiter, if the packet is not executed within that scheduled time, then it will stop processing that packet in between and it will process the next incoming packet. At this point of time previous packet execution will be incomplete, this may lead to live lock problem on the network. This problem is clearly addressed in the proposed system.

#### III. PROPOSED DESIGN

## A. Propsed NOC router design with Random Arbiter



Figure-2. Propsed NOC router design with Random Arbiter

The proposed NoC router contains three stages they are priority encoder, random arbiter and router.

## I. Priority Encoder

The priority encoder has five inputs as shown in Figure-2. These inputs are from the same node processing element or from east, south, north or west side of the node. Priority encoder will select any one of these inputs according to the select line information; this select line information will be generated by the random arbiter. The priority encoder output is connected to the router; which routes the packet into the respective node. The packet may be of the same node or the west, south, north or east side node. If the packet belongs to the same node then the packet size will be 32 bits, this is because further x-y direction information is not required. Else if, the packet size will be 48 bits because this packet contains further x-y direction routing information on that feasible network.

#### II. Random Arbiter

The Random Arbiter plays a vital role on the router in order to take decision for servicing a packet. The packets are serviced randomly from different directions of the network without any packet stacking. In the proposed arbiter there are five independent requests req\_0 to req\_4 and these requests are input to the arbiter. The arbiter will process and generate the grants GNT\_0 to GNT\_4 (service) for incoming packet's requests in a random order on the router. Random arbiter generates the grants randomly without any priority and each time only one grant is high because at a time only one request can be served. The grant information is helpful to the priority encoder to select any one of the router input packets which arrives from different nodes on that network.

Random arbiter will select the packets randomly according to the packet density on the network, in order to balance the packet traffic on the network without any priority. This will directly avoid the stacking of the packet on one side, the packet storage area and packet loss. If the packet request arrives from all the direction then the arbiter generates the grant for only one request out of five requests, which will overcome the confusion of packet selection and it will directly reduce the deadlock and live lock situations. In the previous work, authors have addressed this problem (i.e. deadlock and live lock) and solved with the help of round robin and packet priority assignment technique with a cost of delay and packet stacking. In the proposed random arbiter technique the above problem can be resolved without any delay and stacking.

In the proposed method there is no packet stacking and loss because packets are being serviced alternatively. For example if the arbiter has already given five times grants to GNT\_0 and two times to GNT\_2. The number of times grant given information is available in GNT\_0\_i=+1 and GNT\_2\_i=+1 respectively. Suppose if GNT\_0 and GNT\_2 requests the service at the same time, then random arbiter generates the grant signal for GNT\_2 since it is serviced only for two times when compared to GNT\_0 which is being already serviced for five times. This type of technique is used in order to share the grants to each and every network directions equally. In the proposed system, the servicing sequence keeps on changing according to the packet density on the network in every iteration. Figure-3 shows the random arbiter flow chart and the waveform of Random Arbiter, which is implemented on Xilinx as shown in Figure-4.



Figure-3. Random Arbiter Flow Chart



Figure-4. Waveform of Random Arbiter

## III. Router

The router basically receives packets from the shared links and, according to the address informed in each packet, it forwards the packet to the core attached to it or to another shared link. The protocol itself consists of a set of policies defined during the design and implemented within the router to handle common situations during the transmission of a packet, such as, having two or more packets arriving at the same time or disputing the same channel, avoiding deadlock and livelock situations, reducing the communication latency, increasing the throughput, etc. The design and implementation of a router requires the definition of a set of policies to deal with packet collision, the routing itself, and so on. A NoC router is composed of a number of input ports (connected to shared NoC channels), a number of output ports (connected to possibly other shared channels), a switching matrix connecting the input ports to the output ports, and a local port to access the IP core connected to this router.

The Figure-5 shows an example of Router. Herein, we use the terms router and switch as synonymous, but the term switch can also mean the internal switch matrix that actually connects the router inputs to its outputs. In addition to this physical connection infrastructure, the router also contains a logic block that implements the flow control policies (routing, arbiter, etc.) and defines the overall strategy for moving data though the NoC.



Figure-5. An example of router

Packet formation is done, which has 48 bits. First 32 bits of the packet is assigned to the data\_in, next ten bits i.e., from 32 to 42 are unsed, next two bits contains the address of destination Y, next two bits contains destination address X and the last  $48^{\text{th}}$  bit is assigned to the select line input.

## B. X-Y Algorithm

The x-y routing is a distributive deterministic routing algorithm, which uses the coordinates to determine the destination and deliver the packet through a network. In x-y routing algorithm packet routes first horizontally along with the x-coordinate to reach the column and later vertically along the y-coordinate to reach the destination [11, 12, 13]. This routing is highly preferred for mesh and dead lock free network. This technique often creates load in the middle of the network making the traffic very irregular. Figure-6, describes the basic XY Routing algorithm flow chart.



Figure-6. X-Y Routing Flow Chart

### C. Proposed system used in 4x4 Network

The generalized 4x4 NoC diagram is shown in Figure-7, which consists of processing element, network interface and router [14]. In this network, router plays an important role. Router decides the path for the packets to travel from source to destination.



Figure-7. General diagram of 4x4 Network on Chip

## IV. RESULT

The 2x2 and 4x4 NoC (Network on Chip) is designed using Xilinx ISE 14.7 tool with ModelSim 6.3f respectively and implemented on Spartan 3 FPGA kit, Table-1 compares the utilization percentages of the required FPGA resources to design  $2\times2$  NoC with the design given in Ref.[15]. Figure-8, describes the 2x2 simulated results.



Figure-8. Waveform of the simulated 2x2 NoC

| Table -1. The resources utilization percentages of the designed |
|-----------------------------------------------------------------|
| 2×2 network                                                     |

|                           | PROPOSED DESIGN |           |             | EXISTING DESIGN |           |             |
|---------------------------|-----------------|-----------|-------------|-----------------|-----------|-------------|
| Logic<br>Utilization      | Used            | Available | Utilization | Used            | Available | Utilization |
| No of Slices              | 963             | 4656      | 20%         | 1005            | 4656      | 21%         |
| No of Slice<br>Flip Flops | 964             | 9312      | 10%         | 969             | 9312      | 10%         |
| No of input<br>LUTs       | 1708            | 9312      | 18%         | 1488            | 9312      | 15%         |
| No of bonded<br>IOBs      | 322             | 232       | 138%        | 322             | 232       | 138%        |
| No of GCLKs               | 1               | 24        | 4%          | 1               | 24        | 4%          |

Simulation result of 4x4 Noc designed using Xilinx ISE 14.7 tool with ModelSim 6.3f respectively is shown in Figure-9.



Figure-9. Waveform of proposed 4x4 NoC

As shown in Table-2 the proposed router provides a good improvement of the resources utilization on the proposed 4x4 NoC as compared with the design given in Ref [8].

Table-2. The resources utilization percentages of the designed4x4 Noc and existing design [8]

|                           | PROPOSED DESIGN |           |             | EXISTING DESIGN |           |             |
|---------------------------|-----------------|-----------|-------------|-----------------|-----------|-------------|
| Logic<br>Utilization      | Used            | Available | Utilization | Used            | Available | Utilization |
| No of Slices              | 4764            | 4656      | 102%        | 4733            | 4656      | 101%        |
| No of Slice<br>Flip Flops | 4729            | 9312      | 50%         | 4857            | 9312      | 52%         |
| No of input<br>LUTs       | 8788            | 9312      | 94%         | 8401            | 9312      | 90%         |
| No of bonded<br>IOBs      | 1282            | 232       | 552%        | 898             | 232       | 387%        |
| No of GCLKs               | 1               | 24        | 4%          | 1               | 24        | 4%          |

## V. CONCLUSIONS AND FUTURE WORK

In this paper random arbiter with XY routing algorithm is implemented. The Router in the proposed design services all the incoming packets randomly without any deadlock and live lock. Hence the proposed random arbiter enhances the performance and reduces the packet storage area by reducing the packet staking which is suitable for NoC design. This types of routers reduces the number of slices required to design a  $2\times 2$  NoC, 4x4 NoC, etc. It is found that the number of slices required to design a  $2\times 2$  NoC using the conventional router (see Ref.[15]) is almost four times the number required using the proposed router. In the future work, proposed system will be implemented on 4x4 Agent based Network on Chip.

### VI. REFERENCES

[1] Kalwad, Havisha, Neeharika, Sompally, Divya, Vinodhini, M. Murty, "Merged switch allocation and transversal with dual layer adaptive error control for Network-on-Chip switches", IEEE International Conference on VLSI-SATA,2015, page 1-5.

[2] Se-Joong Lee, Kangmin Lee, Seong-Jun Song, Hoi-Jun Yoo, "Packet switched on-chip interconnection network for system-on-chip applications", IEEE Transactions, 2014, DOI: 10.1109/TCSII.2005.848972

[3] Bo Zhao, Youtao Zhang, Jun Yang, "A speculative arbiter design to enable high-frequency many-VC router in NoCs", Seventh IEEE/ACM International Symposium on Networks on Chip (NoCS), 2013, pp 1-8.

[4] Abdelrasoul, M. Ragab, M. Goulart, "Evaluation of the Scalability of Round Robin Arbiters for NoC Routers on FPGA", IEEE 7<sup>th</sup> International Symposium on Embedded Multicore Socs (MCSoC), 2013, pp 61-66.

[5] Abdelrasoul, M. Ragab, M. Goulart, "Impact of Round Robin Arbiters on router's performance for NoCs on FPGAs",

IEEE International Conference on Circuits and Systems (ICCAS), 2013, pp 59-64.

[6] Fischer E, Fettweis G.P, "An accurate and scalable analytic model for round-robin arbitration in network-onchip", Seventh IEEE/ACM International Symposium on Networks on Chip (NoC), 2013, pp 1-8.

[7] Abdelrasoul. M, Ragab. M, Goulart. V, "Impact of Round Robin Arbiters on router's performance for NoCs on FPGAs" IEEE International Conference on Circuits and Systems (ICCAS), 2013, pp 59-64.

[8] Kendaganna Swamy S, Anil N, Anand Jatti, Uma B V, "Random Arbiter and Platform level Design for Improving the Performance on 4x4 NoC", International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016, pp16 – 19.

[9] Fischer.E, Fettweis.G.P, "An accurate and scalable analytic model for round-robin arbitration in network-onchip", Seventh IEEE/ACM International Symposium on Networks on Chip (NoCS), 2013, pp 1-8.

[10] Stojanovic I.Z, Jovanovic M.D, Djordjevic G L, "Lowcost port allocation scheme for minimizing deflections in bufferless on-chip networks", 21st IEEE international conference on Telecommunications Forum (TELFOR), 2013, pp 357-360.

[11] Gwangsun Kim, Lee M.M.-J, Kim J, Lee J.W, Abts D, Marty M, "Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement", IEEE Transactions, DOI: 10.1109/TC.2012.241, 2014

[12] Singh J.K, Swain A.K, Reddy T.N.K, Mahapatra K.K, "Performance evalulation of different routing algorithms in Network on Chip", IEEE Asia Pacific Conference on Postgraduate Research on Microelectronics and Electronics, DOI: 10.1109/PrimeAsia.2013.6731201, 2013

[13] Ye Lu, Changlin Chen, McCanny J, Sezer S" Design of interlock-free combined allocators for Networks-on-Chip", IEEE International conference on SOC Conference (SOCC), pp 358-363, 2012.

[14] Wang Zhang , Ligang Hou , Jinhui Wang , Shuqin Geng, Wuchen Wu, "Comparison Research between XY and Odd-Even Routing Algorithm of a 2-Dimension 3X3 Mesh Topology Network-on-Chip", WRI Global Congress on Intelligent Systems, DOI: 10.1109/GCIS.2009.110, 2009

[15] Kangmin Lee, Se-Joong Lee, Hoi-Jun Yoo, "Low-power network-onchip for high-performance SoC design", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, DOI: 10.1109/TVLSI.2005.863753, 2006