Terminal and broadcast reliability analysis of direct 2-D symmetric torus network

Sharma, Abhilasha; Sangeetha, R. G.

doi:10.1007/s11227-020-03311-0

Terminal and broadcast reliability analysis of direct 2-D symmetric torus network

Published: 20 May 2020

Volume 77, pages 1517–1536, (2021)
Cite this article

Download PDF

The Journal of Supercomputing Aims and scope Submit manuscript

Terminal and broadcast reliability analysis of direct 2-D symmetric torus network

Download PDF

1148 Accesses
3 Citations
Explore all metrics

Abstract

Reliability analysis is one of the crucial issues for any scalable optical interconnection network. Torus is a highly scalable optical interconnect for data centre networks. The traditional torus network has XY routing algorithm. We have proposed a novel optimised routing algorithm. This paper focuses on the time-dependent and time-independent analysis for both terminal and broadcast reliabilities of the torus network using XY and optimised routing algorithm under various network sizes ($N\times N$ where $N=8, 16, 32, 64$). The results are evaluated and compared considering nodes failures in MATLAB.

Component Reliability Analysis of a 4 × 4 Symmetric 2D Torus Optical Interconnection Network Node Architecture

Reliability Analysis of Data Center Network

Multi-source multi-terminal reliability evaluation of interconnection networks

Article 16 December 2015

Neeraj Kumar Goyal & S. Rajkumar

1 Introduction

The reliability of the data centre (DC) primarily depends on the interconnection networks present within data centre networks (DCNs). The DCNs need to be agile, dynamic and scalable to handle increased traffic and application demands, to ensure high availability while providing various internet and cloud-based services across different domains. Recently, rapid growth of IoT industry, technical stack migrations to cloud can be considered as few of the major contributors to this increased demand. It is also very critical for industries to ensure a reliable DCN to handle unexpected traffic surges which could be due to various reasons such as natural disasters, pandemics (COVID-19), etc. Therefore, failures in the DCN components (for example, links, switches, and servers) can affect services provided by data centres and thus can threaten business continuity. Due to aforementioned factors, there is a need to analyse and address probability of server/node failures in DCN to ensure reliable communication and high availability. This study helps to analyse fault tolerance behaviour of DCN to avoid failures in DCN.

Few examples of different types of DCN topologies are: direct network [19], server centric [11, 13, 16, 17, 31], tree-based topology [2] and hybrid network [15, 30]. Torus is a direct network topology in which every node serves as an input link, output link, and a switching node of the network [14]. The DCN architectures based on torus topology are highly scalable. The torus topology (as shown in Fig. 1) is a suitable candidate to avoid network congestion because of more number of equidistant links which reduces the delay throughout the network. The torus topology incorporates the property of high path diversity which provides high throughput under full load traffic. The DCN based on torus topology has already been implemented as high performance computing networks in [1, 3, 10, 12]. This topology optimises granularity of clusters and scalability of the topology because of its symmetric architecture [14].

1.1 XY routing algorithm (XYRA)

In [21] authors implemented XY routing algorithm (XYRA) in bi-directional torus topology, where each node consists of two output ports ($X+$ and $Y+$) and two input ports ($X-$ and $Y-$) as shown in Fig. 2. $Y-$ port is used to provide buffering and it is idle when both the output ports are free. In this algorithm, only one packet is processed in single time slot in a particular SE (switching element). According to the XYRA, high priority is set to $X+$ port to route the packet. If $X+$ port is busy, then $Y+$ port is used to route the packet. In this scenario, $Y-$ port is also free, but it is idle and is used to provide inherent buffering [21] with the help of deflection routing scheme when both $X+$ and $Y+$ ports are busy. In this paper, for the sake of explaining, we have taken $4\times 4$ network. For example, (0,1) is source node and (3,2) is destination node as shown in Fig. 2. From the source node, by default the packet is routed through $X+$ port (node (0,2)), if the $X+$ port is not free then, the packet is routed through $Y+$ port (node(1,1)). In both the scenarios, the packets reaches to the destination node in four hops. Whereas, if $Y-$ is used to route the packet then the packet reaches to the destination in two hops using the wraparound path. This is achieved with optimised routing algorithm (ORA) using binary signalling as explained in the next section.

1.2 Optimised routing algorithm (ORA)

In [25], we proposed an optimised routing algorithm (ORA) which provides low-latency and high throughput as compared to traditional XYRA. In this algorithm, in a particular node, one packet is processed in a single time slot. In ORA, each node address represents n-digit radix-D (where ‘D’ represents nodes in one dimension) gray code address due to cyclic and unit distance properties of gray codes. Using these properties ORA is developed in which wraparound paths are utilised for routing when they are idle. In ORA, each node is connected to the neighbouring nodes with one bit difference in their node address. The first set of neighbouring nodes is denoted by $H_{S}(x, r)$ which is the hamming shell of radius r centred at node x, where $r=1$. The secondary and other neighbouring nodes are contributing to the formation of a Hamming ball of radius r is denoted by $H_{B}(x;\, r)$, where $r\ge 2$.

To route the packet from source node (x) to destination node (y), hamming distance between the node addresses is calculated. The bit positions at which the node addresses differed gives the priority ports to route the packets. Out of the two selected nodes connected to the port with high priority, node with minimum hamming weight has given the first priority to route the packet. Therefore, in ORA three neighbouring nodes are available to route the packet without compromising the buffering capacity [21], whereas in XYRA only two nodes are available to route the packet. In this paper, for the sake of analysis, we have taken $4\times 4$ network. The same set of source–destination pair, (0001) as source and (1011) as destination node, is assumed with respect to XYRA, as shown in Fig. 3. The packet is routed to the destination in minimum two hops (as shown in Fig. 3). So, torus topology can be implemented as a data centre network using the ORA and yet analysis of reliability becomes a major concern for the evaluation of the entire network performance.

The reliability of the data centre primarily depends on interconnection networks. In [28], authors proposed augmented data vortex (ADV) based on all optical packet switching which has low latency, high capacity and high throughput. They have calculated terminal reliability for a single node pair as 0.986 assuming the ADV is symmetric. Further they have extended the node pair reliability for the entire network [29]. A similar study was performed in [23] where the authors proposed a novel high performance optical multistage interconnection network (OMINs) based on all optical packet switching. In this work, terminal reliability of the proposed bi-directional data vortex (BDV) network using fault tree analysis method based on component reliability was calculated. Here, the terminal reliability between a particular source–destination pair was calculated and extended to all the source–destination pairs because of the symmetric property of multistage interconnection network.

A new topology named scalable crossbar network (SCN) was proposed by [8] to solve the issues of scalability and blocking in typical crossbar networks. The proposed network outperformed multistage crossbar networks and multistage interconnection networks in terms of terminal reliability, mean time to failure, and system failure rate. Two novel network designs of chained K-ary architecture for data centre networks were proposed in [24] considering the symmetric property to improve fault tolerance and blocking probability. The mentioned literature shows improvement in terminal reliability for different network architectures. Yet, these are various aspects towards reliability are need to be analysed:

time-dependent terminal reliability,
time-independent terminal reliability,
time-dependent broadcast reliability and
time-independent broadcast reliability.

Reliability analysis gives the fault tolerance behaviour of the system in-spite of node failures [22]. Terminal reliability analysis is the probability of finding at least a single non-faulty path between a source and a destination node [6]. Broadcast reliability analysis is the probability of finding at least a single non-faulty path from one source node to all the destinations [18]. The time-independent analysis gives an overview how the network reacts in the case of faulty nodes with various confidence intervals of node reliabilities, whereas the time-dependent analysis describes how the network responds when the nodes are failing with a constant rate without recovery. Reliability block diagram method (RBD) method [26] is used to evaluate the aforementioned aspects of reliability.

In [26], we have analysed and compared the time-independent reliability for direct torus topology (using XYRA) and Benes network of size $8\times 8$ and $16\times 16$. The comparison and analysis of time-dependent reliability for Benes and torus network (using XYRA) is presented in [27]. So, we observed that torus network has high reliability than Benes network. Now, in this paper time-dependent and time-independent aspects of terminal and broadcast reliabilities of the torus network are analysed. The analysis is performed considering single source–destination pair (terminal reliability analysis)/pairs (broadcast reliability analysis) for the two-dimensional torus network of size $N \times N$ (where N = 4, 8, 16, 32, 64) using XY routing algorithm (XYRA) and optimised routing algorithm (ORA). The entire analysis is carried out considering symmetric torus topology as a direct network in which each node serves simultaneously as an input terminal, output terminal, and as a switching node of the network [14]. The terminal and broadcast reliabilities are evaluated considering node failures. The results are analysed, evaluated and compared in MATLAB. Since ORA has redundant paths, the reliability of ORA has been estimated in this paper. The rest of the paper is organised as follows: Sect. 2 describes reliability block diagram method, Sects. 3 and 4 includes the terminal and broadcast reliability analysis using XYRA and ORA, respectively, and Sect. 5 presents the concluding remarks.

2 Reliability block diagram (RBD) method

In any interconnection networks, the reliability is calculated using the reliability block diagram method [20]. RBD provides a graphical illustration of how the nodes of a network are connected from the reliability point of view [18]. The network comprises of nodes and links. They are connected in series, parallel or combination of series–parallel. It is assumed that the node is either faulty or in working state [6].

2.1 Series RBD

In series RBD, the nodes are connected in series. Hence, failure of any single node causes the network failure [6]. In Fig. 4, the series RBD between a given source–destination is depicted with a total of n nodes connected in series. Considering $P_{i}$ as the reliability of ith component, the series system reliability ($P_{S}$) is determined by Eq. (1).

$$\begin{aligned} {P_{S} =\prod _{i=1}^{n} P_{i}} \end{aligned}$$

(1)

2.2 Parallel RBD

The arrangement of n nodes in a parallel network is presented in Fig. 5. In this, even if one node is working then also the packet can be transmitted from a source to destination through that node [6]. Therefore, this arrangement can tolerate $n-1$ node failures.

The system unreliability $Q_{S}$ is calculated by Eq. (2), where $Q_{i}$ represents the unreliability of ith component.

$$\begin{aligned} Q_{S}=\prod _{i=1}^{n} Q_{i} \end{aligned}$$

(2)

So, the reliability for parallel system ($P_{P}$) is calculated by Eq. (3).

$$\begin{aligned} {P_{P}}& = 1-Q_{S} \nonumber \\& = 1-\prod _{i=1}^{n} Q_{i} \nonumber \\& = 1-\prod _{i=1}^{n} (1-P_{i}) \end{aligned}$$

(3)

2.3 Series–Parallel RBD

The combination of nodes connected in series as well as parallel forms a complex network. The reliability of these networks is analysed by generating even individual units of them. The reliability is computed by substituting the homogeneous units back in the order of series–parallel (after obtaining reliability for them) as a single network [26]. An example of the RBD structure between a given source–destination of such a network is represented in Fig. 6.

The network reliability ($P_{SP}$) for series–parallel combination is calculated by Eq. (4)

$$\begin{aligned} {P_{SP}} &= P_{1}[1-(1-P_{2})(1-P_{3})][1-((1-(1-(1-P_{4})(1-P_{5})))\nonumber \\&\quad(1-(1-(1-P_{6})(1-P_{7}))))] \end{aligned}$$

(4)

3 Terminal reliability analysis

In torus topology, nodes are connected in series along the path and all the paths are connected in parallel. If any node fails in series then all the nodes connected to it cannot transmit. While when nodes are connected in parallel, the packet can be transmitted using the alternate path. Depending on routing (XYRA and ORA), the series and parallel combination may change. For the terminal reliability analysis, a specific source and destination pair is considered. The identical conditions are applied to any source–destination pair. This analysis is done in terms of time-dependent and time-independent terminal reliability. The time-dependent terminal reliability is defined as the probability of finding at least single non-faulty path between a specific source–destination pair considering a constant node failure rate [9]. This analysis is carried out with the node failure rate of $10^{-6}$ per hour [7, 9, 22, 26]. When in a network all the nodes have a constant failure rate, then the reliability of single node is assumed to follow an exponential distribution $e^{-\lambda t}$ [5]. The time-independent terminal reliability is defined as the probability of finding at least single non-faulty path between a specific source–destination pair considering various values of node/switch reliability r (high = 0.99, medium = 0.95 and low = 0.9) [4].

3.1 Terminal reliability analysis using XYRA

As explained in Sect. 1.1, RBD between a given source–destination of torus topology using XYRA, is shown in Fig. 7. Since torus is symmetrical, the same RBD for any source and destination pair can be used to derive Eqs. (5)–(8). This analysis can be extended to $N \times N$ network (where $N=8, 16, 32, 64$). In an edge-symmetric 2D torus topology, the number of nodes in each dimension are equal [14]. Because of this property there will be the same number of multiple non-overlapping paths available between any source–destination pair. The path follows a similar pattern between any source to destination nodes which will lead to identical RBDs. Similarly, an identical pattern is obtained in the RBD which is shown by the dotted line in Fig. 7 (from SE2), where SE (switching element), SE1 and SE2 designate a node with four ports.

3.1.1 Time-independent terminal reliability analysis using XYRA

According to the XYRA (as explained in Sect. 1.1), a complex series–parallel terminal reliability RBD between a given source and destination node for $4\times 4$ torus network is presented in Fig. 7. Consequently, regarding Fig. 7, the time-independent terminal reliability for $4\times 4$ torus network, denoted by $TXY_{4\times 4}$, is calculated by Eq. (5). The time-independent terminal reliability for $N \times N$ network (where $N=8, 16, 32, 64$) is calculated by Eq. (6). Each node is considered as a $2\times 2$ switch and assumed to have the switch reliability ‘r’.

$$\begin{aligned} TXY_{4\times 4}= r(1-(1-(r(1-(1-(r(1-(1-({r}(1-(1-r)^{2})))^{2})))^{2})))^{2}) \end{aligned}$$

(5)

where r is switch reliability which varies from 0.99 to 0.9.

$$\begin{aligned} TXY_{N\times N}=r(1-(1-(r(1-(1-(r(1-(1-(r\nonumber \\ \left(1-\left(1-\left({TXY_{\frac{N}{2}\times \frac{N}{2}}}\right)\right)^{2}\right)))^{2})))^{2})))^{2}) \end{aligned}$$

(6)

3.1.2 Time-dependent terminal reliability analysis using XYRA

According to Fig. 7, the time-dependent terminal reliability considering nodes failures, denoted by $TXY_{4\times 4}(t)$, is calculated by Eq. (7). The terminal reliability for $N \times N$ network (where $N=8,16,32,64$) is calculated by Eq. (8).

$$\begin{aligned}TXY_{4\times 4}(t)&= e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-({e^{-\lambda t}}\nonumber \\&\quad (1-(1-e^{-\lambda t})^{2})))^{2})))^{2})))^{2}) \end{aligned}$$

(7)

$$\begin{aligned}TXY_{N\times N}(t)&=e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}\nonumber \\&\quad \left(1-\left(1-\left({TXY_{\frac{N}{2} \times \frac{N}{2}}(t)}\right)\right)^{2}\right)))^{2})))^{2})))^{2}) \end{aligned}$$

(8)

3.2 Terminal reliability analysis using ORA

RBD is developed between a given source and destination node for $4\times 4$ torus network using ORA (as described in Sect. 1.2) is shown in Fig. 8. For the terminal reliability analysis, a distinct source and destination pair is considered (as shown in Fig. 3). The analysis can be extended to $N \times N$ network (where $N=8, 16, 32, 64$). The routing scheme of ORA is already explained in [25]. The same pattern is obtained in the RBD which is shown by the dotted line in Fig. 8 (from SE2, SE3, SE1.2, SE1.3, SE1.1.2, SE1.1.3, SE1.1.1b, SE1.1.1c) where each block (e.g. SE, SE1, SE2, SE3, etc.) represents a node with four ports.

3.2.1 Time-independent terminal reliability analysis using ORA

Considering Fig. 8, all interconnected nodes between source and destination can be divided in various groups. In this scenario, the reliability of sub-group (shown in red block) is calculated considering $k-out-of-n$ redundancy condition (where $k=2$ and $n=3$). For this particular sub-group, the reliability is calculated as: $\sum _{k}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) r^{k}(1-r)^{n-k}$. Further simplifying this by considering $k=2$ and $n=3$, reliability for specific sub-group is given by: $3r^2-2r^3$. Remaining groups are divided into individual sub-group, and their reliability is calculated by above-mentioned approach. Overall reliability is computed by substituting these individual groups back in series–parallel combination. The time-independent terminal reliability for $4\times 4$ torus network, denoted by $TORA_{4\times 4}$ is derived using this approach. Consequently, the time-independent terminal reliability for $N \times N$ network (where $N=8,16,32,64$) is calculated by Eq. (10). Each node is viewed as a $2\times 2$ switch and assumed to have the switch reliability ‘r’.

$$\begin{aligned} TORA_{4\times 4} = 3B^{2}-2B^{3} \end{aligned}$$

(9)

where $ B= r(3(r(3(r(3r^{2}-2r^{3}))^{2}-2(r(3r^{2}-2r^{3}))^{3}))^{2}-2(r(3(r(3r^{2}-2r^{3}))^{2}-2(r(3r^{2}-2r^{3}))^{3}))^{3})$

$$\begin{aligned} TORA_{N \times N} = 3B^{2}-2B^{3}. \end{aligned}$$

(10)

where $B=r(3(r(3(r(3TORA_{\frac{N}{2}\times \frac{N}{2}}^{2}-2TORA_{\frac{N}{2}\times \frac{N}{2}}^{3}))^{2}-2(r(3TORA_{\frac{N}{2}\times \frac{N}{2}}^{2}-2TORA_{\frac{N}{2}\times \frac{N}{2}}^{3}))^{3}))^{2}-2(r(3(r(3TORA_{\frac{N}{2}\times \frac{N}{2}}^{2}-2TORA_{\frac{N}{2}\times \frac{N}{2}}^{3}))^{2}-2(r(3TORA_{\frac{N}{2}\times \frac{N}{2}}^{2}-2TORA_{\frac{N}{2}\times \frac{N}{2}}^{3}))^{3}))^{3})$

3.2.2 Time-dependent terminal reliability analysis using ORA

Regarding ORA, the terminal RBD between a given source and destination node for torus topology ($4\times 4$) is shown in Fig. 8. According to aforementioned figure, the time-dependent terminal reliability for torus topology ($4\times 4$) using ORA, denoted by $TORA_{4\times 4}(t)$, is calculated considering nodes failures and is given by Eq. (11). The terminal reliability for $N \times N$ network (where N=8, 16, 32, 64) is calculated using Eq. (12).

$$\begin{aligned} TORA_{4\times 4}(t)=3B(t)^{2}-2B(t)^{3} \end{aligned}$$

(11)

where $B(t)=e^{-\lambda t}(3(e^{-\lambda t}(3(r(3e^{-2\lambda t}-2e^{-3\lambda t}))^{2}-2(e^{-\lambda t}(3e^{-2\lambda t}-2e^{-3\lambda t}))^{3}))^{2}-2(e^{-\lambda t}(3(e^{-\lambda t}(3e^{-2\lambda t}-2e^{-3\lambda t}))^{2}-2(e^{-\lambda t}(3e^{-2\lambda t}-2e^{-3\lambda t}))^{3}))^{3})$

$$\begin{aligned} TORA_{N \times N}(t)=3B^{2}(t)-2B^{3}(t). \end{aligned}$$

(12)

where $B(t)=e^{-\lambda t}(3(e^{-\lambda t}(3(e^{-\lambda t}(3TORA_{\frac{N}{2} \times \frac{N}{2}}^{2}-2TORA_{\frac{N}{2} \times \frac{N}{2}}^{3}))^{2}-2(e^{-\lambda t}(3TORA_{\frac{N}{2} \times \frac{N}{2}}^{2}(t)-2TORA_{\frac{N}{2} \times \frac{N}{2}}^{3}(t)))^{3}))^{2}-2(e^{-\lambda t}(3(e^{-\lambda t}(3TORA_{\frac{N}{2} \times \frac{N}{2}}^{2}(t)-2TORA_{\frac{N}{2} \times \frac{N}{2}}^{3}(t)))^{2}-2(e^{-\lambda t}(3TORA_{\frac{N}{2} \times \frac{N}{2}}^{2}(t)-2TORA_{\frac{N}{2} \times \frac{N}{2}}^{3}(t)))^{3}))^{3})$ and t denotes the time dependency.

3.3 Comparison of terminal reliability

The time-independent terminal reliability for XYRA and ORA with respect to network size for various switch reliabilities ($r = 0.99, 0.95, 0.9$) is shown in Figs. 9 and 10, respectively. In this paper, to show the reliability analysis of torus network using XYRA and ORA we have chosen the network size of $64\times 64$. This analysis can be extended to $N \times N$ network using Eqs. (6), (8), (10) and (12). The analysis shows that the time-independent terminal reliability of torus network of size $64\times 64$ with various switch reliabilities using ORA is higher than XY algorithm as shown in Table 1. The time-dependent terminal reliability of torus network using XYRA and ORA ($64\times 64$) is depicted in Fig. 11. This analysis is carried out with the consideration of constant switch failure rate ($\lambda \sim 10^{-6}$ per hour). As depicted from Fig. 11, the time-dependent terminal reliability of torus network ($64\times 64$) over a duration of 1, 00, 000 hours is 0.952521 and 0.894823 using ORA and XYRA, respectively. This is observed because in XYRA only two links are available for routing, whereas in ORA it has three links available. However, it utilises the property of path diversity efficiently without compromising the inherent buffering capacity. Therefore, ORA provides better reliability than XYRA. Henceforth, during node failure, the probability to sustain a single path between a dedicated source–destination pair is high as compared to the XYRA.

Table 1 Comparison of time-independent terminal reliability for XYRA and ORA ($64\times 64$) with various switch reliabilities ($r = 0.99, 0.95, 0.9$)

Full size table

4 Broadcast reliability analysis

Broadcast reliability is carried out in terms of time-dependent and time-independent broadcast reliability. The time-dependent broadcast reliability is defined as the probability of finding at least single non-faulty path between a specific source to multiple destinations considering a constant node failure rate of $10^{-6}$ per hour [6].

The time-independent broadcast reliability is defined as the probability of finding at least single non-faulty path between a specific source and multiple destinations considering various values of node/switch reliability r (high = 0.99, medium = 0.95 and low = 0.9) [6]. For the analysis, we have considered $\frac{N}{2}$ destination nodes from a single source for the network size of $N\times $ (where $N= 4, 8, 16, 32, 64$). The conditions for broadcast reliability analysis remains same for any source to all the destinations.

4.1 Broadcast reliability analysis using XYRA

The broadcast reliability block diagram between a given source and destinations for XYRA is shown in Fig. 12. For the broadcast reliability analysis, specific source and destination pairs are assumed. The analysis can be extended to $N \times N$ network (where N = 8, 16, 32, 64). In this paper, for the sake of explaining the analysis, we have taken $4\times 4$ network. For example, (0,1) and (3,1),(3,2), is the source–destination pairs as depicted in Fig. 2, respectively. The identical pattern is obtained in the RBD which is shown by the dotted line in Fig. 12 (from SE2).

4.1.1 Time-independent broadcast reliability analysis using XYRA

The XYRA broadcast reliability RBD between a given source and destinations (for $4\times 4$ torus network) is shown in Fig. 12 which is a complex series–parallel RBD. Considering Fig. 12, the time-independent broadcast reliability for $4\times 4$ torus network, denoted by $XY_{4\times 4}$, is calculated by Eq. (13). To calculate the reliability between a given source and destinations for $N \times N$ network (where $N=8,16,32,64$) Eq. (14) is used. Each node is considered as a $2\times 2$ switch and found to have the switch reliability ‘r’.

$$\begin{aligned} XY_{4\times 4}& = r(1-(1-(r(1-(1-(r(1-(1-({r^{2}}\nonumber \\&(1-(1-r)^{2})))^{2})))^{2})))^{2}) \end{aligned}$$

(13)

$$\begin{aligned} XY_{N\times N}& = r(1-(1-(r(1-(1-(r(1-(1\nonumber \\&-\left(r^{\frac{N}{2}}\left(1-\left(1-XY_{\frac{N}{2}\times \frac{N}{2}}\right)^{2}\right)\right))^{2})))^{2})))^{2}) \end{aligned}$$

(14)

4.1.2 Time-dependent broadcast reliability analysis using XYRA

The broadcast RBD between a given source and destinations for torus topology ($4\times 4$) using XYRA is presented in Fig. 12. According to this figure, the time-dependent broadcast reliability for torus topology ($4\times 4$), denoted by $XY_{4\times 4}(t)$, considering nodes failures, is given by Eq. (15). The broadcast reliability for $N \times N$ network (where N = 8, 16, 32, 64) is calculated by Eq. (16).

$$\begin{aligned} XY_{4\times 4}(t)& = e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-({e^{-2\lambda t}}\nonumber \\&(1-(1-e^{-\lambda t})^{2})))^{2})))^{2})))^{2}) \end{aligned}$$

(15)

$$\begin{aligned} XY_{N\times N}(t)& = e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-({e^{-\frac{N}{2}\lambda t}}\nonumber \\&\left(1-\left(1-XY_{\frac{N}{2}\times \frac{N}{2}}(t)\right)^{2})\right))^{2})))^{2})))^{2}) \end{aligned}$$

(16)

4.2 Broadcast reliability analysis using ORA

According to ORA as explained in Sect. 1.2 the RBD between a given source and destinations is depicted in Fig. 13. For the broadcast reliability analysis, a distinct source and $\frac{N}{2}$ destinations are considered. The identical conditions are applied to any source–destination pairs. The analysis can be extended to $N \times N$ network (where $N= 8, 16, 32, 64$). In this paper, to explain the analysis, we have taken $4\times 4$ network. For example, (0001) and (1001), (1011), are the source–destination pairs as depicted in Fig. 3, respectively. The same pattern is obtained in the RBD which is shown by the dotted line in Fig. 13 (from SE2, SE3, SE1.2, SE1.3, SE1.1.2, SE1.1.3, SE1.1.1b, SE1.1.1c) where SE1, SE2, SE3 represents neighbouring nodes.

4.2.1 Time-independent broadcast reliability analysis using ORA

Considering Fig. 13, the entire block diagram can be divided into different groups (designated as G1-G7) and reliability of each node is denoted by ‘r’. The G1 group represents two destination nodes which are connected in series. Therefore, reliability of this group is calculated as: $r^{2}$ Considering the sub-group (shown in red block) in G2 group in Fig. 13, three nodes (represented by SE) are connected in parallel to each other, while at the same time, these nodes are connected in series with destination nodes. So reliability of this subgroup is calculated considering $1-out-of-3$ redundancy (using Eq. (3)) and can be represented as: $r^{2}(1-(1-r)^{3})$. Similarly, reliability of remaining groups is deduced by calculating reliability of each sub-group and by substituting this back in the order of series–parallel. In this manner, Eq. (17) is formulated.

Hence, regarding Fig. 13, the time-independent broadcast reliability for $4\times 4$ torus network, denoted by $ORA_{4\times 4}$, is calculated by Eq. 3 (17). Equation (18) is used to calculate the reliability for $N \times N$ network (where $N= 8, 16, 32, 64$). Each node is considered as a $2\times 2$ switch and assumed to have the switch reliability ‘r’.

$$\begin{aligned} ORA_{4\times 4}& = r(1-(1-(r(1-(1-(r(1-(1-({r^{2}}(1-(1-r){^{3}}))){^{3}}))){^{3}}))){^{3}}) \end{aligned}$$

(17)

$$\begin{aligned} ORA_{N \times N}& =r(1-(1-(r(1-(1-(r(1-(1-(r^{\frac{N}{2}}\left(1-\left(1-ORA_{\frac{N}{2} \times\frac{N}{2}}\right){{^{3}}}\right))){^{3}}))){^{3}}))){^{3}}) \end{aligned}$$

(18)

4.2.2 Time-dependent broadcast reliability analysis using ORA

Taking the working of ORA in account, the RBD between a given source and destinations is developed which is shown in Fig. 13. As explained in Sect. 4.2.1 using Fig. 13, the time-dependent broadcast terminal reliability for torus topology ($4\times 4$) using ORA, denoted by $ORA_{4\times 4}(t)$, is calculated considering nodes failures, is given by Eq. (19). The broadcast reliability for $N \times N$ network (where $N= 8, 16, 32, 64$) is calculated using Eq. (20).

$$\begin{aligned} ORA_{4\times 4}(t)& = e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-({e^{-2\lambda t}}(1-(1-e^{-\lambda t}){^{3}}))){^{3}}))){^{3}}))){^{3}}) \end{aligned}$$

(19)

$$\begin{aligned} ORA_{N \times N}(t)& = e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\lambda t}(1-(1-(e^{-\frac{N}{2}\lambda t}\left(1-\left(1-ORA_{\frac{N}{2}\times \frac{N}{2}}(t)\right){{^{3}}}\right))){^{3}}))){^{3}}))){^{3}}) \end{aligned}$$

(20)

4.3 Comparison of broadcast reliability

The time-independent broadcast reliability for XYRA and ORA with respect to various network sizes ($N\times N$ where $N=8, 16, 32, 64$) for various switch reliabilities ($r = 0.99, 0.95, 0.9$) is shown in Figs. 14 and 15, respectively. In this analysis, $\frac{N}{2}$ destination nodes are considered with single source node. According to ORA, three routing links are available from each node. However, there will be a decrease in the number of free non-overlapping paths with increase in network size. But, as compared to XYRA, ORA has more non-overlapping paths. Apart from this, node reliability also plays a crucial role while calculating both terminal and broadcast reliability. For high node reliability (0.99) chances of failures are less, thereby more paths are available, whereas for low node reliability (0.9) chances of failure being more, the number of available paths is reduced. Therefore, a decline is observed with low node reliability in Figs. 14 and 15. Whereas Figs. 9 and 10 represents the variation of terminal reliability analysis with respect to the network size of NxN (where $N=8,16,32,64$) using ORA and XYRA. And for this analysis, a specific source and destination is considered which will not affect the number of available non-overlapping paths. Hence, there is no significant decline observed in the time-independent terminal reliability analysis.

In this paper, to show the reliability analysis of XYRA and ORA we have chosen the network size of $64\times 64$. This analysis can be extended to $N \times N$ network using Eqs. (14), (16), (18) and (20). The analysis shows that the time-independent broadcast reliability of torus network of size $64\times 64$ with various switch reliabilities using ORA is higher than XY algorithm as shown in Table 2. The time-dependent broadcast reliability of torus network using XYRA and ORA for the network size of $64\times 64$ as shown in Fig. 16. This analysis is done with the consideration of constant switch failure rate ($\lambda \sim 10^{-6}$ per hour). As depicted from Fig. 16, the time-dependent broadcast reliability of torus network ($64\times 64$) using ORA is 0.535884 and using XYRA is 0.192896 over a duration of 1, 00, 000 hours. This is observed because while broadcasting from the source node in ORA, every packet can use three output links at each node as compared to two output links in the case of XYRA. Therefore, the broadcast reliability of ORA is better than XYRA.

Table 2 Comparison of time-independent broadcast reliability for XYRA and ORA ($64\times 64$) with various switch reliabilities (r = 0.99, 0.95, 0.9)

Full size table

5 Conclusion

In this paper, we outlined the various aspects of reliability for a highly scalable torus optical interconnect using ORA for data centre networks (DCNs). The reliability for the network of size $N\times N$ (where N =4, 8, 16, 32, 64) using XY routing algorithm and optimised routing algorithm was analysed. In this study, time-dependent & time-independent analysis of terminal and broadcast reliability using an accurate analytical method was performed. For the time-independent analysis, confidence levels of 0.99 (high switch reliability), 0.95 (medium switch reliability) and 0.9 (low switch reliability) were considered. The time-dependent analysis was performed with a constant switch failure rate of $10^{-6}$ per hour. The overall network reliability was evaluated considering node failures. The results were evaluated in MATLAB. The analysis explains that the reliability of the torus network employing ORA was better than XY algorithm. This was observed because in ORA the property of path diversity was efficiently utilised without compromising the inherent buffering capacity. While comparing reliability of ORA for torus network of size $64\times 64$ against XYRA, an improvement of $64\%$ is observed in time-independent broadcast reliability and an improvement of $6\%$ is observed in time-independent terminal reliability considering node reliability of 0.9, whereas an improvement of $70.5\%$ is observed in time-dependent broadcast reliability and improvement of $6.05\%$ is observed in time-dependent terminal reliability considering duration of 1,00,000 h.

References

Adiga NR, Blumrich MA, Chen D, Coteus P, Gara A, Giampapa ME, Heidelberger P, Singh S, Steinmacher-Burow BD, Takken T et al (2005) Blue gene/l torus interconnection network. IBM J Res Dev 49(2.3):265–276
Article Google Scholar
Al-Fares M, Loukissas A, Vahdat A (2008) A scalable, commodity data center network architecture. In: ACM SIGCOMM Computer Communication Review, vol. 38, pp. 63–74. ACM
Alverson R, Roweth D, Kaplan L (2010) The gemini system interconnect. In: 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pp 83–87. IEEE
Bansal P, Joshi R, Singh K (1994) On a fault-tolerant multistage interconnection network. Comput Electr Eng 20(4):335–345
Article Google Scholar
Bansal P, Singh K, Joshi R (1993) Reliability and performance analysis of a modular multistage interconnection network. Microelectron Reliab 33(4):529–534
Article Google Scholar
Birolini A (2007) Reliability engineering, vol 5. Springer, Berlin
MATH Google Scholar
Bistouni F, Jahanshahi M (2015) Pars network: a multistage interconnection network with fault-tolerance capability. J Parallel Distrib Comput 75:168–183
Article Google Scholar
Bistouni F, Jahanshahi M (2015) Scalable crossbar network: a non-blocking interconnection network for large-scale systems. J Supercomput 71(2):697–728
Article Google Scholar
Bistouni F, Jahanshahi M (2016) Reliability analysis of multilayer multistage interconnection networks. Telecommun Syst 62(3):529–551
Article Google Scholar
Brightwell R, Pedretti K, Underwood KD (2005) Initial performance evaluation of the cray seastar interconnect. In: 13th Symposium on High Performance Interconnects, 2005. Proceedings, pp 51–57. IEEE
Calabretta N, Centelles RP, Di Lucente S, Dorren HJ (2013) On the performance of a large-scale optical packet switch under realistic data center traffic. IEEE/OSA J Opt Commun Network 5(6):565–573
Article Google Scholar
Chen D, Eisley N, Heidelberger P, Senger R, Sugawara Y, Kumar S, Salapura V, Satterfield D, Steinmacher-Burow B, Parker J (2012) The ibm blue gene/q interconnection fabric. IEEE Micro 32(1):32–43
Article Google Scholar
Costa P, Donnelly A, O’shea G, Rowstron A (2010) Camcube: a key-based data center. Technical report, Technical Report MSR TR-2010-74
Dally WJ, Towles BP (2004) Principles and practices of interconnection networks. Elsevier, Amsterdam
Google Scholar
Farrington N, Porter G, Radhakrishnan S, Bazzaz HH, Subramanya V, Fainman Y, Papen G, Vahdat A (2011) Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Comput Commun Rev 41(4):339–350
Article Google Scholar
Guo C, Lu G, Li D, Wu H, Zhang X, Shi Y, Tian C, Zhang Y, Lu S (2009) Bcube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Comput Commun Rev 39(4):63–74
Article Google Scholar
Guo C, Wu H, Tan K, Shi L, Zhang Y, Lu S (2008) Dcell: a scalable and fault-tolerant network structure for data centers. ACM SIGCOMM Comput Commun Rev 38:75–86
Article Google Scholar
Jahanshahi M, Bistouni F (2015) Improving the reliability of the benes network for use in large-scale systems. Microelectron Reliab 55(3–4):679–695
Article Google Scholar
Kitayama KI, Huang YC, Yoshida Y, Takahashi R, Segawa T, Ibrahim S, Nakahara T, Suzaki Y, Hayashitani M, Hasegawa Y et al (2015) Torus-topology data center network based on optical packet/agile circuit switching with intelligent flow management. J Lightwave Technol 33(5):1063–1071
Article Google Scholar
Md Yunus NA, Othman M, Mohd Hanapi Z, Lun KY (2016) Reliability review of interconnection networks. IETE Tech Rev 33(6):596–606
Article Google Scholar
Qi X, Yang W, Chen Y, Dou Q, Feng Q, Dou W (2009) Boin: a novel bufferless optical interconnection network for high performance computer. In: IEEE/ACS International Conference on Computer Systems and Applications, 2009. AICCSA 2009, pp 117–123. IEEE
Rajkumar S, Goyal NK (2016) Reliability analysis of multistage interconnection networks. Qual Reliab Eng Int 32(8):3051–3065
Article Google Scholar
Sangeetha RG, Chandra V, Chadha D (2014) Bidirectional data vortex optical interconnection network: BER performance by hardware simulation and evaluation of terminal reliability. J Lightwave Technol 32(19):3266–3276
Article Google Scholar
Sangeetha RG, Hemanth C (2016) Performance analysis of chained k-ary data centre networks. p. Th3A.45. https://doi.org/10.1364/PHOTONICS.2016.Th3A.45
Sharma A, Sangeetha RG (2018) Performance analysis of high speed low-latency torus optical network. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS), pp 488–491. IEEE
Sharma A, Sangeetha RG (2018) Reliability analysis of data center network. In: Optical And Microwave Technologies, pp 71–80. Springer
Sharma A, Sangeetha RG (2018) Time dependent network reliability analysis of optical data center network. In: International Conference on Fiber Optics and Photonics, PHOTONICS 2018. OSA
Sharma N, Chadha D, Chandra V (2007) The augmented data vortex switch fabric: an all-optical packet switched interconnection network with enhanced fault tolerance. Opt Switch Netw 4(2):92–105
Article Google Scholar
Sharma N, Chadha D, Chandra V (2010) Terminal reliability evaluation of the augmented data vortex all optical interconnection network. In: 2010 Second International Conference on Communication Systems and NETworks (COMSNETS 2010), pp 1–2
Wang G, Andersen DG, Kaminsky M, Papagiannaki K, Ng T, Kozuch M, Ryan M (2011) c-through: part-time optics in data centers. ACM SIGCOMM Comput Commun Rev 41(4):327–338
Article Google Scholar
Wang T, Su Z, Xia Y, Liu Y, Muppala J, Hamdi M (2014) Sprintnet: a high performance server-centric network architecture for data centers. In: 2014 IEEE International Conference on Communications (ICC), pp 4005–4010. IEEE

Download references

Author information

Authors and Affiliations

School of Electronics Engineering, VIT University Chennai Campus, Chennai, India
Abhilasha Sharma & R. G. Sangeetha

Authors

Abhilasha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
R. G. Sangeetha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. G. Sangeetha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, A., Sangeetha, R.G. Terminal and broadcast reliability analysis of direct 2-D symmetric torus network. J Supercomput 77, 1517–1536 (2021). https://doi.org/10.1007/s11227-020-03311-0

Download citation

Published: 20 May 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11227-020-03311-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abstract

Similar content being viewed by others

Component Reliability Analysis of a 4 × 4 Symmetric 2D Torus Optical Interconnection Network Node Architecture

Reliability Analysis of Data Center Network

Multi-source multi-terminal reliability evaluation of interconnection networks

1 Introduction

1.1 XY routing algorithm (XYRA)

1.2 Optimised routing algorithm (ORA)

2 Reliability block diagram (RBD) method

2.1 Series RBD

2.2 Parallel RBD

2.3 Series–Parallel RBD

3 Terminal reliability analysis

3.1 Terminal reliability analysis using XYRA

3.1.1 Time-independent terminal reliability analysis using XYRA

3.1.2 Time-dependent terminal reliability analysis using XYRA

3.2 Terminal reliability analysis using ORA

3.2.1 Time-independent terminal reliability analysis using ORA

3.2.2 Time-dependent terminal reliability analysis using ORA

3.3 Comparison of terminal reliability

4 Broadcast reliability analysis

4.1 Broadcast reliability analysis using XYRA

4.1.1 Time-independent broadcast reliability analysis using XYRA

4.1.2 Time-dependent broadcast reliability analysis using XYRA

4.2 Broadcast reliability analysis using ORA

4.2.1 Time-independent broadcast reliability analysis using ORA

4.2.2 Time-dependent broadcast reliability analysis using ORA

4.3 Comparison of broadcast reliability

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation