Abstract

In this paper, we model the causes of power-related network outages in Ghana using discrete-time Markov chains. We used data consisting of 2,756 small-scale carrier telecommunications outages occurring in Ghana, with accompanying root causes over a period of 5 years and 8 months, from August 2015 to April 2021. The results indicate that the majority (n = 1,404) of the network outages were caused by the generators while the least number (18) of outages were caused by a communication equipment. However, longer network outages were caused by fuel issues with an average outage time of 1,027.82 min over the study period. The transition probability matrix obtained from the data revealed that regardless of the present cause of the network outage, the probability that the next network outage will be caused by the generators is higher than the probability that the outage will be attributable to any other cause. The steady-state distribution indicates that in the long run (n ≥ 16), 51% of the network outages will be caused by the “Generators” while 10.8% of the network outages will be caused by the “Batteries.” We also checked and simulated the probabilities of a network outage caused by any of the 12 possible root causes for 12 steps. It seemed apparent from the simulations that generators are the most likely cause of network outages from Step 1 up to Step 7, irrespective of what the initial cause of the network outage is. With these findings, players in the telecommunications industry can clearly plan better to reduce future network outages.

1. Introduction

To meet the expectations of mobile network subscribers, network reliability must be ensured as the need for telecommunication data traffic has increased tremendously in recent years. The deployment of cutting-edge technology like 5G and its applications, including the internet of things (IoT), machine-to-machine (M2M), and device-to-device (D2D), are crucial services that cannot experience any outage [1]. Circuit capacity leasing is necessary and must always be dependable for banking, internet service providers (ISP), governmental organizations, and broadband services [2]. The two types of network outages are failures of the communication equipment (Active) and failures of the passive system. Nine-five (95) percent of all outages originate from base transceiver stations (BTS), which can have a problem with either passive or active communication equipment [3].

Rectifiers, batteries, and failed direct current (DC) fuses or circuit breakers are a few examples of passive issues [4]. Other examples include generators, AC circuit breakers, commercial air conditioning, an AC transfer switch, alarm systems, and environmental systems.

Microwave radio issues, fiber cuts, blocked time slots, radio indoor interface issues, and signaling issues are further communication equipment failures. These errors can affect one or more cell sites, causing substantial outages that have a big impact on subscribers. The mobile network operators’ (MNOs) need for passive infrastructure operations and maintenance is increased as a result. Tower companies (TCs) must take care to put the proper procedures in place to monitor the health of the passive telecommunication element and the capacity to provide network services continuously. In today’s highly competitive corporate world, it is essential to execute operations with little to no outages. In the telecom industry, harsh penalties are enforced for subpar quality of service. The service level agreement (SLA) determines the cost of outages to the passive maintenance contractor [5].

MNOs and TCs must be interested in the root causes of network outages if they are to increase network availability, quality of service (QoS), and customer satisfaction. This will also help MNOs to avoid sanctions by regulatory bodies such as the national communication authority (NCA), in the case of Ghana.

In 2018, the NCA sanctioned several telecommunication companies in Ghana for failing to meet the standards governing coverage, data, voice, and speech quality due to network outages. MTN-Ghana, AirtelTigo-Ghana, Vodafone-Ghana, and Glo-Ghana were asked to pay a penalty of GHC1.8 million, GHC11.6 million, GHC8.9 million, and GHC4.5 million, respectively, by the NCA [6].

Network outages will result in revenue loss and poor QoS in the telecommunication industry, especially in 5G deployment. It was estimated that the cost of an outage on poor QoS due to power quality on the mobile networks and TCs could be as high as 12% of their annual turnover. In the event of power failure, the network equipment relies on lead-acid or lithium-ion batteries for their energy source to eliminate call drops, reduce mean time to repair, and increase service quality and revenue [5].

In a study by Tollar and Bennett [7], a network outage impact measure was taken into consideration to properly reflect the significance of large and major outages on the modern telecommunications network. A system was created to evaluate the severity of individual outages and the network’s performance over a specified time frame [7].

A study by Rauf et al. [8] proposed three different frameworks (configurations) to minimize network outages, operational costs, and environmental pollution and to improve network reliability and profitability. These three configurations are as follows: (1) utility grid and backup battery; (2) utility grid, backup battery, and diesel generator; and (3) utility grid, backup battery, and solar. After putting the frameworks through a linear optimization process, the results indicate that configuration (2) has the potential to give the highest level of dependability among all configurations [8].

It is important that the processes for service disruption or network outage be understood, the risk evaluated, and practical improvement programs outlined. Since telecommunications systems are complicated, dispersed entities that provide essential services to society, it is critical that these things be understood [9].

The Network Reliability Steering Committee (NRSC) was established with the assistance of an industry association for the purpose of analyzing reports concerning facility outages, local switch outages, common channel signaling outages, tandem switch outages, digital cross-connect system (DCS) outages, and the central office outage. Also, the NRSC examines outages regarding the length of the outage, the number of consumers affected, the number of blocked calls, and the frequency of outages [4, 10, 11].

In 1993, the NRSC carried out technical research on reliability and recommended that the industry conduct further research to better understand the proposed recommendation and devise industry-wide best practices for the method of implementation to cut down on the number of times that outages occurred [12].

The Nippon Telegraph and Telephone (NTT) created a technology to predict the impact of network failure, specify network reliability in terms of the effects on the user, and construct a network per the reliability specification to achieve high reliability in a telecommunications network [13].

A study by Luis and Moncayo [14] presents a model for the planned outage for the telecommunication industry to maintain the uptime standard of 99.999% or 5.25 min of outage per year. The model will make the planning of outage easier and better understand how to minimize the planned outage to improve network availability and QoS [14].

In a study by Raman and Chebrolu [15] discovered that the primary reason for communication network failure in rural India was the poor quality of the power supply. According to the investigation findings, 93 out of 95 faults resulted from power interruptions [15].

An efficient and dependable telecommunications solution that combines renewable and “conventional” energy sources to reduce outages is a hybrid system [16]. This includes solar batteries, generator batteries, commercial AC electricity, and batteries that combine solar, battery, and generator power [17].

An effective, efficient, and lifesaving backup batteries, when the generator on the cell sites breakdowns, automatically supplies DC power to the communication equipment until the problem is fixed [8, 18].

Wind and solar energy were offered as alternative power sources to achieve a dependable power supply in the Tanzanian telecommunications sector to reduce power outages, improve network dependability, and increase profitability [17]. The report also recommended adding a network power management system to the telecommunication network system to increase network service availability and reduce operational costs resulting from broken network components [17].

Alternating current (AC) is usually converted to DC with the help of rectifier modules with an output of 24–27 or 48–57-volt DC. The DC power charges the batteries while supplies DC power to the equipment on the cell sites [11].

A study by Samuels et al. [5] revealed that the cause of outage in telecommunication is due to the failure of the generator to serve as a backup due to the seized piston, overheating of the engine, AC alternator fault, fuel pump, and injection pump. The study also stated that backup batteries deployed on the telecommunications networks could supply power for at least 8 hr in case of generator fault [5].

In a different study by Spragins et al. [19], the duration of failures and single-line availabilities’ probability distribution functions are provided. Along with computer simulation data confirming the model’s correctness, a heuristic approach for calculating availabilities for more complicated systems is offered. The findings allow for more accurate availability forecasts than could previously be computed for typical forms of the network [19].

Modeling of telecommunication network outage was performed by Oduro-Gyimah et al. [20] using the autoregressive integrated moving average (ARIMA) model. The outcome of the investigation showed that the ARIMA (2,0,2) model was the best among all five models explored for predicting telecommunication outage duration. The best model was selected using the root-mean-square error, mean absolute error, and mean absolute percentage error [20].

Statistical models were used in a study by Chayanam [11] to assess power outages in the power sector, including a frequency distribution, which was used to determine the total number of customers affected by the outage and was calculated using the Best fit software [11].

To statistically determine whether a trend exists in a set of time series data, an outage dataset was put through a Laplace trend test [21]. To explore periodicity in the power outage data, the study used Fourier analysis to reveal that no discernible spike could be seen. Thus, it can be concluded that the outage data exhibits either very little or no seasonality or regularity [21].

To assess the reliability relationship of outage data and identify the explanatory variables for the outage data, Poisson regression and Mac ANOVA (Macintosh Analysis of Variance) were utilized [22].

In a study by Snow and Weiss [23], the sudden changes in power outages were examined using a piecewise linear model. The model segmented the data into intervals by a presumed statistically significant breakpoint identified by the Poisson regression [23].

The power law model, often known as the Weibull reliability growth model, was used by Steven [24] on an outage data to determine whether the system is worsening or improving. The study demonstrates that when the scale parameter (β) is greater than 1, the intensity function increases, indicating that failures tend to occur more frequently. Conversely, when the intensity function is 1, the system will perform better [24].

In a study by Iddrisu and Gedel [25], discrete-time Markov chains (DTMCs) were used to model the downtime severity of telecommunication networks in Ghana. Their results indicate that the majority (n = 905) of the daily network downtime recorded was negligible while only 25 of the outages were severe. The transition probability matrix revealed that when the present network downtime severity is negligible, then there is an 81% chance that the next network downtime severity will still be negligible, a 12% chance that the next network downtime severity will be minimal, a 4% chance that the next network downtime severity will be significant, 2% chance that the next network downtime severity will be serious, and 1% chance that the next network downtime will be severe.

Although a lot has been written about telecommunication network outages, only a few studies have concentrated on the causes of telecommunication network outages. Furthermore, the few studies that focused on the causes of network outages only considered a few causes mainly on the active side. This study, therefore, contributes to the literature on telecommunication network outages by employing DTMCs to model the causes of all power-related telecommunication network outages in Ghana. The advantages of using DTMCs over other statistical models are simplicity and out-of-sample forecasting accuracy.

2. Methodology

In this section, we provide a description of the data used for the study and the statistical model used for data analysis.

2.1. Data

The data used in this empirical study were obtained from the National Communication Authority of Ghana, and it consists of 2,756 small-scale carrier telecommunications outages occurring in Ghana, with accompanying root causes over a period of 5 years and 8 months, from August 2015 to April 2021.

The data contain the incident start time, escalated time, battery time, and outage time. In addition, the data contain the number of affected cells, the number of physical and logical sites that are affected, and the root cause of the outage. The data are extracted from the records of the network-monitoring center of the various MNOs and tower companies in Ghana.

2.2. Discrete-Time Markov Chain Model

Markov chains are stochastic models used mainly for the analysis of stochastic processes [26]. There are basically two types of Markov chains: continuous-time and DTMCs. The choice of either continuous-time or DTMC largely depends on the nature of the time series data involved. The DTMC is used in this application since the data consists of discrete causes of network outages in Ghana.

Mathematically, a DTMC is defined as a sequence of random variables , which is characterized by the Markov property. The Markov property, also known as the memoryless property states that the distribution of the next variable () depends only on the value of the current variable () and not any of the previous variables (). This definition is presented in Equation (1) as follows:

The state space of the Markov chain is the set of all possible states of , which can be finite or countably infinite. In this study, the state space consists of the possible power-related causes of network outages identified in Ghana (Equation (2)):

The Markov chain transitions from one state (say ) to another state (say ) with probability in one step, known as the transition probability (Equation (3)):

The probability of transitioning from state to in steps is shown in Equation (4) as follows:

When no change in the underlying transition probabilities is observed even as time changes, then the Markov chain is said to be time-homogeneous. A DTMC exhibits temporal homogeneity if Equation (5) holds:

If the DTMC exhibits temporal homogeneity, then the one-step and n-step transition probabilities are respectively given as; and , where .

Each element, , of the transition probability matrix is computed using Equation (6), where represent the observed frequency of one-step transitions from state to state in the historical data:

To check whether the sequence of events in the given data follows the Markov property with states, we use the Chi-square test statistic with degrees of freedom, as shown in Equation (7):where and are the observed and expected transition frequencies respectively [26, 27]. The expected transition frequency () is computed using Equation (8):

To investigate the long-term behavior of a Markov chain, we use the stationary distribution. The stationary or steady state distribution of the Markov chain in this study shows the long-term proportion of time each cause of network outage spends in a specific state.

Suppose P is the probability transition matrix of the Markov chain. Then the steady state distribution is calculated as follows:(1)Find any eigenvector v of P with eigenvalue 1 by solving .(2)Divide v by the sum of the entries of v to obtain a normalized vector w whose entries sum to 1.(3)This vector automatically has positive entries. It is the unique normalized steady state distribution of the Markov chain.

3. Results and Discussion

In this section, we provide the results of the data analysis and a discussion of the findings.

3.1. Descriptive Statistics

A detailed description of the data used for this study is presented in Tables 1 and 2. The measures of central tendency are contained in Table 1 while the measures of dispersion are presented in Table 2. It is obvious from Table 1 that majority (n = 1,404) of the network outages was caused by generators while the least number (18) of outages was caused by communication equipment. However, longer network outages were caused by fuel issues with an average outage of 1,027.82 min over the study period. Batteries were responsible for 304 of the network outages, with an average outage of 48.6 min. AC transfer switch caused 190 network outages over the study period, with an average outage of 66.95 min. Total of 186 of the network outages were caused by AC circuit breakers, with an average outage of 67.04 min. Shorter network outages were caused by temperature, with an average outage of 35.19 min over the study period. The outage data were however not normally distributed, considering the huge differences in the various measures of central tendency (mean, median, and 5% trimmed mean).

Table 2 contains the measures of dispersion for the network outage data used in this study. The large standard deviation (SD) and mean absolute deviation (MAD) values indicate that there is a lot of variation in the observed network outage data around the mean. This therefore means that the observed network outage data are quite spread out.

3.2. Discrete-Time Markov Chain Model

To begin with, the first thing we did was to check if the sequence of the causes of network outage data we collected followed the Markov property. Table 3 shows the χ2 test results on a series of contingency tables derived from the sequence of events (causes of network outages). Large p-values indicate that the null hypothesis of the sequence following the Markov property should not be rejected. Therefore we fail to reject the null hypothesis that our data on causes of network outages follow the Markov property since the p-value is greater than 0.05 (Table 3). Hence, we can proceed to perform a Markov chain analysis on our data.

The next step in DTMC modeling, after testing the Markov property, is to generate the transition probability matrix. The state transition probability matrix presented in Table 4, gives the probabilities of transitioning from one state to another in a single time unit. In this case, the transition probability matrix gives the probabilities of transitioning from one cause of network outage to another cause in a single time unit. Several interesting revelations are presented in the transition probability matrix. First, the probabilities in Table 4 reveal that regardless of the present cause of the network outage, the probability that the next network outage will be caused by generators is higher than the probability that the outage will be attributable to any other cause. In some cases the probability is as high as 0.72. For instance, if the present cause of network outage is temperature, then there is a 72% chance that the next cause of network outage will be generators. Second, if the present cause of network outage is known to be communication equipment, then we can tell for sure (probability = 0) that the next cause of network outage will not be AC transfer switch, batteries, DC fuse/CB, environmental systems, fuel issues, rectifiers, or temperature. Another interesting revelation from Table 4 is that, if the present cause of network outage is generators, then there is a 57% chance that the next cause of network outage will still be generators. However, if the present cause of network outage is temperature, then there is only a 3% (almost impossible) chance that the next network outage will also be caused by temperature.

For easy understanding of the transition probability matrix in Table 4, the transition matrix, which gives the probabilities of transitioning from one cause of network outage to another, is presented diagrammatically in Figure 1. The circular arrows indicate the probability of transitioning from one cause to itself, while the directional arrows give the probability of transitioning from one cause of network outage to the other.

The steady-state distribution for the causes of network outage Markov chain is presented in Table 5. Also known as the stationary distribution, the steady-state distribution is a probability distribution that remains unchanged in the Markov chain as time progresses. This indicates that in the long run (n ≥ 16), 51% of the network outages will be caused by the “Generators” while 10.8% of the network outages will be caused by the Batteries.” In fact, the top five causes of network outages, in the long run, will include “Generators,” “Batteries,” “AC Circuit Breakers,” “AC Transfer Sswitch,” and “Rectifiers,” respectively. On the other hand, the bottom five causes of network outages, in the long run, will include “Communication Equip.,” “Environmental Systems,” “Temperature,” “DC Fuse/CB,” and “Hybrid Failure,” respectively (Table 5).

The DTMC model developed in this study will also allow us to visualize how the probabilities change as the number of steps increases to contrast the expected number of steps. Thus, we checked the probabilities of a network outage caused by any of the 12 possible root causes for 12 steps. From the four subplots of Figure 2, it seems apparent that generators are the most likely cause of network outages from Step 1 up to Step 7, irrespective of whether the initial cause of the network outage is AC circuit breakers, AC transfer switch, batteries, or commercial AC. However, the likelihood decays to almost zero for all 12 possible causes of network outage after Step 7.

In addition, from the four subplots of Figure 3, it is again clear that generators are the most likely cause of network outages irrespective of whether the initial cause of the network outage is communication equipment, DC fuse/CB, environmental systems, or fuel issues.

Furthermore, the top-left, top-right and bottom left subplots of Figure 4 all indicate that generators are the most likely cause of network outages from Step 1 up to Step 7 when the initial cause of network outage is either generators, hybrid failure, or rectifiers. The bottom-right subplot also shows that generators are the most likely cause of network outages from Step 1 all the way to Step 12, when the initial cause of network outage is temperature.

4. Conclusions

In this study, we applied a DTMC to model and study the causes of power-related telecommunications network outages in Ghana. The power-related causes of network outages were categorized into 12 based on Chatanyam [4]. These included AC circuit breakers, AC transfer switches, batteries, commercial AC, communication equip., DC fuse/CB, environmental systems, fuel issues, generators, hybrid failure, rectifiers, and temperature. The results of the descriptive statistics indicate that the majority (n = 1,404) of the network outages were caused by generators while the least number (18) of outages were caused by a communication equipment. However, longer network outages were caused by fuel issues with an average outage time of 1,027.82 min over the study period. The transition probability matrix obtained from the data revealed that regardless of the present cause of the network outage, the probability that the next network outage will be caused by generators is higher than the probability that the outage will be attributable to any other cause. The steady-state distribution indicates that in the long run (n ≥ 16), 51% of the network outages will be caused by the “Generators” while 10.8% of the network outages will be caused by the “Batteries.” We also checked and simulated the probabilities of a network outage caused by any of the 12 possible root causes for 12 steps. It seemed apparent from the simulations that generators are the most likely cause of network outages from Step 1 up to Step 7, irrespective of what the initial cause of the network outage is. These findings will go a long way to help telecommunications companies plan early and be ready to deal with network outages promptly when they occur [28].

Data Availability

All data used for this study are available upon request.

Conflicts of Interest

The authors declare that there is no conflicts of interest regarding the publication of this paper.

Acknowledgments

The research was fully funded by the authors.