Reliability Assessment for a Safety-Related Digital Reactor Protection System Using Event-Tree/Fault-Tree (ET/FT) Method

Liang, Qingzhu; Liu, Mingxing; Xiao, Peng; Guo, Yun; Xiao, Jun; Peng, Changhong

doi:https://doi.org/10.1155/2020/8839399

Science and Technology of Nuclear Installations

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 8839399 | https://doi.org/10.1155/2020/8839399

Reliability Assessment for a Safety-Related Digital Reactor Protection System Using Event-Tree/Fault-Tree (ET/FT) Method

Qingzhu Liang,¹Mingxing Liu,²Peng Xiao,²Yun Guo,¹Jun Xiao,³and Changhong Peng¹

Academic Editor: Massimo Zucchetti

Received22 Sept 2020

Accepted17 Nov 2020

Published30 Nov 2020

Abstract

The aim of this study is to verify if the reliability of a digital four-channel RPS under the design phase satisfies the specified target and to identify the weakness of system design and potential solutions for system reliability improvement. The event-tree/fault-tree (ET/FT), which is the method used in the current probabilistic safety assessment (PSA) framework of nuclear power plants (NPPs), was adopted to developed reliability modeling for the RPS with the Top Events defined as the system failure to generate reactor trip signal and the system generating spurious trip signal. The evaluation results indicate that the probability of the system failure on demand and the frequency of spurious trip signal generation are 1.47 × 10⁻⁶ with a 95% upper bound of 4.63 × 10⁻⁶ and 7.94 × 10⁻⁴/year with a 95% upper bound of 2.50 × 10⁻³/year, respectively. The importance and sensitivity analyses were conducted and it was found that undetected unsafe common cause failures (CCFs) of signal conditioning modules (SCMs) dominate the system reliability. Two preliminary optimization schemes relative to reducing periodic test interval and adapting two kinds of diverse SCMs were proposed. Results of the quantitive evaluation of the schemes show that neither of them could determinedly improve the system reliability to the target level. In the future, more detailed optimization analysis shall be required to determine a feasible system design optimization scheme.

1. Introduction

The reactor protection system (RPS) is one of the most important safety-related systems in NPPs nuclear power plants (NPPs). It protects the integrity of the safety barriers of NPPs by generating signals to scram or drive engineered safety features when necessary. Obviously, the reliability of the RPS has an important impact on the plant safety and should be demonstrated to satisfy a certain level. With the rapid development of computer technology, digital technologies, which can provide potential to improve the system reliability through special features such as online self-diagnosis, are gradually adopted in the RPS [1]. It is necessary to develop reliability modeling for digital RPS and integrate the system model into probabilistic safety assessment (PSA) of NPPs.

So far there is no consensus on methods for reliability modeling of the digital system in NPPs. Even though some dynamic methods with great potential, for example, dynamic flow-graph methodology, have been proposed, they are still within the usage trial phase [2, 3]. Furthermore, the application of a dynamic method needs substantial effort and the method generally suffers from the incompatibility with the existing PSA framework. On this viewpoint, the ET/FT method that has a mature theory and is easy-to-use got much attention and had been used in research about reliability assessments of digital systems in NPPs and yielded satisfactory results [4–7].

In this paper, the ET/FT method was used to perform reliability assessment of one digital four-channel RPS within the design phase; the main contributions for system risk were identified by importance and sensitivity analysis and two preliminary schemes for the system design optimization were also proposed and quantitatively evaluated.

2. Target System Description

The present paper estimated the reliability of a digital four-channel RPS during the design phase, with the intention of validating if it satisfies the specified reliability goal and obtaining meaningful risk information about the system for design improvement. The reliability goal for the RPS specified by the system requirement specification is as follows:(i)Probability of failure to generate reactor trip signal should be equal to or less than 10⁻⁷ per demand.(ii)Frequency of the generation of the spurious trip signal should be equal to or less than 0.1/year

The schematic diagram of the four-channel RPS is provided in Figure 1. The system includes four channels (i.e., IP, IIP, IIIP, and IVP). Each channel consists of two subchannels (i.e., subchannel-1 and subchannel-2) with functional diversity and eight subchannels constituting subsystem-1 and subsystem-2. Each subchannel (see Figure 2) contains three types of signal condition modules (SCMs), that is, analog signal conditioning modules (ACM), digital signal conditioning modules (DCM), and thermocouple signal conditioning modules (TCM), two types of input modules, that is, analog input modules (AI) and digital input modules (DI), input/output extended modules (EXT), digital output modules (DO), processor modules (CPU), and communication modules (COM). Among them, CPU and COM are hot-standby redundancy configurations. Conditioning modules are used to condition, isolate, and distribute signals from sensors. AI and DI convert the signals into numerical format then transmit them to CPUs through EXT. In a subchannel, the CPU compares the signals with the predefined setpoint value and generate a local coincidence signal (LCS) if the threshold value is reached; with threshold judgment results of other three subchannels transmitted by COM, the CPU performs two-out-of-four voting logic and generation trip signal when there are two or more LCSs. Output signals of subchannel-1 and subchannel-2 of each channel are connected with “OR” gate and then open one pair of reactor trip breakers. If two out of four pairs of reactor trip breakers open, the reactor will be shut down.

For most of the design basis accidents, there are two kinds of diversity of sensor signals used to generate shutdown signals; and signals without diversity are transmitted to two subchannels through SCM.

3. Fault-Tree Analysis

3.1. Model Development

The present paper is focused on the safety function of the RPS to generate a reactor trip signal. Two failure modes of the system are considered, that is, failing to generate reactor trip signal and generating spurious signal.

In order to envelop situations with different acquisition signal quantities and obtain conservative calculation results, Top Events are defined as follows based on the principle for functions allocation of the system:(i)Failing to generate reactor trip signal on demand under three sensor signals without diversity (RT 3IN FD).(ii)Generating a spurious trip signal under one sensor signal with diversity (RT 1IN ST).

Since the component configurations for different types of measurement signal to generate trip signal just distinguish on conditioning and input modules, it might as well select analog signal to develop a case model and only simple modifications will be needed for digital or thermocouple signal. The analysis is based on the following assumptions:(i)The analysis places emphasis on the digital system itself and the failures of sensors, reactor trip breakers, and associated relays with them are not considered.(ii)Loss of power supply of functional modules would cause their unavailability and such has a negative effect on the implementation of preset safety functions of the system. However, since there is not enough information about the supply power system at the time of the performance of this analysis and it can believe that the complete failure probability of it is very low because the power supply of a cabinet generally has triple redundancy configuration, the modeling of the supply power system exclude in the present paper.(iii)LCS signals used for voting to generate trip signals are transmitted among channels of the RPS through the data communication network. Such effect of its failure on the reliability of the RPS should not be ignored, which may result from failure of hardware or software of communication module or faults in transmission medium of communication cable and would lead to loss of communication of LCS signals. Nonetheless, there was not enough information about the data communication network when this study was conducted and the reliability analyses of it are excluded.(iv)The faults in different software modules of the digital system may result in different failures. Although, from the point of view of modeling convenience, software failure can be divided into two categories depending on whether the effect of the failure is failure of a single module or simultaneous failure of multiple modules that is the same as CCF. Examples of the failure categories may be faults in application software and faults in software functional requirements specification. Since debates on the applicability of current quantitative software reliability methods and the lack of data and information of the system software, its modeling was not included in the current study.(v)It is supposed that the human errors have no effect on the generation of the automatic signal and human reliability analysis is out of scope.(vi)To be conservative in terms of reliability, it is assumed that once the failure of one module is detected, repair activity occurs and results in the unavailability of the module.(vii)According to the maintainability and availability requirements of the RPS specified in the system requirement specification, the meantime to repair (MTTR) and periodic test interval (TI) for modules are assumed to be four hours and six months, respectively.

The basic events used in the FT models for the Top Events are defined based on the failures of the modules. Failures of a module are classified according to their detectability and effects on module function, including the following [8]:(i)Detected failure (D): the failure is detected and the repair leads to the availability of the module.(ii)Undetected safe failure (US): the failure is undetected and results in an increase in the probability of spurious action.(iii)Undetected unsafe failure (UU): the failure is undetected and the function of the module is completely lost.

The FT models for a Top Event were constructed based on the following principles:(i)Find out all failure signal combinations that will result in the Top Event.(ii)For each signal in a combination, find out all failure modes and input signals combinations of the module that will lead to the signal.

The topic logics of FT models for RT 3IN FD and RT 1IN ST are shown in Figures 3 and 4, respectively.

3.2. Quantitative Analysis

The reliability models used for basic events in the quantitative analysis include a repairable component for detected failure and a periodically tested component for undetected failure. The unavailability Q (t) of the repairable component is modeled bywhere λ and μ (=1/MTTR) are the failure rate and repair rate of the component, respectively. The long-term unavailability of the component is Q = λ/(λ + μ). The unavailability Q (t) of periodically tested component is modeled by:where λ and TI are failure rate and test interval of the component, respectively. The mean unavailability of the component is Q = 1 − (1 − e^−λTI)/(λTI).

The failure data of the modules constituting the system was derived from results of failure modes, effects, and diagnostic analysis (FMEDA) of the modules. Parameters mainly include detected failure rate (λ_D), undetected safe failure rate (λ_US), and undetected unsafe failure rate (λ_UU) of modules, as shown in Table 1.

Two types of CCFs of the modules were considered: (1) CCFs of modules with hot-standby configuration in the same subchannel and (2) CCFs of identical modules of four channels in the same subsystem. They are modeled by Beta model and Multiple Greek Letter model, respectively [9]. The parameters of CCFs models used in this analysis are shown in Table 2.

The parameter uncertainty was considered in the analysis. Since recognized weaknesses in the data, large error factor (EF) was assumed for the parameter, that is, 5 for failure rate and 3 for β, γ, and δ [10]. In addition, the parameter was assumed to be lognormally distributed. The propagate of parameter uncertainties in terms of variation of system failure probability was evaluated.

The calculation results for three types of signals are shown in Tables 3 and 4. The results indicate that when the input scram parameters are thermocouple signals the probability of the RPS failing to generate a trip signal on demand is 1.47 × 10⁻⁶ with a 95% upper bound of 4.63 × 10⁻⁶ in case of considering CCFs, which is larger than the other two types of signals. If contributions of CCFs are ignored, this value is 2.12 × 10⁻¹¹ with a 95% upper bound of 3.83 × 10⁻¹⁰. For the same signal type, the frequency of the system generating spurious trip signal is 7.94 × 10⁻⁴/year with a 95% upper bound of 5.71 × 10⁻³/year on condition that the FT model includes CCFs, which is also larger than the other two types of signals. When the CCFs are excluded in the system reliability model, the frequency is 2.70 × 10⁻⁵/year with a 95% upper bound of 1.41 × 10⁻⁴/year. Taking CCFs into account, the system reliability does not fulfill the specified reliability goal (see section 2) with regard to the probability of failure on demand of the system function. The results make it clear that CCFs of modules are the main contributors of the system failure; this is consistent with the consensus that the safety-critical protection system with redundancy multiple-channel is remarkably affected by CCFs [4, 11].

4. Importance and Sensitivity Analysis

From the perspective of safety, the probability of the system failure on demand to generate trip signal is more of a concern in PSA. Such importance and sensitivity analyses were performed to identify the significant factors which contribute to the failure on demand of the RPS (selecting analog signal as case study). The factors include individual basic event (BE), input parameters (e.g., failure rate), and components (modules of the RPS). Importance measures commonly used include Fussell–Vesely (FV), risk decrease factor (RDF), and risk increase factor (RIF). FV of factor i (related to individual BE or multiple BEs constituting a group) represents the contribution of the factor on the system risk, defined aswhere Q_Top is the probability of the Top Event. Q_Top,i is the probability of the Top Event calculated based only on all minimum cut sets including BEs related to factor i.

RDF of factor i is a measure that indicates the decrease of system risk assuming the nonoccurrence of BEs related to the factor. Mathematically, it is calculated aswhere Q_Top,p(i)=0 is the probability of the Top Event with assuming that probabilities of BEs related to factor i are zero.

RIF is the opposite of RDF, that is, it expresses the increase of system risk based on BEs related to the factor certainly occurring. It is expressed aswhere Q_Top,p(i)=1 is the probability of the Top Event with assuming that probability of BEs related to factor i is one.

The sensitivity of factor i related to individual BE or multiple BEs on the probability Top Event is defined aswhere Q_Top,U and Q_Top,L are the probabilities of the Top Event based on probability of BEs related to factor i multiplied by a sensitivity factor (SF) and divided by SF, respectively. When the analysis object is the input parameter, the above two quantities, respectively, represent the probabilities of the Top Event under conditions that the parameter is multiplied and divided by SF. In this analysis, SF is defined as 10.

The importance and sensitivity calculation results for the selected BEs, parameters, and components are shown in Tables 5–7. It is shown that undetected unsafe CCFs of ACMs have significant effects on system reliability. TI and λ_UU of the ACM, which determine the probabilities of UU of ACMs, are decisive parameters for the system risk. The results show that ACMs are the critical component of the system.

Schemes for system design optimization shall focus on reducing the unavailability of ACMs caused by CCFs which is determined by TI, λ_UU of the ACM, and CCF parameter. From the perspective of feasibility, reduction of TI might be more appropriate. In addition, enhancing the capacity of the ACM defending CCF, such as applying diversity, is also an effective approach.

5. Preliminary Optimization Schemes for the System

According to the insights of importance and sensitivity analyses, two preliminary optimization schemes were explored, regarding increase test frequency and adopting different kinds of diverse SCMs. The quantitative evaluations for the improvements were conducted as well.

The probability of the system failing to generate trip signal on demand was calculated under different shorter TI as follows:(i)Case 1: TI for modules is reduced to three months.(ii)Case 2: TI for modules is reduced to one month.

The calculation results are shown in Table 8. It is shown that the probability of system failure on demand decreases significantly when TI reduces. However, the reliability requirement of the system is still not explicitly fulfilled. With consideration of the increased maintenance costs associated with increasing the frequency of the periodic test, this approach is not very promising.

Another potential approach is the use of two kinds of diverse SCMs to improve the capacity of SCMs to defense CCF. It should be recognized that although diverse modules usually achieve the same function through different principles, materials, and so forth, it is inappropriate to assume that diverse modules are completely free of CCF, due to the use of small electronic elements manufactured in a globally standardized environment. More appropriate treatment is to assume that the CCF probability of diverse modules decreases to a certain extent. Calculations for the following three cases were performed:(i)Case 1: the CCF probability of diverse SCMs decreases by 50%(ii)Case 2: the CCF probability of diverse SCMs decreases by 75%(iii)Case 3: the CCF probability of diverse SCMs decreases by 90%

The calculation results are shown in Table 9. It indicates that the use of diverse SCMs would markedly improve system reliability, but even if assuming that the CCF probability is reduced to a level that is almost ideal, the system reliability is still not determinately meeting the target.

The analysis results show that the system reliability requirement cannot be fulfilled only by shortening TI or adopting diverse SCMs. More detailed optimization analysis is needed to determine the final system design optimization scheme, for example, the combination of the above scheme or change of system architecture.

6. Conclusions

In this paper, a safety-related digital four-channel RPS within design phase was assessed by ET/FT method to verify if the system reliability meets specified requirements regarding the function to generate reactor trip signal and to obtain important risk information for design feedback.

The results of the quantitative analysis indicate that the probability of failure on demand of the system to generate trip signal is 1.47 × 10⁻⁶ with a 95% upper bound of 4.63 × 10⁻⁶ and the frequency of the system generating spurious signal is 7.94 × 10⁻⁴/year with a 95% upper bound of 2.50 × 10⁻³. The reliability of the system function regarding generating trip signal on demand does not fulfill the reliability target of the system, that is, below 10⁻⁷.

The importance and sensitivity analyses were performed to identify critical factors which have significant impacts on system reliability and to determine improvement direction. It is found that undetected unsafe CCFs of SCMs dominate the probability of the system failure on demand and TI and λ of the SCMs have very high sensitivity.

Quantitive evaluation for two preliminary optimization schemes relative to the improvement of TI frequency and the use of diverse SCMs was conducted. The analysis results show that neither of them could determinedly improve the system reliability to target level. In the future, more detailed optimization analysis will be performed to determine feasible system design optimization scheme, for example, the combination of the above scheme or change of system architecture.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The author would like to thank the Science and Technology on Reactor System Design Technology Laboratory of Nuclear Power Institute of China for financial support of this work.

References

K. Korsah, R. Wetherington, R. Wood et al., Emerging Technologies in Instrumentation and Controls: An Update, Nuclear Regulatory Commission, Washington, DC, USA, 2006, NUREG/CR-6888 ORNL/TM-2005/75.
T. Aldemir, D. W. Miller, M. P. Stovsky et al., Current State of Reliability Modeling Methodologies for Digital Systems and Their Acceptance Criteria for Nuclear Power Plant Assessments, Nuclear Regulatory Commission, Washington, DC, USA, 2006, NUREG/CR-6901.
T. Aldemir, M. P. Stovsky, J. Kirschenbaum et al., Dynamic Reliability Modeling of Digital Instrumentation and Control Systems for Nuclear Reactor Probabilistic Risk Assessments, Nuclear Regulatory Commission, Washington, DC, USA, 2007, NUREG/CR-6942.
H. G. Kang and S.-C. Jang, “A quantitative study on risk issues in safety feature control system design in digitalized nuclear power plant,” Journal of Nuclear Science and Technology, vol. 45, no. 8, pp. 850–858, 2008.
View at: Publisher Site | Google Scholar
S. J. Lee, W. Jung, and J. E. Yang, “PSA model with consideration of the effect of fault-tolerant techniques in digital I&C systems,” Annals of Nuclear Energy, vol. 87, no. Part 2, pp. 375–384, 2008.
View at: Publisher Site | Google Scholar
S. H. Lee, K. S. Son, W. Jung, and H. G. Kang, “Risk assessment of safety data link and network communication in digital safety feature control system of nuclear power plant,” Annals of Nuclear Energy, vol. 108, pp. 394–405, 2017.
View at: Publisher Site | Google Scholar
J. H. Bickel, “Risk implications of digital reactor protection system operating experience,” Reliability Engineering & System Safety, vol. 93, no. 1, pp. 107–124, 2008.
View at: Publisher Site | Google Scholar
M. Jockenhövel-Barttfeld, S. Karg, C. Hessler et al., “Reliability analyses of digital I&C systems within the verification and validation process,” in Proceedings of the 14th International Probabilistic Safety Assessment & Management Conference (PSAM 14), Los Angeles, CA, USA, September 2018.
View at: Google Scholar
A. Mosleh, D. M. Rasmuson, and F. M. Marshall, Guidelines in Modeling Common Cause Failure in Probabilistic Risk Assessment, Nuclear Regulatory Commission, Washington, DC, USA, 1998, NUREG/CR-5485 NEELIEXT-97-01327.
T. L. Chu, M. Yue, G. Martinez et al., Modeling a Digital Feedwater Control System Using Traditional Probabilistic Risk Assessment Methods, Nuclear Regulatory Commission, Washington, DC, USA, 2009, NUREG/CR-6997 BNL-NUREG-90315-2009.
H. G. Kang and T. Sung, “An analysis of safety-critical digital systems for risk-informed design,” Reliability Engineering & System Safety, vol. 78, no. 3, pp. 307–314, 2002.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Qingzhu Liang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

620

Downloads

831

Citations