1 Introduction

Distributed detection is an application of wireless sensor networks (WSNs) to decide whether an event of interest is happening in a monitored area [1, 2]. Basically, it consists of a set of sensor nodes and a fusion center (FC). Sensor nodes are deployed over this area to observe the event and make local decisions. The sensor nodes send these decisions to the FC over wireless channels. The FC collects the local decisions and applies a fusion rule to compute a final decision about the existence of the event. As a result, designing a distributed detection system will concern about the operations at sensor nodes and the operations at the FC. A basic structure of distributed detection is shown in Fig. 1. Unlike classical detection [3], distributed detection does not involve only detection methods but also communication techniques [4,5,6], communication protocols [7,8,9], transceiver-receiver structures [10,11,12,13], transmission channels [14,15,16], network structures [17,18,19] etc. An ultimate goal of distributed detection is to derive an optimal fusion rule that is aware of this information. The design of fusion rules is known as a cross-layer design problem [20, 21].

Designing distributed detection also encounters a resource-constrained problem. In a large sensor network, the FC might not be able to collect local decisions from all sensor nodes because of limited bandwidth or/and limited time (delay). Therefore, transmission strategies between the FC and sensor nodes play an important role to achieve a desired performance under this resource constraint. When the time/banwidth does not allow the FC to collect local decisions from all sensor nodes, the FC might collect only reliable local decisions. This strategy is known as sensor censoring [15, 22, 23]. To minimize the collection time or to optimize the performance (given a limited time), distributed detection can be designed such that local decisions are sent to the FC in descending order of their reliability known as ordered transmissions [8, 24,25,26] and reliability-based splitting algorithms [9].

A suitable medium access control (MAC) protocol is a key to make the transmission strategies above possible in limited bandwidth and time constraints. A problem of applying these transmission strategies is that the transmission scheduling cannot be performed in advance since decision reliability is not known yet. A random access protocol is a method applicable in this scenario. Distributed detection using slotted ALOHA has been studied in many papers [7,8,9, 20], where sensor nodes randomly choose time slots to send their decisions. However, if two or more decisions are sent at the same time slot, a collision time slot happens. The distributed detection schemes in these papers neglect the collision time slots since the local decisions on these time slots cannot be recovered. Therefore, their performance drops as the number of collisions increases.

Fig. 1
figure 1

A basic structure of distributed detection. Distributed detection generally consists of the following basic components: event model, sensor-node processing, channel model, and fusion-center processing

1.1 Contributions

In this paper, we study a distributed detection system with a large number of sensor nodes N. A time duration T time slots (assume \(T \ll N\)) over a single channel is provided to collect local decisions. If two or more sensor nodes send their local decisions at the same time slot, a packet collision happens and the FC cannot decode the transmitted local decisions inside this time slot. Since T is less than N, the FC can collect only some local decisions but not all of them. For example, in a distributed detection system using the time division multiple access (TDMA) as its MAC protocol, the FC will be able to collect only up to T local decisions. Therefore, the collection time T will limit the performance of this distributed detection.

We are interested in jointly designing a transmission strategy, a MAC protocol, and fusion rules at the FC to improve the performance of the distributed detection with a limited collection time over a single channel. A key design is that, with properly revising a transmission strategy and a random access protocol such that the packet collisions will be from the same local decisions, a packet collision indicates two or more sensor nodes have the same decision. As a result, the packet collisions are informative [27,28,29,30]. The main contributions and results of this paper can be summarized as follows.

We propose a transmission protocol (i.e., a transmission strategy and a MAC protocol) based on a population-splitting algorithm called population-splitting-based random access (PSRA), which is a modification of slotted ALOHA. We use the term “population splitting” because the sensor nodes (i.e., population) are split into groups based on their observation values. By using this algorithm, the observation range is divided into censored regions (unreliable observations) and M uncensored regions (reliable observations). Only groups of sensor nodes whose observations are in the uncensored regions will send their data to the FC. We also divide the collection time T into M frames, where each frame contains K time slots. The sensor nodes whose observations are in the mth uncensored region will send their data in the mth frame. Since the frame itself indicates the observation region (i.e., the local data found in the mth frame corresponds to the observation in the mth uncensored region), we assume that the sensor nodes send a data bit 1 (i.e., no need to make a local decision). However, because we do not know which sensor nodes will send their data bits in the mth frame, a fixed transmission scheduling such as TDMA cannot be applied. A slotted ALOHA protocol is exploited. A sensor node whose observation is in the mth region will send the data bit at a time slot in the mth frame with a probability \(\frac{\rho _m}{K}\), where \(\rho _m\) is called transmission probability at the mth frame. The parameter \(\rho _m\) indicates the probability that a sensor node whose observation is in the mth region will send its data in the mth frame. The scaling \(\frac{1}{K}\) is a normalized factor (per frame). By using a slotted ALOHA protocol, collision time slots, when two or more sensor nodes send their data bits at the same time slot, will happen. However, unlike [7,8,9, 20], we design and propose collision-aware distributed detection whose FC is aware of collisions and utilizes them in making a final decision.

The performance of the collision-aware distributed detection is affected by the transmission probabilities \(\varvec{\rho } = ( \rho _1, \rho _2, \ldots , \rho _M )\), where \(\rho _m\) controls the number of sensor nodes sending the data bits in the mth frame (i.e., network traffic). A higher value of \(\rho _m\) induces a larger number of collision time slots in that frame. We, then, propose a method to approximate the optimal transmission probabilities, which minimize the probability of error. We can show that, for the collision-aware distributed detection scheme, unlike data networks [31], the transmission probabilities maximizing the throughput are not optimal. On the other hand, the numerical results show that the transmission probabilities inducing a lot of collision time slots are optimal. This is because, in the collision-aware distributed detection, the collision time slots are informative.

1.2 Related work

Transmission strategies have been broadly exploited to improve the performance of distributed detection in a resource-constrained scenario. In addition to sensor censoring [15, 22, 23, 32], ordered transmissions [8, 24,25,26], and reliability-based splitting algorithms [9], a transmission strategy called type-based multiple access (TBMA) has been proposed [33, 34] over a multiple access channel. In the TBMA scheme, the observation is quantized into M levels and M orthogonal waveforms \(\{ \phi _m(t) \}\), for \(m = 1, 2, \ldots , M\), are provided. Sensor nodes whose observations are in the mth level will send a waveform \(\phi _m(t)\) to the FC. The received waveforms are then combined to each other. The amplitude of the waveform \(\phi _m(t)\) detected at the FC indicates the number of sensor nodes in the mth level, which will be used further in making a final decision. As an extension of the TBMA, a transmission strategy called type-based random access (TBRA) over a multiple access channel has been proposed in [35]. The term “random” here emphasizes that the number of sensor nodes involving in transmissions is random. Our proposed PSRA is different from the TBMA and TBRA in the following aspects. First, we study a problem of distributed detection when only a single collision channel is provided. Second, in the PSRA, the sensor nodes whose observations are in the mth level will send their data bits in the mth frame using slotted ALOHA. As a result, the FC observes the states of time slots (i.e., idle, successful, collision time slots) in making a final decision.

Random-access protocols have been applied in many distributed detection schemes especially when a proper transmission scheduling is not allowed [7,8,9, 20]. However, packet collisions, as an intrinsic property of random access, deteriorate the performance of the schemes in these papers since the packet collisions are neglected. With properly revising a transmission strategy such that the packet collisions will be from the same data, a packet collision indicates two or more sensor nodes have the same data. As a result, the packet collisions can be used in making a final decision [27,28,29,30]. Similarly, in the proposed PSRA, since the data bits sent in the same frame will be from the observations in the same level, the collision time slots in each frame are meaningful and will be used in making a final decision.

Transmission protocols based on population-splitting algorithms for distributed detection/estimation have been studied in [29, 36, 37], where the sensor nodes will share a collision channel to send their decisions. By using these protocols, the observation range is divided into M levels and the collect time is divided into M frames. The sensor nodes whose observations are in the mth level will send their decisions in the mth frame. The FC observes the time-slot states to make a final decision or compute an estimate. The transmission protocol proposed in this paper is different from those in [29, 36, 37] as follows. Here, we apply the sensor-censoring strategy to a population-splitting algorithm, where the observation range is divided into censored regions (unreliable observations) and uncensored regions (reliable observations). The uncensored regions are further divided into M levels. Only the sensor nodes whose observations are in the uncensored regions will send their data bits in the corresponding frames.

1.3 Organization

The remainder of this paper is organized as follows. The system model is introduced in Sect. 2. Collision-aware distributed detection with a population-splitting algorithm is proposed in Sect. 3. Approximations of the optimal transmission probabilities are determined in Sect. 4. The numerical results are evaluated and discussed in Sect. 5. Finally, conclusions are provided in Sect. 6.

2 System model

2.1 Centralized fusion system

We consider a distributed detection system with N sensor nodes deployed in an area to monitor an event of interest. To start the local-decision collection process, the FC will broadcast an inquiry about the existence of this event. Each sensor node will draw an observation, compute a data bit, and send it to the FC via a single-hop and shared collision channel by using the transmission protocol proposed in Sect. 3.2.

2.2 Transmission channel

The sensor nodes will share a collision channel to send their data bits to the FC. This collision channel is divided into time slots, with the FC and sensor nodes knowing when a time slot begins and ends (i.e., synchronous time slot). In a collision-channel model, a time slot is classified into the following time-slot states [31]:

  • a time slot is called as an idle time slot if no data packets are sent at this time slot,

  • a time slot is called as a successful time slot if only one data packet is sent at this time slot,

  • a time slot is called as a collision time slot if two or more data packets are sent at this time slot.

We assume that the collisions are solely from the transmissions of the sensor nodes in the considered network. The length of each time slot is equal to the packet containing a data bit.

2.3 Binary hypothesis testing model

The noisy observation x at a sensor node is governed by the following binary hypothesis model:

$$\begin{aligned} H_0: \,\, x \sim f_X(x|H_0) \quad \text {and} \quad H_1: \,\, x \sim f_X(x|H_1), \end{aligned}$$
(1)

where \(f_X(x|H_i)\) is the conditional probability density function (PDF) of x. The observations are assumed to be independent and identically distributed (IID) given \(H_i\), for \(i = 0, 1\), and among the sensor nodes and time slots. The prior probability that \(H_0\) happens, \(\text {Pr}(H_0)\), is equal to \(P_0\), and, the prior probability that \(H_1\) happens, \(\text {Pr}(H_1)\), is equal to \(P_1 = 1-P_0\).

3 Methods

3.1 Overview of the proposed method

We consider a distributed detection system with a shared collision channel and a limited collection time. There are N sensor nodes deployed in the area to monitor whether the event \(H_0\) or the event \(H_1\) happens. These sensor nodes will send their decisions over a shared collision channel to the FC. The FC is allowed to collect local decisions within a time duration equal to T time slots, which is less than the number of nodes N. As a result, the FC cannot collect all local decisions from N sensor nodes. To handle this issue and improve the performance of this distributed detection system, we propose collision-aware distributed detection with a population-splitting algorithm.

The proposed method consists of two parts: a transmission protocol called population-splitting-based random access (PSRA) and a detection strategy called collision-aware distributed detection. As shown in Fig. 2, each sensor node will apply the PSRA to send its decision over a shared channel to the FC. By using the PSRA, the collection time (whose length is equal to T time slots) is organized into M frames (each frame consists of K time slots) and the observation range is divided into censored regions and M uncensored regions. The sensor nodes whose observations are within a censored region will decide not to send their decisions to the FC. On the other hand, the sensor nodes whose observations are within the mth uncensored region will send their binary bits \(b=1\) at a time slot in the mth frame by using a slotted Aloha protocol. At the end of the collection time T, the FC will observe the time-slot states (idle, successful, and collision) in each frame and use them to estimate the number of sensor nodes (\({\hat{n}}_m\), for \(1 \le m \le M\)) who send their binary bits in that frame by using the population estimator. Finally, the FC applies these estimated node numbers \({\hat{n}}_1\), \({\hat{n}}_2\), ..., \({\hat{n}}_M\) into the population-based fusion rule to make a final decision whether the event \(H_0\) or the event \(H_1\) happens. The details of the transmission protocol PSRA and the collision-aware distributed detection are explained in Sects. 3.2 and 3.3, respectively.

Fig. 2
figure 2

A diagram to display the proposed method. There are N sensor nodes in the area to monitor the status of the event of interest. These sensor nodes send their decisions to the FC over a shared collision channel by using the PSRA protocol. At the end of the collection time T time slots, the FC observes the time-slot states and make a final decision by using the collision-aware distributed detection

3.2 Population-splitting-based random access

The PSRA is shown in Algorithm 1. The censored regions and uncensored regions are chosen according to the observation reliability [9]. The censored regions cover the ranges of those unreliable observations, which are unlikely to help in detection. On the other hand, the uncensored regions consist of the ranges of reliable observation, which would be useful in detection. Only the sensor nodes whose observations are in the uncensored regions might send their data bits \(b=1\) to the FC.

figure a

The algorithm’s specific details are explained and modeled as follows. Let \(n_m\), for \(1 \le m \le M\), be the number of sensor nodes whose observation \(x \in {\mathcal {U}}_m\). Since the sensor nodes draw an observation at the beginning of the collection time, the variables \(n_1, n_2, \ldots , n_M\) can be modeled as multinomial random variables. The probability mass function (PMF) of \({\mathbf {n}} = (n_1, n_2, \ldots , n_M)\) given \(H_i\) is expressed as

$$\begin{aligned} \text {Pr}({\mathbf {n}} | H_i) = \frac{N!}{n_1! n_2! \cdots n_{M+1}!} q_{1|i}^{n_1} q_{2|i}^{n_2} \cdots q_{M+1|i}^{n_{M+1}}, \end{aligned}$$
(3)

where \(0 \le n_m \le N\), \(q_{m|i} = \int _{\tau _{m}^{L}}^{\tau _{m}^{U}} f_X(x|H_i)\,dx\), \(n_{M+1} = N - \sum _{m=1}^M n_m\), and \(q_{M+1|i} = 1 - \sum _{m=1}^M q_{m|i}\). The set \({\mathbf {n}} = (n_1, n_2, \ldots , n_M)\) can be used to differentiate whether the event \(H_0\) or \(H_1\) is happening. Note that \(q_{m|i}\) is the probability that an observation value will be in the mth uncensored region and \(q_{M+1|i}\) is the probability that an observation value will be in the censored regions.

Each frame is used to indicate an uncensored region \({\mathcal {U}}_m\). The sensor nodes who have their observation \(x \in {\mathcal {U}}_m\) will send their data bits \(b=1\) in the mth frame with a probability \(\rho _m\) (the rest will keep silent). However, these sensor nodes are unknown. Transmission scheduling cannot be arranged in advance. We, then, apply slotted ALOHA to handle the multiple access problem. In each time slot, these sensor nodes will decide to send their data bit with the probability \(\frac{1}{K}\). Similar to [27,28,29], a collision time slot is meaningful and recognized since a collision time slot indicates that there are two or more sensor nodes whose observation \(x \in {\mathcal {U}}_m\) (Fig. 3).

Fig. 3
figure 3

The structure of the collection time. The collection time (T time slots) is divided into M frames, where each frame consists of K time slots

At the end of each time slot, the FC observes the state of that time slot. Let \(d_{k,m}\) be the time-slot state of the kth time slot in the mth frame. We have \(d_{k,m} \in \{ 0,\, \text {S},\, \text {C} \}\), where \(d_{k,m} = 0\), \(d_{k,m} = \text {S}\), and \(d_{k,m} = \text {C}\) indicate the idle time slot, the successful time slot, and the collision time slot, respectively. Therefore, a time slot will be an idle, successful, or collision time slot with the following conditional probabilities:

$$\begin{aligned} \begin{aligned}\text {Pr} &(d_{k,m}=0 | n_m) = p_{0,m} = \left( 1 - \frac{\rho _m}{K} \right) ^{n_m}, \\ \text {Pr} &(d_{k,m}=\text {S} | n_m) = p_{\text {S},m} = n_m \left( \frac{\rho _m}{K} \right) \left( 1-\frac{\rho _m}{K} \right) ^{n_m-1}, \\\text {Pr}& (d_{k,m}=\text {C} | n_m) = p_{\text {C},m} = 1 - p_{0,m} - p_{\text {S},m}. \end{aligned} \end{aligned}$$
(4)

The probabilities \(p_{0,m}\), \(p_{\text {S},m}\), and \(p_{\text {C},m}\) are called as the probability of no transmission, the probability of successful transmission, and the probability of collisions, respectively. As a result, the conditional PMF of \(d_{k,m}\) given \(n_m\) can be expressed as

$$\begin{aligned} {\text {Pr}}(d_{k,m} | n_m) &= ( p_{0,m} )^{{\mathbb{1}}_{\{d_{k,m}=0\}}} ( p_{{\rm S},m} )^{{\mathbb{1}}_{\{d_{k,m}=\text {S}\}}} ( p_{\text {C},m} )^{{\mathbb{1}}_{\{d_{k,m}={\text {C}}\}}}, \end{aligned}$$
(5)

where \({\mathbb{1}}_{\{\cdot \}}\) is the indicator function.

At the end of the mth frame, the FC observes the following time-slot states \(d_{1,m}, d_{2,m}, \ldots , d_{K,m}\), whose joint conditional PMF is

$$\begin{aligned} \text {Pr} ({\mathbf {d}}_m | n_m) = \prod _{k=1}^K \text {Pr}(d_{k,m} | n_m), \end{aligned}$$
(6)

where \({\mathbf {d}}_m = ( d_{1,m}, d_{2,m}, \ldots , d_{K,m} )\). In addition, the FC observes that there are \(z_{0,m}\) idle time slots (i.e., \(d_{k,m} = 0\)), \(z_{\text {S},m}\) successful time slots (i.e., \(d_{k,m} = \text {S}\)), and \(z_{\text {C},m}\) collision time slots (\(d_{k,m} = \text {C}\)) in the mth frame. Therefore, the joint PMF (6) is equivalent to, when the FC observes \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\):

$$\begin{aligned} \text {Pr} ({\mathbf {z}}_m | n_m) = \frac{K!}{ z_{0,m}! z_{\text {S},m}! z_{\text {C},m}! } p_{0,m}^{z_{0,m}} p_{\text {S},m}^{z_{\text {S},m}} p_{\text {C},m}^{z_{\text {C},m}}, \end{aligned}$$
(7)

where \({\mathbf {z}}_m = ( z_{0,m}, z_{\text {S},m}, z_{\text {C},m} )\). Note that \(z_{0,m}+z_{\text {S},m}+z_{\text {C},m} = K\). In addition, we can write the joint conditional PMF of \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\) given \(H_i\) as

$$\begin{aligned} \text {Pr} ({\mathbf {z}}_m|H_i) = \sum _{n_m=0}^{N} \text {Pr} ({\mathbf {z}}_m | n_m) \text {Pr} (n_m | H_i). \end{aligned}$$
(8)

At the end of the collection time, the FC has observed the time-slot states \({\mathbf {z}}_1, {\mathbf {z}}_2, \ldots ,.. {\mathbf {z}}_M\). The joint conditional PMF of \({\mathbf {z}} = ( {\mathbf {z}}_1, {\mathbf {z}}_2, \ldots ,.. {\mathbf {z}}_M )\) given \(H_i\) can be expressed as

$$\begin{aligned} \text {Pr} ({\mathbf {z}} | H_i)&= \prod _{m=1}^M \text {Pr} ({\mathbf {z}}_m | H_i) \\&= \prod _{m=1}^M \sum _{n_m=0}^{N} \text {Pr} ({\mathbf {z}}_m | n_m) \text {Pr} (n_m | H_i). \end{aligned}$$
(9)

Therefore, we devise distributed detection that is aware of these collision time slots in addition to successful time slots and idle time slots as shown in Sect. 3.3, where the FC will exploit the time-slot states \({\mathbf {z}}\) to decide whether \(H_0\) or \(H_1\) is happening.

The transmission probability \(\rho _m\) is a key parameter since it controls the network traffic (the number of transmitting nodes) in the mth frame. As a result, numbers of idle, successful, and collision time slots depend on \(\rho _m\). A large value of \(\rho _m\) induces a large number of transmitting nodes and a large number of collision time slots in the mth frame. There will be a set of transmission probabilities \(\varvec{\rho ^*} = ( \rho _1^*, \rho _2^*, \ldots , \rho _M^* )\) maximizing the performance of the proposed distributed detection, which will be studied in Sects. 4 and 5. We would like to note that finding the optimal thresholds \(\varvec{\nu ^*} = ( \nu _{1}^{L*}, \nu _{2}^{U*}, \ldots , \nu _{I}^{L*}, \nu _{I}^{U*} )\) and \(\varvec{\tau ^*} = ( \tau _{1}^{L*}, \tau _{1}^{U*}, \ldots , \tau _{M}^{L*}, \tau _{M}^{U*} )\), which is also known as quantizer design,Footnote 1 is beyond the scope of this paper.

Fig. 4
figure 4

An example illustrates the idea of the PSRA protocol, where the range of the observation x is divided in to one censored region (\({\mathcal {C}}_1\)) and four uncensored regions (\({\mathcal {U}}_1\), \({\mathcal {U}}_2\), \({\mathcal {U}}_3\), \({\mathcal {U}}_4\)). The collection time (T) is equal to 16 and divided into \(M=4\) and \(K=4\). Assume that there are \(n_m\) sensor nodes with \(x \in {\mathcal {U}}_m\), for \(m=1, 2, 3, 4\). They will send their data bits at a time slot in the mth frame with a probability \(\frac{\rho _m}{K}\). As a result, idle time slots (I), successful time slots (S), and collision time slots (C) will happen

An example showing the main idea of the PSRA is provided in Fig. 4. The range of the observation is divided into a censored region (\({\mathcal {C}}_1\)) and four uncensored regions (\({\mathcal {U}}_1\), \({\mathcal {U}}_2\), \({\mathcal {U}}_3\), and \({\mathcal {U}}_4\)). Assume that the event \(H_i\) is happening. The distribution of observations is \(f_X(x|H_i)\). There is \(n_m\) sensor nodes whose observations \(x \in {\mathcal {U}}_m\). According to the PSRA protocol, these sensor nodes will decide to send their data bits at each time slot in the mth frame with the probability \(\frac{\rho _m}{K}\). As a result, the FC observes the time-slot state \({\mathbf {z}}_m = ( z_{0,m}, z_{\text {S},m}, z_{\text {C},m} )\) in the mth frame, where \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\) are the numbers of the idle, successful, and collision time slots, respectively. The FC will exploit these time-slot states \({\mathbf {z}}_m\), for \(m = 1, 2, \ldots , M\), in making a final decision. Note that if the parallel access channels (PACs)Footnote 2 are assumed, the FC would clearly see the number of sensor nodes \(n_m\), for \(m = 1, 2, \ldots , M\), and would directly use them in making a final decision instead.

3.3 Collision-aware distributed detection: estimate-then-fuse approach

The optimal fusion rule of the proposed distributed detection can be directly derived from \(\frac{\text {Pr} ({\mathbf {z}} | H_1)}{\text {Pr} ({\mathbf {z}} | H_0)}\), where \(\text {Pr} ({\mathbf {z}} | H_i)\) is from (9). However, this optimal fusion rule is complicated and untraceable in analysis, which no insightful meaning can be discovered or proved. Similar to [1, 4,5,6, 10,11,12,13,14,15,16], we aim to devise suboptimal fusion that allows us to derive some meaningful properties.

Fig. 5
figure 5

Diagrams to display the structure of the proposed collision-aware distributed detection

In this section, we consider that the FC compute a final decision based on the estimate-then-fuse approach [29, 36, 37] as shown in Fig. 5a, which consists of two stages: a population estimator of the node numbers and a population-based fusion rule. First, the FC will use the estimator to compute the estimates of \(n_1\), \(n_2\), \(\ldots\), and \(n_M\), which are denoted as \({\hat{n}}_1\), \({\hat{n}}_2\), \(\ldots\), and \({\hat{n}}_M\), respectively, from the observed time-slot states \({\mathbf {z}} = ( {\mathbf {z}}_1, {\mathbf {z}}_2, \ldots , {\mathbf {z}}_M )\). Thereafter, the FC exploits these estimates \({\hat{\mathbf {n}}} = ( {\hat{n}}_1, {\hat{n}}_2, \ldots , {\hat{n}}_M )\) in the proposed population-based fusion rule. The population estimator and the population-based fusion rule are described next.

In the first stage, the FC will estimate the number of sensor nodes \(n_m\) in each frame by exploiting the population estimator. The details are shown in the proposition below. Note that, from now on, we omit the fact that \(n_m\) is an integer and, then, consider \(n_m\) as a real number instead. Therefore, we can derive and prove some meaningful properties. We would like to note that Proposition 1 and Corollary 1 have been shown in [36, 37].

Proposition 1

(Population estimate) The maximum-likelihood (ML) estimate of the number of sensor nodes whose \(x \in {\mathcal {U}}_m\) based on the observed time-slot states \({\mathbf {z}}_m\) is

$${\hat{n}}_m = \underset{n_m \in [0,\,N]}{\arg \max } \, \Big [ z_{0,m} \log (p_{0,m}) + z_{\text {S},m} \log (p_{\text {S},m}) + z_{\text {C},m} \log (p_{\text {C},m}) \Big ].$$
(10)

For a large K, the PDF of \({\hat{n}}_m\) can be asymptotically expressed as a Gaussian PDF: \({\hat{n}}_m \overset{a}{\sim } {\mathcal {N}} \big (n_m,\, \upsilon _m^2 \big )\), where \(\overset{a}{\sim }\) denotes “asymptotically distributed according to” and, for a large \(n_m\) and \(n_m \gg K\),

$$\begin{aligned} \upsilon _m^2 \approx \Bigg \{ K \Big ( \frac{1-p_{\text {C},m}}{p_{\text {C},m}} \Big ) \bigg [ \log \Big ( 1-\frac{\rho _m}{K} \Big ) \bigg ]^2 \Bigg \}^{-1}. \end{aligned}$$
(11)

Proof

An outline of the proof has been shown in [36, 37]. The full derivation is shown in Appendix A. □

Note that the variance \(\upsilon _m^2\) will affect the quality of the proposed distributed detection, which will be discussed in the next section. In addition, we can show that \(\log \text {Pr} ({\mathbf {z}}_m | n_m)\) is a concave function of \(n_m\), and, then, we have an alternative form of the population estimate \({\hat{n}}_m\) as shown in Corollary 1 below.

Corollary 1

(Equivalent population estimate of \(n_m\)) For a large \(n_m\), the ML estimate \({\hat{n}}_m\) obtained from (10) is equal to the value \(n_m\) satisfying

$$\begin{aligned} \frac{1-p_{\text {C},m}}{p_{\text {C},m}} = \frac{K-z_{\text {C},m}}{z_{\text {C},m}}. \end{aligned}$$
(12)

Note that \(n_m\) is inside \(p_{\text {C},m}\).

Proof

An outline of the proof has been shown in [36, 37]. The full derivation is shown in Appendix B. □

From Corollary 1, we see that only the number of collision time slots \(z_{\text {C},m}\) is needed to estimate \(n_m\). As a result, the input to the diagram of the proposed distributed detection can be revised as shown in Fig. 5b.

In the second stage, the FC will apply the estimates \({\hat{\mathbf {n}}} = ( {\hat{n}}_1, {\hat{n}}_2, \ldots , {\hat{n}}_M )\) into the population-based fusion rule to compute a final decision. The population-based fusion rule is described in the proposition below.

Proposition 2

(Population-based fusion rule given \({\hat{\mathbf {n}}}\)) Given a set of estimated numbers of sensor nodes, \({\hat{\mathbf {n}}} = ( {\hat{n}}_1, {\hat{n}}_2, \ldots , {\hat{n}}_M )\). The population-based fusion rule is expressed as

(13)

where the decision threshold \(\gamma\) is adjusted to achieve the desired performance.

Proof

The test statistic \(\Lambda\) is directly obtained from the following log-likelihood ratio \(\Lambda = \frac{\text {Pr}( {\mathbf {n}} | H_1)}{\text {Pr}( {\mathbf {n}} | H_0)} \bigg {|}_{{\mathbf {n}} = {\hat{\mathbf {n}}}}\), where \({\mathbf {n}} = ( n_1, n_2, \ldots , n_M )\) will be substituted by the estimates \({\hat{\mathbf {n}}} = ( {\hat{n}}_1, {\hat{n}}_2, \ldots , {\hat{n}}_M )\). From (3), we can show that

$$ \frac{\text {Pr}( {\mathbf {n}} | H_1)}{\text {Pr}( {\mathbf {n}} | H_0)} = N \log \Big ( \frac{q_{M+1|1}}{q_{M+1|0}} \Big ) + \sum _{m=1}^M n_m \log \bigg [ \Big ( \frac{q_{m|1}}{q_{M+1|1}} \Big ) - \log \Big ( \frac{q_{m|0}}{q_{M+1|0}} \Big ) \bigg ]. $$
(14)

Since the first term on the right-hand side of the equation above is a constant, we can write the test statistic \(\Lambda\) as shown in (13) after replacing \({\mathbf {n}}\) with \({\hat{\mathbf {n}}}\). □

Recall that the collision-aware distributed detection here is parameterized by the number of sensor nodes N, the number of frames M, and the number of time slots in a frame K. It is worth to mention the computational complexity of the proposed distributed detection according to these parameters. In the first stage, the FC will estimate the number of sensor nodes \({\hat{n}}_m\) in each frame according to (10). To do this, the FC will observe and count the numbers of time slot states in each frame (i.e., \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\)), compute the probabilities in (4), and, then, check all \(n_m \in [0, N]\) by using (10). As a result, for M frames, the first stage incurs the complexity \({\mathcal {O}}(MK+MN)\). In the second stage, to make a final decision, given the estimate \({\hat{n}}_m\) from the first stage, the FC applies the fusion rule (13) which requires the complexity \({\mathcal {O}}(M)\). Finally, the overall computational complexity of the proposed distributed detection is of order \({\mathcal {O}}(MK+MN+M)\).

4 Approximations of optimal transmission probabilities

The performance of the proposed distributed detection can be shown via the probability of detection (\(P_\text {D}\)) and the probability of false alarm (\(P_\text {F}\)). These probabilities are affected by a set of transmission probabilities \(\varvec{\rho } = ( \rho _1, \rho _2, \ldots , \rho _M )\). A small transmission probability \(\rho _m\) will allow only a few sensor nodes to send their data bits in the mth frame, and, then, induce a large number of idle time slots in the mth frame. On the other hand, a large transmission probability \(\rho _m\) will allow many sensor nodes to send their data bits in the mth frame, and, then, induce a large number of collision time slots in the mth frame. The transmission probabilities \(\varvec{\rho } = ( \rho _1, \rho _2, \ldots , \rho _M )\) will impact on the estimation’s quality of the ML estimator (shown in Proposition 1), and, then, the final decision’s quality of the population-based fusion rule (shown in Proposition 3). Therefore, there will be a set of transmission probabilities \(\varvec{\rho ^*} = ( \rho _1^*, \rho _2^*, \ldots , \rho _M^* )\) optimizing the performance of the proposed distributed detection. However, finding the optimal transmission probabilities \(\varvec{\rho ^*}\) numerically will be cumbersome. An analytical way is needed. In what follows, we will derive approximations of the transmission probabilities minimizing the proposed distributed detection’s error probability.

We need to derive the conditional PDF of \(\Lambda\) given \(H_i\), denoted by \(\text {Pr}(\Lambda |H_i)\). However, the exact form of \(\text {Pr}(\Lambda |H_i)\) is not yet found. Instead, we will derive an asymptotic approximation of \(\text {Pr}(\Lambda |H_i)\), denoted by \(\tilde{\text {Pr}} (\Lambda | H_i)\), as shown in the proposition below.

Proposition 3

(Asymptotic PDF approximation) An approximation of the conditional PDF of \(\Lambda\) given \(H_i\), denoted by \(\tilde{\text {Pr}} (\Lambda | H_i)\), can be expressed as a Gaussian PDF \({\mathcal {N}} \big (\mu _i, \sigma ^2_i \big )\), for \(i = 0, 1\), whose

$$\begin{aligned}\mu _i = N \sum _{m=1}^M q_{m|i} \bigg [ \log \Big ( \frac{q_{m|1}}{q_{M+1|1}} \Big ) - \log \Big ( \frac{q_{m|0}}{q_{M+1|0}} \Big ) \bigg ], \end{aligned}$$
(15)
$$\begin{aligned} \sigma _i^2 = \frac{1}{K} \sum _{m=1}^M \frac{ \Big [ \log \big ( \frac{q_{m|1}}{q_{M+1|1}} \big ) - \log \big ( \frac{q_{m|0}}{q_{M+1|0}} \big ) \Big ]^2 }{ \big ( \frac{1-{\bar{p}}_{\text {C},m|i}}{{\bar{p}}_{\text {C},m|i}} \big ) \Big [ \log \big ( 1-\frac{\rho _m}{K} \big ) \Big ]^2 }, \end{aligned}$$
(16)

where \({\bar{n}}_{m|i} = N q_{m|i}\), \({\bar{p}}_{\text {C},m|i} = 1 - {\bar{p}}_{0,m|i} - {\bar{p}}_{\text {S},m|i}\), \({\bar{p}}_{0,m|i} = (1-\frac{\rho _m}{K} )^{{\bar{n}}_{m|i}}\), and \({\bar{p}}_{\text {S},m|i} = {\bar{n}}_{m|i} ( \frac{\rho _m}{K} ) ( 1-\frac{\rho _m}{K} )^{{\bar{n}}_{m|i}-1}\).

Proof

Please see Appendix C. □

The value \({\bar{n}}_{m|i}\) is the average number of sensor nodes whose \(x \in {\mathcal {U}}_m\) (i.e., in the mth frame) given \(H_i\). Correspondingly, the probabilities \({\bar{p}}_{0,m|i}\), \({\bar{p}}_{\text {S},m|i}\), and \({\bar{p}}_{\text {C},m|i}\) are computed at \({\bar{n}}_{m|i}\). We see that the variance \(\sigma _i^2\) from (16) is a weighted sum of the variance \(\upsilon _m^2\) from (11). Therefore, the quality of the estimates \(\varvec{{\hat{n}}}\) from the population estimator will directly affect the quality of the decision making obtained from the population-based fusion rule.

Subsequently, we will approximate a set of transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\) minimizing the probability of error \(P_{\text {E}} = P_0 P_{\text {F}} + P_1 P_{\text {M}}\), where \(P_{\text {M}}\) is the probability of miss and equal to \(1 - P_{\text {D}}\). From Proposition 3, we have the following approximations of the probability of detection \(P_{\text {D}}\) and the probability of false alarm \(P_{\text {F}}\):

$$\begin{aligned} P_{\text {D}}&= \tilde{\text {Pr}} (\Lambda > \gamma | H_1) = Q \Big ( \frac{\gamma -\mu _1}{\sigma _1} \Big ), \end{aligned}$$
(17)
$$\begin{aligned} P_{\text {F}}&= \tilde{\text {Pr}} (\Lambda > \gamma | H_0) = Q \Big ( \frac{\gamma -\mu _0}{\sigma _0} \Big ), \end{aligned}$$
(18)

where the conditional probability \(\tilde{\text {Pr}} (\Lambda | H_i)\), for \(i=0, 1\), is defined in Proposition 3, and \(Q(\cdot )\) is the Q-function. In order to find \(\varvec{\rho ^{\star }}\), we have the following optimization problem:

$$\begin{aligned} \varvec{\rho ^{\star }} = \underset{\varvec{\rho } \in [0,\,1]^M}{\arg \,\min }\,\, \big ( P_0 P_{\text {F}} + P_1 P_{\text {M}} \big ). \end{aligned}$$
(19)

The necessary conditions of \(\varvec{\rho ^{\star }}\) in (19) are derived in the proposition below.

Proposition 4

(Necessary conditions of \(\varvec{\rho ^{\star }}\)) The transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star },\, \rho _2^{\star }, \ldots ,\, \rho _M^{\star } )\) in (19) must satisfy the following conditions:

  1. (i)

    Let \(g_m(\rho _m)\) be a function of \(\rho _m\) defined as

    $$\begin{aligned} \frac{ \frac{ (1-{\bar{p}}_{\text {C},m|0})^2 }{ 2 {\bar{p}}_{\text {C},m|0} (1-{\bar{p}}_{\text {C},m|0}) - (1-{\bar{n}}_{m|0}) {\bar{p}}_{\text {S},m|0} \big [ \log \big ( 1 - \frac{\rho _m}{K} \big ) \big ] } }{ \frac{ (1-{\bar{p}}_{\text {C},m|1})^2 }{ 2 {\bar{p}}_{\text {C},m|1} (1-{\bar{p}}_{\text {C},m|1}) - (1-{\bar{n}}_{m|1}) {\bar{p}}_{\text {S},m|1} \big [ \log \big ( 1 - \frac{\rho _m}{K} \big ) \big ] } }, \end{aligned}$$
    (20)

    for all m, where \({\bar{n}}_{m|i}\), \({\bar{p}}_{\text {S},m|i}\), and \({\bar{p}}_{\text {C},m|i}\) are defined in Proposition 3. We have

    $$\begin{aligned} g_1(\rho _1^{\star }) = g_2(\rho _2^{\star }) = \ldots = g_M(\rho _M^{\star }). \end{aligned}$$
    (21)
  2. (ii)

    The function \(g_m(\rho _m^{\star })\), for all m, must be equal to

    $$\begin{aligned} \Big ( \frac{\sigma _1^{\star }}{\sigma _0^{\star }} \Big )^3 \Bigg [ \frac{ P_0 \varphi \big ( \frac{\gamma -\mu _0}{\sigma _0^{\star }} \big ) (\gamma -\mu _0) }{ P_1 \varphi \big ( \frac{\gamma -\mu _1}{\sigma _1^{\star }} \big ) (\gamma -\mu _1) } \Bigg ], \end{aligned}$$
    (22)

    where \(\varphi (\cdot )\) is the standard Gaussian PDF and \(\sigma _i^{\star }\) is the standard deviation \(\sigma _i\) in (16) with substituting \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\).

  3. (iii)

    With properly choosing \(\gamma\) such that \(\mu _0 \le \gamma \le \mu _1\), the value \(g_m(\rho _m^{\star })\) is less than or equal to zero for all m.

Proof

Please see Appendix D. □

The necessary conditions in Proposition 4 help us in searching for the transmission probabilities \(\varvec{\rho ^{\star }}\). First, we can find a range of feasible values of \(\rho _m^{\star }\) by using the necessary conditions (i) and (iii). Thereafter, we limit to specific candidates of \(\rho _m^*\) by using the necessary condition (ii). An example of finding the transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\) is explained below.

Fig. 6
figure 6

Illustrations of Examples 1 and 2

Example 1

Consider the observation x governed by the following binary hypothesis testing model

$$\begin{aligned} H_0: \,\, x \sim {\mathcal {N}}(0,5) \quad \text {and} \quad H_1: \,\, x \sim {\mathcal {N}}(1,5), \end{aligned}$$
(23)

where \({\mathcal {N}}(a,b)\) is the Gaussian PDF whose mean and variance are equal to a and b, respectively. Assume further that \(P_0 = P_1 = \frac{1}{2}\) and \(N=400\). We consider the proposed distributed detection when \(T=80\), \(M=4\), \(K=20\), and \(\gamma = \frac{1}{2}\). The observation range is divided into one censored region:

$$\begin{aligned} {\mathcal {C}}_1 = \{x: 0 < x \le 1 \}, \end{aligned}$$
(24)

and four uncensored regions:

$$\begin{aligned}&{\mathcal {U}}_1 = \{x: -\infty< x \le -1.385 \}, \\ &{\mathcal {U}}_2 = \{x: -1.385< x \le 0 \}, \\&{\mathcal {U}}_3 = \{x: 1< x \le 2.385 \}, \\&{\mathcal {U}}_4 = \{x: 2.385< x < \infty \}.\end{aligned}$$
(25)

Note that the thresholds in the uncensored regions are selected such that the probability \(q_{m|0} P_0 + q_{m|1} P_1\), for \(m=1, 2, 3, 4\), are identical (specifically, \(q_{m|0} P_0 + q_{m|1} P_1 = \frac{1-(P_0 q_{M+1|0}+P_1 q_{M+1|1})}{M}\)). The function \(g_m(\rho _m)\) defined in (20), for \(m=1, 2, 3, 4\), are shown in Fig. 6a. Only the negative values are displayed because of the necessary condition (iii). From (21) and the fact that \(g_m(\rho _m^*) \le 0\), for all m, Fig. 6b shows that the feasible values of \(\rho _1^{\star }\), \(\rho _2^{\star }\), \(\rho _3^{\star }\), and \(\rho _4^{\star }\) are

$$\begin{aligned}& 0.261 \le \rho _1^{\star } \le 0.486, \quad 0.301 \le \rho _2^{\star } \le 0.379,\\ &0.301 \le \rho _3^{\star } \le 0.379, \quad 0.261 \le \rho _4^{\star } \le 0.486. \end{aligned}$$

To find the transmission probabilities \(\rho _1^{\star }\), \(\rho _2^{\star }\), \(\rho _3^{\star }\), and \(\rho _4^{\star }\), we vary \(\rho _1\), \(\rho _2\), \(\rho _3\), and \(\rho _4\) within the feasible ranges identified above such that the necessary condition (ii) is true. The corresponding transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \rho _3^{\star }, \rho _4^{\star } )\) minimize the probability of error in (19). As a result, we obtain \(\rho _1^{\star } = 0.282\), \(\rho _2^{\star } = 0.324\), \(\rho _3^{\star } = 0.324\), and \(\rho _4^{\star } = 0.282\). □

A feasible region of \(\rho _m^{\star }\) as shown in the example above can be determined in the following way.

Proposition 5

(A feasible region of \(\rho _m^{\star }\)) The optimal transmission probabilities \(\rho _m^{\star }\) is between \(\rho _{m|0}^{\star }\) and \(\rho _{m|1}^{\star }\), where \(\rho _{m|i}^{\star }\) is obtained from

$$\begin{aligned} \frac{(1-{\bar{p}}_{\text {C},m|i}^{\star }){\bar{p}}_{\text {C},m|i}^{\star }}{{\bar{p}}_{\text {S},m|i}^{\star }} = \frac{(1-{\bar{n}}_{m|i})}{2} \bigg [ \log \Big ( 1- \frac{\rho _{m|i}^{\star }}{K} \Big ) \bigg ], \end{aligned}$$
(26)

\({\bar{p}}_{\text {C},m|i}^{\star } = 1 - {\bar{p}}_{0,m|i}^{\star } - {\bar{p}}_{\text {S},m|i}^{\star }\), \({\bar{p}}_{0,m|i}^{\star } = \big ( 1-\frac{\rho _{m|i}^{\star }}{K} \big )^{{\bar{n}}_{m|i}}\), and \({\bar{p}}_{\text {S},m|i}^{\star } = {\bar{n}}_{m|i} \big ( \frac{\rho _{m|i}^{\star }}{K} \big ) \big ( 1-\frac{\rho _{m|i}^{\star }}{K} \big )^{{\bar{n}}_{m|i}-1}\). The transmission probabilities \(\rho _{1|0}^{\star }\), \(\rho _{2|0}^{\star }\), \(\ldots\), \(\rho _{M|0}^{\star }\) minimize the probability of false alarm \(P_{\text {F}}\). The transmission probabilities \(\rho _{1|1}^{\star }\), \(\rho _{2|1}^{\star }\), \(\ldots\), \(\rho _{M|1}^{\star }\) minimize the probability of miss \(P_{\text {M}}\).

Proof

Please see Appendix E. □

Consider Example 1. We find the transmission probabilities \(\rho _{1|0}^{\star }\) and \(\rho _{1|1}^{\star }\) by using (26). As a result, we have \(\rho _{1|0}^{\star } = 0.261\) and \(\rho _{1|1}^{\star } = 0.486\). From Proposition 5, the feasible region of \(\rho _1^{\star }\) is \(0.261 \le \rho _1^{\star } \le 0.486\), which is similar to that specified in Example 1.

The transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\) are easily found when the following conditions are true.

Corollary 2

(Special case) Assume M is even, the threshold \(\gamma = \frac{1}{2} (\mu _0 + \mu _1)\), and \(q_{m|0} = q_{M+1-m|1}\) for all m. The transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star },\, \rho _2^{\star }, \ldots ,\, \rho _M^{\star } )\) in (19) must satisfy the following conditions:

  1. (i)

    The optimal transmission probabilities satisfy

    $$\begin{aligned} \rho _m^{\star } = \rho _{M+1-m}^{\star }, \quad \forall m. \end{aligned}$$
    (27)
  2. (ii)

    The function \(g_m(\rho _m^{\star })\) defined in (20) must be equal to \(-\frac{P_0}{P_1}\), for all m.

Proof

First, we can prove the condition (i) as follows. From the given assumption that \(q_{m|0} = q_{M+1-m|1}\), we have \({\bar{n}}_{m|0} = {\bar{n}}_{M+1-m|1}\), where \({\bar{n}}_{m|i} = N q_{m|i}\). Recall that the function \(g_{m} (\rho _m)\) defined in (20) is a function of \({\bar{n}}_{m|0}\), \({\bar{n}}_{m|1}\), and \(\rho _m\). Since \({\bar{n}}_{m|0} = {\bar{n}}_{M+1-m|1}\) and \(g_{m} (\rho _m^{\star }) = g_{M+1-m} (\rho _{M+1-m}^{\star })\) from (21), we have \(\rho _m^{\star } = \rho _{M+1-m}^{\star }\) for all m.

Second, we can prove the condition (ii) as follows. Since M is even, \(q_{m|0} = q_{M+1-m|1}\), \({\bar{n}}_{m|i} = N q_{m|i}\), and \(\rho _m^{\star } = \rho _{M+1-m}^{\star }\), we have \(\mu _0 = -\mu _1\) and \(\sigma _0^{\star } = \sigma _1^{\star }\), where \(\mu _i\) and \(\sigma _i^2\) are defined in (15) and (16), respectively. Recall that \(\sigma _i^{\star }\) is the standard deviation \(\sigma _i\) in (16) with substituting \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\). In addition, from the given assumption that \(\gamma = \frac{1}{2} (\mu _0 + \mu _1)\), we have \(\gamma = 0\). As considering (22), we have \((\gamma -\mu _0) = -(\gamma -\mu _1)\) and \(\varphi \big ( \frac{\gamma -\mu _0}{\sigma _0^{\star }} \big ) = \varphi \big ( \frac{\gamma -\mu _1}{\sigma _1^{\star }} \big )\). As a result, (22) is reduced to \(-\frac{P_0}{P_1}\), which is the value of \(g_m(\rho _m^{\star })\). □

The example below shows how easy to find the optimal transmission probabilities \(\varvec{\rho ^{\star }}\) when the assumptions in Corollary 2 are true.

Example 2

Reconsider Example 1. We can see that the parameter setup in Example 1 follows the assumptions in Corollary 2. From Corollary 2, we will have \(\rho _1^{\star } = \rho _4^{\star }\), \(\rho _2^{\star } = \rho _3^{\star }\), and \(g_m(\rho _m^{\star }) = -1\), for all m. Considering Fig. 6b, we obtain \(\rho _1^{\star } = 0.282\), \(\rho _2^{\star } = 0.324\), \(\rho _3^{\star } = 0.324\), and \(\rho _4^{\star } = 0.282\). □

5 Results and discussion

In this section, we will evaluate and show the performance of the collision-aware distributed detection with the PSRA. Throughout this section, we assume that the observation x is governed by the model shown in (23) in Example 1 whose observation signal-to-noise ratio (SNR) is equal to \(-7\) dB. As a result, according to [9], the reliability of the observation will be equal to \(|x - 0.5|\). The value of x that is further away from 0.5 is more reliable. Therefore, as seen later, our censored regions (cover unreliable observations) will be around \(x=0.5\). Note that the probabilities of error \(P_{\text {E}}\) shown in this section are obtained from simulation and the probabilities \(P_0\) and \(P_1\) are equal to \(\frac{1}{2}\).

Fig. 7
figure 7

Simulation results showing the role of the transmission probability on the proposed distributed detection’s network traffic and probability of error \(P_{\text {E}}\). The parameters are set up as follows: \(N=500\), \(T=30\), \(M=2\), and \(K=15\). The range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x: -1 < x \le 2\}\) and two uncensored regions \({\mathcal {U}}_1 = \{x: -\infty < x \le -1\}\) and \({\mathcal {U}}_2 = \{x: 2< x < \infty \}\)

5.1 Network traffic and optimal transmission probabilities

In a data network using slotted ALOHA, the optimal transmission probability will be the probability that maximizing the network throughput (i.e., the probability of successful transmission) [31]. However, in this paper, our optimal transmission probabilities \(\varvec{\rho ^*} = (\rho _1^*, \rho _2^*, \ldots , \rho _M^*)\) are to minimize the proposed distributed detection’s error probability \(P_{\text {E}}\). We study the role of the transmission probabilities on the probability of error \(P_{\text {E}}\) in the following scenario. We consider a distributed detection system with 500 sensor nodes. The collection time T is equal to 30 time slots, which is divided into two frames. Each frame consists of 15 time slots. The observation range is divided into one censored region \({\mathcal {C}}_1 = \{x: -1 < x \le 2\}\) and two uncensored regions \({\mathcal {U}}_1 = \{x: -\infty < x \le -1\}\) and \({\mathcal {U}}_2 = \{x: 2< x < \infty \}\). We have \(q_{1|0} = q_{2|1}\) and \(q_{1|1} = q_{2|0}\). The probability that the event \(H_i\) (i.e., \(P_i\)) happens is equal to \(\frac{1}{2}\). As a result, we have \(P_0 q_{1|0} + P_1 q_{1|1} = P_0 q_{2|0} + P_1 q_{2|1}\).

According to the parameter setup above, the transmission probability \(\rho _1^*\) will be equal to the transmission probability \(\rho _2^*\). Therefore, we vary the transmission probability \(\rho _m\) such that \(\rho _1 = \rho _2 = \rho\). From simulations, the probability of error \(P_{\text {E}}\) versus \(\rho\) is shown in Fig. 7. In addition, we show the network traffic (which is measured by \(p_{0,m|i}\), \(p_{\text {S}, m|i}\), and \(p_{\text {C}, m|i}\)) versus \(\rho\) for \(m=1, 2\) and \(i = 0, 1\) in Fig. 7a–d. We can see that the transmission probabilities minimizing \(P_{\text {E}}\) are \(\rho _1^* = \rho _2^* \approx 0.33\). We further notice that the optimal transmission probabilities \(\varvec{\rho ^*} = (\rho _1^*, \rho _2^*, \ldots , \rho _M^*)\) do not maximize the network throughput (i.e., \(p_{\text {S},m|i}\)). However, by using the optimal transmission probabilities \(\varvec{\rho ^*} = (\rho _1^*, \rho _2^*, \ldots , \rho _M^*)\), a lot of collisions occur (i.e., high \(p_{\text {C},m|i}\)) in each frame. A reason is that, unlike the sensor networks whose collisions are non-informative and, then, neglected [7,8,9, 20, 31], in the proposed distributed detection, the collision time slots can be exploited in making a final decision.

Fig. 8
figure 8

A comparison between the optimal transmission probability \(\rho _1^*\) obtained from simulations and the approximation \(\rho _1^{\star }\) obtained from Proposition 4. The parameters are set up as follows: \(T=30\), \(M=2\), and \(K=15\). The range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x: -1 < x \le 2\}\) and two uncensored regions \({\mathcal {U}}_1 = \{x: -\infty < x \le -1\}\) and \({\mathcal {U}}_2 = \{x: 2< x < \infty \}\)

Next, we show the optimal transmission probabilities \(\varvec{\rho ^*} = (\rho _1^*, \rho _2^*, \ldots , \rho _M^*)\) (minimizing the probability \(P_{\text {E}}\)) when increasing N in Fig. 8. The same scenario described above is assumed except the value of N. Since \(\rho _1^*\) is equal to \(\rho _2^*\), only \(\rho _1^*\) is shown. Increasing N means that there are more sensor nodes in each uncensored region \({\mathcal {R}}_m\). Recall that these sensor nodes will send their data bits in the mth frame with the transmission probability \(\rho _m\). The number N and the probability \(\rho _m\) will affect to the numbers of time-slot states \((z_{0,m}, z_{\text {S},m}, z_{\text {C},m})\). For a fixed \(\rho _m\), increasing N will result in higher collision time slots \(z_{\text {C},m}\) and lower idle time slots \(z_{0,m}\). The proposed scheme exploits the numbers of time-slot states \((z_{0,m}, z_{\text {S},m}, z_{\text {C},m})\) to differentiate between the event \(H_0\) and the event \(H_1\). As a result, increasing N lowers the optimal transmission probability \(\rho _1^*\) such that the corresponding time-slot states \((z_{0,m}, z_{\text {S},m}, z_{\text {C},m})\) are best used in detection.

In addition, we also show the transmission probability \(\rho _1^{\star }\), which is an approximation of the optimal transmission probability \(\rho _1^*\), versus N in Fig. 8. The probability \(\rho _1^{\star }\) is easily obtained from Proposition 4. We see that there is an acceptable gap between \(\rho _1^*\) and \(\rho _1^{\star }\), which is smaller when increasing N.

5.2 Performance comparison

In this section, we demonstrate

  • the estimated number of sensor nodes \({\hat{n}}_m\) obtained from the population estimator (10),

  • the effect of N on the probability of error \(P_{\text {E}}\) of the proposed distributed detection scheme,

  • the effect of censored regions on the probability of error \(P_{\text {E}}\) of the proposed distributed detection scheme.

However, finding the optimal transmission probabilities \(\varvec{\rho ^*} = (\rho _1^*, \rho _2^*, \ldots , \rho _M^*)\) is cumbersome. Therefore, we will find the transmission probabilities \(\varvec{\rho ^{\star }} = ( \rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star } )\), which can be obtained from Proposition 4, and use them instead.

Figure 9 compares the estimated number of sensor nodes \({\hat{n}}_m\) obtained from (10) with the actual number of sensor nodes \(n_m\) in the mth frame for 50 trials when \(M=2\), \(T=60\), \(K=30\), and \(N=500\). We assume that the event \(H_1\) is happening. The other parameters are specified in the figure’s caption. We see that the estimate \({\hat{n}}_m\) has fluctuated around the actual number \(n_m\).

Fig. 9
figure 9

A comparison between the estimated number of sensor nodes \({\hat{n}}_m\) and the actual number of sensor nodes \(n_m\) in the mth frame for 50 trials when \(M=2\), \(T=60\), \(K=30\), and \(N=500\). We assume that the event \(H_1\) is happening. The range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x: -1 < x \le 2\}\) and two uncensored regions \({\mathcal {U}}_1 = \{x: -\infty < x \le -1\}\) and \({\mathcal {U}}_2 = \{x: 2< x < \infty \}\). As a result, we have \(q_{m|0} P_0 + q_{m|1} P_1\), for \(1 \le m \le 2\), are identical. The approximations of the optimal transmission probabilities \(\varvec{\rho ^{\star }} = (\rho _1^{\star }, \rho _2^{\star })\) are obtained from Proposition 4, where we have \(\rho _1^{\star } = \rho _2^{\star } = 0.28\)

Figure 10 shows the effect of N on the proposed distributed detection (specified as PSRA) when \(M=2\) and \(M=6\). The parameter setup is specified in the figure’s caption. Recall that for a given K (the number of time slots in each frame), there will be a set of \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\) that are best used in differentiating between the event \(H_0\) and the event \(H_1\). Increasing N will introduce higher number of sensor nodes in each observation interval. The transmission probabilities \(\varvec{\rho } = (\rho _1, \rho _2, \ldots , \rho _M)\) are used to adjust the number \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\), which will be further exploited by the FC to make a final decision. By using the transmission probabilities \(\varvec{\rho ^{\star }} = (\rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star })\), a set of suitable \(z_{0,m}\), \(z_{\text {S},m}\), and \(z_{\text {C},m}\) will be seen by the fusion center. This is roughly independent on N. Therefore, increasing N slightly affects the probability \(P_{\text {E}}\).

Fig. 10
figure 10

Simulation results showing the probability of error \(P_{\text {E}}\) of the proposed distributed detection versus N. In addition, we compare the probability of error \(P_{\text {E}}\) of the proposed distributed detection (specified as PSRA) to the distributed detection using TDMA (specified as TDMA). The parameters are set up as follows. The collection time T is equal to 60 time slots. The number of frames (M) is specified in the figure. The approximations of the optimal transmission probabilities \(\varvec{\rho ^{\star }} = (\rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star })\) from Proposition 4 are applied. The range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x: -1 < x \le 2\}\) and M uncensored regions \({\mathcal {U}}_1\), \({\mathcal {U}}_2\), \(\ldots\), \({\mathcal {U}}_M\). The thresholds of \({\mathcal {U}}_m\) which are \(\tau _m^L\) and \(\tau _m^U\), are selected such that \(q_{m|0} P_0 + q_{m|1} P_1\), for all m, are identical

In Fig. 10, we also show the probability of error \(P_{\text {E}}\) of the distributed detection using a time division multiple access (TDMA) protocol. In the TDMA protocol, each sensor node is assigned a specific time slot to send its data in advance to avoid packet collisions. In our scenario, since \(T < N\), a group of 60 sensor nodes are randomly selected and assigned exclusive time slots to send their local decisions. The local decision is computed as follows: if the observation x is larger than 0.5 (i.e., the local threshold obtained from our scenario assumption), the local decision is one; otherwise, the local decision is zero. The FC receives 60 local decisions without collisions and sums them together to compute its test statistic. If the test statistic is larger than zero, the FC announces the event \(H_1\); otherwise, the FC announces the event \(H_0\). As shown in the figure, the corresponding probability \(P_{\text {E}}\) of the TDMA-based distributed detection is about 0.0647, which is higher than that of the proposed distributed detection.

Fig. 11
figure 11

Simulation results showing the probability of error \(P_{\text {E}}\) of the proposed distributed detection versus the censored area (which is equal to \(P_0 q_{M+1|0} + P_1 q_{M+1|1}\)). In addition, we compare the probability of error \(P_{\text {E}}\) of the proposed distributed detection (specified as PSRA) to the distributed detection using TDMA (specified as TDMA). The parameters are set up as follows: \(N=800\) and \(T=60\). The number of frames (M) is specified in the figure. The approximations of the optimal transmission probabilities \(\varvec{\rho ^{\star }} = (\rho _1^{\star }, \rho _2^{\star }, \ldots , \rho _M^{\star })\) from Proposition 4 are applied. The range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x:< \nu _1^L < x \le \nu _1^U \}\) and M uncensored regions \({\mathcal {U}}_1\), \({\mathcal {U}}_2\), \(\ldots\), \({\mathcal {U}}_M\). The thresholds of \({\mathcal {C}}_1\), which are \(\nu _1^L\) and \(\nu _1^U\), are adjusted to get the desired censored area. The thresholds of \({\mathcal {U}}_m\) which are \(\tau _m^L\) and \(\tau _m^U\), are selected such that \(q_{m|0} P_0 + q_{m|1} P_1\), for all m, are identical

Figure 11 shows the effect of the censored area on the proposed distributed detection (specified as PSRA) when \(M=2\) and \(M=6\). Increasing the censored area indicates that only data bits obtained from higher reliable observations will be sent to the FC (and, as a result, lowers the number of sensor nodes who will send their data bits to the FC). The censored area is computed from \(P_0 q_{M+1|0} + P_1 q_{M+1|1}\). In this figure, we consider that the range of the observation is divided into a censored region \({\mathcal {C}}_1 = \{x: \nu _1^L < x \le \nu _1^U \}\) and M uncensored regions. The value of \(q_{M+1|i}\) is obtained from \(\int _{\nu _1^L}^{\nu _1^U} f_X(x|H_i)\,dx\). We set the thresholds \(\nu _1^L = 0.5 - a\) and \(\nu _1^U = 0.5 + a\), for \(a > 0\), and vary a to get the desired censored area. The other parameters are set up as shown in the figure’s caption. We see that increasing the censored area such that only data bits obtained from reliable observations are sent to the FC significantly helps improving the probability of error \(P_{\text {E}}\). We also show the probability \(P_{\text {E}}\) of the TDMA-based distributed detection for a comparison. The proposed distributed detection outperforms the TDMA-based distributed detection when the censored area is larger than 0.42 and 0.32 for \(M=2\) and \(M=6\), respectively.

6 Conclusions

We have proposed a collision-aware distributed detection scheme with the PSRA for a single-hop WSN whose collection time is limited and only a collision channel between sensor nodes and the FC is provided. We have shown that, unlike many other random-access distributed detection schemes, the proposed distributed detection favors a large number of collision time slots since the collisions are useful and applicable in making a final decisions. The following are possible extensions of the proposed distributed detection such that its performance is improved and/or it is applied in other scenarios.

  1. i.

    Composite Hypothesis Testing: In this paper, we assume that the distributions of the observation given \(H_0\) and \(H_1\) are known and we, then, formulate the problem as simple binary hypothesis testing. However, in many scenarios, for example, the distribution of the observation given \(H_1\) generally is unknown since its distribution would depend on the location and the strength of the event. Study, analysis, and evaluation of the proposed distributed detection in the composite hypothesis testing is an interesting extension.

  2. ii.

    Energy Detection: In this paper, we design and analyze the proposed distributed detection by assuming that the FC is able to detect the time slot states: idle, successful, and collision time slots. A method to identify these time slot states are open. Specifically, what we study here is from the MAC layer’s point of view. We can extend the proposed distributed detection towards the physical layer’s point of view by including an energy detection into the proposed distributed detection to detect the time slot states. Furthermore, we, then, are able to investigate the effect of the fading channels on the proposed distributed detection.