Abstract

Indoor robots, in particular AI-enhanced robots, are enabling a wide range of beneficial applications. However, great cyber or physical damages could be resulted if the robots’ vulnerabilities are exploited for malicious purposes. Therefore, a continuous active tracking of multiple robots’ positions is necessary. From the perspective of wireless communication, indoor robots are treated as radio sources. Existing radio tracking methods are sensitive to indoor multipath effects and error-prone with great cost. In this backdrop, this paper presents an indoor radio sources tracking algorithm. Firstly, an RSSI (received signal strength indicator) map is constructed based on the interpolation theory. Secondly, a YOLO v3 (You Only Look Once Version 3) detector is applied on the map to identify and locate multiple radio sources. Combining a source’s locations at different times, we can reconstruct its moving path and track its movement. Experimental results have shown that in the typical parameter settings, our algorithm’s average positioning error is lower than 0.39 m, and the average identification precision is larger than 93.18% in case of 6 radio sources.

1. Introduction

1.1. Motivation and Background

Indoor robots are becoming increasingly popular in the market in view of their beneficial applications, ranging from navigating and sweeping to healthy-caring. In combination with artificial intelligence (AI), they are anticipated to drastically change people’s daily lives. Although these with advantages, indoor robots still face severe cyber and physical threats due to their hardware and software vulnerabilities [1]. In extreme cases, even the simplest cleaning robots could be manipulated to launch aggressive physical attacks, such as assassinating revealed in [2] and eavesdropping on private conversations in [39]. Therefore, it is essential to track the nonstationary robots for security besides traditional efficiency concerns.

By literature review, we identify three types of alternative techniques that could be utilized for indoor robots’ tracking. In the first place, a robot could be tracked using indoor navigation technique, where multiple anchors that transmit signals are needed for the robot to derive its own position based on its received signals. Note that to perform the tracking task, we need the robot to report its position [10]. This process could be easily spoofed and spoiled by selfish or malicious robots. In the second place, based on SLAM- (simultaneous localization and mapping-) based technique, a robot may obtain and report its position through diverse sensors [11]. Obviously, this is only feasible for the ‘honest’ robots. In the third place, image recognition or video recognition could be adopted to identify and distinguish different robots if cameras could be installed on the ceiling [12, 13]. However, this alternative is very sensitive to light condition and obstacles. In summary, existing efforts could not fulfill the requirements for malfunctioned robots tracking task due to two challenges. On one hand, they usually require the cooperation from hijacked or malicious robots. On the other hand, they are sensitive to indoor multipath effects (as for radio-based techniques) or illumination conditions (as for image-based solutions).

1.2. Related Work

For ease of presentation, the state of art efforts in indoor localization is shown in Table 1. As shown in Table 1, the efforts in the indoor localization are classified by their method, environment settings, and the one-time object identification consideration. It can be seen that most methods do not take the dynamic environment into the environment, while some traits in the dynamic environment would change, such as wireless channel. Besides, Table 1 is self-explainable, and no existing efforts consider the identification issue for the localized objects in one time. In order to make up this research gap, a one-time indoor object localization and identification scheme is proposed in this paper.

1.3. Main Work and Contribution

In this backdrop, this paper aims to develop an active but nonintrusive indoor robots tracking algorithm. Our tracking process contains two steps, i.e., identification step and localization step, and each indoor robot is treated as a wireless radio source. In the identification step, a YOLO v3 network [16] is deployed to identify and distinguish different robots’ signals based on the constructed RSSI maps. Then, each robot’s position is derived based on the identification results in the localization step. Our main contributions are threefold:(1)An RSSI map construction scheme is proposed in the identification step, which has a satisfactory localization resolution with low cost by deploying a small number of monitors.(2)After establishing RSSI maps, a network based on YOLO v3 is trained and applied for multiple robots recognition; then, the bounding boxes information of YOLO v3 [16] is further utilized on the recognition results to refine the localization accuracy. To our best knowledge, this is the first paper that treats indoor robot tracking from the viewpoint of object recognition.(3)A series of experiments are conducted on a collected dataset to evaluate effectiveness of our algorithm. Results have shown that in typical indoor environments, our proposal’s positioning error is less than 0.39 m with a recognition precision higher than 93.18% in case of 6 radio sources.

The remaining part of this paper is organized as follows. Section 2 introduces the system model. In Section 3, identification and localization algorithm for indoor radio sources is detailed. Evaluation experiments are conducted to verify the correctness and effectiveness of the proposed algorithm in Section 4. Finally, a brief conclusion is provided in Section 5.

2. System Model

The scenario we considered here is that several indoor robots work on a floor, named the task area, of a building; and there are several classrooms, meeting rooms, and corridors on this floor. To realize efficient control and navigation, a robot usually has a communication module and could be treated as an indoor radio source. Therefore, we could track different robots if we can recognize and distinguish their equipped radios. For the ease of description, each radio source refers to a robot in the following analysis.

2.1. Task Area Model

A Cartesian coordinate system is established for the task area to assist radio source localization or tracking, in which an origin, two perpendicular axes, and their positive directions are determined based on the floor plane. In our setting, the origin locates at the northeast corner of the floor, while east and north directions are the positive directions of the horizontal and vertical axes, respectively. The set of to-be-tracked radio sources, i.e., indoor robots, is , where denotes the -th radio source, and is the number of radio sources. All radio sources’ positions at time are , where and are the -th radio source’s abscissa and ordinate, respectively.

A number of monitors are deployed for tracking purpose through collecting the RSSI values. To determine their respective positions, the task area is divided into rectangular areas with the same size, and the monitor is placed at each vertex of all rectangles. In this way, a total of monitors are placed. Denote the set of monitors as , and refers to the monitor deployed in the -th column and the -th row. Each monitor periodically collects its received RSSI value. ’s monitored RSSI value at time is denoted as . Then, all monitors’ collected RSSI values at time (i.e., 1 of Figure 1(a)) could be recorded aswhere , , and are all matrices with rows and columns, and are ’s abscissa and ordinate, respectively, and represents the maximum RSSI value from the collected RSSI values by at time , and .

2.2. Radio Propagation Model

It is a difficult and time-consuming task to build the radio propagation model for the task area due to the challenges brought by multipath effects. Moreover, the change of the indoor environment may easily bias the established empirical model [14]. To solve this tricky problem, based on the collected RSSI values at the monitors without building any analytical model, an RSSI map is built to extract the radiation trait, which is the spatial distribution of the power within the environment [13] (i.e., the image selected by the bounding box in Figure 1(c)). To be specific, the collected raw dataset could be transformed to an RSSI map, in which an RSSI value is represented by a rectangular shape, as 1–2 in Figure 1(a). Let the set of extracted radiation traits be , where is the -th radio source’s radiation trait.

2.3. Tracking Model

To track multiple radio sources or robots, we have to continuously derive their respective positions at different time snaps. Therefore, in each time snap , we have to firstly recognize different radio sources based on the established RSSI map and and then derive each radio source’s position.

For the recognition purpose, we need to derive a bounding box for each radio source through treating the RSSI map as an image. Then, bounding boxes could be obtained for radio sources; the set of bounding boxes is denoted as , where represents the bounding box for the -th radio source and contains four elements: the abscissa and ordinates of the bounding box’s top left vertex and the length and the width of the bounding box in order.

The next move is to get all recognized radio sources’ positions based on in combination with the coordinate system embedded in the RSSI map and thus its derived image. Let be the set of radio sources’ estimated positions, where and are the abscissa and ordinate of the -th radio source at time .

2.4. Problem Formulation

To facilitate the problem formulation, we define as the identification function which can derive based on ; as the bounding function to derive based on ; and as the localization function to get the estimated positions based on :

When radio sources exist in the task area, and radios are identified successfully from the radio sources at time , this paper aims to jointly minimize the identification error in the identification stage (stage 1) and the positioning error in the localization stage (stage 2):

3. YOLO v3-Based Radio Source Tracking

3.1. Basic Idea

After obtaining the interpolated RSSI map and its derived image at time , we have to first identify each radio source and determine its bounding box. From the perspective of object recognition in an image, the recognition task could be completed if we can capture the characteristics of each radio source’s radiation trait, which represents a radio source’s radiation range and intensity in an RSSI map [17]. The process of capturing a radio source’s radiation trait equals to deriving and . However, there are many ways to solve (2) and (3). For example, the empirical model for indoor object detection and localization based on RSSI is widely utilized because of its simplicity and low-cost. But its positioning accuracy is unsatisfactory since the measured RSSI values are sensitive to the indoor multipath effects. Moreover, it is impractical to distinguish different radio sources based solely on RSSI data.

In this backdrop, this paper adopts deep neural networks for tackling image segmentation, i.e., radio source recognition, in view of their great success in image recognition areas. To be specific, YOLO v3 is adopted for this purpose due to its unique capability to capture the different radio sources’ radiation traits and their differences [16]. We could identify each radio source and obtain its bounding box at the same time after conducting YOLO v3 on the image of an RSSI map. In other words, YOLO v3 is adopted to solve both (2) and (3) simultaneously. Compared with empirical model-based solutions, deep neural networks-based methods are robust to indoor multipath effects.

To tackle the localization problem shown in (4), a straightforward idea is treating the center of each radio source’s bounding box as its position. However, this will introduce extra localization errors if the bounding box derived by YOLO is biased. Taking this into consideration, the position of the pixel point with the largest RSSI value within the -th radio source’s bounding box is chosen as the radio source’s location. If there are multiple pixel points with the same maximum RSSI value, the center position is adopted as the localization result.

3.2. Indoor Radio Sources Identification and Localization

As shown in Figure 1, our indoor radio sources tracking method contains three steps: images preparation, radio sources recognition, and radio sources localization. In the first step, raw RSSI values at the monitors are collected to build RSSI maps, which will further be transformed into images using the interpolation theory. Then, a YOLO v3 detector is trained offline on the images for identifying and distinguishing different radio sources, and the trained YOLO v3 network is utilized for online radio sources’ recognition. Finally, all radio sources’ positions are determined based on their respective bounding boxes and the largest RSSI values of the pixels within the boxes.

3.2.1. Images Preparation

Adapting directly to construct the RSSI map will result in two defects. On one hand, the positioning granularity of the RSSI map is determined by the monitors’ deployment density. Sparsely deployed monitors will lead to low tracking accuracy while high-density deployment would introduce high deployment cost. On the other hand, biased or even error monitored data are common due to the impacts of malfunctioned monitors or indoor multipath effect. Therefore, to achieve low-cost monitoring while promoting the tracking accuracy, the 2-th Bernstein Bezier interpolation theory [18] is utilized to refine the raw RSSI map, and is expanded to .

As shown in Figure 2, , , , and are four deployed monitors, and their positions are , , , and , respectively. , and are the center of gravity of the triangle and triangle , and their positions are and . The purpose of applying 2-th Bernstein Bezier interpolation theory [14] is to derive the RSSI values at and without deploying extra monitors.

Next, according to the 2-th Bernstein Bezier polynomial theory, we have the following interpolation formula:

Here, is the RSSI value of the interpolation point; and are the interpolated point’s abscissa and ordinate; and is called the Bezier ordinates of [14]. Let the dataset in the task area after applying the 2-th Bernstein Bezier interpolation be :where , , and are all matrices with rows and columns, and they contain the abscissas, ordinates, and RSSI values of the points after interpolation, respectively. is called the Bezier ordinates of and needs to be determined.

Then, is adopted to establish the refined RSSI map (i.e., the interpolated RSSI map), as 1–3 of Figure 1(a) shows. Finally, image augmentation method presented in reference [19] is applied to expand the number of images in the training dataset, as step 4 in Figure 1(a) shows.

3.2.2. Radio Sources Recognition

As shown in the radio sources recognition step in Figure 1, a YOLO v3 network contains three parts: feature extraction network, first detection head, and second detection head, in which 70 net layers consist the neutral networking, and 78 connection tables and 58 learnable tables are used to connect different layers [16].

In the offline training stage, a number of images obtained in the images preparation step are adopted as the training dataset, and each image is labelled with all the radio sources’ class and bounding box information. Then, the outputs of the network are the images where the radio sources are recognized with their respective bounding boxes. To refine the weights in the network, back propagation method is adopted and cross entropy is the loss function [16]. The learning rate, the number of training epochs, the number of warm-up periods, and the regularization are set to be 0.001, 3500, 1000, and 0.0005, respectively. In the online recognition stage, the penalty, the confidence, and the overlap thresholds are all set to be 0.5.

3.2.3. Radio Sources Localization

To tackle the potential error introduced by biased bounding box derived by YOLO v3 network, the position of the pixel with the largest RSSI values within a recognized radio source’s bounding box is chosen as the source’s location, as shown in the radio source localization step shown in Figure 1. Therefore, for an identified radio source, (4) can be converted towhere , , and is the number of the selected pixels in the -th bounding box.

3.2.4. Algorithm Description

The indoor radio sources tracking algorithm is illustrated in Algorithm 1. Step 1 determines the dataset to construct the RSSI map after interpolation. In steps 2 and 3, the YOLO v3 network is trained, next deriving the image to be recognized in step 5, and the trained YOLO v3 detector is utilized to obtain the identification and localization results in steps 6–9.

Input: Data recorded by deployed monitors every interval within ,
Output: Identification results , localization results
Offline stage
(1)Derive based on according to (6) and (7)
(2)Configure network parameters (i.e., training epochs)
(3)Determine and by training the network
(4)Return the trained YOLO v3 network
Online stage
(5)Derive the RSSI map according to
(6)Input the established RSSI map to the network
(7)Derive identification results as based on step 6
(8)Derive localization results as based on (8)
(9)Return ,

4. Simulation Results

4.1. Experimental Settings

All experiments are conducted in Room 701, Communication Hall, Army Engineering University. The floor plan of the room is shown in Figure 3; the size of room is ; and the vertical and horizontal distances between two neighboring monitors are both [17]. The size of established RSSI map is pixels and the output image’s size is normalized as pixels to accelerate the training process. In addition, the software for collecting RSSI data is WiFi NetSpot. The tracking period lasts for 30 minutes, and it is divided into a number of time intervals with each interval being 30 seconds. 6 radio sources are investigated, i.e., MECHERVO (), Huawei MatePad (), Thinkpad T580 (), Xiaomi mix2 (), HUAWEI P40 (), and Thinkpad T480 (). These portable devices are carried by Turtlebot Robots. We open all the collected raw RSSI values to the research community (https://github.com/tracking-data/tracking-project/releases/tag/v1.0).

4.2. Performance Metrics
4.2.1. Average Identification Precision

Assume times of experiments are conducted in total. In the -th experiment, the -th radio source emerges times, and it is detected times while being recognized correctly by Algorithm 1 for times. Then, the identification rate of the -th radio source in the -th experiment will be and the recall rate is derived to measure the false alarm performance. Then, after times of experiments, the average identification precision and the average recall rate of the -th radio source will be

4.2.2. Average Positioning Error

For the -th identified radio source, its location can be estimated by Algorithm 1 in the -th experiment, and its real position is known in advance. With time period , denote the average positioning error for the -th radio source defined as

4.3. Results and Analysis
4.3.1. Radio Sources Recognition Results

A straightforward presentation of the radio source recognition results in one single timeslot is shown in the images output layer in Figure 1(b), where 6 radio sources (ranging from 1 to 6) are identified with 0.9986, 0.9999, 0.995, 0.9999, 0.9998, and 0.9999 confidence scores, respectively.

Figure 4 shows the 6 radios’ identification precision and localization errors. As shown in Figure 4(a), the average identification precisions of the 6 radios are 0.9648, 0.9562, 0.9469, 0.9318, 0.9404, and 0.9440, respectively. The differences between different radios’ identification precision lie in the fact that different radio sources have different radiation traits. Generally speaking, the more obvious, i.e., the larger the transmitting power, a radio’s radiation trait, the higher its identification precision; and the radiation traits are subject to lots of factors such as the transmitting power, the usage degree, and the position. Figure 4(b) presents the CDF of 6 radios’ average positioning error. As shown in Figure 4(b), each radio’s average positioning error is less than 0.39 m with a probability higher than 90%.

Figure 5 shows the 6 radio sources’ real and estimated traces in 60 timeslots. As can be seen, the difference between the real and the estimated traces is less than 0.4 m, which is better than 1.54 m with fingerprinting approach in [20]. Besides, the higher the identification precision of a radio source is, the less its positioning error will be. A video is made based on the tracking results and is made publicly available (https://github.com/tracking-data/tracking-project/releases/tag/v1.0).

4.3.2. Generality of the Trained Network

To evaluate the generality of the trained network, we have tested the trained network in Room 705 and Room 725 located in the same building with different number of robots from to , and the testing results are shown in Table 2. From Table 2, we know that the trained network can still achieve high recognition () and localization () accuracy in different rooms. Moreover, it is validated that our algorithm still works independent of the number of emerged radios.

5. Conclusion

In this paper, we proposed an algorithm to identify and localize indoor robots (radio sources) in the real time. Experiments show that the proposed algorithm can not only do well in the indoor radio sources identification with 93.18% average identification precision but also is good at localizing them with 0.39 m average positioning error under typical parameters settings. In the future, it would be more interesting to extend the proposed algorithm to enable an incremental robots tracking with variable number of unknown robots.

Data Availability

All data are available within this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61671471).