Abstract

In this paper, we propose a density estimation system of user density at the closed space using high frequencies of speaker and microphone of smart device. High frequencies are sent to the closed space by the server speaker of the density estimation system, and smart devices located at the space detect the high frequencies via the microphone of each device. The smart devices detecting the high frequencies send data to the server system, and the system calculates data from the smart devices. To evaluate performance of the proposed system, we did some experiments with the density estimation system and 20 smart devices. According to the test results, the proposed system showed 96.5% accuracy, and we confirm that the system is very useful for density estimation. Therefore, this system can precisely estimate user density at the closed space, and it could be useful technology for the density estimation of space users and measurement of space using state at indoor space.

1. Introduction

User density estimation at the space is conducted by counting the number of people at the closed space. It uses not only contents concentration statistics, space usage analysis, and space usage frequency measurement, but also technology that is used to support location information for priority rescue to reduce harm to human life at various accident locations, such as a collapsed building or a sinking sailing vessel. Today, therefore, we need user density estimation technologies for closed spaces like a small meeting room of a building, guest rooms of big hotels, stores of outlet malls, etc. There are different approaches to user density estimation technologies and methods which are proposed by many researchers.

Existing technologies for user density estimation are divided into two classes. The first class counts the number of people by analysing faces via closed-circuit television (CCTV) video images or estimates user density by detecting the motion vector of the people [14]. The video image analysis methods are used in wide open spaces, such as public places, and various technologies are proposed, such as Haar wavelets and Histogram of Oriented Gradients (HoG) to detect people’s faces and count the number of people [57]. However, when only one video image is used, the analysis type using video images cannot count the exact number of people, and it cannot be used in smoggy or dark spaces. Furthermore, because there are some serious issues related to personal privacy risks when video images are used, it is quite difficult to apply video image analysis for user density estimation.

The other class of technology uses radio devices, such as Radio Frequency Identification (RFID) tags, smart devices, sensor nodes, etc., instead of video images; and the numbers of each device for user density estimation are counted [811]. The method using RFID can count the number of people at the closed space in real time [12, 13]; and the method using smart devices can estimate user density without additional equipment because it gathers information from Wi-Fi, Bluetooth modules, and the Global Positioning System (GPS) [14]. However, the approach that uses RFID tags must supply the extra supplement with an RFID tag for user density estimation, and the approach that uses smart devices cannot be used in indoor spaces because of the poor signal from the smart device’s GPS and/or the device switching Wi-Fi signals. Because Wi-Fi Direct can use data transmission and data sharing, it should also be able to count the number of smart devices using data transmission technology. However, because the coverage of Wi-Fi Direct is about 100 m, it sometimes has a miss-detection problem from interference of strong signals at near-located closed spaces. Although the above technologies are suitable for outdoor and public spaces such as subways, soccer stadiums, or baseball fields, they cannot be applied in certain closed spaces.

Therefore, in this paper, we propose a new user density estimation system using high frequencies transmitted via the speaker of a server system to the microphone of a smart device. The microphone of the smart device can detect an audible frequency range from 20 Hz to 22 kHz, so the smart device can detect specific high frequencies from received sounds via an application [15]. We use a widely available simple speaker for the speaker of the server system and two high frequencies between 18 kHz and 22 kHz. These high frequencies are regularly used in high-frequency studies, such as smart information service applications and data transmission using high frequencies; these frequencies have an important feature, which is that people cannot hear them in an indoor space [16, 17].

In our system, smart devices located in the same indoor space receive sounds specific to each space, and the devices send a message to the server when they detect the specified high frequencies and analyse the received sound. Thus, because the server gathers each message from the smart devices and counts the number of devices, the proposed system can estimate user density in a specific closed space.

To evaluate the performance of the proposed application and server system, we developed a high-frequency detection application for smart devices and a user density estimation server system, and we conducted an experiment on user density estimation using 10 smart devices. The results show that the accuracy of the proposed application and server system is greater than 95%. Next, we conducted a comparison experiment with a hybrid method (CCTV& RFID and Wi-Fi Direct) and the proposed method using an additional 10 smart devices to verify the superiority of the proposed method. In this experiment, the accuracy of the proposed system is 3% higher than that of the hybrid method, and the error rate of the proposed system is 0%. Moreover, we conducted an experiment on user density estimation using 20 smart devices in two divided closed spaces at the same time. The results show that the proposed system is useful as a user density estimation technology in specific closed spaces because the accuracy of the proposed system is over 96.5%. Therefore, as the proposed system is a new user density estimation technology using inaudible high frequencies and the microphones of smart devices, it can be a useful technology to measure space usage frequency, to compile contents concentration statistics, and to support location information for priority rescue to reduce harm to human life at various accident sites.

This paper is organized as follows. In Section 2, we explain the existing research regarding crowd user density estimation and wireless communications technologies with inaudible high frequencies. In Section 3, we describe the proposed application based on smart devices and a server system for user density estimation and explain the architecture of the proposed system. In Section 4, we show four experiments using the proposed application and a server system and discuss the results of the experiments regarding performance. Lastly, in Section 5, we report the results and suggest a future research direction.

2. Previous Work

In this section, we explain the existing research about crowd density estimation and existing studies using inaudible high frequencies in data communications. In earlier times, crowd density estimation methods have been used to count the number of human heads or faces because the human face has certain remarkable features such as eyes, a nose, and a mouth; and computers and smart devices can effectively detect faces from CCTV video images or camera images of smart devices.

Yin proposed a crowd density estimation method using the detection of an object which conflicts with the background colour of a video image [18]. The people using this method focus on counting the number of human faces with a few people, rather than tracing people’s movement at the specific small space. After this, many researchers shifted their focus from counting the numbers of people to measuring the movement of people using the motion vector for crowd density estimation. Kim and Song et al. suggested this method using analysis values of movement speed and pattern brightness from the motion vector of video images. They used optical flow which traces the movement of object, based on colour distribution, texture, outline of object, etc., and estimates the variable value from the object’s motion [19, 20].

To make up for the inefficiency of the existing method using CCTV video images, various methods have been recently proposed using radio devices such as RFID, smart devices, Wi-Fi, and Bluetooth. Jens proposed a method using the Bluetooth and GPS information of smart devices to estimate crowd density at the specific space in public places such as soccer stadiums and baseball fields [21]. Park et al. suggested a hybrid crowd density estimation method using a video analysis technique and RFID in the subway [22]. However, the existing method only focuses on crowd density measurement without high accuracy because these are methods used at outdoor places and public places, and they only need approximate statistical values.

Next, there are the existing methods using high frequencies of the audible range (20 Hz∼22 kHz). These methods employ inaudible high frequencies (18 kHz∼22 kHz), which people cannot recognize in their audible range, to transmit user data or motion control between smart devices using the speaker and microphone of smart devices. Bihler et al. proposed the Smart Guide system, which transmits information to users’ smartphones in a museum using inaudible high frequencies [23]. This system runs on Android operating systems, receives high frequencies that are emitted from speakers installed in the museum through microphones built into users’ smartphones, and gets the appropriate information from a web server. This method uses the frequency-shift keying (FSK) technique with 20 kHz and 22 kHz frequencies as the signalling bits. Kim et al. proposed a method for authenticating smartphone users using high inaudible frequencies (15.8 kHz∼20 kHz), with one frequency being generated at each stereo channel [24]. When receiving a bit that is represented by a combination of two frequencies four times, this method recognizes it as one challenge, but it takes 8 seconds to recognize a 2-byte challenge.

Chung proposed a data sharing method for multiple smart devices at a closed range using high frequencies and Wi-Fi that proved to be an improvement over existing methods using inaudible frequencies [25]. This system sends three frequencies at the same time by combining one signal 19.6 kHz (low-latency key) and two base signals, 19 kHz and 22 kHz. The three frequencies do not send much data to the smart devices, and those devices have a predetermined trigger signal to start data transmission among the smart devices. Thus, the existing methods using high frequencies have used inaudible high frequencies as the specific trigger signal, and smart devices received the signal and followed the process assigned to the signal, such as checking the status of the smart device, sending GPS information of the smart device to a server system, and showing an advertisement to the smart device.

3. Density Estimation System of Space Users Using High Frequencies

In this section, we explain the proposed application based on smart devices and a server system for user density estimation in a specific closed space. The total flow of the proposed system is shown in Figure 1.

In Figure 1, the server system orders the speaker to generate a specific pair of high frequencies (over 18 kHz) (①), the speaker sends the pair of high frequencies in a specific closed space over a fixed number of seconds, and the smart devices in the space collect the nearby sound via the microphone of each smart device (②). The collected sounds are converted to frequencies using fast Fourier transform (FFT) [26], and each smart device sends the pair of high frequencies to the server system when it detects the specific pair of high frequencies over 18 kHz (③). The server gathers the data for the pair of high frequencies from each smart device, and then it counts the number of smart devices located in the specific closed space at the same time. Finally, the server system shows the number of smart devices and the location where the pair of high frequencies are generated and received via a monitor of the server system.

The specific pair of high frequencies over 18 kHz is selected as two high frequencies of 100 Hz units between 18 kHz and 22 kHz (total: 41 types of pairs). To avoid interference from other high frequencies, the interval between each high frequency is over 600 Hz. Thus, the pair of high frequencies can be composed of a total of 595 possibilities, such as 18.0 kHz–18.7 kHz, 18.0 kHz–18.8 kHz, and 21.3 kHz–22.0 kHz. The composed pairs of high frequencies are generated by the speaker of the server system and produced n times over k seconds. k is the duration of the pair of high frequencies and n is their repetition time. A produced type of pair of high frequencies is shown in Figure 2.

In Figure 2, the pair of high frequencies is 19.0 kHz and 20.0 kHz, k is 5 seconds, and n is 2 times. The pair of high frequencies is generated by the speaker of the server system, and each smart device located in the specific closed space checks whether the pair of high frequencies consistently exists or not. If a smart device detects the pair of high frequencies, it waits for the fixed m seconds and detects the pair of high frequencies again to confirm that it is the same pair. In Figure 3, pseudocode is the performance process to detect the pair of high frequencies on smart device.

a and b are the parameters which will save each high frequency detected via the smart device. At is the audio data obtained by the inner microphone of the smart device during t time, and At is separated from each frequency Ft as FFT. If Ft has two high frequencies over 18 kHz, a saves the first high-frequency value of Ft and b saves the second high-frequency value of Ft. Next, it waits for the fixed m seconds, and the smart device receives the next audio data again. At+1 is the audio data obtained by the inner microphone of the smart device during t + 1 time, and At+1 is separated from each frequency Ft+1 as FFT. If Ft+1 has two high frequencies over 18 kHz, it compares with the a value and the first high frequency of Ft+1 and with the b value and the second high frequency of Ft+1. Thus, if the first and second pairs of high frequencies are the same, the smart device sends the pair’s high-frequency value to the server system.

Next, the server system receives the pair value from smart device at the same space and estimates user density through the process illustrated in Figure 4.

In Figure 4, the structure of the data received from the smart device is one pair of high frequencies, GPS value, and the received time of the high frequency signals. The server system compares the specific pair of high frequencies from the received data and generates the pair of high frequencies from the server (①). If they are not the same, the server will drop the data, and it will not continue to the next process. If the received data and the generated pair of high frequencies from the server are same, the server will count the number of received data (②) and store the data on database (③). After this process is complete, the server shows the number of smart devices and their locations via a monitor of the server system, and the server manager can estimate user density at the closed space.

4. Experiments and Evaluations

This section explains the proposed application based on smart devices for user density estimation. We describe the experiments for user density estimation and analyse the results of the experiments using the proposed application and server system. The screen composition of the proposed application is shown in Figure 5.

In Figure 5, the graph located on the left-hand side of the figure shows the bin value of the high frequencies from the collected sound data (①), and we can confirm that 18.1 kHz and 19.5 kHz stand out among the other high frequencies. The text located on the right-hand side of Figure 5 is the smart device’s GPS information, the detected pair of high frequencies, the set duration time (k seconds), and the slider bar for setting duration time k (②). In Figure 5, because we assume that n was 2 times and m was 1 second, if we set the slider bar to 3, k is 3 seconds and the application detects the first pair of high frequencies during (k–1)/2 seconds. Then, the application waits for 1 second and detects the second pair of high frequencies during (k–1)/2 seconds again. For example, if we set k at 3 seconds, the application checks the first pair of high frequencies for 1 second, waits for 1 second, and then checks the second pair of high frequencies, again for 1 second.

Thus, we initially conducted a test to ascertain the appropriate k seconds, because we expected that the accuracy of high frequencies detection would vary according to k seconds of the proposed application. The specific closed space was a 7 × 4 m laboratory, and the speaker of the server system was located in the top corner of the laboratory as per Figure 6.

In Figure 6, the space had a table, a hanger, four desks, and four chairs. A Harman Kardon Omni 20+ speaker was located at the left top corner of the space diagram. We used an iPhone 8; the server hardware was Intel® Core™ i5 CPU 750, 8G RAM, and the server environment was Apache 2.2.14, PHP 5.2.12, and MySQL 5.1.39. The smart device was running the proposed application in background mode, and it was located on a desk in the bottom right side of the laboratory diagram. We randomly selected a pair of high frequencies from a total of 595 possible pairs, and the server generated the pair of high frequencies 100 times for k seconds; the value for k started at 2 seconds and increased by 0.2-second intervals to 5 seconds. The value for m was 1.0 seconds. Figure 7 shows the detection result of the proposed application according to the k-value.

In Figure 7, we know that the application can detect the pair of high frequencies 98∼100 times from a k-value over 2.8 seconds, and the application can only detect the pair of high frequencies below the average 95% from 2 seconds to 2.6 seconds. Thus, in this experiment, we used 3.0 seconds as the k-value for our user density estimation experiments because the detection accuracy of high frequencies was over 98% after 2.8 seconds.

Next, we continued the experiment using the proposed application and the server system for user density estimation. We used 10 smart devices of various models, such as iPhone 6, iPhone 7, Galaxy s7, and Galaxy s8. The experiment space was the same place, and we used the same server system as in the above experiment. Each smart device was running the proposed application in background mode, and they were located in various positions around the laboratory, such as on a desk, on a chair, in the inside pocket of a jacket hung on a hanger, in front of a computer monitor, or on the floor. The speaker of the server system generated 18.1 kHz and 19.0 kHz as the pair of high frequencies, and we tested 100 times in one-minute intervals. Figure 8 shows the detection results of the pair of high frequencies from each smart device.

In Figure 8, i7 means iPhone 7, i6 means iPhone 6, i6s means iPhone 6s, G7 means Galaxy s7, and G8 means Galaxy s8. The count does not refer to the detection number of the pair of high frequencies; it refers to the number of times a signal was sent to the server system when each smart device detected the pair of high frequencies. Most smart devices detected the pair of high frequencies over 95 times. We can see that the fourth i6 showed detection 95 times, and the seventh G7 showed detection 96 times. Because these two devices were located in the inside pocket of a jacket, we expected that these devices would have more difficulty than the others in detecting the pair of high frequencies.

After this experiment, we developed a web-based server side program that compared the generated pair of high-frequency values using the pair of high frequencies collected from 10 smart devices, as illustrated in Figure 9.

In Figure 9, the setting of the server system that generates high frequencies uses the “Get” method of HTTP protocol. In the Uniform Resource Locator (URL) address, hf1 and hf2 mean the first high frequency and the second high frequency, and td is the k-value. For example, in Figure 9, if the parameter of the URL is hf1 = 18.1 & hf2 = 19 & td = 3, the first high frequency is 18.1 kHz, the second high frequency is 19.0 kHz, and the k-value is 3.0 seconds. When the system manager clicks the “Send Signal” button to estimate user density, the connected speaker generates the pair of high frequencies. After the server system receives the data from the smart devices, it compares it with the values for the generated high frequencies from the server. If they are the same values, the server system shows the result of counting the number of smart devices in the room as a number within a blue circle as in Figure 9.

We conducted a second experiment using the same smart devices and in the same place as the first experiment. The server system generated paired high frequencies 100 times in the laboratory, and the proposed application and server system showed 95% accuracy for this experiment. We suspect that the reason for the 5% error rate is because a participant occasionally put his smart device in his pocket, and the device could not detect the pair of high frequencies during that time.

Next, we did an experiment to compare against the existing hybrid method using CCTV & RFID and the proposed method. We created the same conditions as in Park’s crowd density estimation method in the subway. However, we used the inner RFID of the smart device instead of the extra RFID to manually detect the pair of high frequencies, and we only used 11 devices (5 Galaxy s7 and 6 Galaxy s8) because iPhone 6, iPhone 6s, and iPhone 7 do not support the RFID function. The duration of the experiment was 10 minutes in one-minute intervals, and there were a total of 11 participants. While the experiment proceeded for 10 minutes, each participant went in and out in one-minute intervals and it maintained 10 persons in the closed space. The server system using CCTV & RFID was built to the same specifications as the proposed system. Subsequently, we conducted a further experiment using Wi-Fi Direct and the server system in the same conditions and in the same space. Wi-Fi Direct device has to implement both the role of a P2P Client and the role of an AP, namely, Group Owner (P2P GO), and all group members are able to communicate with each other by exchanging messages relayed by the GO. Therefore, the P2P GO used for Wi-Fi Direct data transmission was a D-Link N-150 USB (DWA-125) connected to the server system. Next, we used the proposed application and the server system in the same conditions and in the same space as the above two experiments. Figure 10 illustrates the user density results of the CCTV & RFID, Wi-Fi Direct, and the proposed method.

As shown in Figure 10, CCTV & RFID method correctly detected 10 devices four times and erroneously detected 11 devices twice at the 2- and 6-minute marks. Moreover, the 3-, 4-, 5-, and 9-minute marks were missed 9, 8, 9, and 9 times, respectively. The Wi-Fi Direct method correctly detected the 10 devices and erroneously detected 11 devices five times each. The proposed method correctly detected the 10 devices most often, only missing accurate detection twice, at the 3- and 8-minute marks. Accordingly, CCTV & RFID method has a 95% accuracy rate and only two instances of erroneous detection, probably due to a participant waiting near the entrance when leaving the space. The Wi-Fi Direct method also has a 95% accuracy rate but with five erroneous detections which we believe to be the result of the wide coverage range of this approach (∼100 m) and a participant who did not turn off the Wi-Fi Direct function when leaving. In contrast, the proposed method has a 98% accuracy rate and no instances of erroneous detection. We think that the missing 2% are due to a participant occasionally placing his smart device into his pocket where it could not detect the pair of high frequencies. The proposed method is, therefore, confirmed as more accurate and useful than either the CCTV & RFID or Wi-Fi Direct methods.

For our next experiment, we conducted user density estimation at the same time at two separate spaces with different pairs of high frequencies and Wi-Fi Direct. Figure 11 is the floor plan of the experiment spaces; the two spaces are located side by side. In Figure 11, each space has the same furniture, and there is little difference in the placement of the furniture; the door and the built-in speaker location are the same. We used the same server system and speaker as in the above experiment. There were a total of 20 participants, and there were 10 persons at each space. They did not move to the other place, and they stayed in the same space until the end of the experiment. We used 20 smart devices (iPhone 6, iPhone 6s, iPhone 8, Galaxy s7, and Galaxy s8), and the server system generated the pairs of high frequencies for the user density estimation in one-minute intervals for a duration of 10 minutes.

The pair of high frequencies at the space shown on the left was 18.1 kHz and 19.0 kHz. The pair of high frequencies at the space shown on the right was 18.5 kHz and 19.2 kHz. Figure 12 shows the result of user density estimation using the proposed method.

In Figure 12, the results show a 90% accuracy rate for the user density estimation in each space. One device was missed at the 2-minute mark at the space on the left, and one device was missed at the 9-minute mark at the space on the right. We believe that these missing devices were due to a participant putting his smart device into his pocket or blocking the microphone of his smart device. After this experiment, we conducted it again, emitting the paired high frequencies 100 times during a span of 10 minutes in one-minute intervals at the same spaces and in the same conditions. The result of the experiment is shown in Figure 13.

Count average on the left side of Figure 13 means total data received by the server system from the smart devices. 10 smart devices at the space to the left sent 983 times (98.3%), and 10 smart devices at the space to the right sent 980 times (98%). Density accuracy on the right side of Figure 13 means the average amount of data that the server system receives from 10 devices at the same time. In Figure 13, the accuracy of the left room is 96%. The reason for this is that a participant put his device into his pocket for approximately four minutes, and the smart device could not detect the pair of high frequencies during this time. Although another participant occasionally put his device into his pocket, it was only for one or two minutes. Thus, in this experiment, the proposed method shows a 96.5% accuracy rate at two separate spaces using two pairs of high frequencies and the same server system. Therefore, according to the various experiments described above, we believe that the proposed application and server system can be a useful technology for conducting user density estimations in a specific closed space like a small meeting room of a building, guest rooms of big hotels, stores of outlet malls, etc.

5. Conclusions

In this paper, we have proposed a new user density estimation system using pairs of high frequencies from a server system and the microphones of smart devices. In the experiment, the server system generated the pairs of high frequencies in the specific closed space, and various smart devices detected the pairs of high frequencies and sent the frequencies’ data to the server system. From this process, the server system was able to count the number of smart devices located in the same space at the same time and was able to estimate user density in the specific closed space. Therefore, the proposed application and server system can be useful systems for estimating user density in closed spaces, and it could be a basic technology to measure space usage frequency, to compile contents concentration statistics, and to support location information for priority rescue to reduce harm to human life at various accident sites. Furthermore, if the proposed method were deployed on a commercial scale, it could be installed automatically on new-release devices and on existing smartphones through firmware updates; it could work in background mode following user consent. As such, the proposed application does not need manual execution and could work without user intervention assuming the device is switched on.

In future research, we will study a user density estimation system for four or more closed spaces in the same building and develop a visualization of the density results from multiple closed spaces through the server system. Moreover, we will study how the accuracy of the proposed application and server system can be improved. In this paper, we only used 20 smart devices in the experiments. Thus, we need to conduct additional experiments using more than 100 smart devices and several separated spaces, in order to ascertain whether the proposed method works well in those conditions.

Data Availability

No data were used to support this study.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported in part by the Ministry of Education, under Basic Science Research Program (NRF-2016R1C1B2007930 and NRF-2019R1H1A1079104), respectively.