Abstract

Stochastic Internet of Things (IoT)-based communication behavior of the progressing world is tremendously impacting social networks. The growth of social networks helps to quantify the effect on the Social Internet of Things (SIoT). Multiple existences of two persons at several geographical locations in different time frames hint to predict the social connection. We investigate the extent to which social ties between people can be inferred by critically reviewing the social networks. Our study used Chinese telecommunication-based anonymized caller data records (CDRs) and two openly available location-based social network data sets, Brightkite and Gowalla. Our research identified social ties based on mobile communication data and further exploits communication reasons based on geographical location. This paper presents an inference framework that predicts the missing ties as suspicious social connections using pipe and filter architecture-based inference framework. It highlights the secret relationship of users, which does not exist in real data. The proposed framework consists of two major parts. Firstly, users’ cooccurrence based on the mutual location in a specific time frame is computed and inferred as social ties. Results are investigated based upon the cooccurrence count, the gap time threshold values, and mutual friend count values. Secondly, the detail about direct connections is collected and cross-related to the inferred results using Precision and Recall evaluation measures. In the later part of the research, we examine the false-positive results methodically by studying the human cooccurrence patterns to identify hidden relationships using a social activity. The outcomes indicate that the proposed approach achieves comprehensive results that further support the theory of suspicious ties.

1. Introduction

A social network is a web of social ties among individuals. Social ties are the kind of one-to-one communication links among humans or Social Internet of Things [1, 2]. Formation of social ties depends upon many attributes, such as the location of living, personality, age, gender, workplace, activities, and many more [3, 4]. These ties are built based on some needs or relationships. People use many mediums for communication such as calls, chatting on social networking websites, reading and writing comments to person, and reviewing and suggesting some purchasing mobile applications [5]. Investigating human behavior and machine performance, how they react to and participate in social networks remained a center of attention for researchers [2, 6]. Social network analysis is the computer science field that quantifies, evaluates, and analyzes human behavior [7, 8].

A concept of social communication links was proposed by Granovetter [9]. According to their research finding, the communication link between two people is considered as the social tie. Also, each communication link’s strength was further classified as strong or weak depending upon the frequency of communication, number of times, emotional attachment, number of mutual ties, relationship actions, and a combination of these mentioned parameters [5]. Equation (1) quantifies the strength of social ties, which is denoted by weight, such that higher weight tells stronger ties and vice versa [10]. represents the weight of social tie between Node and Node , while and represent the degree of Node and Node ; is the number of mutual nodes between and . Community structures were also found one of the main reasons for social tie strength [11]; it was found that people from the same communities have strong ties as compared to different communities [5]:

Various models and techniques are developed to infer the social network based on inadequate aspects [7, 12]. One of the specific categories belonging to such inference determines the cooccurrence based on time and location. Despite encountering many measures, there remains a deficiency in acquiring precise and accurate inferences. In our research, we consider several threshold parameters to quantify more precise inferences. We also develop a framework that infers existing social ties and the hidden relationships in a social network.

Initially, in our research, we present the inference of social ties among people by correlating to their physical presence at several sites and their direct connections. We define a social connection if two individuals and cooccur in a cell within hr time frame, such that calls to person while connected to a base station and in the same time frame calls a person from the same base station. Furthermore, we counted the number of cooccurrence of and . Firstly, we find social ties depending upon the number of direct calls between two people. To ensure the correct social connection, we state a threshold, such that the count of direct calls is more than the threshold. Secondly, we evaluate the social relationship between two people by counting the number of calls by and in a specific time frame. Figure 1 states an example that explains the procedure of quantification. Each hexagon represents an area of a single base station. and are together 6 times in various base stations, and there is a variation in the gap of calls. In the first part of research, we use the CDR data set provided by the telecommunication company and two openly available location-based social network data sets, i.e., Brightkite and Gowalla [13]. All data sets resemble to the stated example in Figure 1. We counted the number of concurrences based on multiple gap time frame thresholds and mutual friends. Furthermore, we correlate results with the direct calls based on social connection using Precision and Recall evaluation measures.

In the second phase of research, we explore the false-positive results formed by the CDR-based social tie inference model. We state a missing tie as a suspicious tie between two people if they do not have any direct calls but are found together numerous times. Also, they have a certain number of mutual friends. In the literature, missing ties are defined as either nonresponse or absent ties [12]. In an activity, an actor does not give any information about a tie considered as nonresponse [14], while an absent tie means when an actor does not give any indication about the tie detail. A survey was conducted to monitor the social behavior of the boys’ and girls’ liking pattern. That was limited to binary data, such that one represents a tie while zero represents no tie. Figure 2 shows visual representation (block modeling) of adjacency matrix made according to survey data [14]. In Figure 2, green-filled slots represent the existence of tie, regardless of its strength, while white slots indicate either absent ties or nonresponse ties. In our research, we explore and classify a subset of missing ties as suspect ties. We conduct a social activity and simulation that generates a data set the same as the CDR data supporting this concept. Furthermore, we correlate the CDR-based social tie inference model’s false-positive results with the activity and simulation results.

The contributions of this paper are as follows:(1)We developed an inference model and a classifier that identifies location-based social ties. The inference model is tested on the CDR-based social network, Brightkite, and Gowalla, using Precision and Recall measures.(2)We identify a class of suspect ties by examining the social tie inference model’s false-positive results.(3)We conducted an activity-based survey and a simulation that demonstrates and evaluates the suspects’ ties.

The rest of this article is organized as follows: Section 2 describes the literature review. Section 3 presents the descriptions of cooccurrence count normalization, inference algorithm, and social tie inference. Brief concepts about the hidden relationship and suspicious links are described in Section 4. The proposed framework and an algorithm to infer suspicious relationships are given in Section 5. Section 6 describes the data set description, results, and analysis. Finally, the conclusion of this article is presented in Section 7.

The physical world social network is represented as a graph, where nodes are treated as people, and edges are represented as the social tie between two people [15]. In the literature, edge weight is represented as the strength of that particular social tie [10, 16]. A social network such as Twitter forms a bidirectional graph, e.g., a fan follows a celebrity but the celebrity hardly ever follows back. Usage of bidirectional graphs investigates influential networks and most inflectional people [1719]. Recommendation and targeted marketing are some of the essential objectives of exploring social ties [20, 21]. Theme-based model adopts dynamic programming to explore critical factors, for example, and are the kind of themes [22]. Social ties coupling and predicting the mobility of users were researched by seeing the physical and network properties (geosocial properties) [23, 24]. An effective prediction technique was proposed to find the typical patterns of two users by comparing the check-in details [24, 25]. Area significance is measured using a weight-assigning method by incorporating two users’ cooccurrence for a specific area. A is a more significant area as compared to an ordinary place [26]. The scoring mechanism helps in categorizing and labelling of social ties [10]. Inference about the any social network is incomplete if associated features are neglected. The baseline of any social network is the single social connection between two people. In the state of the art, social ties are generally categorized as (1) strong ties, (2) weak ties, and (3) absent ties [9, 17], whereas the strength of tie depends upon (a) amount of time, (b) emotional intensity, (c) intimacy variables, and (d) social distance [3, 10]. The repeated presence of two individuals in a specific geographical location within a limited time also infers a social connection [7]. The strength of the social tie is directly proportional to the happening of such high cooccurred events.

IoT has emerged as one of the most powerful and impressive technological research domains [2]. IoT presents a novel connectivity concept, where machines can equally collaborate with humans based on actuators and sensors [27, 28]. One research forecasts that smart devices such as electronic medical kits and smart watches will reach up to the worth of USD 160 billion by the end of 2026 [28]. The communication network between smart devices and human forms Social Internet of Things (SIoT) and further opens up new research challenges for researchers. Managing problems such as data scalability, velocity, and variety are few of the emerging issues in SIoT [29]. Understanding social ties among human-to-human, human-to-machines, and machines-to-machines helps to quantify the network performance issue [30].

Social ties are the backbone of any social network [31]. Formation and deformation of social ties affect communities in a network [32]. Besides social tie strength: factors such as location, emotion, situation, age, gender, religion, personality, and many more have a substantial impact on the social connection [10, 33]. Granovetter highlighted the strong connection between weak social connections and finding jobs [34]. In the literature, sources of data commonly used for social analysis are call logs [35, 36], emails, and social-networking websites [5]. In the literature, extensive challenges associated with the integration of visible and invisible networks are highlighted [37]. Investigating criminal social networks using limited clues is one of the emerging research areas of social network analysis (SNA) [38, 39].

Statistically, there are always some hidden or visible associated parameters among cooccurred events. Social network analysis is performed to explore such intriguing knowledge. In the physical world, social network analysis is utilized in job searching [4], studying urban life psychology, investigation of guilt association [12], finding communities [40], spreading of news [41], and influential networks [18, 42]. In the recent era of information and technologies, massive logs are generating for each person, e.g., call records, bank transactions, online purchase records, daily emails, CCTV cameras, and much more mediums [7, 43, 44]. In contrast to the physical world, such mediums further concise the accuracy of results by highlighting such associated features. Despite numerous data sources, there is no optimal procedure to quantify stochastic human nature and social network evolution [45].

The grouping method identifies hidden social groups, which further explores the friend circles and focuses under high privacy settings [46]. Another research explores the hidden social ties using respondent sampling [47]. In the literature, hidden social ties refer to that population, which is hard to access. The population that tries to hide from the social network is hidden in a network [47]. In our research, suspect ties mean actors in a social network that are present and accessible, but they try to hide their social connections. Our second part of the research explores the suspicious ties within the existing network instead of a hidden population in a social network.

3. Data Set Characteristics and Evaluation Measure

3.1. Data Set Descriptions

In our research, we incorporated three large location-based data sets, i.e., CDR, Brightkite, and Gowalla [13]. The CDR large data set used in this study was provided by one of the Chinese mobile telecommunication operator companies. The data set contains 202,000 subscribes along with user demographic information. Calling detailed records contain six months (June 2014—December 2014), and calling detailed records contain these 202, 000 subscribes, which have 221, 451, 169 records. Each record of the data set is represented in the following format.

Brightkite and Gowalla are openly available location-based social network data sets [13, 48]. Both data sets are gathered using the online social-networking websites. Websites maintain user check-in data by fetching mobile GPS location data. These services use to help people in finding the nearby users and to build social connection. Brightkite contains 58,228 nodes and 214,078 edges, and Gowalla contains 196,591 nodes and 950,327 edges. Other than social network data, both data sets also contain direct social tie data.

3.2. Abbreviations and Evaluation Measures

Figure 3 states the example of the social network, having a case of suspect actors and their hidden ties. Actors with several mutual friends but do not have direct connection may have a secret connection. This information helps in identifying them as a suspect tie. The social network evolves, and new connections expand the scope of the social network. One social network is a combination of multiple social networks involving different individuals [31]. A social network can be sliced based on starting and ending time. Social networks can also be divided into subsocial networks monthwise if it has been developed over one year [49, 50]. Social network slicing helps our research further to explore the missing ties between friends of friend relationships.

The following list of abbreviations is used for the quantification processes of Precision and Recall, which will also be used in several parts of the paper:The calling records represent the actual number of direct calls that occur between two users. The value of is counted to identify the social tie between two users.Time cooccurrence represents the presence of two users in the range of a common base station. We counted when two users were connected to a common base station, and they called any other user.The time-frame gap value represents the time interval between two users’ calls while connected to a specific base station. For example, user calls someone at and user calls someone else at ; in this case, the gap between the calls is . To quantify the results and evaluate Precision and Recall values, we experimented on the following set of gap values: The threshold for direct calls represents the set of threshold values for assessing direct calls between two users. We evaluated Precision and Recall curves on the bases of the following set of threshold values The threshold for cooccurrence represents the set of threshold values for the two users’ presence in a specific base station. We tested the performance on the threshold values ranges from 1 to 40.

The threshold for mutual friend represents the set of threshold values for the two users’ mutual friend. We tested the performance on the threshold values ranges from 1 to 100:

3.3. Cooccurrence Count Normalization Measure

Cooccurrence count value tells the presence of two users in the region of one base station. An issue related to counting is explained and resolved using an example for the two users and , shown in Figure 4. We counted when two users were connected to a common base station, and they called any other user in a specific time frame. The example is shown in Figure 4 states the call log details of users and gathered in a time frame .

Let and , thenwhere , ,

In Figure 4, and call times have the closest call time to call. In this case, a count value of can be calculated as 2. However, such counting may lead to a wrong inference. It is the same as if one person calls once, and another person calls n-times within a specific time frame, equals as the count value. To resolve this issue, we propose a normalization equation that decreases the count value periodically. We introduce Beta () value as a periodic normalizing factor.

Let denotes a set of calls by user and denotes a set of calls by user ;.

According to the example stated in Figure 4, we assumed for set

For first match value of  = 1.

For the second match values of  = .

Likewise, for the nth match value of  = ,

In equation (9), refers to the total number of calls made by user X to Y, while refers to the total number of calls made by user Y to X. equation (9) finds intermediate value for CV, i.e., 4.33, instead of maximum 6 or minimum 3 values.

4. Social Tie Inference

We initially investigated direct social ties formed by CDR data sets and compared them to the indirect social ties formed based on common location using Algorithm 1. By direct ties, we mean calling or direct connection. For example, person A calls person B refers to a direct tie between A and B. Algorithm 1 takes , , and CDR data sets (social network) as inputs. Furthermore, the algorithm has two parts; initially, it finds the direct ties between two individuals depending upon the threshold value. Secondly, it counts the presence of two individuals based on several parameters. The Calculate Cooccurrence Count() function finds the number of cooccurrences using equation (9), explained in the previous section. Infer Social Ties() function finds the social connections depending upon , , and and inferred them as the social ties.

Require:
  SN = Social Network
  GV = Time Gap Threshold Value
  SNK = 2, 5, 10, 15
Ensure:
  DT = Direct Social Ties
  IST = Infer Social Ties
(1)whiledo
(2)  DT = Direct Ties (SNK)
(3)end while
(4)whiledo
(5)  CTK = Calculate Cooccurrence Count (DT, GV)
(6)  IST = Infer Social Ties (CTK, DT, SNK)
(7)end while
4.1. CDR-Based Social Tie Inference

A social tie is inferred between two persons if they are found together at several sites numerous times. The inference algorithm identifies two sets of results, i.e., direct social ties and inferred social ties. For the cross-validation of results, we correlate the direct tie results with the inferred ones. Precision and Recall evaluation measures are used to examine the results. We tested all records based on threshold values, is the direct calls, is the times of cooccurrence, and is the time frame gap value. While as direct call count shows the degree of friendship, more value of indicates the friendship strength. Figure 5 shows the Precision graph, which contains four sets, Figures 5(a)5(d). The whole data set is examined based on and the value of and .

In Figure 5(a), the value of is 15 which represents the users with direct calls between each other equals to or greater than 15. The value of is the number of cooccurrence for two different users. The Precision values are comparatively significantly less for in the range of 0 to 10. In contrast, the value of Precision increases exponentially for the value of in the range of 10 to 30. The higher value of indicates higher cooccurrence of users. A positive correlation can be observed between the values of and Precision. It infers that cooccurrence is a significant attribute that affects positively in identifying social ties. All graphs in Figure 5 have six different lines; each line represents the different time gap ranges. It can also be seen that the values of gap value 30 minutes are having more Precision while the rest lines of 1 hour, 2 hours, 6 hours, 12 hours, and 24 hours are having less Precision. This also clues that the strength of ties has a specific effect on Precision. Users having strong social connections, most of the time, are found together in certain areas. This pattern is explicitly observed by looking cooccurrence value  = (20 to 40) and gap time frame  = 30 minutes. Another positive correlation is found between the degree of friendship and physical presence at a specific place.

To see the effect of friendship strength, we evaluated results for the four different values, i.e., 2, 5, 10, and 15. A typical pattern is found in all the graphs shown in Figure 5. It shows that Precision is less for people whose mutual presence is less at different sites. Also, people with strong social ties spent less than 1 hr time together at a specific location. To understand the graph’s actual meaning, we quantify and reconcile with the actual direct social ties. It is observed that a positive correlation in results infers that people with strong social connections often visit places together.

Figure 6(a) represents the Recall results. We tested and evaluated Recall based on the same measures as Precision, i.e., direct calls, cooccurrence, and gap time frame. Figure 6(a) states that the value of Recall is at a maximum when we consider a less number of commonplaces. An inverse trend is observed between the values of Recall and , specifically for the value ranging between 0 and 3. The same as Precision, Recall is also evaluated based on six different time gap values.

This part of the research finds people’s cooccurrence based on the same base station connectivity in a specific time frame and infers them as social ties. Furthermore, it cross relates the inferred results with direct call results.

4.2. Brightkite- and Gowalla-Based Social Tie Inference

Brightkite and Gowalla data sets contain direct social ties as well as the check-in information of each user. In our study, we investigated both data sets based on several dimensions and found some of the very interesting facts. During the analysis, we observed positive correlations between mutual friend count values and user check-in details. In Brightkite, ranging 6 to 40 and in Gowalla, ranging 25 to 90 shows the positive corelation with Precision. We also measure the effect of gap time value on the Precision and Recall and evaluated results based on several gap time values.

Figures 7(a) and 7(c) of line graph show the relation between mutual friend count values and Precision, while Figures 7(b) and 7(d) show the relation between mutual friend count values and Recall. In Figure 7, results shows that people having a certain number of mutual friends use to visit place together or with little gap of interval.

The social tie inference framework infers some of the absent ties as social ties in the form of false-positive results. To further extract the actual meaning of such incorrect inferences, we conducted a social activity and simulation. The false-positive results of the first part of the research serve as the foundation for the second part. Activity under the first part data set is conducted, and the false-positive results are examined by studying the human cooccurrence patterns, described in the next section Suspicious Ties. This stage of research gave us a clue to further exploit the category of missing ties.

5. Suspicious Ties

An absent tie can be inferred as a suspect tie, if it satisfies the following properties:(1)M, number of mutual friends(2)C, number of cooccurrence on different sitesIn the CDR data set, each cell is treated as a single cell of the base station. Our model adopts the following four features for evaluation:(1)Base station(2)User ID(3)Gap time threshold(4)Call time stampThe following mathematical model explains the problem and its formulation:Let denotes a set of points, and is called as the distinct base station cellLet denotes a set of points, and is called as the distinct userLet denotes a set of points, and is called as the threshold value for timeframeLet denotes a set of points, and is called as the call information,Let denotes a set of points, thenLet denotes a set of points, thenLet denotes a set of points, thenwhere is a set of elements that identifies distinct callers based on the same base station connectivity and a definite number of calls in a specific time frame.

6. Suspect Inference Framework

We studied the pattern of exceptional cases belong to the false-positive set and described a subset of the false-positive set as suspect ties. Physical activity was designed and conducted to investigate the formation of suspect social ties. Activity consisting of 50 people, and a data set was generated within almost 4-5 hours. A basketball court was utilized for the activity. Nine circles were drawn physically on the basketball court, assumed as the base station cell. Out of 50 people, nine were directed to act as a base station. The boundary of each circle was considered as a range of the base station cell. Rests of the 41 persons were directed to perform the following two steps.

6.1. Selection Step

Initially, each person from 41 people was asked to choose two sets of friends. One set as obvious friends and the second set of hidden friends such that the size of the hidden friend set should be at most 1/5 of the obvious friend set size, e.g., if one person has five people in apparent friend set, he can have no more than one hidden friend. After the selection of both sets by each individual, information was shared with one of our representatives.

6.2. Operation Step

In this phase, each person was directed to follow the following rules:(1)You should not call your hidden friend(2)You should call all of your obvious friends at least once(3)You should conduct a maximum number of calls to your closest obvious friend and second maximum to a second level obvious friend and likewise to the least friend(4)You must try to meet your hidden friend as much as possible physically

The method of calling is like, if person A wants to call person B from base station B1, the person has to go to the base station B1 and register a call with a person acting and standing in base station B1. Respective base station person will write and make an entry with five parameters, i.e., Caller Name, Callee Name, Time, From Base Station Name, and To Base Station Name. The data set was gathered in the following format. An example is given below.

For the understanding of the variations and patterns, the same activity was also designed using simulation. The whole simulation followed the same conditions, and another data set was generated using a random function. Based on the activity, a framework is designed to separate a class of suspect ties. Proposed inference framework work is designed and implemented using pipe and filter architecture, shown in Figure 8. Algorithm 2 shows the implementation of suspicious tie inference framework, explained in Figure 8. The framework takes the social network matrix, count threshold value, gap time value, and mutual friend count value as inputs and filters the result. Initially, Calculate Levels() function finds the five level depth information for each distinct user. Let us say if A calls B, B calls C, C calls D, and D calls E, it implies that A = 0, B = 1, C = 2, D = 3, and E = 4 represent five levels. This step ensures that all the levels have distinct users. Secondly, Find Suspects() function selects only those sets of users from level 1 and level 3 that do not have any direct calls and number of mutual friends. Furthermore, Calculate Subsocial Network() function generates subgraphs using level 1 and level 3 details depending upon the gap time value. Results are filtered on these bases of time gap value, e.g., two users called some other user while connected to the same base station within the given time frame, explained in the previous section. After that, Infer Hidden Ties() function uses the proposed normalization method to find the number of the count, defined in equation (9), and then all results are filtered according to the cooccurrence count and the mutual friend threshold value . Based on mentioned parameters and thresholds, Algorithm 2 significantly identifies the subclass of missing ties as suspicious ties.

Require:
   SN = Social Network
   GV = Time GapThreshold Value
   CV = Cooccurrence Count Threshold Value
   M = Mutual Friend Count
Ensure:
   ST = Inferred Social Suspect Ties
(1)whiledo
(2)  Levels [5, n] = Calculate Levels()
(3)  Suspects = Find Suspects (Level 1, Level 3)
(4)  SG = Calculate Subsocial Network (Suspects, GV)
(5)  ST = Infer Suspect Ties (SG, CV, M) using equation (9)
(6)end while

The results of the activity and simulation are computed and evaluated using Precision and Recall measures. Evaluation results of simulation and social activity conducted are shown in Table 1. Precision, Recall, and F1 Score measures are used to evaluate the framework that is further calculated based on the cooccurrence count , mutual friend count , and gap time parameter setting values. F1 Score is calculated using equation (5). Definitions of the related parameters are given as follows.

6.3. Results and Discussion

We tested and evaluated all records based on the cooccurrence count value , mutual friend , and gap time value as the gap time. Figures 9(a)–9(d) show the evaluation of results for the activity, based on three different values of the cooccurrence count value , i.e.,  = 5, 10, and 15, and two different values of the mutual friend , i.e., and . Likewise, Figures 10(a)–10(d) show the evaluation of results for the simulation data set along with mentioned and values. Results were generated on four values of gap time, i.e., 30, 30, 20, and 10.

In Figures 9(a) and 9(b), Recall is maximum and Precision is less where the value of , , and . The system obtains a maximum number of relevant hidden ties along with false-positive results. By , , and , it means that the gap between the two calls is 30 mins or more while the count of cooccurred events is kept minimum five and mutual friend count as two or less. The system’s performance drops when gap time is reduced to 30, 20, and 10. Even though there is a drop in true-positive values, a significant drop in the false-positive values can be seen. Results become concise, with the variation in both values of and . The limited data set collected using activity highlights the occurrences of hidden ties. Whole activity and simulation were designed to get similar fields of data as CDR so that the proposed framework is compatible with the CDR data set.

Results shown in Figures 9 and 10 exhibit the existence of hidden relationships. The parameter, such as the gap time value , helps to identify the time frame selection such that what gap value is suitable to get optimal performance? Likewise, Figures 10(a) to 10(d) represent the Precision and Recall curves for the simulation data set. Results are gathered by setting the same values for the threshold parameters , , and . Figures 10(a) and 10(d) give the least values for Precision and Recall, which tells that if threshold values are tightened, so does the count relevant retrieve results get low. Out of all the results, Precision and Recall values for both simulation and activity data set show maximum, if threshold values , , and are set. We found some exciting dissimilarities in the results between simulation and activity data set results during the complete evaluation process. The simulation results do not give higher Recall and Precision value compared to activity data results. We concluded that the data set of simulation is generated using random function while the activity had the human hiding patterns. These results also help to infer human psychology in building hidden ties. The simulation data set generator works based on the same constraints mentioned as rules of the activity. However, the key difference between simulation and activity is the selection of friends, hidden friend, and the pattern of calling. While conducting the simulation, trivial variation in results was observed as the random selection has random patterns. According to our findings, simulation and activity both exhibit patterns of hidden ties. However, activity results are more pronounced and significantly identifying the hidden relationships.

It is essential to highlight some critical questions and dependent variables that help to find hidden social ties between two people, for example, why two people hide their social ties? Is this deliberate or unintentional action? What if they are deliberately hiding their social tie for a purpose? In such a scenario, extracting a social tie for two people is a kind of intense problem. It is a kind of investigation process which explores a clue to draw some relationship between two people. Investigation designates if two people are posting a picture on social media to be more cautious about not identifying their social ties. While if they are doing some private activity, they will be less careful, for example, posting a picture on Facebook or Flickr compared to calling to a person from a specific location. Although extracting hidden social ties include various privacy issues. We designed activity and simulation that generated the data set by the caller data record (CDR) data set. We thoroughly investigated the patterns of connectivity and established a framework to infer social ties. This research has opened up a new direction further to explore connectivity in the Social Internet of Things (SIoT), specifically machine-to-machine direct communication and machine-to-human or human-to-machine hidden relationships. In many cases, several machines work together but they rarely have a direct connection.

7. Conclusion and Future Work

In this research, we have examined the developments of social connection patterns based on physical gathering. In the first part of our research, we explored the correlation between direct communication and two individuals’ physical presence. To check the system’s performance and evaluation, we examined all results by utilizing Precision and Recall evaluation measures. We also present a periodic normalization equation for the cooccurrence count. In the second phase, we propose the suspect tie inference framework. False-positive results of the first part of the research are the ground to the second part of the study. The proposed framework adopts pipe and filter architecture, where the threshold values control each filter. The framework’s fundamental objective is to take the data set such as CDR (caller data record) and infer suspect social ties, depending upon the specified threshold values. Analyzing the results critically, we propose a theory that identifies suspect social ties. Besides this, for comparison and evaluation, we conducted real-time human-based activity and simulation. Keeping in mind the structure of the actual CDR data set, the whole activity was designed and evaluated. In contrast to existing work, our research focus is on hidden ties instead of the hidden actors. In the future, we are aiming to explore the homophilic nature of suspect ties within the Social Internet of Things (SIoT).

Data Availability

The data used can be found at http://snap.stanford.edu/data/index.html#locnet.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this work.

Acknowledgments

This work was supported by King Saud University, Saudi Arabia, through research supporting project number RSP-2020/184. Nauman Ali Khan acknowledges the support of the Chinese Government and Chinese Scholarship Council (CSC) for his PhD studies at the University of Science and Technology, China. This research work was partially supported by Key Program of the National Natural Science Foundation of China (grant number. 61631018).