Abstract

With the development of mobile Internet, more and more individuals and institutions tend to express their views on certain things (such as software and music) on social platforms. In some online social network services, users are allowed to label users with similar interests as “trust” to get the information they want and use “distrust” to label users with opposite interests to avoid browsing content they do not want to see. The networks containing such trust relationships and distrust relationships are named signed social networks (SSNs), and some real-world complex systems can be also modeled with signed networks. However, the sparse social relationships seriously hinder the expansion of users’ social circle in social networks. In order to solve this problem, researchers have done a lot of research on link prediction. Although these studies have been proved to be effective in the unsigned social network, the prediction of trust and distrust in SSN has not achieved good results. In addition, the existing link prediction research does not consider the needs of user privacy protection, so most of them do not add privacy protection measures. To solve these problems, we propose a trust-based missing link prediction method (TMLP). First, we use the simhash method to create a hash index for each user. Then, we calculate the Hamming distance between the two users to determine whether they can establish a new social relationship. Finally, we use the fuzzy computing model to determine the type of their new social relationship (e.g., trust or distrust). In the paper, we gradually explain our method through a case study and prove our method’s feasibility.

1. Introduction

With the development of the Internet, more and more individuals or organizations tend to communicate and interact on the network platform. Through social platforms, people can not only share their own feelings about different products but also express their views on others, which greatly enriches people’s social activities. However, the rapid development of social platforms has filled them with too many useless or false information and accounts. In order to quickly and easily browse the content for interested, users usually add users who have common interests to the “trust list.” At the same time, in order to avoid browsing the content not interested, users usually add users with opposite interests to the “distrust list.” For example, and have similar interests, while and have some conflict in a certain area. So, on Twitter, may follow and add to the blacklist. By capturing the trust and distrust relationships between users, we can build a signed social network.

Through the trust/distrust relationships in the signed social network, we can not only know which users the target user has social relationships with but also know what kind of attitude the target user adopts towards these users. However, the social relationships in a social network are too sparse, which seriously hinders the expansion of the user’s social circle and the further development of the social platform. Therefore, it has become necessary to help users discover more new friends or trusted users. Fortunately, users have left a large amount of historical behavior data (e.g., user’s rating and comments) on social platforms, which provides favorable conditions for evaluating the trust relationships and similarity of preferences between different users.

However, the approach to finding potential friends through shared interests still faces many challenges. First of all, existing methods focus more on how to predict the trust relationships between users and ignore the role of distrust relationships in the social network. Secondly, the existing methods do not consider how to protect the user’s private information. Rating information is a piece of very important private information for users. Once the rating information is disclosed, users may be affected by targeted marketing. For example, some criminals learned about your rating information and fabricated or disseminated data, infringing on the user’s private information. In response to these challenges, we propose a trust-based missing link prediction method to find new trust/distrust social relationships for users in social networks.

In general, we have two contributions in this paper: (1)In this paper, we use simhash technology to find users who may establish a social relationship with the target user. This technology is not sensitive to the historical data of user behavior, which effectively protects the user’s private information and greatly reduces the calculation range(2)We use the fuzzy computing model to predict the types of social relationships that may be established between users (that is, to form a signed social network)

The rest of this paper is organized as follows. Related work is introduced in Section 2. In Section 3, we introduced research motivation. Section 4 introduced the simhash-based link prediction method we proposed in detail. In Section 5, we conducted a case study to prove that our method is feasible. Finally, we summarize this paper in Section 6.

2.1. Link Prediction

As an effective method to solve network sparsity, many studies [1] have used link prediction methods to predict missing edges in networks. For example, Qi et al. [2] proposed a web API recommendation method to generate links between compatible web APIs. Naturally, link prediction methods are also used to solve problems in social networks. Zhang et al. [3] used the network structure and user information to efficiently predict future friendships between users, thereby improving customer loyalty and user experience. Kutty et al. [4] are committed to predicting new social relationships between two different sets of users. For example, there are two user collections: teachers and students. Kutty et al. will predict a new social relationship between a teacher and a student. Wang et al. [5] integrated the cyber, physical, and social spaces together and proposed a distributed method with its incremental calculation for big data in cyberphysical social systems and then used big data of network physical society to calculate tensor and optimize the model. Yang et al. [6] proposed an online social network recommendation system based on Bayesian inference, which attempts to help users establish social relationships with users with similar ratings. In addition, Zhou et al. [7] propose a coarse-to-fine feature matching scheme using both global and local CNN features for real-time near-duplicate image detection. Zhou et al.’s method has some creative inspirations for finding the social relationship between users through feature matching.

2.2. Signed Social Network

Although many people have studied how to solve the sparse problem in social networks, they only focus on unsigned social networks. Fortunately, a growing number of researchers have realized that social relationships between users are signed and have studied trust/distrust in social networks. Xu et al. [8] applied trust relationships to edge computing of social networks. Beigi et al. [9] distinguished unsigned social networks from signed social networks and used three social science theories to study the problem of predicting social relationships in SSN. Wen et al. [10] studied the differences in people’s behaviors when they tended to believe and not believe and confirmed their impact on the spread of information on social media. Xu et al. [11] were devoted to using trust relationships for vehicle internet video monitoring offloading service. In addition, Li et al. [12] studied the community diversified influence maximization (CDIM) problem and solved a series of computing challenges in social networks.

2.3. Privacy Protection

In addition, privacy protection is also a research hotspot in related fields, attracting many scholars to participate in the research. Liu et al. [13] proposed an outsource real-time route planning (or2p) scheme, which can protect user trajectory data in route planning. Zhong et al. [14] proposed a multidimensional quality ensemble-driven recommendation approach named RecLSHTOPSIS based on LSH and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) techniques to protect multidimensional user Qos privacy data in mobile edge computing. Chi et al. [15] proposed LSH-based recommender systems with privacy protection. Xu et al. [16] proposed a blockchain-powered crowdsourcing method considering privacy preservation in a mobile environment. Although the above research is very effective, there is not much research on protecting user private information in social networks. Qi et al. [17] proposed a kind based on the classic Locality-Sensitive Hashing (LSH) technique to protect privacy data in a smart city. In addition, Zhou et al. [18] propose a novel coverless steganographic approach without any modification for transmitting a secret color image. This method has some enlightenment for protecting users’ private information.

3. Research Motivation

As shown in Figure 1, each node in the figure represents users in the social network, and each edge represents the social relationships between users. In Figure 1, represents the target user and the other nodes represent users associated with . In the social network shown in Figure 1(a), the black lines represent the social relationships that exist between users, but it is not known whether the relationship between users is trust or distrust. When the social relationship contains the information of whether the user trusts or not, it constitutes the signed social network as shown in Figure 1(b). The blue lines represent the social relationships of trust, the red lines represent the social relationships of distrust, the solid lines represent the existing social relationships, and the dashed lines represent the possible social relationships.

As can be seen from Figure 1, the existing social relationships between users are relatively sparse, but there are still many potential social relationships waiting to be explored in the social network. In order to increase the number of edges in a social network, we need to calculate the possibility of establishing a social relationship between users and calculate the trust/distrust value between users. However, we face many challenges in this process. First of all, there are tens of thousands of users in social networks, and it takes a lot of computing power to calculate the trust/distrust relationships between any two users. This can place a huge burden on the server, prolong computing time, and ultimately create a bad user experience. Secondly, users generally care about whether their private information will be disclosed by social platforms. In fact, the user’s rating information can accurately reflect the user’s interests and hobbies. However, obtaining the user’s rating information without permission is usually considered to be offensive, and at the same time, users will be harassed by marketing advertisements. However, in the process of calculating whether users can establish new social relations, users’ privacy will be frequently accessed, which will easily lead to user privacy disclosure. Therefore, we need to design a method that can effectively protect the user’s private information and significantly reduce the amount of calculation while predicting the social relationship between users.

There are both trust and distrust relationships in social networks. Most of the existing link prediction methods in the field of social networks only predict the trust relationship between users and ignore the distrust relationship. In fact, distrust relationship is also crucial in social networks, so we propose a new link prediction method named the trust-based missing link prediction method (TMLP), which can predict both trust and distrust relationships. In addition, considering the user’s demand for privacy protection, the TMLP method can also effectively protect the user’s private information from being disclosed. The principle of the TMLP method will be explained in detail below.

Step 1. Build a hash index for each user.

Hash is a common verification [19] and mapping technology, and simhash is a better mapping technology in hash. It is well known that the principle of the simhash method [20] is that the more similar the items that two users interact with, the more similar their simhash values are. From this, we can see that if we want to find users who may have a social relationship with the target users, we only need to compare their simhash values. This subsection will explain how to create a simhash index for each user based on their behavior history.

In Figure 2, represents the target user who needs to establish social relationships. represents a collection of all items, and users in a social network interact with items in . is a collection of items that has interacted with. Simhash technology can map the interaction history of into a one-dimensional vector, which is represented by .

First, we set up for each item in a random -dimensional vector consisting only of “0” and “1,” in which ( is taking the round number in the direction, e.g., ). In the example shown in Figure 2, . According to Formula (1), we form the items the user has interacted with into an matrix :

Next, as shown in Step (1) in Figure 2, we delete the null value in and replace the “0” in it with “-1” to get a new matrix . Next, we take the sum of the columns of , and we get an -dimensional vector as shown in Step (2) in Figure 2. Finally, as shown in Step (3) in Figure 2, we set values greater than “0” in to “1” and values less than or equal to “0” to “0,” to obtain the user’s simhash index . According to the simhash theory [21], can be regarded as the index of user . At this point, we can create a simhash index for each user through this method.

Step 2. Build the set of users who may establish social relationships with the target user.

In the previous step, we created a simhash index for each user based on their behavior history. Next, we need to determine which users might have social relationships with .

We first calculate the Hamming distance between and , which is represented as . Specifically speaking, assume and are represented by the -dimensional vectors and , respectively. Then, can be calculated by Formula (2), where is the Boolean value calculated by Formula (3). Here, the sign “” refers to XOR operation:

The more similar the items that two users interact with, the smaller the Hamming distance, and the more likely they are to establish a social relationship. In Formula (4), if the Hamming distance between the target user and user is smaller than , can be regarded as a possible linked user to , and is put into the possible linked users (PLU ()) set of . In addition, the pseudocode used to build a possible set of users linked to the target user is specified in Algorithm 1. With the simhash method, we build an index for each user that is insensitive to their historical data. Therefore, the simhash method can also effectively protect user privacy. In short, on the premise of effectively protecting users’ privacy, we have found users who can establish social relations for target users.

Require: the simhash index of the target user: &
 the simhash index of each user
Ensure: each target user sets PLU ()
Let PLU () = Φ
while each do
ifthen
   
   ifthen
    
   else
    
   end if
   enqueue into PLU()
   update PLU()
end if
end while
Algorithm 1. PLU ().

Step 3. Calculate whether social relationships are positive or negative.

Now that we know which users might have social relationships with their target users, that is not enough. If user A and user B have seen the same four movies, but user A likes movies 1 and 2, while user B likes movies 3 and 4, we cannot think that user A and user B trust each other. So next, we need to predict the types of these social relations (new links). Based on the methods of [22, 23], we propose a trust-distrust fuzzy computing method based on the user preference similarity in this section.

The calculation of preference similarity based on the user’s rating of items is actually the construction of a weighted social network [24], and the sign and sizes of weights on social relations are used to judge whether users have a trusting relationship or a distrustful relationship. Users usually score items in the range of 1-5. For 3-5 points, we can think that the user likes the item, while for 1-2 points, we can think that the user does not like the item. If two users have similar ratings on an item, we can assume that their preferences are similar. Therefore, the user’s rating of an item can be regarded as a fuzzy variable.

Therefore, we adopt the half triangular membership function [25] defined in Formula (5). This half triangular membership function represents the continuity of fuzzy set from the minimum value (min) to the maximum value (max), in which refers to the rating of the item made by the user:

Based on the fuzzy set , the items are classified into two types: helpful item () and the unhelpful item (NH):

The preference similarity between and is represented as and , while and are expressed by Formulas (8) and (9), respectively:

The trust and distrust relations between users are represented by and , respectively. Whether the social relationship between two users is a trust relationship or a distrust relationship depends on the relative size of and . If is greater than , the two users are in a trust relationship, and vice versa. The calculation is as follows:

The trust values in Figure 3 are fuzzy into three normal fuzzy sets. The fuzzy sets of trust include CT (complete trust), AT (almost trust), and NT (not trust). Similarly, the untrusted fuzzy sets include CD (complete distrust), ad (almost distrust), and Nd (not distrust). Please note that in a social network, the social relationship between two users is clear. If between the two users is greater than , the two users are considered to trust each other; otherwise, they are regarded as distrustful. The pseudocode for the TMLP method is specified in Algorithm 2.

Require: and PLU ()
Ensure: and New signal social network
whiledo
   or
  ifthen
   
  else
   
  end if
end while
Algorithm 2. TMLP.

5. A Case Study

In this section, we will demonstrate the process of the TMLP method through a case study. As shown in Figure 4, there are nine users in the social network forming a user set . Among them, is the target user. In Figure 4, the connection between users represents the social relationship, where the solid line represents the existing social relationship, the dashed line represents the potential social relationship, the blue line represents the trust relationship, and the red line represents the distrust relationship. Then, we selected 16 movies to form the movie set . The record of the user watching the movie is shown in Table 1. 1-5 indicates the rating given by the user, and 0 indicates that the user has not watched the movie.

Step 1. Build a hash index for each user in .

First, we calculate the hash value of the movie in to get the set , and the results are shown in Table 2. Then, according to Formula (1) and user rating records, we can get shown in Table 3.

Next, in vector , we will delete dimensions with a null value and replace “0” with “−1.” Then, we will obtain a new vector . Next, in the generated matrix, we calculate the sum of each column and obtain a new vector . Finally, the positive and negative values were replaced with “1” and “0,” respectively, and we will obtain a new vector . The new vector is the simhash value of . Formulas (12) and (13) show the process of generating from , and then to . Table 4 shows the simhash values of all users in :

Step 2. Build the set of users who may establish a social relationship with the target user.

Now, we need to find the set of users (PLU ()) who may establish a social relationship with the target user according to the simhash value. We first calculate the Hamming distance between and , which is represented as . The Hamming distance between and other users calculated according to Formula (2) is shown in Table 5.

At this time, we set the threshold of Hamming distance to 3; that is, users whose Hamming distance is less than 3 belong to the set PLU ().

Step 3. Calculate the similarity between the target user and other users to determine the type of social relationships.

Now that we know which users are able to establish social relationships with target users, we will next determine the types of these new social relationships. First, we use the half triangular membership function defined by Formula (5) to determine the continuity of the fuzzy set , as shown in Table 6.

Based on the fuzzy set and Formulas (6) and (7), we can classify the movies that users have watched as like () and dislike (), as shown in Table 7.

According to the movies that users like and the movies they do not like, we can calculate the similarity of preferences between and users in PLU (), as shown in Tables 8 and 9.

From the preference similarity, we can get the trust value and the distrust value between the two users and finally determine whether the two users trust or distrust, as shown in Tables 10 and 11.

6. Conclusions and Future Work

In this paper, we mainly propose a novel link prediction method (i.e., trust-based missing link prediction (TMLP)) to find missing social relationships in signed social networks and predict possible social relationships. In addition, we also conducted research on how to effectively protect user privacy during the link prediction process. Finally, through a case study, we verified the feasibility of this novel link prediction method. However, there are some shortcomings in our approach. For example, our method does not use only case studies without actual experimental validation. Furthermore, our method does not consider the network delay and energy consumption of social platforms. In the future work, we will carry out a series of experiments to verify our method and make it more convincing. Then, we will consider using an edge computing algorithm [26, 27] to solve the problem of social platform in the Internet, so as to provide better services for users. And we also note that there have been some studies [28] on local feature matching, and we will use the inspirations from these studies for future work.

Data Availability

The research was demonstrated through a case study and therefore did not use publicly available data sets.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This paper is partially supported by the Natural Science Foundation of China (No. 61872219), the Natural Science Foundation of Shandong Province (ZR2019MF001), and the Open Project of State Key Laboratory of Novel Software Technology (No. KFKT2020B08).