Abstract

Social networks are becoming popular, with people sharing information with their friends on social networking sites. On many of these sites, shared information can be read by all of the friends; however, not all information is suitable for mass distribution and access. Although people can form communities on some sites, this feature is not yet available on all sites. Additionally, it is inconvenient to set receivers for a message when the target community is large. One characteristic of social networks is that people who know each other tend to form densely connected clusters, and connections between clusters are relatively rare. Based on this feature, community-finding algorithms have been proposed to detect communities on social networks. However, it is difficult to apply community-finding algorithms to distributed social networks. In this paper, we propose a distributed privacy control protocol for distributed social networks. By selecting only a small portion of people from a community, our protocol can transmit information to the target community.

1. Introduction

Social networks are increasing in popularity, and people are sharing information with their friends on social networking sites (SNS). Most of these sites treat all contacts equally by default. For example, if a person does not sort his/her friends into groups, subsequently all of the person’s friends can view his/her messages posted on a wall. Even if SNS provide a grouping function, previous works have indicated that sorting friends is inconvenient [1, 2]. In the real world, individuals have distinct types of relationships with different people. The information a user wishes to share with a group of people may not be appropriate for people in other groups, even if they are all the user’s friends.

Hence, many privacy protection mechanisms have been proposed [36]. These mechanisms, however, require users to set access rights for all their friends in advance. Although this provides accurate solutions for deciding who should have access to certain information, it is inconvenient for a user to manage them, especially when a user has many friends. People may not maintain all the groups that they join in real life on SNS. In addition, even though many social networking sites provide group settings, famous SNS such as Twitter do not have this feature yet. Jones and O’Neill [2] suggested providing group-based privacy using naturally organized groups, which reduces the burden of configurations.

A naturally organized group is a densely connected cluster on a social graph. People in real life tend to form groups. For example, you and your high school classmates form a group; you and your coworkers form another group; members of a club you belong to form yet another group. As indicated in previous studies, people who recognize each other in real life are likely to establish connections on SNS. Mayer and Puller [7] reported that only of connections were merely online interactions; therefore, it is safe to assume that the connections on SNS between you and your friends and those between your friends and your friends’ friends form clusters. While many ties exist inside a cluster, only a few ties exist across different clusters. These clusters become meaningful groups because connections in a cluster are established for the same reason.

Typical community-finding algorithms only function when a user have access to his/her own ego-network, which includes connections between the user and his/her friends (Level 1 friends) and between the user’s friends and their friends (Level 2 friends). However, on many SNS such as Facebook, a user does not have access to other people’s relationship paths. In theory, a user may acquire all his/her friends’ connection by asking his/her friends to use a Facebook application written to collect data, which is almost impossible to achieve.

Another approach to protect a user’s privacy is to establish a decentralized social network, that is, a social network in which a user only knows his/her direct connections. Although this is not yet popular, it has been discussed in Safebook [8] and Helloworld [9]. Furthermore, several studies have presented decentralized social network schemes [10]. In this type of social network, it is impossible to learn other people’s connections in advance.

Herein, we first present previous studies regarding private information sharing in social networks; subsequently, we propose a new private-information sharing protocol used on decentralized social networks. Our protocol, which is based on secret sharing, utilizes characteristics of social networks. Our protocol exhibits the following properties. First, to utilize naturally organized groups, communities must be located using only information a user can acquire. We assume that the information a user can acquire is the list of her Level 1 friends. Next, this protocol does not leak the friendship connections of the source to any users. Furthermore, this protocol can be adapted to centralized social networks.

The remainder of this paper is organized as follows. We introduce the background and related studies in Section 2, present the model of our study in Section 3, introduce and analyze our protocol in Section 4, describe our experiments in Section 5, discuss the results in Section 6, and provide the conclusions in Section 7.

2.1. Privacy Control on SNS

Security and privacy [1114] are two important topics that are often discussed in various kinds of applications and environments [1520]. The privacy problem on SNS has been reported in previous studies. Persona [3] combined attribute-based encryption with the traditional public-key approach to provide user-defined access control on SNS. flybynight [4] supported secure one-to-one and one-to-many communications on Facebook by applying RSA and El Gamal to encrypt and decrypt information. NOYB [5] protects private data by partitioning data into atoms and substituting these atoms with another user’s atoms pseudorandomly. Lockr [21] provides access control on Flickr. However, these mechanisms require users to define the access ability for each of their friends in advance. By placing each of the user’s friends into a predefined community, a user can share private information to only those in the target groups. They use cryptography to protect private information. This results in a complicated key exchange, and it will be difficult to revoke the keys when the connections on SNS are canceled.

2.2. Community-Finding Algorithms

Searching for communities on complex networks is a well-studied topic. Traditional methods based on graph partitioning, such as Kernighan–Lin’s algorithm [22], divide a graph into clusters. Modern methods, such as Newman–Girvan’s algorithm [23], utilize “modularity” to define the stop criterion. Many community-finding algorithms [24, 25] based on modularity demonstrate good partition results when modularity is maximized. CONGA [26] improved the original Newman–Girvan algorithm so that overlapping groups could be detected. Other algorithms such as those in [27, 28] have been introduced to detect overlapping groups. The algorithms introduced above require a user to know the entire network data. However, it is infeasible for a user to obtain full network data on SNS or the WWW. Additionally, local community-finding algorithms have been proposed. Clauset [29] and others [30] proposed the local modularity method. Bagrow [31] proposed the “outwardness” method.

The algorithms mentioned above require users to know their Level 1 and Level 2 friends. However, on mobile networks, a user cannot easily obtain other people’s connections. In addition, SNS such as Facebook restrict users from accessing other people’s contextual information, which renders it difficult to apply these methods.

2.3. Group Communication

Applications on social networks are often related to group communication. Some have already utilized the naturally organized community. Grob et al. [1] conducted a survey and concluded that group communication occurred frequently, but grouping functions were rarely used. In their survey, only of users used the built-in grouping functions on mobile phones. They implemented Cluestr and applied CONGA [26] to recommend friends within a community. Jones and O’Neill proposed using implicit communities that appeared on people’s social graph for privacy control [2]. They used the SCAN algorithm [32] to detect communities. Li et al. [33] proposed a provably secure group key agreement scheme with privacy preservation for online social networks using extended chaotic maps.

3. Problem Statement and System Model

We model an online social network as a simple graph , in which is a set of users and is a set of connections on that online social network. Furthermore, we model a real-life social network as a simple graph , where is a set of people and is a set of acquaintance links between each person. We assume that a bijection function exists. In other words, we ignore people who do not exist on the online social network. We model a real-life community . We assume that a corresponding naturally formed community exists for every community . That is, a bijection function exists.

We define a friend set as a set of nodes , in which for each , exists.

3.1. Problem Statement

The goal of our protocol is to enable to transmit a secret to , where a corresponding exists. For each , a corresponding node exists. Transmitting to the nodes in is equivalent to transmitting it to .

3.2. Desired Properties

Our protocol exhibits the following properties.

3.2.1. Decentralized

Our protocol can be applied to decentralized social networks.

3.2.2. Privacy

Our protocol should protect all nodes’ identities and link privacy. A node’s ’s link privacy is the knowledge of , . It is noteworthy that means the friends of . Its identity privacy is the knowledge of ’s existence.

3.2.3. Robustness

Our protocol should adapt to constantly changing social networks. The set of users who receive the private information should conform to the current social network topology.

3.3. Adversary Model

Herein, we define a semihonest adversary model. In this model, a node follows the protocol but may wish to discover , , where , , and is not the sender on .

For each , if the adversary can identify any , , then the link privacy of is leaked.

For who sends a secret in , if an adversary can identify the identity of without acquiring the full secret, then ’s identity is leaked. That is, a node should learn the source of a secret if and only if it receives the secret.

An adversary can be an intermediate node, receiver, or stranger that does not receive any tokens.

4. Protocol

In this section, we propose and analyze our privacy control protocol, known as the decentralized private information sharing protocol (DPISP). The DPISP allows a node to distribute private information on social networks to a group of nodes without setting community members in advance. Table 1 describes the notations.

In the DPISP, nodes in are divided into three parties:(i)A source node sends a secret to .(ii)A receiver receives any part of the secret.(iii)An intermediate node forwards any part of the secret to his/her friends. An intermediate node is also a receiver.

Although a source node knows only , it can easily identify the role of each in . For example, a node knows who its classmates are and who its coworkers are. To send information to a particular community on , can select representative nodes that belong to the corresponding community in . The nodes selected are the intermediate nodes.

The set of receivers is controlled by two parameters, and , along with the intermediate nodes designated by the source node. is the number of intermediate nodes plus the source node, and indicates the number of connections a receiver has between and him/her to receive the full message. Refer to Figure 1 for an example: the diamond nodes indicate the source node. The four square nodes are the intermediate nodes . If we set , only the squares will have access to the full information. If we set , both the squares and circles will have access to the full message. If we set , even the triangles will have access to the message.

4.1. Protocol Overview

To send a private message to a community , the source node first divides the private message into partial message; subsequently, sends one token comprising partial information and a TTL tag (the TTL tag is used to indicate if a token should be propagated further) along with an identity tag, and , to each of the intermediate nodes, keeping one for him/herself. The source node sends the token with to all his/her friends. Tokens with the same identity tag indicate that they are the partial message of the same private message. The intermediate nodes save a copy of the tokens received from , decrease the TTL by 1, and subsequently propagate the tokens to all their friends. Those who receive or more tokens with the same identity tag can recover the private message.

However, in the case above, the source node’s link privacy is leaked. If an intermediate node receives a token directly from and a token propagated by another intermediate node also sent from , then can acquire ’s connections with other people. Assume that sets ; although cannot recover the full message, he/she can still discover that has a connection with . Because the token sent directly from has , instantly knows that the source node of this token is . Furthermore, because is the origin of the token propagated from , the two tokens’ identity tag will be the same; therefore, knows that the token propagated from is also sent from and realizes that a connection exists between and .

To solve this privacy leakage problem, the identity of the source node cannot be identified by receivers, unless they can recover the private information.

4.2. DPISP

The DPISP is based on secret sharing. In secret sharing, a secret is divided to parts; anyone who receives of parts can recover the secret, while those who receive fewer than parts cannot recover the secret and learn anything from the information they have received. We applied Shamir’s secret-sharing scheme [34] in our protocol. The DPISP contains two phases: the propagation and recovery phases. The detailed procedures of these two phases are shown in Figures 2 and 3.

4.2.1. Shamir’s Secret Sharing

Shamir’s secret sharing contains the following two schemes: the distribution scheme () and the reconstruction scheme (). Figure 4 shows the detailed functions.

A node runs to generate the shares from the secret . The input indicates the number of shares it creates, and indicates the number of shares it has to recover the secret.

In , a trusted dealer does the following:(i)Randomly chooses coefficients, denoted by (ii)Constructs a polynomial (iii)Computes shares by evaluating in distinct points

A node runs to recover the secret . The input indicates the number of shares it has to recover the secret; the input is a set of different shares denoted by , where . In , a node adapts Lagrange’s interpolation with the set to reconstruct the polynomial . If contains or more different shares, ; otherwise, and no information is revealed from .

4.2.2. Protocol Description

Figure 2 shows the propagation phase of our protocol. The source node first selects that it wishes to share the private information with. It selects members that it recognizes in real life from that group as its intermediate nodes, where is smaller than the group size. Next, applies to generate shares , where , , by evaluating the polynomial in and . For each , constructs the corresponding tokens with for , where and for . The elements of the token are described as follows: is the share that distributes to ; indicates the total number of different shares that distributes; represents the number of shares a node has to hold to reconstruct the message; and is a TTL tag. It sends token , , …, to the corresponding nodes that it selects earlier and sends to all its friends.

A node that receives any share first verifies . If , the node decreases by 1 and sends the token to all its friends. Additionally, the node maintains a copy of the token.

Figure 3 shows the recovery phase of the DPISP. To recover the private information from the tokens a node receives, he runs the recovery phase of the DPISP. To decrease the calculation cost, groups all the tokens by their , creating sets. Subsequently, he puts the tokens with the same in each into the same subsets, creating subsets , where . After grouping ’s tokens, he runs with all the combinations of tokens in each set . In other words, selects subsets from and runs for every possible combination of tokens among those subsets. If a secret is recovered successfully, removes the tokens belonging to that secret. After testing all the possible combinations, selects another subset and repeats the same procedure until all the possible combinations of subsets are tested.

To verify if is recovered successfully, a node calculates and verifies if .

4.2.3. Analysis

First, we examine the privacy of our protocol. As we have described earlier, a node on decentralized social networks only has knowledge of node set . By the DPISP, recovers the information if and only if , where is the set containing intermediate nodes.

The types of privacy involved in this study are as follows:

(1) The Source Node’s Privacy. Given that any receives one or more tokens, cannot distinguish its source . In addition, cannot distinguish if a connection exists between and any unless can read the secret.

(2) The Intermediate Node’s Privacy. Given that any receives one or more tokens, cannot distinguish if a connection exists between the intermediate node and any unless can recover the secret.

(3) The Receiver’s Privacy. Any cannot distinguish the receiver’s identity and its connections to other nodes in .

We discuss the privacy of the three roles. First, we show that DPISP protects the privacy of the source node by demonstrating that the identity of the source node is not revealed to those who cannot recover the secret . Assume that a receiver cannot reconstruct ; as the elements of the tokens do not reveal the identity of the source node, the origin of the tokens cannot be distinguished. The only exception is that the intermediate nodes know the origin because ; however, this is not a privacy leakage because the source node and intermediate nodes are already friends. In addition, knowing the information of one token does not reveal the source of other tokens.

Next, we demonstrate that the link privacy of the intermediate nodes is not revealed. Similar to the above, as the receivers do not know the origin of a token unless they can recover the private information, the identity of the source node is not revealed. Therefore, the receivers cannot acquire any knowledge regarding and, hence, the link privacy of the intermediate nodes is protected.

Finally, the privacy of the receivers is not revealed because the receivers do not provide any information to other nodes. The receivers can recover the private information by evaluating using the reconstruction method of Shamir’s secret-sharing protocol. With this information, they can identify shares that belong to the same . Because they know the identity of the intermediate nodes that sent these shares to them and they know because they can recover the private information, they know that connections exist between the intermediate nodes and the source node. However, we do not consider this a privacy leakage because we assume that nodes that can decode the message are in the same community as and the intermediate nodes; therefore, the receiver should know that and the intermediate nodes are Level 1 friends.

Next, we analyze the overhead of the DPISP. During the propagation phase of the DPISP, the source node sends tokens to all its friends; subsequently, all the intermediate nodes send tokens to their friends. Assume that the source node has friends; among its friends, it selects intermediate nodes, denoted by , …, , and each of them has friends. The total number of tokens transmitted during the propagation phase is .

During the recovery phase of the DPISP, a node places the tokens into the subsets according to its and . Assume that different pairs exist; therefore, it has to perform a maximum of for times to recover any secrets. Although a node has to perform times, the time cost is not as large as one might imagine.

4.3. Semidecentralized Protocol

The most pressing problem of the DPISP is that nodes may spend a significant amount of time recovering secrets if they receive many tokens. To avoid an exhaustive search, we propose a semidecentralized information sharing protocol, that is, the SDPISP.

The SDPISP utilizes a server to log tokens. The server provides two functions: register (, ) and query (), in which is a set of integers, and is the threshold. A source node registers a group of numbers to the server by calling the register function. The server records these numbers into a single entry via an in its database. Each entry contains the threshold . A receiver calls the query function to verify if any set of tokens that is valid for recovery exists.

To send private information using the SDPISP, decides a target community and divides the secret , where is the private information and , into shares by applying . Subsequently, instead of generating tokens without any identity tags, it generates tokens , with , where is a random number. Next, calls register (, ), , to send these random numbers to the server.

To recover secrets, a receiver calls query () and inputs all the random numbers he/she received. The server returns the sets of random numbers that get recorded under the same if the receiver holds or more tokens.

Figure 5 illustrates the concept of the SDPISP. and wish to send some data to their friends. They set and . They first create tokens with random numbers and , respectively. Subsequently, they register these numbers to a server and send the tokens to the intermediate users. Those who receive any tokens, for example, who receives tokens with , call the query function to the server; subsequently, the server returns to . The server does not return 4 because only obtains one token from , and cannot recover the data only by token 4. Take as another example: receives tokens and calls the query function; the server returns and to . Subsequently knows that he/she can recover two different sets of data from them.

4.3.1. Analysis

Using a server, receivers are not required to calculate all possible token combinations. By the SDPISP, they ask a server if any set of tokens that belongs to the same secret exists.

We examine the privacy of this semidecentralized protocol and ignore the chance of two random numbers colliding. In the SDPISP, the identity and link privacy of the participants are not leaked to each other. Additionally, the link privacy is not leaked to the server. If a node registers some random numbers to the server and a node queries the server with any random number that is registered by , then the server will not know whether is a Level 1 friend of or a Level 2 friend of .

5. Experiments

Assume that a community is an isolated, fully connected network, where all people belonging to the community are connected to each other, while no connections exist between people in different categories. In this case, it is sufficient for a user to set and . All the members in the community recover the information, but those who do not belong to the community will not receive enough shares to reconstruct the data.

However, on real social networks, two cases can occur. First, two users in the same community may not have a connection with each other. Second, one or more users may have connections to the people who are not in the same community as them. To improve the accuracy, we conducted experiments to obtain adequate settings for and for communities with different sizes and clustering coefficients.

5.1. Data Collection

Owing to the lack of ground truth, which is the information of the communities each user belongs to, existing social network graphs such as those in [35, 36] cannot be applied to our experiments. Many previous studies collected data from Facebook. However, querying other people’s contextual information is not allowed by the Facebook API unless they agree to provide the information. The only information we can easily acquire is the participants’ Level 1 friends. To collect the Level 2 friends, we can develop a Facebook application to collect the information and ask the participants’ friends to use it; however, it is unrealistic to expect all of them to use it. Because we require both the Level 1 and Level 2 connections of the participants to perform the experiments, we cannot collect data from Facebook. Therefore, we collected data from Plurk.

Plurk is a famous microblog in Taiwan. According to Alexa [37], on April 4, 2011, of Plurk traffic was from Taiwan, where it ranked 27th, as well as 1297th worldwide. The Plurk API allows us to collect other users’ data provided that the information is publicly available. Therefore, we asked students at both HIT.SZ and IIIRC to provide their friendship connections on Plurk.

In our experiments, we collected the friendship graph of eight students from HIT.SZ and two students from IIIRC whose Karma were all higher than 60—Karma is a value that evaluates the liveness of a user on Plurk. We extracted the links between these participants and their Level 1 and Level 2 friends. Subsequently, we placed their friends into one or more communities that were defined by the participants. For example, Table 2 shows the list of communities given by participant . He/she defined 4 communities to represent the social network and each of his/her friends can belong to 1 to 4 communities. Similarly, we collected 42 communities from them, in which the minimum community size was 3 and the maximum community size was 61; we denote the community size as in the following sections. Furthermore, we calculated the clustering coefficient () of each community, which measures the degree of which users in a community tend to cluster together. Equation (1) shows the definition of , where and indicate the actual and maximum number of links between each user in a community, respectively. The maximum number of links between each user in a community is , where is the community size. Table 3 shows the detailed data that we collected from Plurk.

After we have collected data from the participants, we performed experiments and recorded users who were not direct friends of the sender but had recovered the token. We sorted the list by the number of appearances each user emerged in the community during the experiments; subsequently, we asked the participants to confirm whether those who appeared on the lists were members of that community. Most participants indicated that they could not accurately determine whether a user on the list belonged to any of their defined communities. However, among the data we collected, we were confident that two participants could correctly identify their Level 2 friends who belonged to one of their communities.

Because it is easier for a user to select community members manually when the community is small, we ignored communities smaller than nine in our subsequent experiments. Hence, our test cases were formed by 31 communities that contain 9 to 61 members.

We only considered the neighbors of participants in a community because the participants cannot accurately tag those who are not their neighbors in the social network. Therefore, in our subsequent experiments, the results involve only the neighbors of each participant.

5.2. Accuracy of the DPISP

A feature of the DPISP is that the secret-sharing parameter can be dynamically adjusted. That is, people may decide if they want more or fewer users to receive the token. The secret-sharing parameter indicates the number of shares a user has to send to the intermediate users, and means the number of shares a user has to receive to recover the message. We measured the results of the DPISP by calculating the precision and recall.

The parameters used in this study are defined as follows:(i)True Positive (TP): users retrieved by the DPISP who are in the community defined by the participant.(ii)False Positive (FP): users retrieved by the DPISP who are not in the community defined by the participant.(iii)False Negative (FN): users who are in the community defined by the participant but were not retrieved by the DPISP(iv)Precision: fraction of users retrieved by the DPISP that belonged to the community tagged by the participant.(v)Recall: fraction of users retrieved by the DPISP that belonged to the community tagged by the participant.

In our experiments, we divided the test cases based on their sizes () and clustering coefficients. For each test case, we tested them by setting to 3 to 8. For each , we produced possible intermediate users sets. Subsequently, we tested the results for ranging from 2 to for each intermediate user sets.

An intermediate user set is the intermediate user selected by the source user. Owing to high time cost, if the number of possible combinations of is smaller than or equal to 5000, then we test all the possible intermediate user sets; if the number of possible combinations is larger than 5000, then we randomly select 5000 possible intermediate user sets to test.

5.2.1. Relation between and Group Size

We present the result by dividing the test cases into three categories according to their sizes. Categories , , and contain the test cases with community sizes between 9 and 16, between 17 and 32, and larger than 32, respectively. The clustering coefficients of these categories are 0.5878, 0.5126, and 0.5394, respectively.

Figures 6 and 7 show the average precision and recall of each category: in all three categories, when is fixed, the precision decreases and the recall increases as increases. This is because if more intermediate users exist, more people will receive the shares, thereby increasing the probability of people who are/are not community members that recover the token.

According to these data, while increases, recall decreases quickly. For example, in category , recall is 0.7278 for and decreases to 0.1646 for when . This is because users must receive more shares to recover the token; hence, the number of users who can recover the token decreases. Consequently, we recommend that users select a small . Meanwhile, the precision decreases and the recall increases slightly when increases. The precision is 0.73 for and decreases to 0.7008 for when ; the recall increases from 0.57 to 0.66 for the same pairs. This implies that users do not have to select a large to obtain the best result. Instead, they can select a smaller and the result will still be acceptable.

Additionally, we observed that the average precision increased with the community size. In our opinion, this was caused by an overlap in these communities. For instance, a user’s “good friend” group might be a subset of his/her “classmates” group. Figure 8 illustrates an example of overlapping communities. Suppose a user wishes to send a message to his/her “good friends”; therefore, she sends shares to the intermediate users among her “good friends.” However, some of her “classmates” can recover the token because the network is densely connected, thereby reducing the precision.

5.2.2. Relation between and the Clustering Coefficient

We divided the test cases into four categories according to their , where category contains test cases with , category contains test cases with , category contains test cases with , and category contains test cases with .

Figure 9 shows the average recalls of the four categories. We observed that the recalls reduced quickly while increased for communities with but decreased slightly while increased for communities with .

Even though the users may not know the clustering coefficient of their desired communities in advance, they can estimate whether the community is densely or loosely connected. For example, if wishes to send a message to his/her laboratory, he/she can assume that the members of this community are familiar with each other; therefore, the community is highly clustered, and he/she can set to 4 or 5 to minimize the chances of outliers recovering the token. Meanwhile, if wishes to send a message to his/her friends in his department, he/she should set to 2 or 3 to maximize the chances of the members of the department to recover the token.

5.2.3. Token Transmitted during DPISP

The number of tokens that the source and intermediate users must transmit must be equal to that of their friends. Figure 10 shows the average number of tokens transmitted during the protocol; the total number of tokens transmitted during the protocol increases with .

For each , regardless of the value of , the number of tokens transmitted should be the same. However, as shown by the results, the number of tokens differs when changes. This is because not every round appears during the experiments when a user recovers the token. Occasionally, no user can recover the token because none has received enough shares. We only counted the rounds where one user at the least recovered the message in the experiments.

5.3. Success Rate of the DPISP

Transmitting tokens through different intermediate user sets causes different groups of users to receive the tokens. While some intermediate user sets yield good results, occasionally no user can recover the information sent by the source user.

Herein, we present the success rate of the DPISP. For each test case, we measured the results for a maximum of 5000 rounds. We considered a test round successful if one or more users could recover the token. The results shown in Section 5.2 only incorporated the successful rounds.

We measured the success rate of our protocol by the following formula:where indicates the average success rate tested in for all test cases, indicates the number of successful rounds tested in for the test case, and indicates the number of total rounds tested in for the test case.

Figure 11 depicts the success rates, average number of failure rounds, and average number of total rounds of each calculated by the test cases. Although the success rate decreases when increases, it is near when is small, which means that even if a user selects intermediate users randomly, the information can still be propagated to someone.

5.4. Choosing Intermediate Users

In the DPISP, a user selects intermediate users from those who belong to that community. If, unfortunately, he/she selects “bad” intermediate users (i.e., users who have only a few links to the community members), the precision will be low, or the recall will be high.

To help users find “good” intermediate users, a user can send a link to all of his/her friends and ask them to register on that link to prove in advance that sufficient connections exist between them. Hence, a user first generates shares from the link and distributes these shares individually to each of his/her friends. Anyone who receives the shares recovers the link by applying the reconstruction method of secret sharing. Subsequently, he/she registers him/herself on that link.

Figure 12 shows the results of selecting those who have many connections with the community members. As shown in previous data, setting yields a low recall and only a few people can recover the private information. The experimental result shows an example of selecting good intermediate users. If a user selects 6 intermediate users who are tightly connected with the community members, even if he/she sets , the recall will be approximately 0.9. Conversely, if he/she chooses intermediate users randomly, the recall will only be 0.6. According to previous results, the precision increases with ; therefore, selecting good intermediate users yields better precisions and recalls.

5.5. Computation Cost of the Recovery Phase

In this section, we discuss the cost of the DPISP’s recovery phase. At first glance, it seems that a user must spend a large amount of time to reconstruct the secrets. In theory, a user must perform times of to construct all possible secrets, where is the number of tokens with different pairs.

To examine the efficiency of the DPISP, we analyzed the number of a user has to perform with respect to and the number of users who have sent secrets. Furthermore, we analyzed the time required by . Simulations were performed on a PC with a 4-core 3.2 GHz Intel CPU and 4 Gb of RAM. We implemented the secret-sharing functions using the C# SecretSharp library [38] on Microsoft Visual Studio 2010.

In our experiments, we simulated users simultaneously and sent their secrets with the settings in the ranges of to and to . In each case, a user can receive a maximum of tokens decentralized in bins, and each bin contains a maximum of tokens. Therefore, a user has to perform a maximum of secret sharing to recover all the secrets. Table 4 shows the results of the simulations, with . The size of the coefficients of a polynomial is 1024 bits. In other words, the maximum secret size is 128 bytes. As shown by the results, a user can perform times of per second. If a user sets , 448,000 possible combinations exist, which requires slightly more than one minute to recover all the secrets.

6. Discussion

In this section, we discuss the causes of the inaccuracy of the DPISP and a method to improve the efficiency of the DPISP recovery phase. Additionally, we discuss the method of sharing large data.

6.1. Improving DPISP Accuracy

We discuss possible reasons for the inaccuracy of the protocol in this section.

First, inaccuracy can be caused by users who do not publish their friendship connections. Many social networks allow users to set whether their information can be accessed by other people. If any of the intermediate users do not share their connections during the experiments, the share sent to him/her cannot be further transmitted to other users, thereby decreasing the probability of users related to the corresponding intermediate user recovering the token.

Next, inaccuracy may be caused by robots. Many “robots” exist on Plurk. These robots were developed to perform automated message broadcasting or to be an “oracle,” which allows users to ask questions and provide answers. Almost all users on Plurk have connections with the default account “plurk buddy” and many other robots. Therefore, although these robots are not in user-defined communities, they have a high chance of recovering the token.

This problem can be mitigated by creating a list of ignored users. When we execute the protocol, we can ignore users who are not normal users.

6.2. Sharing Large Data

Using Shamir’s secret sharing, the maximum size of a message is restricted by the coefficient size of a polynomial. For example, if the coefficient size is 1024 bits, the maximum message size is 128 bytes. If the size of a private information is larger than 128 bytes, a user has to partition the message into 128-byte blocks and a receiver has to spend more time recovering the private information.

The constant of the polynomial is insufficient for placing large data, that is, file, photographs, and so forth. Researchers have proposed several multi-secret-sharing schemes [3942]. For example, Yang et al. [39] proposed a scheme that shares secrets instead of one secret in a polynomial; however, the threshold of that polynomial is extremely high according to their experiments. The DPISP performs well only when is small. Hence, it is difficult to apply these algorithms unless a user selects a large .

Instead of sharing data directly, a user can encrypt data with a session key; furthermore, they can share the key and a path to the data with his/her friends using the DPISP.

7. Conclusion

In this paper, we present DPISP, an information sharing protocol used on social networks. On decentralized social networks or on SNS like Facebook, where users cannot directly access other people’s contextual information , our method provides a more realistic way to implement group communication functions using naturally organized communities. We also demonstrate that our method protects users’ link privacy. In addition, DPISP runs without using any key or passwords, so it adapts to changes of the networks. One does not have to redistribute keys to all of her friends when she adds or remove friends.

By tuning the parameters , an information can be sent to different subsets of community members. Our results show that among the users who can recover secrets, about to belong to the target communities; about to of a community can recover the secret correctly.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by national funding from Fundação para a Ciência e a Tecnologia (FCT) through the UID/EEA/50008/2019 Project and by the Brazilian National Council for Research and Development (CNPq) via Grant no. 309335/2017-5.