Matching user identities across social networks with limited profile data

Nurgaliev, Ildar; Qu, Qiang; Bamakan, Seyed Mojtaba Hosseini; Muzammal, Muhammad

doi:10.1007/s11704-019-8235-9

Matching user identities across social networks with limited profile data

Research Article
Published: 19 April 2020

Volume 14, article number 146809, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Ildar Nurgaliev^1,2,
Qiang Qu¹,
Seyed Mojtaba Hosseini Bamakan^1,3 &
…
Muhammad Muzammal^1,4

168 Accesses
6 Citations
Explore all metrics

Abstract

Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age, location, education, interests, and others. The task of matching user identities across different social networks is considered a challenging task. In this work, we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data, i.e., user-name and friendship. Thus, we propose a framework, ExpandUIL, that includes three standalone algorithms based on (i) the percolation graph matching in ExpandFullName algorithm, (ii) a supervised machine learning algorithm that works with the graph embedding, and (iii) a combination of the two, ExpandUserLinkage algorithm. The proposed framework as a set of algorithms is significant as, (i) it is based on the network topology and requires only name feature of the nodes, (ii) it requires a considerably low initial seed, as low as one initial seed suffices, (iii) it is iterative and scalable with applicability to online incoming stream graphs, and (iv) it has an experimental proof of stability over a real ground-truth dataset. Experiments on real datasets, Instagram and VK social networks, show upto 75% recall for linked accounts with 96% accuracy using only one given seed pair.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big Data Security and Privacy

The homophily principle in social network analysis: A survey

Article 18 January 2022

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

Article Open access 15 April 2024

References

Shu K, Wang S, Tang J, Zafarani R, Liu H. User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter, 2017, 18(2): 5–17
Article Google Scholar
Carmagnola F, Cena F. User identification for cross-system personalisation. Information Sciences, 2009, 179(1): 16–32
Article Google Scholar
Madden M, Lenhart A, Cortesi S, Gasser U, Duggan M, Smith A, Beaton M. Teens, social media, and privacy. Pew Research Center, 2013, 21: 2–86
Google Scholar
Goga O, Loiseau P, Sommer R, Teixeira R, Gummadi K P. On the reliability of profile matching across large online social networks. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 1799–1808
Korolova A, Motwani R, Nabar S U, Xu Y. Link privacy in social networks. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008, 289–298
Chiasserini C F, Garetto M, Leonardi E. Social network deanonymization under scale-free user relations. IEEE/ACM Transactions on Networking, 2016, 24(6): 3756–3769
Article Google Scholar
Narayanan A, Shmatikov V. De-anonymizing social networks. In: Proceedings of the 30th IEEE Symposium on Security and Privacy. 2009, 173–187
Sharad K. True friends let you down: benchmarking social graph anonymization schemes. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. 2016, 93–104
Chiasserini C F, Garetto M, Leonardi E. Social network deanonymization under scale-free user relations. IEEE/ACM Transactions on Networking, 2016, 24(6): 3756–3769
Article Google Scholar
Kazemi E, Hassani S H, Grossglauser M. Growing a graph matching from a handful of seeds. Proceedings of the VLDB Endowment, 2015, 8(10): 1010–1021
Article Google Scholar
Vosecky J, Hong D, Shen V Y. User identification across multiple social networks. In: Proceedings of the 1st International Conference on Networked Digital Technologies. 2009, 360–365
Vosoughi S, Zhou H, Roy D. Digital stylometry: linking profiles across social networks. In: Proceedings of International Conference on Social Informatics. 2015, 164–177
Hazimeh H, Mugellini E, Khaled O A, Cudré-Mauroux P. Social-matching++: a novel approach for interlinking user profiles on social networks. In: Proceedings of the 4th International Workshop on PROFILES Co-located with the 16th ISWC. 2017
Lorenzo Livi A R. The graph matching problem. Pattern Analysis and Applications, 2013, 16: 253–283
Article MathSciNet Google Scholar
Cook D J, Holder L B. Graph-based data mining. IEEE Intelligent Systems, 2000, 15(2): 32–41
Article Google Scholar
Singh R, Xu J, Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 2008, 105(35): 12763–12768
Article Google Scholar
Kazemi E, Grossglauser M. On the structure and efficient computation of isorank node similarities. Journal of CoRR, 2016, arXiv preprint arXiv:1602.00668
Al-Azizy D, Millard D E, Symeonidis I, O’Hara K, Shadbolt N. A literature survey and classifications on data deanonymisation. In: Proceedings of the 10th International Conference on Risks and Security of Internet and Systems. 2015, 36–51
Xie H, Gao K, Zhang Y, Li J, Ren H. Common visual pattern discovery via graph matching. In: Proceedings of the 19th International Conference on Multimedia. 2011, 1385–1388
Shaji A, Varol A, Torresani L, Fua P. Simultaneous point matching and 3D deformable surface reconstruction. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010, 1221–1228
Sanfeliu A, Fu K. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on System, Man, and Cybemetics, 1983, 13(3): 353–362
Article Google Scholar
Messmer B T, Bunke H. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transaction on Pattern Analysis & Machine Intelligence, 1998, 20(5): 493–504
Article Google Scholar
Bunke H, Allermann G. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1983, 1(4): 245–253
Article Google Scholar
Neuhaus M, Bunke H. An error-tolerant approximate matching algorithm for attributed planar graphs and its application to fingerprint classification. In: Proceedings of Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). 2004, 180–189
Levi G. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo, 1973, 9(4): 341
Article MathSciNet Google Scholar
Bunke H, Shearer K. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 1998, 19(3–4): 255–259
Article Google Scholar
Fernández M L, Valiente G. A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 2001, 22(6–7): 753–758
Article Google Scholar
Wallis W D, Shoubridge P, Kraetzl M, Ray D. Graph distances using graph union. Pattern Recognition Letters, 2001, 22(6–7): 701–704
Article Google Scholar
Bunke H. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 1997, 18(8): 689–694
Article Google Scholar
Neuhaus M, Bunke H. A probabilistic approach to learning costs for graph edit distance. In: Proceedings of the 17th International Conference on Pattern Recognition. 2004, 389–393
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 1977, 39(1): 1–22
MathSciNet MATH Google Scholar
Redner R A, Walker H F. Mixture densities, maximum likelihood and the em algorithm. SIAM Review, 1984, 26(2): 195–239
Article MathSciNet Google Scholar
Bianchi F M, Livi L, Rizzi A, Sadeghian A. A granular computing approach to the design of optimized graph classification systems. Soft Computing, 2014, 18(2): 393–412
Article Google Scholar
Bianchi F M, Livi L, Rizzi A. Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Analysis and Applications, 2016, 19(3): 745–763
Article MathSciNet Google Scholar
Lozano M A, Escolano F. Graph matching and clustering using kernel attributes. Neurocomputing, 2013, 113: 177–194
Article Google Scholar
Janson S, Łuczak T, Turova T, Vallier T. Bootstrap percolation on the random graph G_n,p. The Annals of Applied Probability, 2012, 22(5): 1989–2047
Article MathSciNet Google Scholar
Yartseva L, Grossglauser M. On the performance of percolation graph matching. Proceedings of the 1st ACM Conference on Online Social Networks. 2013, 119–130
Erdős P, Rényi A. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5(1): 17–60
MathSciNet MATH Google Scholar
Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393(6684): 440
Article Google Scholar
Barabási A L, Albert R. Emergence of scaling in random networks. Science, 1999, 286(5439): 509–512
Article MathSciNet Google Scholar
Chung F, Lu L. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics, 2002, 6(2): 125–145
Article MathSciNet Google Scholar
Sharad K. Change of guard: the next generation of social graph deanonymization attacks. In: Proceedings of ACM Workshop on Artificial Intelligence and Security. 2016, 105–116
Ho T K. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282

Download references

Acknowledgements

We are especially grateful to Sadegh Nobari for his fruitful comments and inspiration, and also to the anonymous FCS reviewers for their constructive feedback and helpful discussions.

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Ildar Nurgaliev, Qiang Qu, Seyed Mojtaba Hosseini Bamakan & Muhammad Muzammal
Sberbank, Moscow, 121170, Russia
Ildar Nurgaliev
Department of Management, Yazd University, Yazd, 89195-741, Iran
Seyed Mojtaba Hosseini Bamakan
Department of Computer Science, Bahria University, Islamabad, 44000, Pakistan
Muhammad Muzammal

Authors

Ildar Nurgaliev
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Qu
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mojtaba Hosseini Bamakan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Muzammal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiang Qu.

Additional information

Ildar Nurgaliev received the MSc degree in Data Science from the Innopolis University, Tatarstan, Russia in 2017. Currently, he is in R&D at Sberbank in the field of Natural Language Understanding and Knowledge graphs. Previously he was a research engineer at Huawei Moscow research center and Ozon.ru. He was a student at CERN OpenLab in 2016, Geneva, Switzerland. His current research interests include natural language understanding, image enhancement, and blockchain.

Qiang Qu is an associate professor at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (CAS), and the director of Guangdong Provincial R&D Center of Blockchain and Distributed IoT Security, China. He is a candidate for the CAS Pioneer Hundred Talents Program. He received an MSc degree in computer science from Peking University, China and a PhD degree from Aarhus University, Denmark. His current research interests are in dataintensive applications and systems, focusing on efficient and scalable algorithm design, blockchain, data sense-making, and mobility intelligence. His recent research has been published in leading journals and international conferences, including ACM SIGMOD, VLDB, AAAI, the IEEE transactions on Data Engineering, the IEEE Transactions on Intelligent Transportation Systems, and Information Sciences. He was a TPC member of several prestige conferences, and he chaired workshops in VLDB 2018, VLDB 2017, ICDM 2015, and APWEB-WAIM 2017 on mobility analysis.

Seyed Mojtaba Hosseini Bamakan is an assistant professor at the Department of Management, Yazd University, Iran, a postdoctoral researcher at Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), China. He received his PhD degree in Data Science from the University of Chinese Academy of Sciences (UCAS), China in 2017, and his master’s degree in IT management field from Allameh Tabataba’i University (ATU), Iran in 2009. His current research interests include business intelligence, data mining, and intelligent optimization techniques.

Muhammad Muzammal is an associate professor at the Department of Computer Science, Bahria University, Pakistan, a visiting associate professor under CAS President’s International Fellowship Initiative (PIFI) at the Centre of Big Mobile Intelligence, Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Shenzhen, and a Vice Director of Guangdong Provincial R&D Centre of Blockchain and Distributed IoT Security, Guangdong, China. He received a PhD degree from the University of Leicester, UK in 2012. Before, that he was a software analyst at LMKR. He received the master’s and bachelor’s degrees from FAST-NU, Pakistan, and IIUI, Pakistan, in 2007 and 2005, respectively. His research interests are in large scale data mining including algorithm design and mobility data mining. Recently, he is interested in blockchain technology with a focus on decentralized systems and mining.

Electronic supplementary material