ORCA: Outlier detection and Robust Clustering for Attributed graphs

Eswar, Srinivas; Kannan, Ramakrishnan; Vuduc, Richard; Park, Haesun

doi:10.1007/s10898-021-01024-z

ORCA: Outlier detection and Robust Clustering for Attributed graphs

Published: 03 May 2021

Volume 81, pages 967–989, (2021)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Srinivas Eswar ORCID: orcid.org/0000-0002-3418-7796¹,
Ramakrishnan Kannan²,
Richard Vuduc¹ &
…
Haesun Park¹

637 Accesses
1 Citation
Explore all metrics

Abstract

A framework is proposed to simultaneously cluster objects and detect anomalies in attributed graph data. Our objective function along with the carefully constructed constraints promotes interpretability of both the clustering and anomaly detection components, as well as scalability of our method. In addition, we developed an algorithm called Outlier detection and Robust Clustering for Attributed graphs (ORCA) within this framework. ORCA is fast and convergent under mild conditions, produces high quality clustering results, and discovers anomalies that can be mapped back naturally to the features of the input data. The efficacy and efficiency of ORCA is demonstrated on real world datasets against multiple state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Mining communities and their descriptions on attributed graphs: a survey

Article Open access 27 February 2021

Martin Atzmueller, Stephan Günnemann & Albrecht Zimmermann

SIAS-miner: mining subjectively interesting attributed subgraphs

Article Open access 22 November 2019

Anes Bendimerad, Ahmad Mel, … Tijl De Bie

Graph based anomaly detection and description: a survey

Article 05 July 2014

Leman Akoglu, Hanghang Tong & Danai Koutra

Notes

http://www.patentsview.org.
https://github.com/smallk/.
https://aminer.org/
https://www.cs.cmu.edu/~enron/
Our code and datasets are publicly available at https://gitlab.com/seswar3/orca.

References

Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Aggarwal, C.C.: An introduction to outlier analysis. In: Outlier Analysis, pp. 1–34. Springer (2017). https://doi.org/10.1007/978-3-319-47578-3_1
Akoglu, L., McGlohon, M., Faloutsos, C.: Oddball: spotting anomalies in weighted graphs. In: Advances in Knowledge Discovery and Data Mining, pp. 410–421 (2010). https://doi.org/10.1007/978-3-642-13672-6_40
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. 29(3), 626–688 (2015). https://doi.org/10.1007/s10618-014-0365-y
Article MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bouwmans, T., Sobral, A., Javed, S., Jung, S.K., Zahzah, E.H.: Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput. Sci. Rev. 23, 1–71 (2017). https://doi.org/10.1016/j.cosrev.2016.11.001
Article MATH Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011). https://doi.org/10.1145/1970392.1970395
Article MathSciNet MATH Google Scholar
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009). https://doi.org/10.1007/s10208-009-9045-5
Article MathSciNet MATH Google Scholar
Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. (CSUR) 38(1), 2-es (2006). https://doi.org/10.1145/1132952.1132954
Article Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006). https://doi.org/10.1145/1143844.1143874
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. ACM (2004). https://doi.org/10.1145/1014052.1014118
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1), 9–33 (2004). https://doi.org/10.1023/B:MACH.0000033113.59016.96
Article MATH Google Scholar
Du, R., Drake, B., Park, H.: Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J. Glob. Optim. 74, 861–877 (2017). https://doi.org/10.1007/s10898-017-0578-x
Article MathSciNet MATH Google Scholar
Du, R., Kuang, D., Drake, B., Park, H.: Hierarchical community detection via rank-2 symmetric nonnegative matrix factorization. Computat. Soc. Netw. 4(1), 7 (2017). https://doi.org/10.1186/s40649-017-0043-5
Article Google Scholar
Dunlavy, D.M., Kolda, T.G., Acar, E.: Temporal link prediction using matrix and tensor factorizations. ACM Trans. Knowl. Discov. Data (TKDD) 5(2), 1–27 (2011). https://doi.org/10.1145/1921632.1921636
Article Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010). https://doi.org/10.1016/j.physrep.2009.11.002
Article MathSciNet Google Scholar
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. NDSS 12, 1–16 (2012). https://doi.org/10.1109/ICDM.2011.124
Article Google Scholar
Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813–822. ACM (2010). https://doi.org/10.1145/1835804.1835907
Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663–671. ACM (2011). https://doi.org/10.1145/2020408.2020512
Huber, P.J.: Robust Statistics, vol. 523. Wiley, Hoboken (2004). https://doi.org/10.1002/9780470434697
Book Google Scholar
Kannan, R., Ballard, G., Park, H.: Mpi-faun: an mpi-based framework for alternating-updating nonnegative matrix factorization. IEEE Trans. Knowl. Data Eng. 30(3), 544–558 (2018). https://doi.org/10.1109/TKDE.2017.2767592
Article Google Scholar
Kannan, R., Woo, H., Aggarwal, C.C., Park, H.: Outlier detection for text data. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 489–497. SIAM (2017). https://doi.org/10.1137/1.9781611974973.55
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014). https://doi.org/10.1007/s10898-013-0035-4
Article MathSciNet MATH Google Scholar
Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011). https://doi.org/10.1137/110821172
Article MathSciNet MATH Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999). https://doi.org/10.1145/324133.324140
Article MathSciNet MATH Google Scholar
Kuang, D., Yun, S., Park, H.: Symnmf: nonnegative low-rank approximation of a similarity matrix for graph clustering. J. Glob. Optim. 62(3), 545–574 (2015). https://doi.org/10.1007/s10898-014-0247-2
Article MathSciNet MATH Google Scholar
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 333–341. ACM (2018). https://doi.org/10.1145/3159652.3159729
Lee, D.D., Seung, H.S.: (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791. https://doi.org/10.1038/44565
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001). https://doi.org/10.5555/3008751.3008829
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, pp. 695–704. ACM (2008). https://doi.org/10.1145/1367497.1367591
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010). https://doi.org/10.1145/1772690.1772755
Li, J., Dani, H., Hu, X., Liu, H.: Radar: Residual analysis for anomaly detection in attributed networks. In: IJCAI, pp. 2152–2158 (2017). https://doi.org/10.24963/ijcai.2017/299
Liu, N., Huang, X., Hu, X.: Accelerated local anomaly detection via resolving attributed networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (2017). https://doi.org/10.24963/ijcai.2017/325
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 496–503 (2003). https://doi.org/10.5555/3041838.3041901
Mahoney, M.W., Drineas, P.: Cur matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. 106(3), 697–702 (2009). https://doi.org/10.1073/pnas.0803205106
Article MathSciNet MATH Google Scholar
McCallum, A.K., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Inf. Retrieval 3(2), 127–163 (2000). https://doi.org/10.1023/A:1009953814988
Article Google Scholar
Muller, E., Sánchez, P.I., Mulle, Y., Bohm, K.: Ranking outlier nodes in subspaces of attributed graphs. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 216–222. IEEE (2013). https://doi.org/10.1109/ICDEW.2013.6547453
Peng, Z., Luo, M., Li, J., Liu, H., Zheng, Q.: Anomalous: a joint modeling approach for anomaly detection on attributed networks. In: IJCAI, pp. 3513–3519 (2018). https://doi.org/10.5555/3304222.3304256
Pfeiffer III, J.J., Moreno, S., La Fond, T., Neville, J., Gallagher, B.: Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 831–842. ACM (2014). https://doi.org/10.1145/2566486.2567993
Revelle, M., Domeniconi, C., Sweeney, M., Johri, A.: Finding community topics and membership in graphs. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 9285, pp. 625–640. Springer (2015). https://doi.org/10.1007/978-3-319-23525-7_38
She, Y., Owen, A.B.: Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011). https://doi.org/10.1198/jasa.2011.tm10390
Article MathSciNet MATH Google Scholar
Tong, H., Lin, C.Y.: Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 143–153. SIAM (2011). https://doi.org/10.1137/1.9781611972818.13
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Wang, G., Xie, S., Liu, B., Philip, S.Y.: Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 1242–1247. IEEE (2011)
Whang, J.J., Du, R., Jung, S., Lee, G., Drake, B., Liu, Q., Kang, S., Park, H.: Mega: multi-view semi-supervised clustering of hypergraphs. Proc. VLDB Endowment 13(5), 698–711 (2020). https://doi.org/10.14778/3377369.3377378
Article Google Scholar
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in Neural Information Processing Systems, pp. 2080–2088 (2009). https://doi.org/10.5555/2984093.2984326
Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. In: Advances in Neural Information Processing Systems, pp. 2496–2504 (2010). https://doi.org/10.5555/2997046.2997174
Yu, R., He, X., Liu, Y.: Glad: group anomaly detection in social media analysis. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 18 (2015). https://doi.org/10.1145/2811268
Article Google Scholar

Download references

Acknowledgements

This material is based in part upon work supported by the U.S. National Science Foundation (NSF) under Grant Nos. OAC-1642410, CCF-1533768, and OAC-1710371. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DOE.

Author information

Authors and Affiliations

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30308, USA
Srinivas Eswar, Richard Vuduc & Haesun Park
Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
Ramakrishnan Kannan

Authors

Srinivas Eswar
View author publications
You can also search for this author in PubMed Google Scholar
Ramakrishnan Kannan
View author publications
You can also search for this author in PubMed Google Scholar
Richard Vuduc
View author publications
You can also search for this author in PubMed Google Scholar
Haesun Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srinivas Eswar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eswar, S., Kannan, R., Vuduc, R. et al. ORCA: Outlier detection and Robust Clustering for Attributed graphs. J Glob Optim 81, 967–989 (2021). https://doi.org/10.1007/s10898-021-01024-z

Download citation

Received: 04 September 2020
Accepted: 03 April 2021
Published: 03 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10898-021-01024-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

ORCA: Outlier detection and Robust Clustering for Attributed graphs

Abstract

Access this article

Similar content being viewed by others

Mining communities and their descriptions on attributed graphs: a survey

SIAS-miner: mining subjectively interesting attributed subgraphs

Graph based anomaly detection and description: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ORCA: Outlier detection and Robust Clustering for Attributed graphs

Abstract

Access this article

Similar content being viewed by others

Mining communities and their descriptions on attributed graphs: a survey

SIAS-miner: mining subjectively interesting attributed subgraphs

Graph based anomaly detection and description: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation