Leveraging citation influences for Modeling scientific documents

Qian, Yue; Liu, Yu; Xu, Xiujuan; Sheng, Quan Z.

doi:10.1007/s11280-020-00796-w

Leveraging citation influences for Modeling scientific documents

Published: 11 March 2020

Volume 23, pages 2281–2302, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yue Qian ORCID: orcid.org/0000-0002-4020-0766¹,
Yu Liu¹,
Xiujuan Xu¹ &
…
Quan Z. Sheng²

287 Accesses
4 Citations
Explore all metrics

Abstract

This paper studies a link-text algorithm to model scientific documents by citation influences, which is applied to document clustering and influence prediction. Most existing link-text algorithms ignore the different weights of citation influences that cited documents have on the corresponding citing document. In fact, citation influences reveal the latent structure of citation networks which is more accurate to describe the knowledge flow than the original citation structure. In this study, a citation influence is modeled as a weight of linear combination that approximates the text of a document by the content of its citations. Then, we present a novel matrix factorization algorithm, called Citation-Influences-Text Nonnegative Matrix Factorization (CIT-NMF), which incorporates text and citations to obtain better document representations by learning influence weights. In addition, an efficient optimization method is derived to solve the optimization problem. Experimental results on several real datasets show satisfactory improvements over the baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network

Article 02 June 2017

Tao Dai, Li Zhu, … Sheng Yuan

Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network

Article Open access 23 June 2021

Fang Zhang & Shengli Wu

Analysis of the relationships among paper citation and its influencing factors: a Bayesian network-based approach

Article 13 April 2023

Mingyue Sun, Tingcan Ma, … Mingliang Yue

Notes

References

Asatani, K., Mori, J., Ochi, M., Sakata, I.: Detecting trends in academic research from a citation network using network representation learning. PLoS One. 13(5), 197–220 (2018)
Article Google Scholar
Barbieri, N., Bonchi, F., Manco, G.: Topic-aware social influence propagation models. Knowl. Inf. Syst. 37(3), 555–584 (2013)
Article Google Scholar
Bonzi, S., Snyder, H.: Motivations for citation: a comparison of self citation and citation to others. Scientometrics. 21(2), 245–254 (1991)
Article Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)
Article Google Scholar
Chang, J., Blei, D.: Relational topic models for document networks. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009, pp. 81–88. Clearwater Beach, Florida, USA (2009)
Chen, N., Zhu, J., Xia, F., Zhang, B.: Discriminative relational topic models. IEEE Trans. Pattern Anal. Mach. Intell. 37(5), 973–986 (2015)
Article Google Scholar
Cohn, D.A., Hofmann, T.: The missing link-a probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems 14, NIPS 2001, pp. 430–436. Vancouver, British Columbia, Canada (2001)
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 233–240. Corvallis, Oregon, USA (2007)
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, pp. 606–610. New Orleans, Louisiana, USA (2005)
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004)
Article Google Scholar
Ganguly, S., Pudi, V.: Paper2vec: combining graph and text information for scientific paper representation. In: European Conference on Information Retrieval, ECIR 2017, pp. 383–395. Aberdeen, Scotland (2017)
Gao, J., Zhang, J.: Clustered svd strategies in latent semantic indexing. Inf. Process. Manag. 41(5), 1051–1063 (2005)
Article Google Scholar
Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2014, pp. 1629–1635. Quebec City, Quebec, Canada (2014)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM, San Francisco (2016)
Google Scholar
Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, NNSP 2002, pp. 557–565. Martigny, Switzerland, Switzerland (2002)
Hu, C., Cao, H.: Aspect-level influence discovery from graphs. IEEE Trans. Knowl. Data Eng. 28(7), 1635–1649 (2016)
Article Google Scholar
Hu, C., Cao, H., Ke, C.: Detecting influence relationships from graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SDM 2014, pp. 821–829. Philadelphia, Pennsylvania, USA (2014)
Huang, S., Kang, Z., Xu, Z.: Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recogn. 97, 1070–1085 (2020)
Article Google Scholar
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1910–1914. Maui, HI, USA (2012)
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014)
Article MathSciNet Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature. 401(6755), 788–799 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, NIPS 2001, pp. 556–562. Vancouver, British Columbia, Canada (2001)
Li, C.T., Huang, M.Y., Yan, R.: Team formation with influence maximization for influential event organization on social networks. World Wide Web. 1–21 (2017)
Li, W., Yeung, D.: Relation regularized matrix factorization. In: Twenty-First International Joint Conference on Artificial Intelligence, IJCAI 2009, pp. 1126–1131. Pasadena, California, USA (2009)
Li, Y., Chen, W., Wang, Y., Zhang, Z.: Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 657–666. Rome, Italy (2013)
Lim, K.W., Buntine, W.: Bibliographic analysis with the citation network topic model. In: The 6th Asian Conference on Machine Learning, ACML 2014, pp. 142–158. Nha Trang City, Vietnam (2014)
Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013, pp. 252–260. Austin, Texas, USA (2013)
Liu, L., Tang, J., Han, J., Jiang, M., Yang, S.: Mining topic-level influence in heterogeneous networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 199–208. Toronto, Ontario, Canada (2010)
Liu, L., Tang, J., Han, J., Yang, S.: Learning influence from heterogeneous social networks. Data Min. Knowl. Disc. 25(3), 511–544 (2012)
Article MathSciNet Google Scholar
Liu, Y., Cao, H., Hao, Y., Han, P., Zeng, X.: Discovering context-aware influential objects. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM 2012, pp. 780–791. Anaheim, California, USA (2012)
McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., Biran, O., Bothe, S., Collins, M., Fleischmann, K.R., et al.: Predicting the impact of scientific concepts using full-text features. J. Assoc. Inf. Sci. Technol. 67(11), 2684–2696 (2016)
Article Google Scholar
Nallapati, R., Cohen, W.W.: Link-plsa-lda: a new unsupervised model for topics and influence of blogs. In: International Conference on Weblogs and Social Media 2008, ICWSM 2008, pp. 84–92. Hilton Seattle Downtown, Seattle, Washington, USA (2008)
Nallapati, R., McFarland, D., Manning, C.: Topicflow model: unsupervised learning of topic-specific influences of hyperlinked documents. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, pp. 543–551. Ft. Lauderdale, FL, USA (2011)
Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 542–550. ACM, Las Vegas (2008)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Shen, J., Song, Z., Li, S., Tan, Z., Mao, Y., Fu, L., Song, L., Wang, X.: Modeling topic-level academic influence in scientific literatures. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 711–717. Phoenix, Arizona, USA (2016)
Shi, C., Zhang, Z., Ji, Y., Wang, W., Philip, S.Y., Shi, Z.: Semrec: a personalized semantic recommendation method based on weighted heterogeneous information networks. World Wide Web. 22(1), 153–184 (2019)
Article Google Scholar
Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 650–658. Las Vegas, Nevada, USA (2008)
Takeuchi, K., Ishiguro, K., Kimura, A., Sawada, H.: Non-negative multiple matrix factorization. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1713–1720. Beijing, China (2013)
Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2009, pp. 807–816. Las Vegas, Nevada, USA (2009)
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 203–209. San Francisco, California, USA (2017)
Zhang, C., Fu, H., Hu, Q., Cao, X., Xie, Y., Tao, D., Xu, D.: Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 170–191 (2018)
Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2921–2927. San Francisco, California, USA (2017)
Zhu, X., Turney, P., Lemire, D., Vellino, A.: Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol. 66(2), 408–427 (2015)
Article Google Scholar
Zhu, Y., Yan, X., Getoor, L., Moore, C.: Scalable text and link analysis with mixed-topic link models. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2013, pp. 473–481. Chicago, IL, USA (2013)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61672128).

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, China
Yue Qian, Yu Liu & Xiujuan Xu
Department of Computing, Macquarie University, Macquarie Park, NSW, 2109, Australia
Quan Z. Sheng

Authors

Yue Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiujuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, Y., Liu, Y., Xu, X. et al. Leveraging citation influences for Modeling scientific documents. World Wide Web 23, 2281–2302 (2020). https://doi.org/10.1007/s11280-020-00796-w

Download citation

Received: 29 March 2019
Revised: 04 January 2020
Accepted: 04 February 2020
Published: 11 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11280-020-00796-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Leveraging citation influences for Modeling scientific documents

Abstract

Access this article

Similar content being viewed by others

Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network

Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network

Analysis of the relationships among paper citation and its influencing factors: a Bayesian network-based approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging citation influences for Modeling scientific documents

Abstract

Access this article

Similar content being viewed by others

Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network

Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network

Analysis of the relationships among paper citation and its influencing factors: a Bayesian network-based approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation