Skip to main content
Log in

Leveraging citation influences for Modeling scientific documents

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

This paper studies a link-text algorithm to model scientific documents by citation influences, which is applied to document clustering and influence prediction. Most existing link-text algorithms ignore the different weights of citation influences that cited documents have on the corresponding citing document. In fact, citation influences reveal the latent structure of citation networks which is more accurate to describe the knowledge flow than the original citation structure. In this study, a citation influence is modeled as a weight of linear combination that approximates the text of a document by the content of its citations. Then, we present a novel matrix factorization algorithm, called Citation-Influences-Text Nonnegative Matrix Factorization (CIT-NMF), which incorporates text and citations to obtain better document representations by learning influence weights. In addition, an efficient optimization method is derived to solve the optimization problem. Experimental results on several real datasets show satisfactory improvements over the baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://people.cs.umass.edu/mccallum/data.html

  2. https://linqs.soe.ucsc.edu/data

  3. https://www.aminer.cn/heterinf

References

  1. Asatani, K., Mori, J., Ochi, M., Sakata, I.: Detecting trends in academic research from a citation network using network representation learning. PLoS One. 13(5), 197–220 (2018)

    Article  Google Scholar 

  2. Barbieri, N., Bonchi, F., Manco, G.: Topic-aware social influence propagation models. Knowl. Inf. Syst. 37(3), 555–584 (2013)

    Article  Google Scholar 

  3. Bonzi, S., Snyder, H.: Motivations for citation: a comparison of self citation and citation to others. Scientometrics. 21(2), 245–254 (1991)

    Article  Google Scholar 

  4. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)

    Article  Google Scholar 

  5. Chang, J., Blei, D.: Relational topic models for document networks. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009, pp. 81–88. Clearwater Beach, Florida, USA (2009)

  6. Chen, N., Zhu, J., Xia, F., Zhang, B.: Discriminative relational topic models. IEEE Trans. Pattern Anal. Mach. Intell. 37(5), 973–986 (2015)

    Article  Google Scholar 

  7. Cohn, D.A., Hofmann, T.: The missing link-a probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems 14, NIPS 2001, pp. 430–436. Vancouver, British Columbia, Canada (2001)

  8. Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 233–240. Corvallis, Oregon, USA (2007)

  9. Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, pp. 606–610. New Orleans, Louisiana, USA (2005)

  10. Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004)

    Article  Google Scholar 

  11. Ganguly, S., Pudi, V.: Paper2vec: combining graph and text information for scientific paper representation. In: European Conference on Information Retrieval, ECIR 2017, pp. 383–395. Aberdeen, Scotland (2017)

  12. Gao, J., Zhang, J.: Clustered svd strategies in latent semantic indexing. Inf. Process. Manag. 41(5), 1051–1063 (2005)

    Article  Google Scholar 

  13. Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2014, pp. 1629–1635. Quebec City, Quebec, Canada (2014)

  14. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM, San Francisco (2016)

    Google Scholar 

  15. Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, NNSP 2002, pp. 557–565. Martigny, Switzerland, Switzerland (2002)

  16. Hu, C., Cao, H.: Aspect-level influence discovery from graphs. IEEE Trans. Knowl. Data Eng. 28(7), 1635–1649 (2016)

    Article  Google Scholar 

  17. Hu, C., Cao, H., Ke, C.: Detecting influence relationships from graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SDM 2014, pp. 821–829. Philadelphia, Pennsylvania, USA (2014)

  18. Huang, S., Kang, Z., Xu, Z.: Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recogn. 97, 1070–1085 (2020)

    Article  Google Scholar 

  19. Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1910–1914. Maui, HI, USA (2012)

  20. Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014)

    Article  MathSciNet  Google Scholar 

  21. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature. 401(6755), 788–799 (1999)

    Article  Google Scholar 

  22. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, NIPS 2001, pp. 556–562. Vancouver, British Columbia, Canada (2001)

  23. Li, C.T., Huang, M.Y., Yan, R.: Team formation with influence maximization for influential event organization on social networks. World Wide Web. 1–21 (2017)

  24. Li, W., Yeung, D.: Relation regularized matrix factorization. In: Twenty-First International Joint Conference on Artificial Intelligence, IJCAI 2009, pp. 1126–1131. Pasadena, California, USA (2009)

  25. Li, Y., Chen, W., Wang, Y., Zhang, Z.: Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 657–666. Rome, Italy (2013)

  26. Lim, K.W., Buntine, W.: Bibliographic analysis with the citation network topic model. In: The 6th Asian Conference on Machine Learning, ACML 2014, pp. 142–158. Nha Trang City, Vietnam (2014)

  27. Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013, pp. 252–260. Austin, Texas, USA (2013)

  28. Liu, L., Tang, J., Han, J., Jiang, M., Yang, S.: Mining topic-level influence in heterogeneous networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 199–208. Toronto, Ontario, Canada (2010)

  29. Liu, L., Tang, J., Han, J., Yang, S.: Learning influence from heterogeneous social networks. Data Min. Knowl. Disc. 25(3), 511–544 (2012)

    Article  MathSciNet  Google Scholar 

  30. Liu, Y., Cao, H., Hao, Y., Han, P., Zeng, X.: Discovering context-aware influential objects. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM 2012, pp. 780–791. Anaheim, California, USA (2012)

  31. McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., Biran, O., Bothe, S., Collins, M., Fleischmann, K.R., et al.: Predicting the impact of scientific concepts using full-text features. J. Assoc. Inf. Sci. Technol. 67(11), 2684–2696 (2016)

    Article  Google Scholar 

  32. Nallapati, R., Cohen, W.W.: Link-plsa-lda: a new unsupervised model for topics and influence of blogs. In: International Conference on Weblogs and Social Media 2008, ICWSM 2008, pp. 84–92. Hilton Seattle Downtown, Seattle, Washington, USA (2008)

  33. Nallapati, R., McFarland, D., Manning, C.: Topicflow model: unsupervised learning of topic-specific influences of hyperlinked documents. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, pp. 543–551. Ft. Lauderdale, FL, USA (2011)

  34. Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 542–550. ACM, Las Vegas (2008)

    Google Scholar 

  35. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  36. Shen, J., Song, Z., Li, S., Tan, Z., Mao, Y., Fu, L., Song, L., Wang, X.: Modeling topic-level academic influence in scientific literatures. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 711–717. Phoenix, Arizona, USA (2016)

  37. Shi, C., Zhang, Z., Ji, Y., Wang, W., Philip, S.Y., Shi, Z.: Semrec: a personalized semantic recommendation method based on weighted heterogeneous information networks. World Wide Web. 22(1), 153–184 (2019)

    Article  Google Scholar 

  38. Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 650–658. Las Vegas, Nevada, USA (2008)

  39. Takeuchi, K., Ishiguro, K., Kimura, A., Sawada, H.: Non-negative multiple matrix factorization. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1713–1720. Beijing, China (2013)

  40. Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2009, pp. 807–816. Las Vegas, Nevada, USA (2009)

  41. Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 203–209. San Francisco, California, USA (2017)

  42. Zhang, C., Fu, H., Hu, Q., Cao, X., Xie, Y., Tao, D., Xu, D.: Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 170–191 (2018)

  43. Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2921–2927. San Francisco, California, USA (2017)

  44. Zhu, X., Turney, P., Lemire, D., Vellino, A.: Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol. 66(2), 408–427 (2015)

    Article  Google Scholar 

  45. Zhu, Y., Yan, X., Getoor, L., Moore, C.: Scalable text and link analysis with mixed-topic link models. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2013, pp. 473–481. Chicago, IL, USA (2013)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61672128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, Y., Liu, Y., Xu, X. et al. Leveraging citation influences for Modeling scientific documents. World Wide Web 23, 2281–2302 (2020). https://doi.org/10.1007/s11280-020-00796-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00796-w

Keywords

Navigation