Skip to main content
Log in

Deep-profiling: a deep neural network model for scholarly Web user profiling

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Scholarly big data refer to the rapidly growing scholarly source of information, including a large number of authors, papers, and massive scale scholarly networks. Extracting the profile attributes for Web users is an important step in Web user analysis. For Web scholarly users, profile attributes extraction should integrate multi-source and heterogeneous information resources. However, the traditional extraction models have two main drawbacks: (1) The traditional models require manual feature selection based on specific domain knowledge; (2) The traditional models cannot adapt to the diversities of Scholarly Web pages and cannot discover the relationships between different target entities which are far apart in different domains. To address these issues, we propose a profile attributes extraction model, PAE-NN, based on a Bi-LSTM-CRF neural network. This model can automatically extract the characteristics and contextual representations of each extracting entity through a Recurrent Neural Network with end-to-end training. It takes advantage of the long-memory sequence characteristics of LSTM network to effectively discover the long-term dependencies on extracting entities. Our experimental results on published datasets from the SMPCUP2017 Open Academic Competition and Aminer demonstrate that the proposed PAE-NN model outperforms existing models in terms of extraction precision, recall, and F1-score with large-scale training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Wu, Z., Wu, J., Khabsa, M., et al.: Towards building a scholarly big data platform: Challenges, lessons and opportunities. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 2014: 117–126.

  2. Xia, F., Wang, W., Bekele, T.M., et al.: Big scholarly data: A survey[J]. IEEE Trans. Big Data 3(1), 18–35 (2017)

    Article  Google Scholar 

  3. Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. arXiv preprint arXiv: 1703.09109, 2017.

  4. Zhang, L., Eichmann-Kalwara, N.: Mapping the scholarly literature found in scopus on “research data management”: a bibliometric and data visualization approach. J. Librariansh. Sch. Commun. (2019). https://doi.org/10.7710/2162-3309.2266

    Article  Google Scholar 

  5. Sriramoju, S.B.: Heat diffusion based search for experts on world wide web. Int. J. Sci. Res. 6(11), 632–635 (2017)

    Google Scholar 

  6. Tan, Z., Liu, C., Mao, Y., et al.: AceMap: a novel approach towards displaying relationship among academic literatures. Proceedings of the 25th international conference companion on world wide web. International World Wide Web Conferences Steering Committee, 2016: 437–442.

  7. Khabsa, M., Giles, C.L.: The number of scholarly documents on the public web. PLoS ONE 9(5), e93949 (2014)

    Article  Google Scholar 

  8. Tang, J.: AMiner: Toward understanding big scholar data. Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp. 467–467 (2016)

  9. Tang, J., Yao, L., Zhang, D., et al.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2 (2010)

    Article  Google Scholar 

  10. Farseev, A., Nie, L., Akbari, M. et al.: Harvesting multiple sources for user profile learning: a big data study. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, pp. 235–242 (2015)

  11. Zhang, Y., Tang, J., Yang, Z., et al.: Cosnet: Connecting heterogeneous social networks with local and global consistency. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1485–1494 (2015)

  12. Gu, X., Yang, H., Tang, J., et al.: Profiling web users using big data. Soc. Netw. Anal. Min. 8(1), 24 (2018)

    Article  Google Scholar 

  13. Zhang, W., Shu, K., Liu, H., et al.: Graph neural networks for user identity linkage. arXiv preprint arXiv: 1903.02174, (2019)

  14. Alonso, O., Sellam, T.: Quantitative information extraction from social data. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, pp.1005–1008. (2018)

  15. Allahyari, M., Pouriyeh, S., Assefi, M., et al.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv: 1707.02919, 2017.

  16. Zhang, J., Sun, H., Lu, Q, et al: Combining attributes and links: Finding homepage for entity searching. 2015 International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, (2015).

  17. Visuwasam, L., Raj, D.P.: NMA: integrating big data into a novel mobile application using knowledge extraction for big data analytics. Cluster Comput. (2018). https://doi.org/10.1007/s10586-018-2287-8

    Article  Google Scholar 

  18. Xu, H., Xu, H., Lei, L.: Phishing recognition technology based on fusion of multiple features classification and recognition algorithm. Appl. Res. Comput. 34(04), 1129–1132 (2017)

    Google Scholar 

  19. Wu, H., Cheng, S., Wang, Z., Zhang, S., Yuan, F.: Multi-task learning based on question–answering style reviews for aspect category classification and aspect term extraction on GPU clusters. Cluster Comput. 23(3), 1973–1986 (2020)

    Article  Google Scholar 

  20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents International conference on machine learning. pp.1188–1196 (2014)

  21. Kirillov, A., Schlesinger, D., Forkel, W., et al.: A generic CNN-CRF model for semantic segmentation. arXiv preprint arXiv: 1511.05067, (2015).

  22. Kirillov, A., Schlesinger, D., Zheng, S., et al.: Joint training of generic CNN-CRF models with stochastic optimization. Asian Conference on Computer Vision. Springer, Cham, pp. 221–236 2016

  23. Colovic, A., Knöbelreiter, P., Shekhovtsov, A. et al.: End-to-end training of hybrid CNN-CRF models for semantic segmentation using structured learning. Computer Vision Winter Workshop. (2017).

  24. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv: 1508.01991, (2015)

  25. Zhang, H., Dayong, W. U., Liu, Y., Cheng, X., Yantai, I.: Chinese named entity recognition based on deep neural network. J. Chinese Information Processing. (2017)

  26. Liu, L., Shang, J., Ren, X., et al.: Empower sequence labeling with task-aware neural language model. Thirty-Second AAAI Conference on Artificial Intelligence. (2018)

  27. Rei, M., Crichton, G. K. O., Pyysalo, S.: Attending to characters in neural sequence labeling models. arXiv preprint arXiv: 1611.04361, (2016)

  28. Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794 2016:.

  29. Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. arXiv preprint arXiv: 1808.03314, (2018)

  30. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781, (2013)

  31. SMPCUP2017 Open Academic Competition Dataset of Task1 [Data File]. http://www.biendata.xyz/competition/scholar/data. Accessed 2017.

  32. AMiner.: AMiner Research Profiling DataSet [Data file]. https://www.aminer.cn/data. Accessed 2007.

  33. Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: A system for large-scale machine learning. 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). pp. 265–283 (2016)

  34. Levi, M., Hazan, I.: User profiling using sequential mining over web elements. IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp 1–6 (2019)

  35. Jianqiao Hu, Feng Jin, Guigang Zhang, et al. A User Profile Modeling Method Based on Word2Vec. IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 410–414 (2017)

  36. Li, J., Ritter, A. Hovy, E.: Weakly Supervised User Profile Extraction from Twitter. 52nd Annual Meeting of the Association for Computational Linguistics, pp. 165–174 (2014)

  37. Pellet, H., Shiaeles, S., Stavrou, S.: Localising social network users and profiling their movement. Comput. Secur. 81, 49–57 (2019)

    Article  Google Scholar 

  38. Paszke, A., Gross, S., Massa, F. et al.: PyTorch: An Imperative Style, High-Performance Deep Learning Libaray. Neural Information Processing Systems, (2019).

  39. Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. (2016)

  40. Huang, X., Tan, H., Lin, G. et al.: A LSTM-based bidirectional translation model for optimizing rare words and terminologies 018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 185–189 (2018)

  41. Xiao, Q., Chang, X., Zhang, X., et al.: Multi-information spatial-temporal LSTM fusion continuous sign language neural machine translation. IEEE Access 8, 216718–216728 (2020)

    Article  Google Scholar 

  42. Tiwari, G., Sharma, A., Sahotra, A. et al.: English-Hindi Neural Machine Translation-LSTM Seq2Seq and ConvS2S. International Conference on Communication and Signal Processing (ICCSP), pp. 871–875 (2020)

  43. Hossain, M.N. Bhuiyan, R., Tumpa, Z.N. et al.: Sentiment Analysis of Restaurant Reviews using Combined CNN-LSTM. International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5 (2020)

  44. Monika, R., Deivalakshmi, S., Janet, B.: Sentiment Analysis of US Airlines Tweets Using LSTM/RNN. International Conference on Advanced Computing (IACC), pp. 92–95 (2019)

  45. Li, Y., Lu, Y.: LSTM-BA: DDos Detection Approach Combing LSTM and Bayes. International Conference on Advanced Cloud and Big Data (CBD), pp. 180–185 (2019)

  46. Sunny, M.A.I., Maswood, M.M.S., Alharbi, A. G.: Deep Learning-based Stock Pricing Prediction Using LSTM and Bi-directional LSTM Model. Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 87–92 (2020)

  47. Chan, C. C. K., Kumar, V. Delaney, S., et al.: Combating deepfakes: Multi-LSTM and Blockchain as Proof of Authenticity for Digital Media. IEEE/ITU International Conference on Artificial Intelligence for Good (AI4G), pp. 55–62 (2020)

  48. Xia, X. A., Yu, F. A., Hai, J. A., et al.: A novel text mining approach for scholar information extraction from web content in Chinese. Future Generation Computer Systems, (2020)

  49. Wielfrid, M. M., Iza, M. S., Tra, G. B.: Information extraction model to improve learning game metadata indexing. Ingénierie des Systèmes D Information, (2020)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (62072187, 61872084), Guangzhou Science and Technology Program key projects (202007040002) and Guangdong Major Project of Basic and Applied Basic Research(2019B030302002). James Z. Wang’s work is partially supported by NSF DBI grant #1759856 and NIH grant #2R01HD069374-06A1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiwei Lin.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, W., Xu, H., Li, J. et al. Deep-profiling: a deep neural network model for scholarly Web user profiling. Cluster Comput 26, 1753–1766 (2023). https://doi.org/10.1007/s10586-021-03315-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03315-2

Keywords

Navigation