Deep-profiling: a deep neural network model for scholarly Web user profiling

Lin, Weiwei; Xu, Haojun; Li, Jianzhuo; Wu, Ziming; Hu, Zhengyang; Chang, Victor; Wang, James Z.

doi:10.1007/s10586-021-03315-2

Deep-profiling: a deep neural network model for scholarly Web user profiling

Published: 09 June 2021

Volume 26, pages 1753–1766, (2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Weiwei Lin ORCID: orcid.org/0000-0001-6876-1795^1,2,
Haojun Xu²,
Jianzhuo Li¹,
Ziming Wu¹,
Zhengyang Hu¹,
Victor Chang³ &
…
James Z. Wang⁴

610 Accesses
3 Citations
Explore all metrics

Abstract

Scholarly big data refer to the rapidly growing scholarly source of information, including a large number of authors, papers, and massive scale scholarly networks. Extracting the profile attributes for Web users is an important step in Web user analysis. For Web scholarly users, profile attributes extraction should integrate multi-source and heterogeneous information resources. However, the traditional extraction models have two main drawbacks: (1) The traditional models require manual feature selection based on specific domain knowledge; (2) The traditional models cannot adapt to the diversities of Scholarly Web pages and cannot discover the relationships between different target entities which are far apart in different domains. To address these issues, we propose a profile attributes extraction model, PAE-NN, based on a Bi-LSTM-CRF neural network. This model can automatically extract the characteristics and contextual representations of each extracting entity through a Recurrent Neural Network with end-to-end training. It takes advantage of the long-memory sequence characteristics of LSTM network to effectively discover the long-term dependencies on extracting entities. Our experimental results on published datasets from the SMPCUP2017 Open Academic Competition and Aminer demonstrate that the proposed PAE-NN model outperforms existing models in terms of extraction precision, recall, and F1-score with large-scale training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 3

Multi-strategies Integrated Information Extraction for Scholar Profiling Task

Enhancing identification of structure function of academic articles using contextual information

Article 20 January 2022

Bowen Ma, Chengzhi Zhang, … Sanhong Deng

A joint deep model of entities and documents for cumulative citation recommendation

Article 17 October 2017

Lerong Ma, Dandan Song, … Yao Ni

References

Wu, Z., Wu, J., Khabsa, M., et al.: Towards building a scholarly big data platform: Challenges, lessons and opportunities. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 2014: 117–126.
Xia, F., Wang, W., Bekele, T.M., et al.: Big scholarly data: A survey[J]. IEEE Trans. Big Data 3(1), 18–35 (2017)
Article Google Scholar
Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. arXiv preprint arXiv: 1703.09109, 2017.
Zhang, L., Eichmann-Kalwara, N.: Mapping the scholarly literature found in scopus on “research data management”: a bibliometric and data visualization approach. J. Librariansh. Sch. Commun. (2019). https://doi.org/10.7710/2162-3309.2266
Article Google Scholar
Sriramoju, S.B.: Heat diffusion based search for experts on world wide web. Int. J. Sci. Res. 6(11), 632–635 (2017)
Google Scholar
Tan, Z., Liu, C., Mao, Y., et al.: AceMap: a novel approach towards displaying relationship among academic literatures. Proceedings of the 25th international conference companion on world wide web. International World Wide Web Conferences Steering Committee, 2016: 437–442.
Khabsa, M., Giles, C.L.: The number of scholarly documents on the public web. PLoS ONE 9(5), e93949 (2014)
Article Google Scholar
Tang, J.: AMiner: Toward understanding big scholar data. Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp. 467–467 (2016)
Tang, J., Yao, L., Zhang, D., et al.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2 (2010)
Article Google Scholar
Farseev, A., Nie, L., Akbari, M. et al.: Harvesting multiple sources for user profile learning: a big data study. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, pp. 235–242 (2015)
Zhang, Y., Tang, J., Yang, Z., et al.: Cosnet: Connecting heterogeneous social networks with local and global consistency. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1485–1494 (2015)
Gu, X., Yang, H., Tang, J., et al.: Profiling web users using big data. Soc. Netw. Anal. Min. 8(1), 24 (2018)
Article Google Scholar
Zhang, W., Shu, K., Liu, H., et al.: Graph neural networks for user identity linkage. arXiv preprint arXiv: 1903.02174, (2019)
Alonso, O., Sellam, T.: Quantitative information extraction from social data. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, pp.1005–1008. (2018)
Allahyari, M., Pouriyeh, S., Assefi, M., et al.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv: 1707.02919, 2017.
Zhang, J., Sun, H., Lu, Q, et al: Combining attributes and links: Finding homepage for entity searching. 2015 International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, (2015).
Visuwasam, L., Raj, D.P.: NMA: integrating big data into a novel mobile application using knowledge extraction for big data analytics. Cluster Comput. (2018). https://doi.org/10.1007/s10586-018-2287-8
Article Google Scholar
Xu, H., Xu, H., Lei, L.: Phishing recognition technology based on fusion of multiple features classification and recognition algorithm. Appl. Res. Comput. 34(04), 1129–1132 (2017)
Google Scholar
Wu, H., Cheng, S., Wang, Z., Zhang, S., Yuan, F.: Multi-task learning based on question–answering style reviews for aspect category classification and aspect term extraction on GPU clusters. Cluster Comput. 23(3), 1973–1986 (2020)
Article Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents International conference on machine learning. pp.1188–1196 (2014)
Kirillov, A., Schlesinger, D., Forkel, W., et al.: A generic CNN-CRF model for semantic segmentation. arXiv preprint arXiv: 1511.05067, (2015).
Kirillov, A., Schlesinger, D., Zheng, S., et al.: Joint training of generic CNN-CRF models with stochastic optimization. Asian Conference on Computer Vision. Springer, Cham, pp. 221–236 2016
Colovic, A., Knöbelreiter, P., Shekhovtsov, A. et al.: End-to-end training of hybrid CNN-CRF models for semantic segmentation using structured learning. Computer Vision Winter Workshop. (2017).
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv: 1508.01991, (2015)
Zhang, H., Dayong, W. U., Liu, Y., Cheng, X., Yantai, I.: Chinese named entity recognition based on deep neural network. J. Chinese Information Processing. (2017)
Liu, L., Shang, J., Ren, X., et al.: Empower sequence labeling with task-aware neural language model. Thirty-Second AAAI Conference on Artificial Intelligence. (2018)
Rei, M., Crichton, G. K. O., Pyysalo, S.: Attending to characters in neural sequence labeling models. arXiv preprint arXiv: 1611.04361, (2016)
Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794 2016:.
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. arXiv preprint arXiv: 1808.03314, (2018)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781, (2013)
SMPCUP2017 Open Academic Competition Dataset of Task1 [Data File]. http://www.biendata.xyz/competition/scholar/data. Accessed 2017.
AMiner.: AMiner Research Profiling DataSet [Data file]. https://www.aminer.cn/data. Accessed 2007.
Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: A system for large-scale machine learning. 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). pp. 265–283 (2016)
Levi, M., Hazan, I.: User profiling using sequential mining over web elements. IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp 1–6 (2019)
Jianqiao Hu, Feng Jin, Guigang Zhang, et al. A User Profile Modeling Method Based on Word2Vec. IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 410–414 (2017)
Li, J., Ritter, A. Hovy, E.: Weakly Supervised User Profile Extraction from Twitter. 52nd Annual Meeting of the Association for Computational Linguistics, pp. 165–174 (2014)
Pellet, H., Shiaeles, S., Stavrou, S.: Localising social network users and profiling their movement. Comput. Secur. 81, 49–57 (2019)
Article Google Scholar
Paszke, A., Gross, S., Massa, F. et al.: PyTorch: An Imperative Style, High-Performance Deep Learning Libaray. Neural Information Processing Systems, (2019).
Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. (2016)
Huang, X., Tan, H., Lin, G. et al.: A LSTM-based bidirectional translation model for optimizing rare words and terminologies 018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 185–189 (2018)
Xiao, Q., Chang, X., Zhang, X., et al.: Multi-information spatial-temporal LSTM fusion continuous sign language neural machine translation. IEEE Access 8, 216718–216728 (2020)
Article Google Scholar
Tiwari, G., Sharma, A., Sahotra, A. et al.: English-Hindi Neural Machine Translation-LSTM Seq2Seq and ConvS2S. International Conference on Communication and Signal Processing (ICCSP), pp. 871–875 (2020)
Hossain, M.N. Bhuiyan, R., Tumpa, Z.N. et al.: Sentiment Analysis of Restaurant Reviews using Combined CNN-LSTM. International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5 (2020)
Monika, R., Deivalakshmi, S., Janet, B.: Sentiment Analysis of US Airlines Tweets Using LSTM/RNN. International Conference on Advanced Computing (IACC), pp. 92–95 (2019)
Li, Y., Lu, Y.: LSTM-BA: DDos Detection Approach Combing LSTM and Bayes. International Conference on Advanced Cloud and Big Data (CBD), pp. 180–185 (2019)
Sunny, M.A.I., Maswood, M.M.S., Alharbi, A. G.: Deep Learning-based Stock Pricing Prediction Using LSTM and Bi-directional LSTM Model. Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 87–92 (2020)
Chan, C. C. K., Kumar, V. Delaney, S., et al.: Combating deepfakes: Multi-LSTM and Blockchain as Proof of Authenticity for Digital Media. IEEE/ITU International Conference on Artificial Intelligence for Good (AI4G), pp. 55–62 (2020)
Xia, X. A., Yu, F. A., Hai, J. A., et al.: A novel text mining approach for scholar information extraction from web content in Chinese. Future Generation Computer Systems, (2020)
Wielfrid, M. M., Iza, M. S., Tra, G. B.: Information extraction model to improve learning game metadata indexing. Ingénierie des Systèmes D Information, (2020)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (62072187, 61872084), Guangzhou Science and Technology Program key projects (202007040002) and Guangdong Major Project of Basic and Applied Basic Research(2019B030302002). James Z. Wang’s work is partially supported by NSF DBI grant #1759856 and NIH grant #2R01HD069374-06A1.

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Weiwei Lin, Jianzhuo Li, Ziming Wu & Zhengyang Hu
School of Software Engineering, South China University of Technology, Guangzhou, China
Weiwei Lin & Haojun Xu
School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
Victor Chang
School of Computing, Clemson University, Clemson, SC, USA
James Z. Wang

Authors

Weiwei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haojun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Victor Chang
View author publications
You can also search for this author in PubMed Google Scholar
James Z. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiwei Lin.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, W., Xu, H., Li, J. et al. Deep-profiling: a deep neural network model for scholarly Web user profiling. Cluster Comput 26, 1753–1766 (2023). https://doi.org/10.1007/s10586-021-03315-2

Download citation

Received: 23 April 2020
Revised: 23 May 2021
Accepted: 27 May 2021
Published: 09 June 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10586-021-03315-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep-profiling: a deep neural network model for scholarly Web user profiling

Abstract

Access this article

Similar content being viewed by others

Multi-strategies Integrated Information Extraction for Scholar Profiling Task

Enhancing identification of structure function of academic articles using contextual information

A joint deep model of entities and documents for cumulative citation recommendation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep-profiling: a deep neural network model for scholarly Web user profiling

Abstract

Access this article

Similar content being viewed by others

Multi-strategies Integrated Information Extraction for Scholar Profiling Task

Enhancing identification of structure function of academic articles using contextual information

A joint deep model of entities and documents for cumulative citation recommendation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation