Skip to main content
Log in

Identifying Structural Holes for Sentiment Classification

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

The prevalence of online user-generated content has attracted great interest in textual sentiment analysis, which provides a low-cost yet effective way to discern consumers and markets. A mainstream of sentiment analysis is to construct a classification model with Bag-of-Words (BoW) features, but the large vocabulary base and skewed distribution of term frequency consistently pose research challenges, which is made even worse by the limited valid sentiment labels. In light of this, in this paper, we propose a novel method called Structural Holes based Sentiment Classifier (SHSC) for BoW-based sentiment classification. The key to SHSC is to reinforce the classification contribution of semantically rich words with clear-cut sentiment polarity. To this end, a word co-occurrence network is carefully constructed to represent both high and low frequency words. The work to find classification-inefficient words is then transformed into the identification of so-called bridge nodes that occupy the positions of structural holes in the network. Two interesting measures, i.e., information advantage rank and control advantage weight, are then designed elaborately for this purpose, which are based on the proposed sentiment-label propagation and short-path computation algorithms, respectively. SHSC finally feeds this information as the key regularizers into a simple regression model to guide parametric learning. Extensive experiments on real-world text datasets demonstrate the advantage of our SHSC model over competitive benchmarks, particularly when sentiment labels are scarce. The effectiveness of uncovering structural holes for sentiment classification is also carefully verified with some robustness checks and demonstration cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the experiment, we set the window size to 2. That is, if two words are adjacent to each other, their co-occurrence is increased by one.

  2. COAE2013, https://download.csdn.net/download/u013906860/9776509https://download.csdn.net/download/u013906860/9776509.

  3. Hotel, https://download.csdn.net/download/lssc4205/9903298.

  4. NLPCC2014, http://tcci.ccf.org.cn/conference/2014/pages/page04_sam.htmlhttp://tcci.ccf.org.cn/conference/2014/pages/page04_sam.html.

  5. The ratio ranges from 10% to 80%. In the subsequent subsections, the results are at a ratio of 40% unless otherwise stated.

  6. Gensim: https://radimrehurek.com/gensim/

  7. HowNet, http://www.keenage.com/html/c_index.html.

  8. SWOL, http://ir.dlut.edu.cn/news/detail/215.

  9. NTUSD, http://nlg18.csie.ntu.edu.tw:8080/opinion/pub1.html.

  10. Jieba, https://github.com/fxsjy/jieba.

References

  • Amancio, D.R., Fabbri, R., ONO, Jr., Nunes, M.G.V., & Costa, L.D.F. (2010). Distinguishing between positive and negative opinions with complex network features. In TextGraphs-5 Proceddings of the 2010 Workshop on Graph-based Methods for Natual Language Processing (pp. 83–87).

  • Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Eighteenth International Conference on Machine Learning (pp. 19–26).

  • Bo, P., & Lee, L. (2004). A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Meeting on Association for Computational Linguistics (pp. 271).

  • Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv:160704606.

  • Burt, R.S. (2004). Structural holes and good ideas. American Journal of Sociology, 110, 349–399.

    Article  Google Scholar 

  • Cancho, R.F.I., & Solé, R. V. (2001). Two regimes in the frequency of words and the origins of complex lexicons: Zipf’s law revisited. Journal of Quantitative Linguistics, 8(3), 165–173.

    Article  Google Scholar 

  • Chan, S.W.K., & Chong, M.W.C. (2017). Sentiment analysis in financial texts. Decision Support Systems, 94, 53–64.

    Article  Google Scholar 

  • Choi, Y., & Lee, H. (2017). Data properties and the performance of sentiment classification for electronic commerce applications. Information Systems Frontiers, 19(5), 993–1012.

    Article  Google Scholar 

  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., & Stein C. (2009). Introduction to Algorithms, 3rd edn. Cambridge: The MIT Press.

    Google Scholar 

  • Deng, S., Sinha, A.P., & Zhao, H. (2017). Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems, 94, 65–76.

    Article  Google Scholar 

  • Duric, A., & Song, F. (2012). Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems, 53(4), 704–711.

    Article  Google Scholar 

  • Esuli, A., & Sebastiani, F. (2007). Pageranking wordnet synsets: an application to opinion mining. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics. Prague.

  • Hassan, A., Abu-Jbara, A., Lu, W., & Radev, D. (2014). A random walk-based model for identifying semantic orientation. Computational Linguistics, 40(3), 539–562.

    Article  Google Scholar 

  • Huang, S., Niu, Z., & Shi, C. (2014). Automatic construction of domain-specific sentiment lexicon based on constrained label propagation. Knowledge-Based Systems, 56(C), 191– 200.

    Article  Google Scholar 

  • Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In International Conference on Neural Information Processing Systems (pp. 315–323).

  • Khan, F.H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems, 57(3):245–257.

  • Kiritchenko, S., Zhu, X., & Mohammad SM. (2014). Sentiment analysis of short informal texts. AI Access Foundation.

  • Lau, R.Y.K., Liao, S.S.Y., Wong, K.F., & Chiu, D.K.W. (2012). Web 2.0 environmental scanning and adaptive decision support for business mergers and acquisitions. MIS Quarterly, 36(4), 1239–1268.

    Article  Google Scholar 

  • Li, S., Wang, Z., Zhou, G., & Lee, S.Y.M. (2011). Semi-supervised learning for imbalanced sentiment classification. In Twenty-Second International Joint Conference on Aritifical Intelligence (pp. 1826–1831).

  • Li, W., Guo, K., Shi, Y., Zhu, L., & Zheng, Y. (2018). DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain. Knowledge-Based Systems.

  • Li, Y.M., & Li, T.Y. (2013). Deriving market intelligence from microblogs. Decision Support Systems, 55(1), 206–217.

    Article  Google Scholar 

  • Liu, S., Li, F., Li, F., Cheng, X., & Shen, H. (2013). Adaptive co-training SVM for sentiment classification on tweets. In ACM International Conference on Conference on Information & Knowledge Management (pp. 2079–2088).

  • Luo, X., Zhang, J., Gu, B., & Phang, C.W. (2017). Expert blogs and consumer perceptions of competing brands. MIS Quarterly, 41(2), 371–395.

    Article  Google Scholar 

  • Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into texts. ACM Conference on Empirical Methods in Natural Langugae Processing, 404–411.

  • Ferrer, R. (2001).

  • Rao, D., & Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. In Eacl 2009, Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 675–682). Greece.

  • Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedlmair, M. (2018). More than bags of words: Sentiment analysis with word embeddings. Communication Methods and Measures, 12(2-3), 140–157.

    Article  Google Scholar 

  • Singh, P., Dwivedi, Y.K., Kahlon, K.S., Sawhney, R.S., Alalwan, A.A., & Rana, N.P. (2019). Smart monitoring and controlling of government policies using social media and cloud computing. Information Systems Frontiers, 22(2), 315–337.

    Google Scholar 

  • Swain, A.K., & Cao, R.Q. (2017). Using sentiment analysis to improve supply chain intelligence. Information Systems Frontiers, 21(2), 469–484.

    Article  Google Scholar 

  • Tang, D., Wei, F., Qin, B., Zhou, M., & Ting, L. (2014). Building large-scale twitter-specific sentiment lexicon: A representation learning approach. In COLING 2014 Proceedings of the 25rd International Conference on Computational Linguistics (pp. 172–182).

  • Velikovich, L., Blair-Goldensohn, S, Hannan, K., & Mcdonald, R. (2010). The viability of web-derived polarity lexicons. In Human Language Technologies: the 2010 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 777–785).

  • Wu, F., & Huang, Y. (2016). Personalized microblog sentiment classification via multi-task learning. In Thirtieth AAAI Conference on Artificial Intelligence (pp. 3059–3065).

  • Wu, F., Song, Y., & Huang, Y. (2015). Microblog sentiment classification with contextual knowledge regularization. In Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 2332–2338).

  • Wu, F., Huang, Y., Song, Y., & Liu, S. (2016a). Towards building a high-quality microblog-specific Chinese sentiment lexicon. Decision Support Systems, 87(C):39–49.

  • Wu, F., Wu, S., Huang, Y., Huang, S., & Qin, Y. (2016b). Sentiment domain adaptation with multi-level contextual sentiment knowledge. In ACM International Conference on Information and Knowledge Management (pp. 949–958).

  • Wu, F., Yuan, Z., & Huang, Y. (2017a). Collaboratively training sentiment classifiers for multiple domains. IEEE Transactions on Knowledge & Data Engineering, 29(7):1370–1383.

  • Wu, F., Zhang, J., Yuan, Z., Wu, S., Huang, Y., & Yan, J. (2017b). Sentence-level sentiment classification with weak supervision. In: The International ACM SIGIR Conference (pp. 973–976).

  • Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications, 42(4):1857–1863.

  • Zhang, K., Bhattacharyya, S., & Ram, S. (2016). Large-scale network analysis for online social brand advertising. MIS Quarterly, 40(4), 849–868.

    Article  Google Scholar 

  • Zhiguang, L., Xishuang, D., Yi, G., & Jinfeng, Y. (2013). Reserved self-training: A semi-supervised sentiment classification method for Chinese microblogs. In International Joint Conference on Natural Language Processing (pp. 455–462).

  • Zhou, S., Chen, Q., & Wang, X. (2013). Active deep learning method for semi-supervised sentiment classification. Neurocomputing, 120(10), 536–546.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guannan Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Dr. Guannan Liu was supported by National Natural Science Foundation of China (NSFC) (71701007,92046025). Dr. Junjie Wu was supported by National Natural Science Foundation of China (NSFC) (71725002, 72031001,72021001). Dr. Zheng Xie was supported by the Science and Technology Innovation 2030 Major Project of China (2020AAA0108405, 2020AAA0108400).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Z., Liu, G., Qu, J. et al. Identifying Structural Holes for Sentiment Classification. Inf Syst Front 24, 1735–1751 (2022). https://doi.org/10.1007/s10796-021-10185-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-021-10185-x

Keywords

Navigation