Skip to main content
Log in

Investigating Gender Bias in BERT

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In this work, we analyze the gender bias induced by BERT in downstream tasks. We also propose solutions to reduce gender bias. Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM-provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing “he” to “she”, or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., \(\text {BERT}\). We analyze the gender bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing \(\text {BERT}\)’s word embeddings. We then evaluate the gender bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system’s predictions on gender-particular words and phrases. We claim that such biases can be reduced by removing gender-specific features from word embedding. Hence, for each layer in BERT, we identify directions that primarily encode gender information. The space formed by such directions is referred to as the gender subspace in the semantic space of word embeddings. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. This obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Experiments show that removing embedding components in gender directions achieves great success in reducing BERT-induced bias in the downstream tasks. The investigation reveals significant gender bias a contextualized language model ( i.e., \(\text {BERT}\)) induces in downstream tasks. The proposed solution seems promising in reducing such biases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. e.g.,:- {angry, enraged} represents a common emotion, i.e., anger

  2. \(W_k^{CLS}\) and \(t_k^{cls}\) represent the same vector.

  3. Following https://competitions.codalab.org/competitions/17751#results

  4. Delta values can be compared to models studied in [23]

  5. We focus on those words having low word-sense ambiguity

  6. https://github.com/uclanlp/gn_glove/tree/master/wordlist

  7. https://github.com/ecmonsen/gendered_words

  8. cosine of the angle between two random vectors in high dimensions is zero with high probability.

References

  1. Hussain A, Tahir A, Hussain Z, Sheikh Z, Gogate M, Dashtipour K, Ali A, Sheikh A. Artificial intelligence-enabled analysis of public attitudes on facebook and twitter toward covid-19 vaccines in the united kingdom and the united states: Observational study. J Med Intern Res. 2021;23(4):e26627.

    Google Scholar 

  2. Basta C, Costa-jussà MR, Casas N. Evaluating the underlying gender bias in contextualized word embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. Association for Computational Linguistics. 2019:33–39.

  3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In Advances in neural information processing systems 2017:5998–6008.

  4. Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. 2017;356(6334):83–186.

  5. Dashtipour K, Gogate M, Cambria E, Hussain A. A novel context-aware multimodal framework for persian sentiment analysis. 2021. https://arxiv.org/abs/2103.02636.

  6. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. 2018. https://arxiv.org/abs/1810.04805.

  7. Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems. 2016:4349–4357.

  8. Gonen H, Goldberg Y. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics. 2019:609–614.

  9. Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. 2017;356(6334):183–6.

    Article  Google Scholar 

  10. Hussain A, Tahir A, Hussain Z, Sheikh Z, Gogate M, Dashtipour K, Ali A, Sheikh A. Artificial intelligence–enabled analysis of public attitudes on facebook and twitter toward covid-19 vaccines in the united kingdom and the united states: Observational study. Journal of medical Internet research. 2021;23(4):e26627.

  11. Kim Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2014:1746–1751.

  12. Kiritchenko S, Mohammad S. Examining gender and race bias in two hundred sentiment analysis systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics. 2018:43–53.

  13. Kurita K, Vyas N, Pareek A, Black AW, Tsvetkov Y. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. Association for Computational Linguistics. 2019:166–172.

  14. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: A robustly optimized bert pretraining approach. 2019. https://arxiv.org/abs/1907.11692.

  15. Lu J, Batra D, Parikh D, Lee S. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in Neural Information Processing Systems. 2019:13–23.

  16. Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S. SemEval-2018 task 1: Affect in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation. Association for Computational Linguistics. 2018:pp. 1–17.

  17. Morcos A, Raghu M, Bengio S. Insights on representational similarity in neural networks with canonical correlation. In Advances in Neural Information Processing Systems. 2018:5727–5736.

  18. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics. 2018:2227–2237.

  19. Elazar Y, Ravfogel S, Jacovi A, Goldberg Y. Amnesic probing: Behavioral explanation with amnesic counterfactuals. Transactions of the Association for Computational Linguistics. 2021;9:160–75.

    Article  Google Scholar 

  20. Morcos A, Raghu M, Bengio S. Insights on representational similarity in neural networks with canonical correlation. In Advances in Neural Information Processing Systems. 2018:5727–5736.

  21. Tenney I, Xia P, Chen B, Wang A, Poliak A, McCoy RT, Kim N, Van Durme B, Bowman SR, Das D, et al. What do you learn from context? probing for sentence structure in contextualized word representations. 2019. https://arxiv.org/abs/1905.06316.

  22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In Advances in neural information processing systems. 2017:5998–6008.

  23. Voita E, Titov I. Information-theoretic probing with minimum description length. 2020. https://arxiv.org/abs/2003.12298.

  24. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J. Huggingface’s transformers: State-of-the-art natural language processing. 2019. https://arxiv.org/abs/1910.03771.

  25. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le Q. V. Xlnet: Generalized autoregressive pretraining for language understanding. 2019. https://arxiv.org/abs/1906.08237.

  26. Zhao J, Wang T, Yatskar M, Cotterell R, Ordonez V, Chang KW. Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics. 2019:629–634.

  27. Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics. 2018:15–20.

  28. Zhao J, Zhou Y, Li Z, Wang W, Chang KW. Learning gender-neutral word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2018:4847–4853.

  29. Zhao J, Zhou Y, Li Z, Wang W, Chang KW. Learning gender-neutral word embeddings. 2018. https://arxiv.org/abs/1809.01496.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soujanya Poria.

Ethics declarations

Informed Consent

The authors did not receive support from any organization for the submitted work.

Ethical Standard

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhardwaj, R., Majumder, N. & Poria, S. Investigating Gender Bias in BERT. Cogn Comput 13, 1008–1018 (2021). https://doi.org/10.1007/s12559-021-09881-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09881-2

Keywords

Navigation