Leveraging the alignment between machine learning and intersectionality: Using word embeddings to measure intersectional experiences of the nineteenth century U.S. South
Introduction
Academia exists within disciplinary silos, yet several powerful interdisciplinary ideas have managed to bridge disciplines. Intersectionality is one of these ideas. A theory about how social categories intersect with each other and with systems of power to produce unequal lived experiences, intersectionality has provided a language to connect multiple disciplines and subjects under the same theoretical and epistemological framework (Collins & Bilge, 2016). Inferential statistics, or the use of statistics and quantitative data to deduce properties of an underlying population distribution, similarly connects multiple disciplines and subjects under the same methodological and epistemological framework. These two unifying frameworks, however, represent nearly mutually exclusive research agendas. This chasm is largely driven by epistemology: as a radically deductive method, the epistemology of inferential statistics is, by and large, inconsistent with the epistemology of intersectionality. Machine learning, or the use of computers to do complex tasks without explicit instructions, has recently become another disciplinary-spanning framework. As a radically inductive method, machine learning is foundationally compatible with the epistemology of intersectionality.
To the detriment of all, machine learning is currently developing alongside inferential statistics but largely orthogonally to the intersectionality paradigm and other cultural and qualitative research. Intersectional and culture scholars should not abandon computational methods to inferential statistics. Instead, I argue we can better realize the full potential of both machine learning and intersectionality by leveraging the alignment between the two.
Using a diverse collection of nineteenth-century first-person narratives from the U.S. South, I empirically demonstrate one way scholars can leverage this alignment. Using word embeddings, one machine learning method, I quantitatively and visually mapped the relative positions of four social categories (Black and white men and women) within five social institutions (the polity, the economy, culture, the domestic, and authority), formally modeling intersectionality while staying true to its epistemology. Through this inductive computational analysis, I found that in this corpus, identities based on race were most distinguished by cultural discourse, identities based on gender were most distinguished by discourse about the domestic, and the economy vector revealed the differing gender schemas ascribed Black and white women. Black men, additionally, were afforded more discursive authority compared to white women, even as white women had real authority over Black men. A complementary close reading of several texts contextualized and confirmed these findings. Even in a corpus composed largely of abolitionist and anti-racist sentiments, white identities were afforded a social status via culture not allowed Black identities, establishing a deep discursive divide between the races.
Section snippets
Intersectionality
Intersectionality is a theoretical framework for understanding how social identities and categories combine and interact with systems of social, cultural, economic, and political power to create distinct, and unequal, lived experiences. While the specific concept of intersectionality is most often traced to Kimberlé Crenshaw's legal treatise (1989), the attention to multiple intersecting identities reaches back centuries. In an early articulation of intersectionality, Anna Julia Cooper
Machine learning and high dimensional data
Machine learning is rapidly supplanting both traditional inferential statistics and dimensional and clustering techniques in the social sciences. While machine learning is currently being developed to augment inferential statistics, the mathematical assumptions of machine learning—both unsupervised and supervised approaches—are, I claim, better equipped for use in the type of inductive, exploratory, and contextual research traditionally conducted using qualitative methods.
Researchers working
Intersectionality and the nineteenth century United States
The nineteenth century U.S. is an ideal case study to demonstrate the capability for machine learning to capture and illuminate intersectionality. As detailed above, the first articulations of the theory of intersectionality in the United States appeared in the writings of Black women who experienced slavery, reconstruction, and post-reconstruction in the Americas. Slave narratives, such as Harriet Jacobs's Incidents in the Life of a Slave Girl, the writings of activists such as Sojourner
Word embeddings to model social categories and institutions
I used word embeddings to capture discursive representations of intersectional subjectivities as conveyed in this corpus. Word embeddings are a machine learning technique that takes a corpus as input and outputs a high-dimensional vector space model of the corpus. A vector is an object that contains components (typically numbers) that represent data within a set space (for example, x,y coordinates on a two-dimensional plot). Word vectors are simply sets of numbers that represent the meaning of
The discursive field
Table 1 shows the ten vectors most similar to each of the social institution vectors, indicating the semantic meaning of these institutions in this corpus. The polity vector captures political entities (country, commonwealth, municipalities) and national economic phenomenon (graingrowing, bankruptcy). The word afroamericans is closely related to the polity vector, capturing this new social and political identity. The economy vector is related more to the practical economic context, including
Visualizing intersectionality
Fig. 2, Fig. 3 visualize the results, from two different perspectives. Based on the two dimensions found in the PCA above, Fig. 2 is organized into four scatter plots, showing the relative placement of the four social categories relative to culture vs. the economy (lower left), culture vs. the polity (upper left), domestic vs. the economy (lower right), and domestic vs. the polity (upper right). The 200 vectors capturing the institutional space are also plotted along these dimensions, providing
Vectors in context
Figs. 1–3 visually represent the intersectional discursive space emerging from this corpus. Discourse, of course, is much more complex than can be represented in two-dimensional space, even with the multiple dimensions represented in Fig. 2. In a final step I placed the patterns represented above back into their full discursive context via a close reading of the text. To move from vector representations of a corpus back to the text I calculated raw counts of the fifty closest words to the
The complex relationship between race, gender, and authority
Recent scholarship has complicated the relationships between gender, race, and authority in the nineteenth century. White women used enslaved people as a source of real social and economic empowerment, providing themselves a measure of supremacy within an otherwise patriarchal society (Glymph, 2003; Jones-Rogers, 2020). In this case, we might expect white women to be discursively closer to authority compared to Black men and women in this corpus. At the same time, Black men were afforded a
Discussion
Using word embeddings as an example of one machine learning method, I analyzed intersectional experiences and discursive associations as conveyed via a diverse collection of first-person narratives from the nineteenth century U.S. South. Empirically, I found that, even in a corpus composed largely of abolitionist or abolitionist-friendly narratives, social status via culture discursively distinguished social identities based on race. Additionally, Black women, while not using more economic
Acknowledgments
The author extends deep gratitude to Elizabeth Maddock Dillon, Jane Nelson, and Leslie McCall for comments and suggestions on drafts of this paper. Two reviewers and the special issue editors provided thoughtful guidance in shaping the final version. I am grateful that Sarah Connell and Julia Flanders asked me to participate in the workshop Word Vectors for the Thoughtful Humanist, out of which this paper emerged.
Laura K. Nelson is an assistant professor of sociology at Northeastern University where she is core faculty at the NULab for Texts, Maps, and Networks, is affiliated faculty at the Network Science Institute, and is on the executive committee for Women's, Gender, and Sexuality Studies. She was previously a postdoctoral fellow at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley at the University of California, Berkeley, and for the Management and Organizations Department
References (50)
Extracting culture through textual analysis
Poetics
(1994)What do animals do all day?: The division of labor, class bodies, and totemic thinking in the popular imagination
Poetics
(2000)- et al.
Graphing the grammar of motives in national security strategies: Cultural interpretation, automated text analysis and the drama of global politics
Poetics
(2013) - et al.
Evaluating the stability of embedding-based word similarities
Transactions of the Association for Computational Linguistics
(2018) #transform(Ing)DH Writing and research: An autoethnography of digital humanities and feminist ethics
Digital Humanities Quarterly
(2015)Race after technology: Abolitionist tools for the new Jim code: Benjamin, Ruha: 9781509526406: Amazon.com: Books
(2019)- et al.
The great regression
Revue Francaise de Sociologie
(2018) - Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., and Kalai, A. (2016). “Man is to computer programmer as woman is...
Toiling in the fields: Valuing female slaves in Jamaica, 1674-1788
Closer to freedom: Enslaved women and everyday resistance in the plantation south
(2005)
Black feminist thought: Knowledge, consciousness, and the politics of empowerment
Intersectionality's definitional dilemmas
Annual Review of Sociology
Intersectionality
A voice from the South: By a Black woman of the South
Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist politics
University of Chicago Legal Forum
Data feminism
Automating inequality: How high-tech tools profile, police, and punish the poor
Word embeddings quantify 100 years of gender and ethnic stereotypes
Proceedings of the National Academy of Sciences
Euclidean embedding of co-occurrence data
The Journal of Machine Learning Research
Out of the house of bondage: The transformation of the plantation household
When multiplication doesn't equal quick addition: Examining intersectionality as a research paradigm
Perspectives on Politics
The second shift
‘Us colored women had to go though a plenty’: Sexual exploitation of African-American slave women
Journal of Women's History
An end to the neglect of the problems of negro women
Political Affairs
They were her property: White women as slave owners in the American south
Cited by (35)
Three families of automated text analysis
2022, Social Science ResearchCitation Excerpt :Researchers also use word embeddings to uncover dimensions of meaning we might not be able to otherwise study quantitatively. Nelson (2021b), for instance, inductively analyzes associations between intersectional identities and different institutions (e.g., the domestic sphere, politics, the economy) in first-person narratives originating from the nineteenth-century American South. She finds that in this cultural context, women (and especially Black women) are more associated with the domestic sphere than are men and that White folks are more associated with the cultural domain than are Black folks.
“Do your part: Stay apart”: Collective intentionality and collective (in)action in US governor's COVID-19 press conferences
2022, PoeticsCitation Excerpt :Then, a word embedding model (Skip Gram Model) analyzed the closest words around the identified terms (e.g., we, Marylanders, people) per state, creating standardized values of closeness. Word embedding models are recognized as being particularly well-suited for text analysis focused on meaning (Nelson, 2021; Stoltz & Taylor, 2021). Thus, the composite measure describes a set of words that revolve around context specific plural words that represent the construction of collective intentionality in the press conferences.
Using Topic-Modeling in Legal History, with an Application to Pre-Industrial English Case Law on Finance
2022, Law and History ReviewWord Embedding Models and the Hybridity of Newspaper Genres
2024, American Historical Review
Laura K. Nelson is an assistant professor of sociology at Northeastern University where she is core faculty at the NULab for Texts, Maps, and Networks, is affiliated faculty at the Network Science Institute, and is on the executive committee for Women's, Gender, and Sexuality Studies. She was previously a postdoctoral fellow at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley at the University of California, Berkeley, and for the Management and Organizations Department at Northwestern University, where she was also affiliated with the Northwestern Institute on Complex Systems (NICO). She uses computational tools, principally automated text analysis, to study social movements, culture, gender, institutions, and organizations. She has published in Sociological Methods and Research, Sociological Methodology, Mobilization, Gender & Society, and Oxford University Press, among other outlets, and has given talks and workshops on computational social science throughout the U.S. and internationally.