Leveraging the alignment between machine learning and intersectionality: Using word embeddings to measure intersectional experiences of the nineteenth century U.S. South

doi:10.1016/j.poetic.2021.101539

Poetics

Volume 88, October 2021, 101539

https://doi.org/10.1016/j.poetic.2021.101539 Get rights and content

Highlights

•
Machine learning is aligned with inductive, cultural, and intersectionality research.
•
Word embeddings can visualize intersectional experiences of slavery and the Civil War.
•
Culture distinguished racial identities in the nineteenth century U.S.
•
The domestic category distinguished gender identities.
•
Black men were closer to discursive authority compared to white women.

Abstract

Machine learning is a rapidly growing research paradigm. Despite its foundationally inductive mathematical assumptions, machine learning is currently developing alongside traditionally deductive inferential statistics but largely orthogonally to inductive, qualitative, cultural, and intersectional research—to its detriment. I argue that we can better realize the full potential of machine learning by leveraging the epistemological alignment between machine learning and inductive research. I empirically demonstrate this alignment through a word embedding model of first-person narratives of the nineteenth-century U.S. South. Situating social categories in relation to social institutions via an inductive computational analysis, I find that the cultural and economic spheres discursively distinguished by race in these narratives, the domestic sphere distinguished by gender, and Black men were afforded more discursive authority compared to white women. Even in a corpus over-representing abolitionist sentiment, I find white identities were afforded a status via culture not allowed Black identities.

Introduction

Academia exists within disciplinary silos, yet several powerful interdisciplinary ideas have managed to bridge disciplines. Intersectionality is one of these ideas. A theory about how social categories intersect with each other and with systems of power to produce unequal lived experiences, intersectionality has provided a language to connect multiple disciplines and subjects under the same theoretical and epistemological framework (Collins & Bilge, 2016). Inferential statistics, or the use of statistics and quantitative data to deduce properties of an underlying population distribution, similarly connects multiple disciplines and subjects under the same methodological and epistemological framework. These two unifying frameworks, however, represent nearly mutually exclusive research agendas. This chasm is largely driven by epistemology: as a radically deductive method, the epistemology of inferential statistics is, by and large, inconsistent with the epistemology of intersectionality. Machine learning, or the use of computers to do complex tasks without explicit instructions, has recently become another disciplinary-spanning framework. As a radically inductive method, machine learning is foundationally compatible with the epistemology of intersectionality.

To the detriment of all, machine learning is currently developing alongside inferential statistics but largely orthogonally to the intersectionality paradigm and other cultural and qualitative research. Intersectional and culture scholars should not abandon computational methods to inferential statistics. Instead, I argue we can better realize the full potential of both machine learning and intersectionality by leveraging the alignment between the two.

Using a diverse collection of nineteenth-century first-person narratives from the U.S. South, I empirically demonstrate one way scholars can leverage this alignment. Using word embeddings, one machine learning method, I quantitatively and visually mapped the relative positions of four social categories (Black and white men and women) within five social institutions (the polity, the economy, culture, the domestic, and authority), formally modeling intersectionality while staying true to its epistemology. Through this inductive computational analysis, I found that in this corpus, identities based on race were most distinguished by cultural discourse, identities based on gender were most distinguished by discourse about the domestic, and the economy vector revealed the differing gender schemas ascribed Black and white women. Black men, additionally, were afforded more discursive authority compared to white women, even as white women had real authority over Black men. A complementary close reading of several texts contextualized and confirmed these findings. Even in a corpus composed largely of abolitionist and anti-racist sentiments, white identities were afforded a social status via culture not allowed Black identities, establishing a deep discursive divide between the races.

Section snippets

Intersectionality

Intersectionality is a theoretical framework for understanding how social identities and categories combine and interact with systems of social, cultural, economic, and political power to create distinct, and unequal, lived experiences. While the specific concept of intersectionality is most often traced to Kimberlé Crenshaw's legal treatise (1989), the attention to multiple intersecting identities reaches back centuries. In an early articulation of intersectionality, Anna Julia Cooper

Machine learning and high dimensional data

Machine learning is rapidly supplanting both traditional inferential statistics and dimensional and clustering techniques in the social sciences. While machine learning is currently being developed to augment inferential statistics, the mathematical assumptions of machine learning—both unsupervised and supervised approaches—are, I claim, better equipped for use in the type of inductive, exploratory, and contextual research traditionally conducted using qualitative methods.

Researchers working

Intersectionality and the nineteenth century United States

The nineteenth century U.S. is an ideal case study to demonstrate the capability for machine learning to capture and illuminate intersectionality. As detailed above, the first articulations of the theory of intersectionality in the United States appeared in the writings of Black women who experienced slavery, reconstruction, and post-reconstruction in the Americas. Slave narratives, such as Harriet Jacobs's Incidents in the Life of a Slave Girl, the writings of activists such as Sojourner

Word embeddings to model social categories and institutions

I used word embeddings to capture discursive representations of intersectional subjectivities as conveyed in this corpus. Word embeddings are a machine learning technique that takes a corpus as input and outputs a high-dimensional vector space model of the corpus. A vector is an object that contains components (typically numbers) that represent data within a set space (for example, x,y coordinates on a two-dimensional plot). Word vectors are simply sets of numbers that represent the meaning of

The discursive field

Table 1 shows the ten vectors most similar to each of the social institution vectors, indicating the semantic meaning of these institutions in this corpus. The polity vector captures political entities (country, commonwealth, municipalities) and national economic phenomenon (graingrowing, bankruptcy). The word afroamericans is closely related to the polity vector, capturing this new social and political identity. The economy vector is related more to the practical economic context, including

Visualizing intersectionality

Fig. 2, Fig. 3 visualize the results, from two different perspectives. Based on the two dimensions found in the PCA above, Fig. 2 is organized into four scatter plots, showing the relative placement of the four social categories relative to culture vs. the economy (lower left), culture vs. the polity (upper left), domestic vs. the economy (lower right), and domestic vs. the polity (upper right). The 200 vectors capturing the institutional space are also plotted along these dimensions, providing

Vectors in context

Figs. 1–3 visually represent the intersectional discursive space emerging from this corpus. Discourse, of course, is much more complex than can be represented in two-dimensional space, even with the multiple dimensions represented in Fig. 2. In a final step I placed the patterns represented above back into their full discursive context via a close reading of the text. To move from vector representations of a corpus back to the text I calculated raw counts of the fifty closest words to the

The complex relationship between race, gender, and authority

Recent scholarship has complicated the relationships between gender, race, and authority in the nineteenth century. White women used enslaved people as a source of real social and economic empowerment, providing themselves a measure of supremacy within an otherwise patriarchal society (Glymph, 2003; Jones-Rogers, 2020). In this case, we might expect white women to be discursively closer to authority compared to Black men and women in this corpus. At the same time, Black men were afforded a

Discussion

Using word embeddings as an example of one machine learning method, I analyzed intersectional experiences and discursive associations as conveyed via a diverse collection of first-person narratives from the nineteenth century U.S. South. Empirically, I found that, even in a corpus composed largely of abolitionist or abolitionist-friendly narratives, social status via culture discursively distinguished social identities based on race. Additionally, Black women, while not using more economic

Acknowledgments

The author extends deep gratitude to Elizabeth Maddock Dillon, Jane Nelson, and Leslie McCall for comments and suggestions on drafts of this paper. Two reviewers and the special issue editors provided thoughtful guidance in shaping the final version. I am grateful that Sarah Connell and Julia Flanders asked me to participate in the workshop Word Vectors for the Thoughtful Humanist, out of which this paper emerged.

References (50)

K. Carley
Extracting culture through textual analysis
Poetics
(1994)
J.L. Martin
What do animals do all day?: The division of labor, class bodies, and totemic thinking in the popular imagination
Poetics
(2000)
J.W. Mohr et al.
Graphing the grammar of motives in national security strategies: Cultural interpretation, automated text analysis and the drama of global politics
Poetics
(2013)
M. Antoniak et al.
Evaluating the stability of embedding-based word similarities
Transactions of the Association for Computational Linguistics
(2018)
M. Bailey
#transform(Ing)DH Writing and research: An autoethnography of digital humanities and feminist ethics
Digital Humanities Quarterly
(2015)
R. Benjamin
Race after technology: Abolitionist tools for the new Jim code: Benjamin, Ruha: 9781509526406: Amazon.com: Books
(2019)
J. Boelaert et al.
The great regression
Revue Francaise de Sociologie
(2018)
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., and Kalai, A. (2016). “Man is to computer programmer as woman is...
T. Burnard
Toiling in the fields: Valuing female slaves in Jamaica, 1674-1788
S.M.H. Camp
Closer to freedom: Enslaved women and everyday resistance in the plantation south
(2005)

P.H. Collins

Black feminist thought: Knowledge, consciousness, and the politics of empowerment

(2009)

P.H. Collins

Intersectionality's definitional dilemmas

Annual Review of Sociology

(2015)

P.H. Collins et al.

Intersectionality

(2016)

A.J. Cooper

A voice from the South: By a Black woman of the South

(1892)

K. Crenshaw

Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist politics

University of Chicago Legal Forum

(1989)

C. D'Ignazio et al.

Data feminism

(2020)

V. Eubanks

Automating inequality: How high-tech tools profile, police, and punish the poor

(2018)

N. Garg et al.

Word embeddings quantify 100 years of gender and ethnic stereotypes

Proceedings of the National Academy of Sciences

(2018)

A. Globerson et al.

Euclidean embedding of co-occurrence data

The Journal of Machine Learning Research

(2007)

T. Glymph

Out of the house of bondage: The transformation of the plantation household

(2003)

A.-M. Hancock

When multiplication doesn't equal quick addition: Examining intersectionality as a research paradigm

Perspectives on Politics

(2007)

A. Hochschild

The second shift

(1989)

T. Jennings

‘Us colored women had to go though a plenty’: Sexual exploitation of African-American slave women

Journal of Women's History

(1990)

C. Jones

An end to the neglect of the problems of negro women

Political Affairs

(1949)

S.E. Jones-Rogers

They were her property: White women as slave owners in the American south

(2020)

Cited by (35)

Uncovering how Black and Latinx Communities perceive environmental justice: Integrating a public deliberation quasi-experiment and computational methods
2024, Public Relations Review
This research aims to address gaps in the knowledge and practice of engaging underrepresented communities in public communication and policymaking on environmental justice issues. Through a quasi-field experiment conducted in Madison, Wisconsin, we investigate how Black and Latinx communities perceive environmental justice issues and how these perceptions are associated with intersectional identities such as education level, income, and gender. Additionally, we explore the impact of different information material designs on fostering inclusive and diverse discussions. Our results highlight the nuances of environmental justice perspectives within communities and underscore the importance of tailored approaches in community engagement and public relations. Furthermore, the study reveals the effectiveness of visual information materials in stimulating diverse discussions, emphasizing the significance of audience accessibility and comprehension in communication design. These findings provide both theoretical implications for the field of public relations and practical applications for science communicators, public relations professionals, and community-engaged scholars to promote more inclusive and personalized strategies in community engagement.
A Symbolic Hierarchy of Places: Global Inequalities in Tourism Narratives of the New York Times Travel Section
2024, Poetics
We study the symbolic value of places using the case of global tourism where places are explicitly objectified for valorization. Unlike most prior research that uses tangible measurements like UNESCO's World Heritage Sites for global comparison of place-based symbolic values, we harness the power of computational text analysis to measure the symbolic value of places based on travel writings of the New York Times Travel Section. Our results demonstrate that there is a symbolic hierarchy among places depending on the various meanings of culture and nature and the degree of engagement with either topic. NYT travel writers valorize European regions for cultural tourism according to the broadest meanings of culture, and often engage with the region's history as a main topic of the travel article. However, other regions – particularly the ones with legacies of past colonization – are valorized for their nature's scenic beauty while obscuring the significance of their “cultural” values. Even when a place's “cultural” values are recognized, the meanings of culture tend to be limited in non-European regions. Our findings have implications for the enduring symbolic inequality of places at the global level.
Three families of automated text analysis
2022, Social Science Research
Citation Excerpt :
Researchers also use word embeddings to uncover dimensions of meaning we might not be able to otherwise study quantitatively. Nelson (2021b), for instance, inductively analyzes associations between intersectional identities and different institutions (e.g., the domestic sphere, politics, the economy) in first-person narratives originating from the nineteenth-century American South. She finds that in this cultural context, women (and especially Black women) are more associated with the domestic sphere than are men and that White folks are more associated with the cultural domain than are Black folks.
Since the beginning of this millennium, data in the form of human-generated text in a machine-readable format has become increasingly available to social scientists, presenting a unique window into social life. However, harnessing vast quantities of this highly unstructured data in a systematic way presents a unique combination of analytical and methodological challenges. Luckily, our understanding of how to overcome these challenges has also developed greatly over this same period. In this article, I present a novel typology of the methods social scientists have used to analyze text data at scale in the interest of testing and developing social theory. I describe three “families” of methods: analyses of (1) term frequency, (2) document structure, and (3) semantic similarity. For each family of methods, I discuss their logical and statistical foundations, analytical strengths and weaknesses, as well as prominent variants and applications.
“Do your part: Stay apart”: Collective intentionality and collective (in)action in US governor's COVID-19 press conferences
2022, Poetics
Citation Excerpt :
Then, a word embedding model (Skip Gram Model) analyzed the closest words around the identified terms (e.g., we, Marylanders, people) per state, creating standardized values of closeness. Word embedding models are recognized as being particularly well-suited for text analysis focused on meaning (Nelson, 2021; Stoltz & Taylor, 2021). Thus, the composite measure describes a set of words that revolve around context specific plural words that represent the construction of collective intentionality in the press conferences.
This mixed-methods study examines how political leaders mobilize collective intentionality during the COVID-19 pandemic in nine US States, and how collective intentionality differs across republican and democratic administrations. The results of our computational and qualitative analyses show that i) political leaders establish collective intentionality by emphasizing unity, vulnerability, action, and community boundaries; ii) political leaders’ call to collective action clashes with the inaction required by health guidelines; iii) social inequalities received little attention across all states compared to other themes; and iv) collective intentionality in democratic administrations is linked to individuals’ agency and actions, suggesting a bottom-up approach. Conversely, in republican administrations individuals’ contributions are downplayed compared to work and state-level action, indicating a top-down approach. This study demonstrates the theoretical and empirical value of collective intentionality in sociological research, and contributes to a better understanding of leadership and prosociality in times of crisis.
Using Topic-Modeling in Legal History, with an Application to Pre-Industrial English Case Law on Finance
2022, Law and History Review
Word Embedding Models and the Hybridity of Newspaper Genres
2024, American Historical Review

View all citing articles on Scopus

Laura K. Nelson is an assistant professor of sociology at Northeastern University where she is core faculty at the NULab for Texts, Maps, and Networks, is affiliated faculty at the Network Science Institute, and is on the executive committee for Women's, Gender, and Sexuality Studies. She was previously a postdoctoral fellow at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley at the University of California, Berkeley, and for the Management and Organizations Department at Northwestern University, where she was also affiliated with the Northwestern Institute on Complex Systems (NICO). She uses computational tools, principally automated text analysis, to study social movements, culture, gender, institutions, and organizations. She has published in Sociological Methods and Research, Sociological Methodology, Mobilization, Gender & Society, and Oxford University Press, among other outlets, and has given talks and workshops on computational social science throughout the U.S. and internationally.

View full text

Leveraging the alignment between machine learning and intersectionality: Using word embeddings to measure intersectional experiences of the nineteenth century U.S. South

Highlights

Abstract

Introduction

Section snippets

Intersectionality

Machine learning and high dimensional data

Intersectionality and the nineteenth century United States

Word embeddings to model social categories and institutions

The discursive field

Visualizing intersectionality

Vectors in context

The complex relationship between race, gender, and authority

Discussion

Acknowledgments

Poetics

Poetics

Poetics

Evaluating the stability of embedding-based word similarities

Transactions of the Association for Computational Linguistics

#transform(Ing)DH Writing and research: An autoethnography of digital humanities and feminist ethics

Digital Humanities Quarterly

Race after technology: Abolitionist tools for the new Jim code: Benjamin, Ruha: 9781509526406: Amazon.com: Books

The great regression

Revue Francaise de Sociologie

Toiling in the fields: Valuing female slaves in Jamaica, 1674-1788

Closer to freedom: Enslaved women and everyday resistance in the plantation south

Black feminist thought: Knowledge, consciousness, and the politics of empowerment

Intersectionality's definitional dilemmas

Annual Review of Sociology

Intersectionality

A voice from the South: By a Black woman of the South

Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist politics

University of Chicago Legal Forum

Data feminism

Automating inequality: How high-tech tools profile, police, and punish the poor

Word embeddings quantify 100 years of gender and ethnic stereotypes

Proceedings of the National Academy of Sciences

Euclidean embedding of co-occurrence data

The Journal of Machine Learning Research

Out of the house of bondage: The transformation of the plantation household

When multiplication doesn't equal quick addition: Examining intersectionality as a research paradigm

Perspectives on Politics

The second shift

‘Us colored women had to go though a plenty’: Sexual exploitation of African-American slave women

Journal of Women's History

An end to the neglect of the problems of negro women

Political Affairs

They were her property: White women as slave owners in the American south