Elsevier

Poetics

Volume 88, October 2021, 101539
Poetics

Leveraging the alignment between machine learning and intersectionality: Using word embeddings to measure intersectional experiences of the nineteenth century U.S. South

https://doi.org/10.1016/j.poetic.2021.101539Get rights and content

Highlights

  • Machine learning is aligned with inductive, cultural, and intersectionality research.

  • Word embeddings can visualize intersectional experiences of slavery and the Civil War.

  • Culture distinguished racial identities in the nineteenth century U.S.

  • The domestic category distinguished gender identities.

  • Black men were closer to discursive authority compared to white women.

Abstract

Machine learning is a rapidly growing research paradigm. Despite its foundationally inductive mathematical assumptions, machine learning is currently developing alongside traditionally deductive inferential statistics but largely orthogonally to inductive, qualitative, cultural, and intersectional research—to its detriment. I argue that we can better realize the full potential of machine learning by leveraging the epistemological alignment between machine learning and inductive research. I empirically demonstrate this alignment through a word embedding model of first-person narratives of the nineteenth-century U.S. South. Situating social categories in relation to social institutions via an inductive computational analysis, I find that the cultural and economic spheres discursively distinguished by race in these narratives, the domestic sphere distinguished by gender, and Black men were afforded more discursive authority compared to white women. Even in a corpus over-representing abolitionist sentiment, I find white identities were afforded a status via culture not allowed Black identities.

Introduction

Academia exists within disciplinary silos, yet several powerful interdisciplinary ideas have managed to bridge disciplines. Intersectionality is one of these ideas. A theory about how social categories intersect with each other and with systems of power to produce unequal lived experiences, intersectionality has provided a language to connect multiple disciplines and subjects under the same theoretical and epistemological framework (Collins & Bilge, 2016). Inferential statistics, or the use of statistics and quantitative data to deduce properties of an underlying population distribution, similarly connects multiple disciplines and subjects under the same methodological and epistemological framework. These two unifying frameworks, however, represent nearly mutually exclusive research agendas. This chasm is largely driven by epistemology: as a radically deductive method, the epistemology of inferential statistics is, by and large, inconsistent with the epistemology of intersectionality. Machine learning, or the use of computers to do complex tasks without explicit instructions, has recently become another disciplinary-spanning framework. As a radically inductive method, machine learning is foundationally compatible with the epistemology of intersectionality.

To the detriment of all, machine learning is currently developing alongside inferential statistics but largely orthogonally to the intersectionality paradigm and other cultural and qualitative research. Intersectional and culture scholars should not abandon computational methods to inferential statistics. Instead, I argue we can better realize the full potential of both machine learning and intersectionality by leveraging the alignment between the two.

Using a diverse collection of nineteenth-century first-person narratives from the U.S. South, I empirically demonstrate one way scholars can leverage this alignment. Using word embeddings, one machine learning method, I quantitatively and visually mapped the relative positions of four social categories (Black and white men and women) within five social institutions (the polity, the economy, culture, the domestic, and authority), formally modeling intersectionality while staying true to its epistemology. Through this inductive computational analysis, I found that in this corpus, identities based on race were most distinguished by cultural discourse, identities based on gender were most distinguished by discourse about the domestic, and the economy vector revealed the differing gender schemas ascribed Black and white women. Black men, additionally, were afforded more discursive authority compared to white women, even as white women had real authority over Black men. A complementary close reading of several texts contextualized and confirmed these findings. Even in a corpus composed largely of abolitionist and anti-racist sentiments, white identities were afforded a social status via culture not allowed Black identities, establishing a deep discursive divide between the races.

Section snippets

Intersectionality

Intersectionality is a theoretical framework for understanding how social identities and categories combine and interact with systems of social, cultural, economic, and political power to create distinct, and unequal, lived experiences. While the specific concept of intersectionality is most often traced to Kimberlé Crenshaw's legal treatise (1989), the attention to multiple intersecting identities reaches back centuries. In an early articulation of intersectionality, Anna Julia Cooper

Machine learning and high dimensional data

Machine learning is rapidly supplanting both traditional inferential statistics and dimensional and clustering techniques in the social sciences. While machine learning is currently being developed to augment inferential statistics, the mathematical assumptions of machine learning—both unsupervised and supervised approaches—are, I claim, better equipped for use in the type of inductive, exploratory, and contextual research traditionally conducted using qualitative methods.

Researchers working

Intersectionality and the nineteenth century United States

The nineteenth century U.S. is an ideal case study to demonstrate the capability for machine learning to capture and illuminate intersectionality. As detailed above, the first articulations of the theory of intersectionality in the United States appeared in the writings of Black women who experienced slavery, reconstruction, and post-reconstruction in the Americas. Slave narratives, such as Harriet Jacobs's Incidents in the Life of a Slave Girl, the writings of activists such as Sojourner

Word embeddings to model social categories and institutions

I used word embeddings to capture discursive representations of intersectional subjectivities as conveyed in this corpus. Word embeddings are a machine learning technique that takes a corpus as input and outputs a high-dimensional vector space model of the corpus. A vector is an object that contains components (typically numbers) that represent data within a set space (for example, x,y coordinates on a two-dimensional plot). Word vectors are simply sets of numbers that represent the meaning of

The discursive field

Table 1 shows the ten vectors most similar to each of the social institution vectors, indicating the semantic meaning of these institutions in this corpus. The polity vector captures political entities (country, commonwealth, municipalities) and national economic phenomenon (graingrowing, bankruptcy). The word afroamericans is closely related to the polity vector, capturing this new social and political identity. The economy vector is related more to the practical economic context, including

Visualizing intersectionality

Fig. 2, Fig. 3 visualize the results, from two different perspectives. Based on the two dimensions found in the PCA above, Fig. 2 is organized into four scatter plots, showing the relative placement of the four social categories relative to culture vs. the economy (lower left), culture vs. the polity (upper left), domestic vs. the economy (lower right), and domestic vs. the polity (upper right). The 200 vectors capturing the institutional space are also plotted along these dimensions, providing

Vectors in context

Figs. 1–3 visually represent the intersectional discursive space emerging from this corpus. Discourse, of course, is much more complex than can be represented in two-dimensional space, even with the multiple dimensions represented in Fig. 2. In a final step I placed the patterns represented above back into their full discursive context via a close reading of the text. To move from vector representations of a corpus back to the text I calculated raw counts of the fifty closest words to the

The complex relationship between race, gender, and authority

Recent scholarship has complicated the relationships between gender, race, and authority in the nineteenth century. White women used enslaved people as a source of real social and economic empowerment, providing themselves a measure of supremacy within an otherwise patriarchal society (Glymph, 2003; Jones-Rogers, 2020). In this case, we might expect white women to be discursively closer to authority compared to Black men and women in this corpus. At the same time, Black men were afforded a

Discussion

Using word embeddings as an example of one machine learning method, I analyzed intersectional experiences and discursive associations as conveyed via a diverse collection of first-person narratives from the nineteenth century U.S. South. Empirically, I found that, even in a corpus composed largely of abolitionist or abolitionist-friendly narratives, social status via culture discursively distinguished social identities based on race. Additionally, Black women, while not using more economic

Acknowledgments

The author extends deep gratitude to Elizabeth Maddock Dillon, Jane Nelson, and Leslie McCall for comments and suggestions on drafts of this paper. Two reviewers and the special issue editors provided thoughtful guidance in shaping the final version. I am grateful that Sarah Connell and Julia Flanders asked me to participate in the workshop Word Vectors for the Thoughtful Humanist, out of which this paper emerged.

Laura K. Nelson is an assistant professor of sociology at Northeastern University where she is core faculty at the NULab for Texts, Maps, and Networks, is affiliated faculty at the Network Science Institute, and is on the executive committee for Women's, Gender, and Sexuality Studies. She was previously a postdoctoral fellow at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley at the University of California, Berkeley, and for the Management and Organizations Department

References (50)

  • P.H. Collins

    Black feminist thought: Knowledge, consciousness, and the politics of empowerment

    (2009)
  • P.H. Collins

    Intersectionality's definitional dilemmas

    Annual Review of Sociology

    (2015)
  • P.H. Collins et al.

    Intersectionality

    (2016)
  • A.J. Cooper

    A voice from the South: By a Black woman of the South

    (1892)
  • K. Crenshaw

    Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist politics

    University of Chicago Legal Forum

    (1989)
  • C. D'Ignazio et al.

    Data feminism

    (2020)
  • V. Eubanks

    Automating inequality: How high-tech tools profile, police, and punish the poor

    (2018)
  • N. Garg et al.

    Word embeddings quantify 100 years of gender and ethnic stereotypes

    Proceedings of the National Academy of Sciences

    (2018)
  • A. Globerson et al.

    Euclidean embedding of co-occurrence data

    The Journal of Machine Learning Research

    (2007)
  • T. Glymph

    Out of the house of bondage: The transformation of the plantation household

    (2003)
  • A.-M. Hancock

    When multiplication doesn't equal quick addition: Examining intersectionality as a research paradigm

    Perspectives on Politics

    (2007)
  • A. Hochschild

    The second shift

    (1989)
  • T. Jennings

    ‘Us colored women had to go though a plenty’: Sexual exploitation of African-American slave women

    Journal of Women's History

    (1990)
  • C. Jones

    An end to the neglect of the problems of negro women

    Political Affairs

    (1949)
  • S.E. Jones-Rogers

    They were her property: White women as slave owners in the American south

    (2020)
  • Cited by (35)

    • Three families of automated text analysis

      2022, Social Science Research
      Citation Excerpt :

      Researchers also use word embeddings to uncover dimensions of meaning we might not be able to otherwise study quantitatively. Nelson (2021b), for instance, inductively analyzes associations between intersectional identities and different institutions (e.g., the domestic sphere, politics, the economy) in first-person narratives originating from the nineteenth-century American South. She finds that in this cultural context, women (and especially Black women) are more associated with the domestic sphere than are men and that White folks are more associated with the cultural domain than are Black folks.

    • “Do your part: Stay apart”: Collective intentionality and collective (in)action in US governor's COVID-19 press conferences

      2022, Poetics
      Citation Excerpt :

      Then, a word embedding model (Skip Gram Model) analyzed the closest words around the identified terms (e.g., we, Marylanders, people) per state, creating standardized values of closeness. Word embedding models are recognized as being particularly well-suited for text analysis focused on meaning (Nelson, 2021; Stoltz & Taylor, 2021). Thus, the composite measure describes a set of words that revolve around context specific plural words that represent the construction of collective intentionality in the press conferences.

    View all citing articles on Scopus

    Laura K. Nelson is an assistant professor of sociology at Northeastern University where she is core faculty at the NULab for Texts, Maps, and Networks, is affiliated faculty at the Network Science Institute, and is on the executive committee for Women's, Gender, and Sexuality Studies. She was previously a postdoctoral fellow at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley at the University of California, Berkeley, and for the Management and Organizations Department at Northwestern University, where she was also affiliated with the Northwestern Institute on Complex Systems (NICO). She uses computational tools, principally automated text analysis, to study social movements, culture, gender, institutions, and organizations. She has published in Sociological Methods and Research, Sociological Methodology, Mobilization, Gender & Society, and Oxford University Press, among other outlets, and has given talks and workshops on computational social science throughout the U.S. and internationally.

    View full text