Skip to main content
Log in

Is a Picture Worth A Thousand Words? An Experiment Comparing Observer-Based Skin Tone Measures

  • Published:
Race and Social Problems Aims and scope Submit manuscript

Abstract

Several different measures of skin color are popular in social science surveys, yet we have little evidence to suggest which method is the most valid or reliable when we design new studies. In this experiment, we compare three different ways of asking raters to evaluate skin tone, testing whether common methods designed to reduce variation across raters from different social groups are effective. We compare two popular scales: a simple text-based 5-point skin tone scale (which asks raters to classify pictures on a scale from very light to very dark) and a newer 10-point palette-based skin tone scale (which asks raters to choose a number from 1 to 10, with pictures associated with each number). We also ask raters to use a more complex two-axis color grid that we created, in order to test whether addressing common criticisms of the palette-based scales improves rating reliability. Experiment participants rated a randomly selected subset of pictures with a wide range of skin tones. We find that demographic characteristics of the raters such as gender, race, their amount of contact with diverse racial groups, and immigration status affect skin tone ratings that observers assign, no matter what type of measure is used, and the three measures have reliability ratings that are statistically similar. We discuss the implications of the differences between the measures for designing social science surveys and interview studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Latinxs are one of the groups most often perceived as members of a different racial category by observers (Campbell and Troyer 2011; Vargas and Kingsbury 2016; Vargas and Stainback 2016). In order to stay true to the models’ self-concept and explore the full range of appearances associated with Latinx identification, we used self-identification as the sole requirement of Latinx identity.

  2. We did not recruit any graduate students who had taught classes, in order to ensure that undergraduate participants would not be likely to recognize the graduate students from class. We asked after the experiment was completed if the rater recognized anyone from the pictures. No participants reported having recognized anyone from the photographs used.

  3. https://web.archive.org/web/20200417054253/https://www.loreal.com/research-and-innovation/when-the-diversity-of-types-of-beauty-inspires-science/expert-in-skin-and-hair-types-around-the-world, supplemented with shades from https://web.archive.org/web/20200107024915/https://www.lorealparisusa.com/products/makeup/face/foundation-makeup/true-match-super-blendable-makeup.aspx?shade=w0-5-cream-ivory

  4. We tested the results without combining columns 11 and 12; results did not differ.

  5. Seven respondents did not enter their zip code but did enter the name of their high school and city at age 15, so we used the zip code of their high school.

  6. https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html

  7. We asked raters to think about the “person or people who raised you” and select the highest level of education attained by these individuals.

  8. A Hausman specification test showed that a fixed-effects model was not appropriate for these data. Quadrature checks (with the Stata command quadchk) showed that the results were not sensitive to the number of integration points used in the estimation. Although random-effects models require larger samples than a simple OLS model, our sample of 60 level-2 units (photos) is large enough to limit the risk of bias in our estimates and standard errors, especially given the high ICC for our dependent variables, as we can see from studies of similar model types that tested the impact of sample size (Ali et al. 2016, 2019).

  9. https://www.census.gov/quickfacts/fact/table/tx/PST045217

  10. Krippendorff’s alpha has significant advantages for this comparison over other measures of agreement. For example, it generalizes across different levels of measurement, and it discounts the amount of agreement across raters that occurs simply by chance, which is greater in scales that have few options (Hayes and Krippendorff 2007).

  11. We used a one-way random-effects model to measure reliability for a single rater, because photos are not all rated by the same group of raters (Koo and Li 2016).

  12. We assigned the photographs to darkness tertiles using the L* score from the pixel sampled from the forehead of each picture in Photoshop.

References

  • Abascal, M. (2020). Contraction as a response to group threat: Demographic decline and whites’ classification of people who are ambiguously white. American Sociological Review,85(2), 298–322.

    Google Scholar 

  • Ali, A., Ali, S., Khan, S. A., Khan, D. M., Abbas, K., Khalil, A., et al. (2019). Sample size issues in multilevel logistic regression models. PLoS ONE,14(11), e0225427.

    Google Scholar 

  • Ali, S., Ali, A., Khan, S. A., & Hussain, S. (2016). Sufficient sample size and power in multilevel ordinal logistic regression models. Computational and Mathematical Methods in Medicine. https://doi.org/10.1155/2016/7329158.

    Article  Google Scholar 

  • Allen, W., Telles, E., & Hunter, M. (2000). Skin color, income and education: A comparison of African Americans and Mexican Americans. National Journal of Sociology,12(1), 129–180.

    Google Scholar 

  • Bailey, S., Saperstein, A., & Penner, A. (2014). Race, color, and income inequality across the Americas. Demographic Research,31, 735–756.

    Google Scholar 

  • Bond, S., & Cash, T. F. (1992). Black beauty: Skin color and body images among African-American College Women. Journal of Applied Social Psychology,22(11), 874–888.

    Google Scholar 

  • Borrell, L. N., Kiefe, C. I., Diez-Roux, A. V., Williams, D. R., & Gordon-Larsen, P. (2013). Racial discrimination, racial/ethic segregation, and health behaviors in the CARDIA Study. Ethnicity & Health,18(3), 227–243.

    Google Scholar 

  • Campbell, M. E., Bratter, J. L., & Roth, W. D. (2016). Measuring the diverging components of race: An introduction. American Behavioral Scientist,60(4), 381–389.

    Google Scholar 

  • Campbell, M. E., & Troyer, L. (2011). Further data on misclassification: A reply to Cheng and Powell. American Sociological Review,76(2), 356–364.

    Google Scholar 

  • Caruso, E. M., Mead, N. L., & Balcetis, E. (2009). Political partisanship influences perception of biracial candidates’ skin tone. Proceedings of the National Academy of Sciences,106(48), 20168–20173.

    Google Scholar 

  • Chavez-Dueñas, N. Y., Adames, H. Y., & Organista, K. C. (2014). Skin-color prejudice and within-group racial discrimination: Historical and current impact on Latino/a populations. Hispanic Journal of Behavioral Sciences,36(1), 3–26.

    Google Scholar 

  • Dixon, T. L., & Maddox, K. B. (2005). Skin Tone, Crime News, and Social Reality Judgments: Priming the Stereotype of the Dark and Dangerous Black Criminal. Journal of Applied Social Psychology,35(8), 1555–1570.

    Google Scholar 

  • Dressler, W. W. (1991). Social class, skin color, and arterial blood pressure in two societies. Ethnicity & Disease,1(1), 60–77.

    Google Scholar 

  • Dressler, W. W. (1993). Health in the African American Community: Accounting for health inequalities. Medical Anthropology Quarterly,7(4), 325–345.

    Google Scholar 

  • Espino, R., & Franz, M. M. (2002). Latino phenotypic discrimination revisited: The impact of skin color on occupational status. Social Science Quarterly,83(2), 612–623.

    Google Scholar 

  • Feliciano, C. (2016). Shades of race: How phenotype and observer characteristics shape racial classification. American Behavioral Scientist,60(4), 390–419.

    Google Scholar 

  • Frank, R., Akresh, I. R., & Bo, Lu. (2010). Latino immigrants and the U.S. racial order: How and where do they fit in? American Sociological Review,75(3), 378–401.

    Google Scholar 

  • Freeman, J. B., Penner, A. M., Saperstein, A., Scheutz, M., & Ambady, N. (2011). Looking the part: Social status cues shape race perception. PLoS ONE,6(9), e25107.

    Google Scholar 

  • Garcia, D., & Abascal, M. (2015). Colored perceptions: Racially distinctive names and assessments of skin color. American Behavioral Scientist,60(4), 420–441.

    Google Scholar 

  • Goldsmith, A. H., Hamilton, D., & Darity, W. (2006). Shades of discrimination: Skin tone and wages. The American Economic Review,96(2), 242–245.

    Google Scholar 

  • Goldsmith, A. H., Hamilton, D., & Darity, W. (2007). From dark to light: Skin color and wages among African-Americans. Journal of Human Resources,42(4), 701–738.

    Google Scholar 

  • Gravlee, C. C., Dressler, W. W., & Russell Bernard, H. (2005). Skin color, social classification, and blood pressure in Southeastern Puerto Rico. American Journal of Public Health,95(12), 2191–2197.

    Google Scholar 

  • Hagiwara, N., Kashy, D. A., & Cesario, J. (2012). The independent effects of skin tone and facial features on Whites’ affective reactions to blacks. Journal of Experimental Social Psychology,48(4), 892–898.

    Google Scholar 

  • Hamilton, D., Goldsmith, A. H., & Darity, W. (2009). Shedding ‘light’ on marriage: The influence of skin shade on marriage for Black females. Journal of Economic Behavior and Organization,72(1), 30–50.

    Google Scholar 

  • Hannon, L. (2014). Hispanic respondent intelligence level and skin tone. Hispanic Journal of Behavioral Sciences,36(3), 265–283.

    Google Scholar 

  • Hannon, L., & DeFina, R. (2014). Just Skin Deep? The impact of interviewer race on the assessment of African American Respondent Skin Tone. Race and Social Problems,6(4), 356–364.

    Google Scholar 

  • Hannon, L., & DeFina, R. (2016). Reliability concerns in measuring respondent skin tone by interviewer observation. Public Opinion Quarterly,80(2), 534–541.

    Google Scholar 

  • Harris, K. M., Halpern, C. T., Whitsel, E., Hussey, J., Tabor, J., Entzel, P., & Udry, J. R. (2009). The National Longitudinal Study of Adolescent to Adult Health: Research Design [WWW Document]. Retrieved March 17, 2020, from https://www.cpc.unc.edu/projects/addhealth/design.

  • Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures,1(1), 77–89.

    Google Scholar 

  • Hebl, M. R., Williams, M. J., Sundermann, J. M., Kell, H. J., & Davies, P. G. (2012). Selectively friending: Racial stereotypicality and social rejection. Journal of Experimental Social Psychology,48(6), 1329–1335.

    Google Scholar 

  • Herman, M. R. (2010). Do you see what I am? How observers’ backgrounds affect their perceptions of multiracial faces. Social Psychology Quarterly,73(1), 58–78.

    Google Scholar 

  • Hersch, J. (2006). Skin-tone effects among African Americans: Perceptions and reality. American Economic Review,96(2), 251–255.

    Google Scholar 

  • Hersch, J. (2008). Profiling the new immigrant worker: The effects of skin color and height. Journal of Labor Economics,26(2), 345–386.

    Google Scholar 

  • Hersch, J. (2011). The persistence of skin color discrimination for immigrants. Social Science Research,40(5), 1337–1349.

    Google Scholar 

  • Hill, M. E. (2002a). Race of the interviewer and perception of skin color: Evidence from the multi-city study of urban inequality. American Sociological Review,67(1), 99–108.

    Google Scholar 

  • Hill, M. E. (2002b). Skin color and the perception of attractiveness among African Americans: Does gender make a difference? Social Psychology Quarterly,65(1), 77–91.

    Google Scholar 

  • Hughes, M., Jill Kiecolt, K., Keith, V. M., & Demo, D. H. (2015). Racial identity and well-being among African Americans. Social Psychology Quarterly,78(1), 25–48.

    Google Scholar 

  • Hunter, M. L. (2002). ‘If You’re Light You’re Alright’ Light Skin Color as Social Capital for Women of Color. Gender & Society,16(2), 175–193.

    Google Scholar 

  • Hunter, M. L. (2005). Race, gender, and the politics of skin tone. New York: Routledge.

    Google Scholar 

  • Hunter, M. L. (2007). The persistent problem of colorism: Skin tone, status, and inequality. Sociology Compass,1(1), 237–254.

    Google Scholar 

  • Jackson, J. S., Torres, M., Caldwell, C. H., Neighbors, H. W., Nesse, R. M., Taylor, R. J., et al. (2004). The National Survey of American Life: A study of racial, ethnic and cultural influences on mental disorders and mental health. International Journal of Methods in Psychiatric Research,13(4), 196–207.

    Google Scholar 

  • Keith, V. M., & Campbell. M. E. (2015). Texas diversity survey.

  • Keith, V. M., & Herring, C. (1991). Skin tone and stratification in the black community. American Journal of Sociology,97(3), 760–778.

    Google Scholar 

  • Keith, V. M., & Thompson, M. S. (2003). Color matters: The importance of skin tone for African American Women’s Self-Concept in Black and White America. In D. R. Brown & V. M. Keith (Eds.), In and out of our right minds: The mental health of African American Women (pp. 116–135). New York: Columbia University Press.

    Google Scholar 

  • Klonoff, E. A., & Landrine, H. (2000). Is skin color a marker for racial discrimination? Explaining the skin color-hypertension relationship. Journal of Behavioral Medicine,23(4), 329–338.

    Google Scholar 

  • Koo, T. K., & Mae, Y. L. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine,15(2), 155–163.

    Google Scholar 

  • Krieger, N., Sidney, S., & Coakley, E. (1998). Racial discrimination and skin color in the CARDIA Study: Implications for Public Health Research. Coronary Artery Risk Development in Young Adults. American Journal of Public Health,88(9), 1308–1313.

    Google Scholar 

  • López, N., Vargas, E., Juarez, M., Cacari-Stone, L., & Bettez, S. (2018). What’s your ‘Street Race’? Leveraging multidimensional measures of race and intersectionality for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity,4(1), 49–66.

    Google Scholar 

  • Maddox, K. B. (2004). Perspectives on racial phenotypicality bias. Personality and Social Psychology Review,8(4), 383–401.

    Google Scholar 

  • Martin, L. L., Horton, H. D., Herring, C., Keith, V., & Thomas, M. (2017). Color Struck: How Race and Complexion Matter in the “Color-Blind” Era. Boston, MA: Sense.

    Google Scholar 

  • Massey, D. S., & Martin, J. A. (2003). The NIS Skin Color Scale. Retrieved August 3, 2015, from https://nis.princeton.edu/downloads/NIS-Skin-Color-Scale.pdf.

  • Monk, E. P. (2014). Skin tone stratification among Black Americans, 2001–2003. Social Forces,92(4), 1313–1337.

    Google Scholar 

  • Monk, E. P. (2015). The cost of color: Skin color, discrimination, and health among African-Americans. American Journal of Sociology,121(2), 396–444.

    Google Scholar 

  • Penner, A. M., & Saperstein, A. (2013). Engendering Racial Perceptions: An intersectional analysis of how social status shapes race. Gender & Society,27(3), 319–344.

    Google Scholar 

  • Rondilla, J. L., & Spickard, P. (2007). Is lighter better? Skin-tone discrimination among Asian Americans. Toronto: Rowman & Littlefied Publishers Inc.

    Google Scholar 

  • Ross, L. E. (1997). Mate selection preferences among African American college students. Journal of Black Studies,27(4), 554–569.

    Google Scholar 

  • Roth, W. D. (2016). The multiple dimensions of race. Ethnic and Racial Studies,39(8), 1310–1338.

    Google Scholar 

  • Ryabov, I. (2016a). Colorism and educational outcomes of Asian Americans: Evidence from the National Longitudinal Study of Adolescent Health. Social Psychology of Education,19(2), 303–324.

    Google Scholar 

  • Ryabov, I. (2016b). Educational outcomes of Asian and Hispanic Americans: The significance of skin color. Research in Social Stratification and Mobility,44, 1–9.

    Google Scholar 

  • Saperstein, A. (2012). Capturing complexity in the United States: Which aspects of race matter and when? Ethnic and Racial Studies,35(8), 1484–1502.

    Google Scholar 

  • Saperstein, A., Kizer, J. M., & Penner, A. M. (2015). Making the most of multiple measures: Disentangling the effects of different dimensions of race in survey research. American Behavioral Scientist,60(4), 519–537.

    Google Scholar 

  • Saperstein, A., & Penner, A. M. (2016). Still Searching for a True Race? Reply to Kramer et al. and Alba et al. American Journal of Sociology,122(1), 263–285.

    Google Scholar 

  • Stepanova, E. V., & Strube, M. J. (2012). The role of skin color and facial physiognomy in racial categorization: Moderation by implicit racial attitudes. Journal of Experimental Social Psychology,48(4), 867–878.

    Google Scholar 

  • Stewart, Q. T., Cobb, R. Y., & Keith, V. M. (2018). The color of death: Race, observed skin tone, and all-cause mortality in the United States. Ethnicity & Health. https://doi.org/10.1080/13557858.2018.1469735.

    Article  Google Scholar 

  • Telles, E., Flores, R. D., & Urrea-Giraldo, F. (2015). Pigmentocracies: Educational inequality, skin color and census ethnoracial identification in eight Latin American Countries. Research in Social Stratification and Mobility,40, 39–58.

    Google Scholar 

  • Uzogara, E. E., Lee, H., Abdou, C. M., & Jackson, J. S. (2014). A comparison of skin tone discrimination among African American Men: 1995 and 2003. Psychology of Men & Masculinity,15(2), 201–212.

    Google Scholar 

  • Vargas, N. (2015). Latina/o Whitening? Which Latina/Os Self-Classify as White and Report Being Perceived as White by Other Americans? Du Bois Review: Social Science Research on Race,12(01), 119–136.

    Google Scholar 

  • Vargas, N., & Kingsbury, J. (2016). Racial identity contestation: Mapping and measuring racial boundaries. Sociology Compass,10(8), 718–729.

    Google Scholar 

  • Vargas, N., & Stainback, K. (2016). Documenting contested racial identities among self-identified Latina/Os, Asians, Blacks, and Whites. American Behavioral Scientist,60(4), 442–464.

    Google Scholar 

  • Villarreal, A. (2010). Stratification by skin color in contemporary Mexico. American Sociological Review,75(5), 652–678.

    Google Scholar 

  • Weatherall, I. L., & Coombs, B. D. (1992). Skin color measurements in terms of CIELAB color space values. Journal of Investigative Dermatology,99(4), 468–473.

    Google Scholar 

  • Weaver, V. M. (2012). The electoral consequences of skin color: The ‘hidden’ side of race in politics. Political Behavior,34, 159–192.

    Google Scholar 

  • Young, D. M., Sanchez, D. T., & Wilton, L. S. (2017). Biracial perception in Black and White: How Black and White perceivers respond to phenotype and racial identity cues. Cultural Diversity and Ethnic Minority Psychology,23(1), 154–164.

    Google Scholar 

  • Zopf, B. J. (2018). A different kind of Brown: Arabs and Middle Easterners as Anti-American Muslims. Sociology of Race and Ethnicity,4(2), 178–191.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the College of Liberal Arts at Texas A&M University for funding the data collection, Lance Hannon for the loan of the spectrometer, Phia Salter for her help with the project development, Aline Piacun for photo manipulation, Mary K. Campbell for her help with photo skin tone measures, and Katie Constantin, Emily Knox, Gabe Miller, David Orta, and Jesus Smith for their help with data collection or project discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mary E. Campbell.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Skin tone measures

Text-Based 5-Point Scale

Probably the most commonly used scale in the social sciences is the text-based scale, which asks interviewers to rate the respondent’s skin tone on a scale from very light to very dark. We asked participants to rate the photographs on this scale:

The subject’s skin color is:

  • Very Light

  • Light

  • Medium

  • Dark

  • Very Dark

Graphic-Based 10-Point Scale

Another widely used scale is the 10-point Massey and Martin (2003) scale, which asks interviewers to classify the skin tone of every respondent using this graphic (which was to be memorized and never to be shown to the people they were classifying) as a guide:

figure a

Graphic-Based Grid (69-Point) Scale

This scale asked the respondents to classify each person based on a graphic of skin tones. This measure was adapted from L’Oréal’s grid of skin tones, supplemented with more shades in order to broaden the range of darker skin tones available (see footnote 3). The lettered rows indicate the individual’s undertone, which varies from red (A) to yellow (F). The numbered columns vary in skin tone darkness, with the lightest tones on the left (1) and the darkest (12) on the right. Respondents were instructed to choose the cell they believe best matches the skin color of the individuals photographed (e.g., “E5”).

figure b

Appendix B: Photograph Selection

We began with photographs from 16 different Latinx models with a range of skin tones. For each photograph, we asked the artist to produce a set of lighter and darker pictures. Then a panel of five researchers (Mary Campbell, Verna Keith, Vanessa Gonlin, Emily Knox, and David Orta) examined each of the pictures in turn, and if more than one of the five researchers felt the picture was not convincing (that is, if they felt the picture looked like it had been digitally altered), then the picture was discarded. The researchers viewed each picture on its own (because often the digital alteration is very clear if the pictures of a single model are viewed all together, as they are displayed below, but not obvious when viewed in isolation).

One model had only two pictures that were chosen (the original, and one that was lightened). Two models had five pictures selected (the original, as well as some that were lightened and some that were darkened). Most models had three or four pictures that we selected. The final number of total photos was 60.

Each respondent only saw each model one time—meaning each respondent saw 16 pictures. The picture they saw was randomly selected from all of the photos of that model. So, for example, there were three pictures of model A, so each respondent was randomly assigned to see one of those photographs, and then one of the five pictures of model B, and so on.

Below is one example of a set of model photographs (included with permission from the model):

figure c

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Campbell, M.E., Keith, V.M., Gonlin, V. et al. Is a Picture Worth A Thousand Words? An Experiment Comparing Observer-Based Skin Tone Measures. Race Soc Probl 12, 266–278 (2020). https://doi.org/10.1007/s12552-020-09294-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12552-020-09294-0

Keywords

Navigation