Is a Picture Worth A Thousand Words? An Experiment Comparing Observer-Based Skin Tone Measures,Race and Social Problems

当前位置： X-MOL 学术 › Race and Social Problems › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Is a Picture Worth A Thousand Words? An Experiment Comparing Observer-Based Skin Tone Measures
Race and Social Problems ( IF 2.8 ) Pub Date : 2020-05-28 , DOI: 10.1007/s12552-020-09294-0
Mary E. Campbell , Verna M. Keith , Vanessa Gonlin , Adrienne R. Carter-Sowell

Several different measures of skin color are popular in social science surveys, yet we have little evidence to suggest which method is the most valid or reliable when we design new studies. In this experiment, we compare three different ways of asking raters to evaluate skin tone, testing whether common methods designed to reduce variation across raters from different social groups are effective. We compare two popular scales: a simple text-based 5-point skin tone scale (which asks raters to classify pictures on a scale from very light to very dark) and a newer 10-point palette-based skin tone scale (which asks raters to choose a number from 1 to 10, with pictures associated with each number). We also ask raters to use a more complex two-axis color grid that we created, in order to test whether addressing common criticisms of the palette-based scales improves rating reliability. Experiment participants rated a randomly selected subset of pictures with a wide range of skin tones. We find that demographic characteristics of the raters such as gender, race, their amount of contact with diverse racial groups, and immigration status affect skin tone ratings that observers assign, no matter what type of measure is used, and the three measures have reliability ratings that are statistically similar. We discuss the implications of the differences between the measures for designing social science surveys and interview studies.

中文翻译：

一幅价值千言万语的图片吗？比较基于观察者的肤色测量的实验

几种不同的肤色测量方法在社会科学调查中很流行，但是我们没有证据表明在设计新研究时哪种方法最有效或最可靠。在本实验中，我们比较了要求评估者评估肤色的三种不同方式，测试旨在减少来自不同社会群体的评估者之间差异的通用方法是否有效。我们比较了两种流行的音阶：一个简单的基于文本的5点肤色音阶（要求评估者以从非常亮到非常暗的等级对图片进行分类）和一个更新的基于10点调色板的肤色音阶（要求评估者从1到10中选择一个数字，每个数字都与图片相关联）。我们还要求评估者使用我们创建的更为复杂的两轴颜色网格，为了测试解决基于调色板的音阶的常见批评是否会提高评级的可靠性。实验参与者对具有广泛肤色的图片的随机选择子集进行了评分。我们发现，评估者的人口统计特征（例如性别，种族，与不同种族群体的接触量以及移民状况）会影响观察者分配的肤色等级，无论使用哪种类型的度量，并且这三种度量均具有信度等级在统计上相似。我们讨论了设计社会科学调查和访谈研究的措施之间差异的含义。我们发现，评估者的人口统计特征（例如性别，种族，与不同种族群体的接触量以及移民状况）会影响观察者分配的肤色等级，无论使用哪种类型的度量，并且这三种度量均具有信度等级在统计上相似。我们讨论了设计社会科学调查和访谈研究的措施之间差异的含义。我们发现，评估者的人口统计特征（例如性别，种族，与不同种族群体的接触量以及移民状况）会影响观察者分配的肤色等级，无论使用哪种类型的度量，并且这三种度量均具有信度等级在统计上相似。我们讨论了设计社会科学调查和访谈研究的措施之间差异的含义。

更新日期：2020-05-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文