当前位置: X-MOL 学术Digital Journalism › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Noise Pollution: A Multi-Step Approach to Assessing the Consequences of (Not) Validating Search Terms on Automated Content Analyses
Digital Journalism ( IF 6.847 ) Pub Date : 2022-09-23 , DOI: 10.1080/21670811.2022.2114920
Daniela Mahl 1 , Gerret von Nordheim 2 , Lars Guenther 3, 4
Affiliation  

Abstract

Advances in analytical methodologies and an avalanche of digitized data have opened new avenues for (digital) journalism research—and with it, new challenges. One of these challenges concerns the sampling and evaluation of data using (non-validated) search terms in combination with automated content analyses. This challenge has largely been neglected by research, which is surprising, considering that noise slipping in during the process of data collection can generate great methodological concerns. To address this gap, we first offer a systematic interdisciplinary literature review, revealing that the validation of search terms is far from acknowledged as a required standard procedure, both in and beyond journalism research. Second, we assess the consequences of validating search terms, using a multi-step approach and investigating common research topics from the field of (digital) journalism research. Our findings show that careless application of non-validated search terms has its pitfalls: while scattershot search terms can make sense in initial data exploration, final inferences based on insufficiently validated search terms are at higher risk of being obscured by noise. Consequently, we provide a step-by-step recommendation for developing and validating search terms.



中文翻译:

噪音污染:一种多步骤方法来评估(不)验证搜索词对自动内容分析的影响

摘要

分析方法的进步和数字化数据的大量涌现为(数字)新闻研究开辟了新途径,同时也带来了新的挑战。这些挑战之一涉及使用(未经验证的)搜索词结合自动内容分析对数据进行抽样和评估。这一挑战在很大程度上被研究所忽视,考虑到在数据收集过程中滑入的噪音会产生很大的方法论问题,这令人惊讶。为了解决这一差距,我们首先提供了系统的跨学科文献综述,揭示了搜索词的验证远未被公认为必需的标准程序,无论是在新闻研究中还是在新闻研究之外。其次,我们评估验证搜索词的后果,使用多步骤方法并调查(数字)新闻研究领域的共同研究主题。我们的研究结果表明,粗心地使用未经验证的搜索词有其缺陷:虽然散乱的搜索词在初始数据探索中可能有意义,但基于未充分验证的搜索词的最终推论被噪音掩盖的风险更高。因此,我们提供了开发和验证搜索词的分步建议。

更新日期:2022-09-23
down
wechat
bug