当前位置:
X-MOL 学术
›
Psychological Methods
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.
Psychological Methods ( IF 10.929 ) Pub Date : 2021-08-01 , DOI: 10.1037/met0000349 Johannes C Eichstaedt 1 , Margaret L Kern 2 , David B Yaden 3 , H A Schwartz 4 , Salvatore Giorgi 5 , Gregory Park 5 , Courtney A Hagan 5 , Victoria A Tobolsky 5 , Laura K Smith 5 , Anneke Buffone 5 , Jonathan Iwry 5 , Martin E P Seligman 5 , Lyle H Ungar 5
Psychological Methods ( IF 10.929 ) Pub Date : 2021-08-01 , DOI: 10.1037/met0000349 Johannes C Eichstaedt 1 , Margaret L Kern 2 , David B Yaden 3 , H A Schwartz 4 , Salvatore Giorgi 5 , Gregory Park 5 , Courtney A Hagan 5 , Victoria A Tobolsky 5 , Laura K Smith 5 , Anneke Buffone 5 , Jonathan Iwry 5 , Martin E P Seligman 5 , Lyle H Ungar 5
Affiliation
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
中文翻译:
文本分析的封闭和开放词汇方法:综述、定量比较和建议。
技术现在可以有效地、大规模地理解人们如何使用语言来揭示他们的日常思想、行为和情感。已经通过社会科学中基于理论的封闭词汇方法以及计算机科学中的数据驱动的开放词汇方法对书面文本进行了分析,但尚未对这些方法进行全面比较。为了提供有关自动分析书面文本的最佳实践的指导,这篇叙述性评论和定量综合比较了五种主要的封闭式和开放式词汇方法:语言查询和字数统计 (LIWC)、一般查询器、DICTION、潜在狄利克雷分配和差异语言分析。我们比较了与性别、年龄、使用 Facebook 状态更新的现有数据集和来自 65,896 名用户的自我报告调查数据来分析五种方法的个性和个性。不同方法的结果相当一致。封闭词汇方法有效地总结了概念,有助于理解人们的思维方式,LIWC2015 产生了最强大、最简洁的结果。开放词汇方法在广泛的内容领域中揭示了更具体和具体的模式,更好地解决了模糊的词义,并且不易被误解,这表明它们非常适合捕捉日常心理过程的细微差别。我们详细介绍了封闭词汇分析中可能出现的几个错误、样本大小的影响、每个用户的单词数以及开放词汇分析中包含的主题数量,以及不同分析决策的影响。我们最后为研究人员提出建议,提倡结合封闭和开放词汇方法的补充方法。(PsycInfo 数据库记录 (c) 2021 APA,保留所有权利)。
更新日期:2021-08-01
中文翻译:
文本分析的封闭和开放词汇方法:综述、定量比较和建议。
技术现在可以有效地、大规模地理解人们如何使用语言来揭示他们的日常思想、行为和情感。已经通过社会科学中基于理论的封闭词汇方法以及计算机科学中的数据驱动的开放词汇方法对书面文本进行了分析,但尚未对这些方法进行全面比较。为了提供有关自动分析书面文本的最佳实践的指导,这篇叙述性评论和定量综合比较了五种主要的封闭式和开放式词汇方法:语言查询和字数统计 (LIWC)、一般查询器、DICTION、潜在狄利克雷分配和差异语言分析。我们比较了与性别、年龄、使用 Facebook 状态更新的现有数据集和来自 65,896 名用户的自我报告调查数据来分析五种方法的个性和个性。不同方法的结果相当一致。封闭词汇方法有效地总结了概念,有助于理解人们的思维方式,LIWC2015 产生了最强大、最简洁的结果。开放词汇方法在广泛的内容领域中揭示了更具体和具体的模式,更好地解决了模糊的词义,并且不易被误解,这表明它们非常适合捕捉日常心理过程的细微差别。我们详细介绍了封闭词汇分析中可能出现的几个错误、样本大小的影响、每个用户的单词数以及开放词汇分析中包含的主题数量,以及不同分析决策的影响。我们最后为研究人员提出建议,提倡结合封闭和开放词汇方法的补充方法。(PsycInfo 数据库记录 (c) 2021 APA,保留所有权利)。