Using anticlustering to partition data sets into equivalent parts.,Psychological Methods

当前位置： X-MOL 学术 › Psychological Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using anticlustering to partition data sets into equivalent parts.
Psychological Methods ( IF 7.6 ) Pub Date : 2020-06-22 , DOI: 10.1037/met0000301
Martin Papenberg ₁ , Gunnar W Klau ₂

Affiliation

Numerous applications in psychological research require that a pool of elements is partitioned into multiple parts. While many applications seek groups that are well-separated, that is, dissimilar from each other, others require the different groups to be as similar as possible. Examples include the assignment of students to parallel courses, assembling stimulus sets in experimental psychology, splitting achievement tests into parts of equal difficulty, and dividing a data set for cross-validation. We present anticlust, an easy-to-use and free software package for solving these problems fast and in an automated manner. The package anticlust is an open source extension to the R programming language and implements the methodology of anticlustering. Anticlustering divides elements into similar parts, ensuring similarity between groups by enforcing heterogeneity within groups. Thus, anticlustering is the direct reversal of cluster analysis that aims to maximize homogeneity within groups and dissimilarity between groups. Our package anticlust implements 2 anticlustering criteria, reversing the clustering methods k-means and cluster editing, respectively. In a simulation study, we show that anticlustering returns excellent results and outperforms alternative approaches like random assignment and matching. In 3 example applications, we illustrate how to apply anticlust on real data sets. We demonstrate how to assign experimental stimuli to equivalent sets based on norming data, how to divide a large data set for cross-validation, and how to split a test into parts of equal item difficulty and discrimination. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

中文翻译：

使用反聚类将数据集划分为等效部分。

心理学研究中的许多应用都需要将元素池划分为多个部分。虽然许多应用程序寻求分离良好的组，即彼此不相似，但其他应用程序要求不同的组尽可能相似。例子包括给学生分配平行课程、组装实验心理学中的刺激集、将成绩测试分成同等难度的部分，以及划分数据集进行交叉验证。我们推出了 anticlus，这是一个易于使用且免费的软件包，可以以自动化的方式快速解决这些问题。 anticluster 包是 R 编程语言的开源扩展，并实现了反聚类的方法。反聚类将元素划分为相似的部分，通过强制组内的异质性来确保组之间的相似性。因此，反聚类是聚类分析的直接逆转，旨在最大化组内的同质性和组间的差异性。我们的软件包 anticluster 实现了 2 个反聚类标准，分别反转了聚类方法 k-means 和聚类编辑。在模拟研究中，我们表明反聚类返回了出色的结果，并且优于随机分配和匹配等替代方法。在 3 个示例应用程序中，我们说明了如何在真实数据集上应用反欲望。我们演示了如何根据规范数据将实验刺激分配给等效集，如何划分大型数据集进行交叉验证，以及如何将测试分成具有相同项目难度和区分度的部分。（PsycInfo 数据库记录 (c) 2020 APA，保留所有权利）。

更新日期：2020-06-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文