Using synthetic data to improve the reproducibility of statistical results in psychological research.,Psychological Methods

当前位置： X-MOL 学术 › Psychological Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using synthetic data to improve the reproducibility of statistical results in psychological research.
Psychological Methods ( IF 10.929 ) Pub Date : 2022-08-04 , DOI: 10.1037/met0000526
Simon Grund ₁ , Oliver Lüdtke ₁ , Alexander Robitzsch ₁

Affiliation

In recent years, psychological research has faced a credibility crisis, and open data are often regarded as an important step toward a more reproducible psychological science. However, privacy concerns are among the main reasons that prevent data sharing. Synthetic data procedures, which are based on the multiple imputation (MI) approach to missing data, can be used to replace sensitive data with simulated values, which can be analyzed in place of the original data. One crucial requirement of this approach is that the synthesis model is correctly specified. In this article, we investigated the statistical properties of synthetic data with a particular emphasis on the reproducibility of statistical results. To this end, we compared conventional approaches to synthetic data based on MI with a data-augmented approach (DA-MI) that attempts to combine the advantages of masking methods and synthetic data, thus making the procedure more robust to misspecification. In multiple simulation studies, we found that the good properties of the MI approach strongly depend on the correct specification of the synthesis model, whereas the DA-MI approach can provide useful results even under various types of misspecification. This suggests that the DA-MI approach to synthetic data can provide an important tool that can be used to facilitate data sharing and improve reproducibility in psychological research. In a working example, we also demonstrate the implementation of these approaches in widely available software, and we provide recommendations for practice.

中文翻译：

使用合成数据来提高心理学研究中统计结果的可重复性。

近年来，心理学研究面临可信度危机，而开放数据通常被视为迈向更具可重复性的心理学科学的重要一步。然而，隐私问题是阻止数据共享的主要原因之一。基于缺失数据的多重插补 (MI) 方法的合成数据程序可用于用模拟值替换敏感数据，模拟值可代替原始数据进行分析。这种方法的一个关键要求是正确指定综合模型。在本文中，我们研究了合成数据的统计特性，特别强调了统计结果的可重复性。为此，我们将基于 MI 的合成数据的传统方法与数据增强方法 (DA-MI) 进行了比较，DA-MI 试图结合掩蔽方法和合成数据的优点，从而使程序对错误指定更加稳健。在多项模拟研究中，我们发现 MI 方法的良好特性很大程度上取决于合成模型的正确规范，而 DA-MI 方法即使在各种类型的错误规范下也能提供有用的结果。这表明合成数据的 DA-MI 方法可以提供一个重要的工具，可用于促进数据共享和提高心理学研究的可重复性。在一个工作示例中，我们还演示了这些方法在广泛可用的软件中的实现，并提供了实践建议。我们发现 MI 方法的良好特性很大程度上取决于合成模型的正确规范，而 DA-MI 方法即使在各种类型的错误规范下也能提供有用的结果。这表明合成数据的 DA-MI 方法可以提供一个重要的工具，可用于促进数据共享和提高心理学研究的可重复性。在一个工作示例中，我们还演示了这些方法在广泛可用的软件中的实现，并提供了实践建议。我们发现 MI 方法的良好特性很大程度上取决于合成模型的正确规范，而 DA-MI 方法即使在各种类型的错误规范下也能提供有用的结果。这表明合成数据的 DA-MI 方法可以提供一个重要的工具，可用于促进数据共享和提高心理学研究的可重复性。在一个工作示例中，我们还演示了这些方法在广泛可用的软件中的实现，并提供了实践建议。这表明合成数据的 DA-MI 方法可以提供一个重要的工具，可用于促进数据共享和提高心理学研究的可重复性。在一个工作示例中，我们还演示了这些方法在广泛可用的软件中的实现，并提供了实践建议。这表明合成数据的 DA-MI 方法可以提供一个重要的工具，可用于促进数据共享和提高心理学研究的可重复性。在一个工作示例中，我们还演示了这些方法在广泛可用的软件中的实现，并提供了实践建议。

更新日期：2022-08-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>