ei.Datasets: Real Data Sets for Assessing Ecological Inference Algorithms,Social Science Computer Review

当前位置： X-MOL 学术 › Soc. Sci. Comput. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ei.Datasets: Real Data Sets for Assessing Ecological Inference Algorithms
Social Science Computer Review ( IF 3.0 ) Pub Date : 2021-09-06 , DOI: 10.1177/08944393211040808
Jose M. Pavía ₁

Affiliation

Ecological inference models aim to infer individual-level relationships using aggregate data. They are routinely used to estimate voter transitions between elections, disclose split-ticket voting behaviors, or infer racial voting patterns in U.S. elections. A large number of procedures have been proposed in the literature to solve these problems; therefore, an assessment and comparison of them are overdue. The secret ballot however makes this a difficult endeavor since real individual data are usually not accessible. The most recent work on ecological inference has assessed methods using a very small number of data sets with ground truth, combined with artificial, simulated data. This article dramatically increases the number of real instances by presenting a unique database (available in the R package ei.Datasets) composed of data from more than 550 elections where the true inner-cell values of the global cross-classification tables are known. The article describes how the data sets are organized, details the data curation and data wrangling processes performed, and analyses the main features characterizing the different data sets.

中文翻译：

ei.Datasets：用于评估生态推理算法的真实数据集

生态推理模型旨在使用聚合数据来推断个体层面的关系。它们通常用于估计选举之间的选民转换、披露分票投票行为或推断美国选举中的种族投票模式。文献中提出了大量程序来解决这些问题；因此，对它们的评估和比较已经过期。然而，无记名投票使这成为一项困难的工作，因为通常无法访问真实的个人数据。最近关于生态推断的工作已经使用非常少量的数据集与真实数据结合人工模拟数据来评估方法。本文通过展示一个独特的数据库（在 R 包 ei. 数据集）由来自 550 多个选举的数据组成，其中全局交叉分类表的真实内部单元格值是已知的。本文描述了数据集的组织方式，详细介绍了执行的数据管理和数据整理过程，并分析了表征不同数据集的主要特征。

更新日期：2021-09-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11