Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-01-04 , DOI: 10.1145/3424670
Chainarong Amornbunchornvej ₁ , Navaporn Surasvadi ₁ , Anon Plangprasopchok ₁ , Suttipong Thajchayapong ₁

Affiliation

One shirt size cannot fit everybody, while we cannot make a unique shirt that fits perfectly for everyone because of resource limitations. This analogy is true for policy making as well. Policy makers cannot make a single policy to solve all problems for all regions because each region has its own unique issue. At the other extreme, policy makers also cannot make a policy for each small village due to resource limitations. Would it be better if we can find a set of largest regions such that the population of each region within this set has common issues and we can make a single policy for them? In this work, we propose a framework using regression analysis and Minimum Description Length (MDL) to find a set of largest areas that have common indicators, which can be used to predict household incomes efficiently. Given a set of household features, and a multi-resolution partition that represents administrative divisions, our framework reports a set C * of largest subdivisions that have a common predictive model for population-income prediction. We formalize the problem of finding C * and propose an algorithm that can find C * correctly. We use both simulation datasets as well as a real-world dataset of Thailand’s population household information to demonstrate our framework performance and application. The results show that our framework performance is better than the baseline methods. Moreover, we demonstrate that the results of our method can be used to find indicators of income prediction for many areas in Thailand. By adjusting these indicator values via policies, we expect people in these areas to gain more incomes. Hence, the policy makers will be able to make policies by using these indicators in our results as a guideline to solve low-income issues. Our framework can be used to support policy makers in making policies regarding any other dependent variable beyond income in order to combat poverty and other issues. We provide the R package, MRReg, which is the implementation of our framework in the R language. The MRReg package comes with a documentation for anyone who is interested in analyzing linear regression on multi-resolution population data.

中文翻译：

使用最小描述长度原则识别多分辨率人口数据中的线性模型以预测家庭收入

一种衬衫尺寸无法适合所有人，而由于资源限制，我们无法制作出适合所有人的独特衬衫。这个类比也适用于政策制定。政策制定者无法制定单一政策来解决所有地区的所有问题，因为每个地区都有自己独特的问题。在另一个极端，由于资源限制，政策制定者也无法为每个小村庄制定政策。如果我们能找到一组最大的区域，使得该组内的每个区域的人口都有共同的问题，并且我们可以为它们制定一个单一的政策，那会更好吗？在这项工作中，我们提出了一个使用回归分析和最小描述长度 (MDL) 的框架，以找到一组具有共同指标的最大区域，可用于有效预测家庭收入。给定一组家庭特征，C *具有共同预测人口收入预测模型的最大细分市场。我们将寻找问题形式化C *并提出一个可以找到的算法C *正确。我们使用模拟数据集和泰国人口家庭信息的真实数据集来展示我们的框架性能和应用。结果表明，我们的框架性能优于基线方法。此外，我们证明了我们方法的结果可用于寻找泰国许多地区的收入预测指标。通过政策调整这些指标值，我们预计这些地区的人们将获得更多的收入。因此，政策制定者将能够利用我们结果中的这些指标作为解决低收入问题的指导方针来制定政策。我们的框架可用于支持政策制定者制定有关收入以外的任何其他因变量的政策，以应对贫困和其他问题。我们提供 R 包，MRReg，这是我们框架在 R 语言中的实现。MRReg 软件包为任何有兴趣分析多分辨率人口数据的线性回归的人提供了一个文档。

更新日期：2021-01-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11