Feature Selection on Lyme Disease Patient Survey Data,arXiv - CS - Computers and Society

当前位置： X-MOL 学术 › arXiv.cs.CY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature Selection on Lyme Disease Patient Survey Data
arXiv - CS - Computers and Society Pub Date : 2020-08-24 , DOI: arxiv-2009.09087
Joshua Vendrow, Jamie Haddock, Deanna Needell, and Lorraine Johnson

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants' answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and $k$-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the "key" features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.

中文翻译：

莱姆病患者调查数据的特征选择

莱姆病是一种迅速发展的疾病，医学界对其知之甚少。关于患者何时以及为何对治疗有反应或生病、哪些治疗有效，甚至如何正确诊断疾病等关键问题在很大程度上仍未得到解答。我们通过将机器学习技术应用于由非营利组织 LymeDisease.org 开发的大规模莱姆病患者登记处 MyLymeData 来调查这些问题。我们应用各种机器学习方法来衡量个体特征在预测参与者对全球变化评级 (GROC) 调查问题答案的影响，这些问题评估他们的状况在以下情况下改善、恶化或保持不变的自我报告程度抗生素治疗。我们使用基本的线性回归，支持向量机，神经网络、基于熵的决策树模型和 $k$-最近邻方法。我们首先分析模型的一般性能，然后确定预测参与者对 GROC 答案的最重要特征。在我们确定“关键”特征之后，我们将它们从数据集中分离出来，并证明这些特征在识别 GROC 方面的有效性。在这样做的过程中，我们强调了未来在数学和临床上研究的可能方向。我们将它们与数据集分开，并证明了这些特征在识别 GROC 方面的有效性。在这样做的过程中，我们强调了未来在数学和临床上研究的可能方向。我们将它们与数据集分开，并证明了这些特征在识别 GROC 方面的有效性。在这样做的过程中，我们强调了未来在数学和临床上研究的可能方向。

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>