当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Succinct contrast sets via false positive controlling with an application in clinical process redesign
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2020-06-29 , DOI: 10.1016/j.eswa.2020.113670
Dang Nguyen , Wei Luo , Bay Vo , Witold Pedrycz

Many applications of intelligent systems involve understanding a group of contrastively different outcome (e.g., all survivors of a deadly cancer, a top performing team in a large corporation). The intelligent system needs to identify attributes (features) which best describe or explain the group versus its alternatives. In data mining, this problem is studied under the framework of contrast set mining (CSM). Although CSM is not new, the era of big data has produced new computational and statistical challenges. In particular, existing algorithms fail (1) to perform efficiently in terms of runtime on large-scale datasets and (2) to accommodate simultaneous inference on an overwhelming array of features which are often repetitive and collinear. In this paper, we develop a CSM algorithm which addresses both challenges. The computational challenge is addressed with a tree structure and two theorems while the statistical challenge is addressed with the application of false discovery rate for multiple testing. The computational and statistical advantages of the proposed algorithm over three state-of-the-art algorithms are demonstrated with comprehensive experiments. In addition, we also show the effectiveness of our proposed method in an intelligence-system application involving hospital process redesign. The proposed method not only improves the performance of machine learning systems, but also generates succinct and insightful patterns directly relevant to clinical decision-making.



中文翻译:

通过假阳性控制的简洁对比集及其在临床过程重新设计中的应用

智能系统的许多应用涉及理解一组相反的结果(例如,致命癌症的所有幸存者,大公司中表现最好的团队)。智能系统需要识别最能描述或解释小组及其替代方案的属性(特征)。在数据挖掘中,这个问题是在对比集挖掘的框架下研究的(CSM)。尽管CSM并不新鲜,但是大数据时代已经带来了新的计算和统计挑战。尤其是,现有算法无法(1)在大型数据集上的运行时方面高效执行,并且(2)无法适应经常重复且共线的绝大多数特征的同时推断。在本文中,我们开发了可解决这两个挑战的CSM算法。计算挑战通过树结构和两个定理解决,而统计挑战通过使用错误发现率进行多重测试解决。通过全面的实验,证明了该算法在三种最先进算法上的计算和统计优势。此外,我们还展示了我们提出的方法在涉及医院流程重新设计的智能系统应用中的有效性。所提出的方法不仅提高了机器学习系统的性能,而且产生了与临床决策直接相关的简洁而有见地的模式。

更新日期:2020-06-29
down
wechat
bug