当前位置: X-MOL 学术Bus. Inf. Syst. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Intelligent User Assistance for Automated Data Mining Method Selection
Business & Information Systems Engineering ( IF 7.9 ) Pub Date : 2020-03-18 , DOI: 10.1007/s12599-020-00642-3
Patrick Zschech , Richard Horn , Daniel Höschele , Christian Janiesch , Kai Heinrich

In any data science and analytics project, the task of mapping a domain-specific problem to an adequate set of data mining methods by experts of the field is a crucial step. However, these experts are not always available and data mining novices may be required to perform the task. While there are several research efforts for automated method selection as a means of support, only a few approaches consider the particularities of problems expressed in the natural and domain-specific language of the novice. The study proposes the design of an intelligent assistance system that takes problem descriptions articulated in natural language as an input and offers advice regarding the most suitable class of data mining methods. Following a design science research approach, the paper (i) outlines the problem setting with an exemplary scenario from industrial practice, (ii) derives design requirements, (iii) develops design principles and proposes design features, (iv) develops and implements the IT artifact using several methods such as embeddings, keyword extractions, topic models, and text classifiers, (v) demonstrates and evaluates the implemented prototype based on different classification pipelines, and (vi) discusses the results’ practical and theoretical contributions. The best performing classification pipelines show high accuracies when applied to validation data and are capable of creating a suitable mapping that exceeds the performance of joint novice assessments and simpler means of text mining. The research provides a promising foundation for further enhancements, either as a stand-alone intelligent assistance system or as an add-on to already existing data science and analytics platforms.

中文翻译:

自动数据挖掘方法选择的智能用户辅助

在任何数据科学和分析项目中,由该领域的专家将特定领域的问题映射到一组适当的数据挖掘方法的任务是至关重要的一步。然而,这些专家并不总是可用的,并且可能需要数据挖掘新手来执行任务。虽然有一些研究工作将自动方法选择作为一种支持手段,但只有少数方法考虑了用新手的自然和领域特定语言表达的问题的特殊性。该研究提出了一种智能辅助系统的设计,该系统将用自然语言表达的问题描述作为输入,并就最合适的数据挖掘方法类别提供建议。遵循设计科学研究方法,论文 (i) 用来自工业实践的示例场景概述了问题设置,(ii) 得出设计要求,(iii) 制定设计原则并提出设计特征,(iv) 使用多种方法(例如嵌入)开发和实施 IT 工件、关键字提取、主题模型和文本分类器,(v) 演示和评估基于不同分类管道的实现原型,以及 (vi) 讨论结果的实践和理论贡献。性能最好的分类管道在应用于验证数据时表现出很高的准确性,并且能够创建一个合适的映射,其性能超过联合新手评估和更简单的文本挖掘方法的性能。该研究为进一步增强提供了有希望的基础,
更新日期:2020-03-18
down
wechat
bug