当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset
arXiv - CS - Information Retrieval Pub Date : 2021-05-02 , DOI: arxiv-2105.00574
W. Y. Ayele

Data mining project managers can benefit from using standard data mining process models. The benefits of using standard process models for data mining, such as the de facto and the most popular, Cross-Industry-Standard-Process model for Data Mining (CRISP-DM) are reduced cost and time. Also, standard models facilitate knowledge transfer, reuse of best practices, and minimize knowledge requirements. On the other hand, to unlock the potential of ever-growing textual data such as publications, patents, social media data, and documents of various forms, digital innovation is increasingly needed. Furthermore, the introduction of cutting-edge machine learning tools and techniques enable the elicitation of ideas. The processing of unstructured textual data to generate new and useful ideas is referred to as idea mining. Existing literature about idea mining merely overlooks the utilization of standard data mining process models. Therefore, the purpose of this paper is to propose a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM). The design and development of the CRISP-IM are done following the design science approach. The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles. The adapted CRISP-IM can be used to guide the process of identifying trends using scholarly literature datasets or temporally organized patent or any other textual dataset of any domain to elicit ideas. The ex-post evaluation of the CRISP-IM is left for future study.

中文翻译:

使CRISP-DM适应于思想挖掘:使用文本数据集生成思想的数据挖掘过程

数据挖掘项目经理可以从使用标准数据挖掘过程模型中受益。使用标准过程模型进行数据挖掘的好处(例如,事实最流行的跨行业标准过程数据挖掘模型(CRISP-DM))降低了成本和时间。此外,标准模型有助于知识转移,最佳实践的重用,并最大程度地减少知识需求。另一方面,为了释放不断增长的文本数据(如出版物,专利,社交媒体数据和各种形式的文档)的潜力,越来越需要数字创新。此外,最先进的机器学习工具和技术的引入使思想得以启发。处理非结构化文本数据以生成新的有用思想的过程称为思想挖掘。现有的有关思想挖掘的文献只是忽略了标准数据挖掘过程模型的使用。因此,本文的目的是提出一种可重用的模型,用于为思想挖掘(CRISP-IM)生成思想CRISP-DM。CRISP-IM的设计和开发是遵循设计科学方法进行的。CRISP-IM通过使用动态主题建模(DTM),无监督的机器学习以及随后对学术论文数据集的统计分析,促进了思想的产生。改编的CRISP-IM可用于指导使用学术文献数据集或按时间组织的专利或任何领域的任何其他文本数据集来识别趋势的过程,以得出想法。CRISP-IM的事后评估留待将来研究。
更新日期:2021-05-04
down
wechat
bug