Enhancing data quality to mine credible patterns,Journal of Information Science

当前位置： X-MOL 学术 › J. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing data quality to mine credible patterns
Journal of Information Science ( IF 2.4 ) Pub Date : 2021-06-06 , DOI: 10.1177/01655515211013693
Muhammad Imran ₁ , Adnan Ahmad ₂

Affiliation

The importance of big data is widely accepted in various fields. Organisations spend a lot of money to collect, process and mine the data to identify patterns. These patterns facilitate their future decision-making process to improve the organisational performance and profitability. However, among discovered patterns, there are some meaningless and misleading patterns which restrict the effectiveness of decision-making process. The presence of data discrepancies, noise and outliers also impacts the quality of discovered patterns and leads towards missing strategic goals and objectives. Quality inception of these discovered patterns is vital before utilising them in making predictions, decision-making process or strategic planning. Mining useful and credible patterns over social media is a challenging task. Often, people spread targeted content for character assassination or defamation of brands. Recently, some studies have evaluated the credibility of information over social media based on users’ surveys, experts’ judgement and manually annotating Twitter tweets to predict credibility. Unfortunately, due to the large volume and exponential growth of data, these surveys and annotation-based information credibility techniques are not efficiently applicable. This article presents a data quality and credibility evaluation framework to determine the quality of individual data instances. This framework provides a way to discover useful and credible patterns using credibility indicators. Moreover, a new Twitter bot detection algorithm is proposed to classify tweets generated by Twitter bots and real users. The results of conducted experiments showed that the proposed model generates a positive impact on improving classification accuracy and quality of discovered patterns.

中文翻译：

提高数据质量以挖掘可信模式

大数据的重要性已被各个领域广泛接受。组织花费大量资金来收集、处理和挖掘数据以识别模式。这些模式有助于他们未来的决策过程，以提高组织绩效和盈利能力。然而，在发现的模式中，也有一些毫无意义和误导性的模式限制了决策过程的有效性。数据差异、噪音和异常值的存在也会影响发现模式的质量，并导致战略目标和目标的缺失。在将这些发现的模式用于进行预测、决策过程或战略规划之前，这些模式的质量初始至关重要。在社交媒体上挖掘有用和可信的模式是一项具有挑战性的任务。经常，人们传播有针对性的内容以暗杀人物或诽谤品牌。最近，一些研究基于用户的调查、专家的判断以及手动注释推特推文来预测可信度，从而评估社交媒体上信息的可信度。不幸的是，由于数据量大且呈指数级增长，这些调查和基于注释的信息可信度技术无法有效应用。本文提出了一个数据质量和可信度评估框架，以确定单个数据实例的质量。该框架提供了一种使用可信度指标发现有用和可信模式的方法。此外，提出了一种新的 Twitter 机器人检测算法来对 Twitter 机器人和真实用户生成的推文进行分类。

更新日期：2021-06-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>