当前位置: X-MOL 学术World Wide Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding a bag of words by conceptual labeling with prior weights
World Wide Web ( IF 2.7 ) Pub Date : 2020-04-14 , DOI: 10.1007/s11280-020-00806-x
Haiyun Jiang , Deqing Yang , Yanghua Xiao , Wei Wang

In many natural language processing tasks, e.g., text classification or information extraction, the weighted bag-of-words model is widely used to represent the semantics of text, where the importance of each word is quantified by its weight. However, it is still difficult for machines to understand a weighted bag of words (WBoW) without explicit explanations, which seriously limits its application in downstream tasks. To make a machine better understand a WBoW, we introduce the task of conceptual labeling, which aims at generating the minimum number of concepts as labels to explicitly represent and explain the semantics of a WBoW. Specifically, we first propose three principles for label generation and then model each principle as an objective function. To satisfy the three principles simultaneously, a multi-objective optimization problem is solved. In our framework, a taxonomy (i.e., Microsoft Concept Graph) is used to provide high-quality candidate concepts, and a corresponding search algorithm is proposed to derive the optimal solution (i.e., a small set of proper concepts as labels). Furthermore, two pruning strategies are also proposed to reduce the search space and improve the performance. Our experiments and results prove that the proposed method is capable of generating proper labels for WBoWs. Besides, we also apply the generated labels to the task of text classification and observe an increase in performance, which further justifies the effectiveness of our conceptual labeling framework.

中文翻译:

通过概念上的称重和先验权重来理解一袋单词

在许多自然语言处理任务中,例如文本分类或信息提取,加权词袋模型被广泛用于表示文本的语义,其中每个词的重要性通过其权重来量化。但是,对于机器来说,如果没有明确的解释,仍然很难理解单词的加权包(WBoW),这严重限制了它在下游任务中的应用。为了使机器更好地理解WBoW,我们引入了概念标记的任务,其目的是生成最少数量的概念作为标签,以明确表示和解释WBoW的语义。具体来说,我们首先提出标签生成的三个原则,然后将每个原则建模为目标函数。为了同时满足这三个原则,解决了多目标优化问题。在我们的框架中,使用分类法(即Microsoft概念图)来提供高质量的候选概念,并提出了相应的搜索算法来推导最佳解决方案(即一小组适当的概念作为标签)。此外,还提出了两种修剪策略,以减少搜索空间并提高性能。我们的实验和结果证明,该方法能够为WBoW生成适当的标签。此外,我们还将生成的标签应用于文本分类任务,并观察到性能的提高,这进一步证明了我们概念标签框架的有效性。提出了一种相应的搜索算法来推导最优解(即一小组适当的概念作为标签)。此外,还提出了两种修剪策略以减少搜索空间并提高性能。我们的实验和结果证明,该方法能够为WBoW生成适当的标签。此外,我们还将生成的标签应用于文本分类任务,并观察到性能的提高,这进一步证明了我们概念标签框架的有效性。提出了一种相应的搜索算法来推导最优解(即一小组适当的概念作为标签)。此外,还提出了两种修剪策略以减少搜索空间并提高性能。我们的实验和结果证明,该方法能够为WBoW生成适当的标签。此外,我们还将生成的标签应用于文本分类任务,并观察到性能的提高,这进一步证明了我们概念标签框架的有效性。我们的实验和结果证明,该方法能够为WBoW生成适当的标签。此外,我们还将生成的标签应用于文本分类任务,并观察到性能的提高,这进一步证明了我们概念标签框架的有效性。我们的实验和结果证明,该方法能够为WBoW生成适当的标签。此外,我们还将生成的标签应用于文本分类任务,并观察到性能的提高,这进一步证明了我们概念标签框架的有效性。
更新日期:2020-04-14
down
wechat
bug