Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-07-04 , DOI: 10.1016/j.eswa.2020.113682
Neha Agarwal , Geeta Sikka , Lalit Kumar Awasthi

Due to the rapid growth of web services in repositories, discovering the requisite web service is becoming increasingly cumbersome task. It has raised the demand for efficient web service clustering algorithms. In service repositories, when related web services are stored in a clustered way, it enhances the web service discovery process by reducing search space and time. Many eminent researchers have worked in this field and used the Term Frequency – Inverse Document Frequency (TF-IDF) method for representing web services in vector space. In general, there are various limitations of the TF-IDF approach i.e. (1) Not efficient for large documents (2) Position of term and its co-occurrences does not matter (3) Unable to analyze how terms are dispersed in different documents. In the web service scenario, services are represented in short text form. TF-IDF does not work well in web service representation because of the reason that it is unable to effectively find the importance of a term concerning its occurrence in other documents. If we compare two service documents i.e. ‘s1’ and ‘s2’ first having a large and second having small number of terms respectively then TF-IDF does not demonstrate the importance of terms in ‘s1’ as smaller to ‘s2’. Therefore, it is not possible to assign effective weights to the terms. In the lack of effective vector space representation, the performance of the clustering algorithm also degrades. In this paper, we propose a new approach i.e. LFW+K which is based on Length Feature Weight (LFW) for the vectorized representation of service followed by K-Means clustering. The proposed approach helps to find the informative term from web service and assigns the term weight accordingly by considering parameters like the dimension of the web service document, maximum frequency of a term in the document and occurrences of a term in other documents. LFW+K is applied on the datasets of real-world web services and the performance is measured using standard measurement criteria (i.e. precision, recall, F1-score, and accuracy). Results of the proposed approach are compared with K-Means clustering on TF-IDF representation method i.e. TF-IDF+K. Results show that the proposed method outperforms the clustering done by using TF-IDF method for vector space representation of web services.

中文翻译：

使用长度特征权重法增强Web服务聚类以用于服务描述文档向量空间表示

由于存储库中Web服务的快速增长，发现必需的Web服务正变得越来越繁琐。它提出了对高效Web服务群集算法的需求。在服务存储库中，当以群集方式存储相关的Web服务时，它通过减少搜索空间和时间来增强Web服务发现过程。许多著名的研究人员在该领域进行了研究，并使用术语频率–反向文档频率（TF-IDF）方法来表示矢量空间中的Web服务。通常，TF-IDF方法存在各种局限性，即（1）对大型文档而言效率不高（2）术语的位置及其共现关系不重要（3）无法分析术语在不同文档中的分散方式。在Web服务场景中，服务以短文本形式表示。TF-IDF在Web服务表示中不能很好地工作，原因是它无法有效地找到有关其在其他文档中出现的术语的重要性。如果我们比较两个服务文档，即“ s1”和“ s2”，它们的第一个具有较大的术语，第二个具有较少的术语，则TF-IDF不会证明“ s1”中术语的重要性要小于“ s2”。因此，不可能为这些术语分配有效权重。在缺乏有效的向量空间表示的情况下，聚类算法的性能也会下降。在本文中，我们提出了一种新的方法，即LFW + K，它基于长度特征权重（LFW）进行服务的矢量化表示，然后进行K-Means聚类。所提出的方法有助于从Web服务中找到信息术语，并通过考虑诸如Web Service文档的尺寸，文档中术语的最大频率以及其他文档中术语的出现之类的参数来相应地分配术语权重。LFW + K应用于实际的Web服务的数据集，并使用标准测量标准（即精度，召回率，F1得分和准确性）来测量性能。将该方法的结果与基于TF-IDF表示方法即TF-IDF + K的K-均值聚类进行了比较。结果表明，所提出的方法优于使用TF-IDF方法进行的Web服务矢量空间表示的聚类。文档中术语的最大频率以及其他文档中术语的出现。LFW + K应用于实际的Web服务的数据集，并使用标准测量标准（即精度，召回率，F1得分和准确性）来测量性能。将该方法的结果与基于TF-IDF表示方法即TF-IDF + K的K-均值聚类进行了比较。结果表明，所提出的方法优于使用TF-IDF方法进行的Web服务矢量空间表示的聚类。文档中术语的最大频率以及其他文档中术语的出现。LFW + K应用于实际的Web服务的数据集，并使用标准测量标准（即精度，召回率，F1得分和准确性）来测量性能。将该方法的结果与基于TF-IDF表示方法即TF-IDF + K的K-均值聚类进行了比较。结果表明，所提出的方法优于使用TF-IDF方法进行的Web服务矢量空间表示的聚类。

更新日期：2020-07-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11