Discovering web services in social web service repositories using deep variational autoencoders

https://doi.org/10.1016/j.ipm.2020.102231Get rights and content

Highlights

  • We explore the use of Variational Autoencoders for syntactic Web Service discovery.

  • We evaluate our approach using a 17113-service dataset, the largest among the research community.

  • Our approach outperforms service engines based on traditional dimensionality reduction techniques (LSA, LDA).

  • Our approach outperforms service engines based on Word Embeddings.

  • Average query processing times and VAE training times confirm that our approach is viable in practice.

Abstract

Web Service registries have progressively evolved to social networks-like software repositories. Users cooperate to produce an ever-growing, rich source of Web APIs upon which new value-added Web applications can be built. Such users often interact in order to follow, comment on, consume and compose services published by other users. In this context, Web Service discovery is a core functionality of modern registries as needed Web Services must be discovered before being consumed or composed. Many efforts to provide effective keyword-based service discovery mechanisms are based on Information Retrieval techniques as services are described using structured or unstructured textdocuments that specify the provided functionality. However, traditional techniques suffer from term-mismatch, which means that only the terms that are contained in both user queries and descriptions are exploited to perform service retrieval. Early feature learning techniques such as LSA or LDA tried to solve this problem by finding hidden or latent features in text documents. Recently, alternative feature learning based techniques such as Word Embeddings achieved state of the art results for Web Service discovery. In this paper, we propose to learn features from service descriptions by using Variational Autoencoders, a special kind of autoencoder which restricts the encoded representation to model latent variables. Autoencoders in turn are deep neural networks used for unsupervised learning of efficient codings. We train our autoencoder using a real 17 113-service dataset extracted from the ProgrammableWeb.com API social repository. We measure discovery efficacy by using both Recall and Precision metrics, achieving significant gains compared to both Word Embeddings and classic latent features modelling techniques. Also, performance-oriented experiments show that the proposed approach can be readily exploited in practice.

Introduction

The SOC (Service Oriented Computing) paradigm has become essential for developing Web 2.0 applications. SOC promotes assembling Internet-accessible components, called services, to create new applications. Applications can be developed using existing services as basic software components, potentially decreasing the cost of developing new software due to increased code reuse. Web Services, the most common technological materialization of SOC, are common in the industry because they expose functionality and data that can be seamlessly accessed remotely. Furthermore, as social networks and Web Service-powered computing paradigms such as Cloud Computing became more popular, new applications, which combine Web Services from different sources –or mashups– emerged (Garriga, Mateos, Flores, Cechich, & Zunino, 2016).

Web Service descriptions enable consumers to utilize services without having to know how they are implemented because each description acts as an API documentation. These descriptions not only define service data-types and operations, but also support communication protocols, such as HTTP and SOAP, and data formats, such as JSON and XML. Web Service descriptions are produced by using markup-based Web Service description languages (Chinnici, Moreau, Ryman, & Weerawarana, 2007), which are built upon standard markup languages –mainly XML– and textual content, or semantic description languages (David, Mark, Drew, Sheila, Massimo, Katia, Deborah, Evren, Naveen, 2007, Roman, Keller, Lausen, de Bruijn, Lara, 2005) that exploit ontologies. Additionally, these descriptions can be SOAP-oriented or REST. SOAP-oriented Web Services are described through markup-based description languages such as WSDL or semantic description languages such as OWL or SAWSDL. In contrast, REST Web Services use newer, yet less widespread, descriptions such as WADL and OpenAPI Specification for markup-based descriptions or SA-REST (Gomadam, Ranabahu, & Sheth, Lathem, Gomadam, Sheth, 2007) for semantic descriptions.

Service providers create and publish Web Service descriptions to make their services available, after which they have to be discovered. Moreover, as producing semantic services requires annotating Web Service descriptions (i.e. data-types, operations, messages, and so on) with semantic concepts from ontologies, which in turn has been recognized as a rather difficult task (Corbellini, Godoy, Mateos, Zunino, & Lizarralde, 2017), researchers have concentrated on the so-called syntactics-based approaches for service discovery. In this way, earlier works have addressed discovery of markup-based services for SOAP-oriented (Crasso, Zunino, Campo, 2011, Wu, 2012) and REST services (Lizarralde, Rodriguez, Mateos, Zunino, 2017, Rodriguez, Zunino, Mateos, Segura, Rodriguez, 2015). Despite these advances, application developers still struggle to find relevant services (Maamar, Hacid, & Huhns, 2011), a problem that is nowadays even more prevalent in light of social Web Service repositories such as RapidAPI.com and ProgrammableWeb.com (Corbellini et al., 2017). These repositories encourage service clients and providers not only to reach out and cooperate with the task of single service refinement (e.g. bug fixes, enhancements suggestions), but also to relentlessly publish new value-added composed services (mashups). This makes the registry grow further and hence accurate service discovery becomes more challenging. For example, ProgrammableWeb.com grew from hundreds of services in 2005 to more than 17 000 in 2017. Fig. 1 illustrates this growth.

Most existing syntactics-based Web Service discovery approaches adapt traditional Information Retrieval (IR) techniques to match keyword-based queries against a stored database of markup-based Web Service descriptions (a.k.a. documents), which may contain such keywords (Crasso et al., 2011). When a user’s query contains multiple topic-specific keywords that are (partially) contained in the service descriptions, traditional Web Service registries are likely to return good matches. However, users often employ short natural language sentences, thereby reducing the potential usefulness and number of input keywords. This is problematic when retrieving service descriptions because only words in the query can be exploited for the search, leading to term mismatch. This is instead caused by the vocabulary problem (Furnas, Landauer, Gomez, & Dumais, 1987), which stems from polysemy (same word with different meanings, e.g. ’Java’), synonymy (different words with the same or similar meanings, e.g. ’tv’ and ’television’) and quasi-synonyms (words that are not synonyms per se but can be used as synonyms in particular contexts, e.g. ’diseases’ and ’disorders’).

An early attempt to reduce term mismatch was query expansion (Carpineto & Romano, 2012), which tries to solve this issue by finding features correlated with the query terms. Usually, query expansion is performed by using lexical databases, notably WordNet (Miller, 1995), which is a very common strategy to improve service discoverability (Carpineto, Romano, 2012, Vechtomova, Karamuftuoglu, 2007). However, query expansion does not take into consideration the service description side, which contains most of the service functionality declaration. To deal with this problem, the community started using dimensionality reduction techniques such as Latent Semantic Analysis (LSA) (Kontostathis & Pottenger, 2006) or Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) which aim to reduce the size of the corpus’ vocabulary, mainly by grouping related terms from service descriptions into concepts. These concepts can then help to group related services, thus improving the probability of retrieving relevant services.

LSA and LDA are indeed successful techniques, yet they became outperformed by newer machine learning-based techniques for certain feature learning tasks. In particular, Autoencoders (Salakhutdinov & Hinton, 2009) are a special type of neural network that attempts to copy the input to its output. However, instead of just doing so, a major contribution of autoencoders is that they reduce the input feature set to a smaller number of features. This allows the autoencoder to find hidden features of the data, similarly to LSA or LDA, and also represent more complex relationships due to the non-linear capacity of its internal network (Kingma & Welling, 2013).

After the success that autoencoders had for image preprocessing (Kingma, Mohamed, Rezende, & Welling, 2014), the research community began to apply them in other domains such as document hashing (Salakhutdinov & Hinton, 2009), and more recently text classification (Xu, Sun, Deng, & Tan, 2017) and movie recommendation (Li & She, 2017). Motivated by these facts, in this paper we focus on studying autoencoders to represent text extracted from service descriptions. We specifically investigate how to improve Web Service discoverability by using a generative autoencoder called Variational Autoencoder (Kingma & Welling, 2013) (VAE) to create a latent space that represents the registry content, i.e. set of service descriptions. Our approach transforms each user’s query to the latent space and performs cosine similarity over the latent vector space to find relevant Web Services, for which we propose to modify the autoencoder cost function. The main hypothesis behind our work is that this approach improves discoverability by reducing term mismatch since autoencoders would be able to find complex, latent term relationships between service descriptions, and queries and service descriptions.

Methodologically, we trained the VAE component of our approach using a dataset of 17, 113 service descriptions crawled from ProgrammableWeb.com. We preprocessed each service description to derive its bag of words, and each bag of words was then transformed into a vector using TF-IDF to train the VAE. Then, we used the trained model along with a subset of the main dataset (2772 services) and service queries to assess the performance of the approach. We compared the results of our approach in terms of common IR metrics, namely Precision, Recall, F-Measure and NDCG (Normalized Discounted Cumulative Gain). As baselines, we considered VSM (Vector Space Model) (Salton, Wong, & Yang, 1975), as it is the canonical IR model to retrieve documents, LSA, which is one of the first models that tried to estimate continuous representations of words (Lee, Lee, Hwang, Lee, 2007, Paliwal, Adam, Bornhövd, 2007, Platzer, Dustdar, 2005, Sajjanhar, Hou, Zhang, 2004), and Word Embeddings, which showed promising results in previous work (Lizarralde et al., 2017). The results show an improvement over these techniques of up to 14% in Precision, 12% in Recall, 25% in F-Measure and up to 10% in NDCG.

The rest of the paper is organized as follows. The next Section explains the concepts that underpin the problem and our approach. Section 3 revisits prominent dimensionality reduction techniques used so far to address the same problem. Then, Section 4 outlines our proposed approach, and explains the added value of using VAE. Section 5 presents the above-mentioned experimental evaluation. Section 6 analyses previous works in service discovery, e.g. both syntactics-based and semantics-based approaches. Finally, Section 7 concludes the paper and outlines future research opportunities.

Section snippets

Background

In the context of this paper, discovering Web Services means finding those that fulfill client side application needs. For example, one might want to develop a new application to recommend movies and read or store user opinions on specific movies, while delegating the task of translating text between different languages to an external service such as Google Translate or Microsoft Translator. Therefore, the application developer has to search the service registry for relevant translation

Traditional dimensionality reduction techniques for web service discovery

Given the publishing process explained in the previous section, we now concentrate on the indexing step, which is where our contribution lies. To this end, we step into the main alternatives that have been explored to represent service descriptions as vectors. Particularly, we focus on the use of plain VSM and variants (Section 3.1) and Word embeddings (Section 3.2). Indeed, VSM+LDA, VSM+LSA and Word Embeddings are regarded as competitors of our approach in our experiments (Section 5).

Variational autoencoders applied to web service discovery

Finding the underlying meaning of service descriptions or queries is key to improve discoverability. Accurately doing so without requiring client and service providers to exploit pure semantic-based approaches is in the agenda of various researchers. We will now discuss how we apply autoencoders to index Web Service descriptions, as an alternative to VSM-based approaches and Word Embeddings.

Autoencoders are neural networks that try to model the input as a new set of features in a lower

Validation

The proposed approach aims to improve syntactics-based Web Service discoverability by introducing VAE in the standard Web Service indexing and searching process. The rationale is that VAE enables the modelling of more compact and precise representations of service descriptions. As each description is modelled in a continuous vector space, by calculating the distance between vectors we can obtain the similarity between them. Our main experimental hypothesis, which is assessed in this Section, is

Related work

Web Services are essential building blocks of modern Web 2.0 applications, and nowadays Web Services are present in almost any Web, mobile, and even desktop applications. Along with this, the number of available Web Services is also increasing heavily Corbellini et al. (2017), making Web Service discovery essential to find services that effectively fulfill users’ needs when developing service-oriented client applications. Consequently, many researchers have focused on improving service

Conclusions

Motivated by the past successful application of autoencoders for image processing (Kingma, Mohamed, Rezende, Welling, 2014, Vincent, Larochelle, Bengio, Manzagol, 2008) and more recently text feature extraction (Chen & Zaki, 2017), we have proposed to exploit autoencoders for the task of Web Service retrieval. Autoencoders are neural networks that reduce the input dimensionality and then try to reconstruct the input from the new encoded representation. This allows autoencoders to extract

CRediT authorship contribution statement

Ignacio Lizarralde: Software, Investigation, Data curation, Writing - original draft. Cristian Mateos: Conceptualization, Methodology, Investigation, Writing - review & editing, Funding acquisition, Writing - original draft. Alejandro Zunino: Conceptualization, Supervision, Writing - review & editing, Funding acquisition. Tim A. Majchrzak: Writing - review & editing. Tor-Morten Grønli: Writing - review & editing.

Acknowledgements

We acknowledge funding by CONICET through grant code 11220170100490CO – Convocatoria PIP 2017-2019 GI.

References (57)

  • E. Agichtein et al.

    Learning user interaction models for predicting web search result preferences

    29th annual international ACM SIGIR conference on research and development in information retrieval

    (2006)
  • T. Amaral et al.

    Using different cost functions to train stacked auto-encoders

    12th mexican international conference on artificial intelligence

    (2013)
  • A.R. Baskara et al.

    Web service discovery using combined bi-term topic model and wdag similarity

    Information & communication technology and system (ICTS), 2017 11th international conference on

    (2017)
  • D.M. Blei et al.

    Latent dirichlet allocation

    Journal of machine Learning research

    (2003)
  • P. Bojanowski et al.

    Enriching word vectors with subword information

    Transactions of the Association for Computational Linguistics

    (2017)
  • A. Bukhari et al.

    A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering

    Service Oriented Computing and Applications

    (2018)
  • C. Carpineto et al.

    A survey of automatic query expansion in information retrieval

    ACM Computing Surveys

    (2012)
  • G.-B. Chen et al.

    Word co-occurrence augmented topic model in short text

    International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 2, December 2015-Special Issue on Selected Papers from ROCLING XXVII

    (2015)
  • L. Chen et al.

    Wordnet-powered web services discovery using kernel-based similarity matching mechanism

    Service oriented system engineering (SOSE), 2010 fifth IEEE international symposium on

    (2010)
  • Y. Chen et al.

    Kate: K-competitive autoencoder for text

    Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining

    (2017)
  • R. Chinnici et al.

    Web services description language (WSDL) version 2.0 part 1: Core language

    W3C recommendation

    (2007)
  • A. Corbellini et al.

    Mining social web service repositories for social relationships to aid service discovery

    Proceedings of the 2017 IEEE/ACM 4th international conference on mobile software engineering and systems

    (2017)
  • M. Crasso et al.

    Combining query-by-example and query expansion for simplifying web service discovery

    Information Systems Frontiers

    (2011)
  • A. Czyszczoń et al.

    Latent semantic indexing for web service retrieval

    Computational collective intelligence. technologies and applications

    (2014)
  • M. David et al.

    Bringing semantics to web services with OWL-s

    World Wide Web

    (2007)
  • A. De Renzis et al.

    A domain independent readability metric for web service descriptions

    Computer Standards & Interfaces

    (2017)
  • Y. Elshater et al.

    godiscovery: Web service discovery made efficient

    IEEE international conference on web services

    (2015)
  • G.W. Furnas et al.

    The vocabulary problem in human-system communication

    Communications of the ACM

    (1987)
  • M. Garriga et al.

    Restful service composition at a glance: A survey

    Journal of Network and Computer Applications

    (2016)
  • W.H. Gomaa et al.

    A survey of text similarity approaches

    International Journal of Computer Applications

    (2013)
  • Gomadam, K., Ranabahu, A., & Sheth, A. (2010). Sa-rest: Semantic annotation of web resources (w3c member...
  • D.P. Kingma et al.

    Semi-supervised learning with deep generative models

    Advances in neural information processing systems

    (2014)
  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. ArXiv preprint...
  • Kolouri, S., Martin, C. E., & Rohde, G. K. (2018). Sliced-wasserstein autoencoder: An embarrassingly simple generative...
  • A. Kontostathis et al.

    A framework for understanding latent semantic indexing (lsi) performance

    Information Processing & Management

    (2006)
  • B.T. Kumara et al.

    Web service clustering using a hybrid term-similarity measure with ontology learning

    International Journal of Web Services Research (IJWSR)

    (2014)
  • B.T. Kumara et al.

    Cluster-based web service recommendation

    Services computing (SCC), 2016 IEEE international conference on

    (2016)
  • M. Kusner et al.

    From word embeddings to document distances

    International conference on machine learning

    (2015)
  • Cited by (22)

    • A systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation

      2022, Computer Science Review
      Citation Excerpt :

      After the preprocessing steps, it is required to represent features in vector space to perform clustering. Generally feature representation techniques for web services can be categorized into three parts which are as follows [44–46]: Weighted word representation: In these types of methods, features are represented on the basis of their occurrences.

    • An intermediary utility-based service search and structure organization approach in service-oriented MAS

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Distributed networks in real systems provide numerous web services (e.g., peer-to-peer networks) [4–8]. In these distributed systems, each node hosts a set of web services that can be accessed by other nodes via standard invocation protocols of web services [9,10]. The service-oriented systems are similar to multiagent systems where the nodes and agents are both autonomous and interconnected through networks [6].

    • A deep recommendation model of cross-grained sentiments of user reviews and ratings

      2022, Information Processing and Management
      Citation Excerpt :

      Research on recommendation systems with deep learning has brought new breakthroughs. Deep learning techniques used in recommendation systems include autoencoder (AE) (Lizarralde et al., 2020), convolutional neural networks (CNNs), recurrent neural networks (RNNs) (Hammou et al., 2020), and the restricted Boltzmann machine (RBM) (Liu et al., 2014). Kim et al. (2016) proposed convolutional matrix factorization (ConvMF), which used CNNs to produce deeper latent expressions from an item's description, taking into account local word order from the text to produce more accurate latent factors.

    • A Web service clustering method based on topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model and service collaboration graph

      2022, Information Sciences
      Citation Excerpt :

      Das investigated a method of using the Gaussian LDA model to process text word embedding [17]. On this basis, Lizzaralde added service description to the Gaussian LDA model to obtain service description representation, and finally ranked services by the correlation between the user query and service description representation [18]. A Sen-LDA that learns topics of words, sentences and descriptions in service description was presented by Shi.

    • Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation

      2020, Expert Systems with Applications
      Citation Excerpt :

      Due to enhanced vector space representation, the performance of web service clustering is also improved. In the future, the proposed method can be enhanced by using word embedding techniques and other methods to find the semantic relations among the features (Lizarralde, Mateos, Zunino, Majchrzak, & Grønli, 2020). The proposed method can be exploited in clustering the web pages, twitter text, etc.

    • An Enhancing Recovery Links between Two Artifacts Using Variational Autoencoder

      2024, International Journal of Intelligent Engineering and Systems
    View all citing articles on Scopus
    View full text