Discovering web services in social web service repositories using deep variational autoencoders
Introduction
The SOC (Service Oriented Computing) paradigm has become essential for developing Web 2.0 applications. SOC promotes assembling Internet-accessible components, called services, to create new applications. Applications can be developed using existing services as basic software components, potentially decreasing the cost of developing new software due to increased code reuse. Web Services, the most common technological materialization of SOC, are common in the industry because they expose functionality and data that can be seamlessly accessed remotely. Furthermore, as social networks and Web Service-powered computing paradigms such as Cloud Computing became more popular, new applications, which combine Web Services from different sources –or mashups– emerged (Garriga, Mateos, Flores, Cechich, & Zunino, 2016).
Web Service descriptions enable consumers to utilize services without having to know how they are implemented because each description acts as an API documentation. These descriptions not only define service data-types and operations, but also support communication protocols, such as HTTP and SOAP, and data formats, such as JSON and XML. Web Service descriptions are produced by using markup-based Web Service description languages (Chinnici, Moreau, Ryman, & Weerawarana, 2007), which are built upon standard markup languages –mainly XML– and textual content, or semantic description languages (David, Mark, Drew, Sheila, Massimo, Katia, Deborah, Evren, Naveen, 2007, Roman, Keller, Lausen, de Bruijn, Lara, 2005) that exploit ontologies. Additionally, these descriptions can be SOAP-oriented or REST. SOAP-oriented Web Services are described through markup-based description languages such as WSDL or semantic description languages such as OWL or SAWSDL. In contrast, REST Web Services use newer, yet less widespread, descriptions such as WADL and OpenAPI Specification for markup-based descriptions or SA-REST (Gomadam, Ranabahu, & Sheth, Lathem, Gomadam, Sheth, 2007) for semantic descriptions.
Service providers create and publish Web Service descriptions to make their services available, after which they have to be discovered. Moreover, as producing semantic services requires annotating Web Service descriptions (i.e. data-types, operations, messages, and so on) with semantic concepts from ontologies, which in turn has been recognized as a rather difficult task (Corbellini, Godoy, Mateos, Zunino, & Lizarralde, 2017), researchers have concentrated on the so-called syntactics-based approaches for service discovery. In this way, earlier works have addressed discovery of markup-based services for SOAP-oriented (Crasso, Zunino, Campo, 2011, Wu, 2012) and REST services (Lizarralde, Rodriguez, Mateos, Zunino, 2017, Rodriguez, Zunino, Mateos, Segura, Rodriguez, 2015). Despite these advances, application developers still struggle to find relevant services (Maamar, Hacid, & Huhns, 2011), a problem that is nowadays even more prevalent in light of social Web Service repositories such as RapidAPI.com and ProgrammableWeb.com (Corbellini et al., 2017). These repositories encourage service clients and providers not only to reach out and cooperate with the task of single service refinement (e.g. bug fixes, enhancements suggestions), but also to relentlessly publish new value-added composed services (mashups). This makes the registry grow further and hence accurate service discovery becomes more challenging. For example, ProgrammableWeb.com grew from hundreds of services in 2005 to more than 17 000 in 2017. Fig. 1 illustrates this growth.
Most existing syntactics-based Web Service discovery approaches adapt traditional Information Retrieval (IR) techniques to match keyword-based queries against a stored database of markup-based Web Service descriptions (a.k.a. documents), which may contain such keywords (Crasso et al., 2011). When a user’s query contains multiple topic-specific keywords that are (partially) contained in the service descriptions, traditional Web Service registries are likely to return good matches. However, users often employ short natural language sentences, thereby reducing the potential usefulness and number of input keywords. This is problematic when retrieving service descriptions because only words in the query can be exploited for the search, leading to term mismatch. This is instead caused by the vocabulary problem (Furnas, Landauer, Gomez, & Dumais, 1987), which stems from polysemy (same word with different meanings, e.g. ’Java’), synonymy (different words with the same or similar meanings, e.g. ’tv’ and ’television’) and quasi-synonyms (words that are not synonyms per se but can be used as synonyms in particular contexts, e.g. ’diseases’ and ’disorders’).
An early attempt to reduce term mismatch was query expansion (Carpineto & Romano, 2012), which tries to solve this issue by finding features correlated with the query terms. Usually, query expansion is performed by using lexical databases, notably WordNet (Miller, 1995), which is a very common strategy to improve service discoverability (Carpineto, Romano, 2012, Vechtomova, Karamuftuoglu, 2007). However, query expansion does not take into consideration the service description side, which contains most of the service functionality declaration. To deal with this problem, the community started using dimensionality reduction techniques such as Latent Semantic Analysis (LSA) (Kontostathis & Pottenger, 2006) or Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) which aim to reduce the size of the corpus’ vocabulary, mainly by grouping related terms from service descriptions into concepts. These concepts can then help to group related services, thus improving the probability of retrieving relevant services.
LSA and LDA are indeed successful techniques, yet they became outperformed by newer machine learning-based techniques for certain feature learning tasks. In particular, Autoencoders (Salakhutdinov & Hinton, 2009) are a special type of neural network that attempts to copy the input to its output. However, instead of just doing so, a major contribution of autoencoders is that they reduce the input feature set to a smaller number of features. This allows the autoencoder to find hidden features of the data, similarly to LSA or LDA, and also represent more complex relationships due to the non-linear capacity of its internal network (Kingma & Welling, 2013).
After the success that autoencoders had for image preprocessing (Kingma, Mohamed, Rezende, & Welling, 2014), the research community began to apply them in other domains such as document hashing (Salakhutdinov & Hinton, 2009), and more recently text classification (Xu, Sun, Deng, & Tan, 2017) and movie recommendation (Li & She, 2017). Motivated by these facts, in this paper we focus on studying autoencoders to represent text extracted from service descriptions. We specifically investigate how to improve Web Service discoverability by using a generative autoencoder called Variational Autoencoder (Kingma & Welling, 2013) (VAE) to create a latent space that represents the registry content, i.e. set of service descriptions. Our approach transforms each user’s query to the latent space and performs cosine similarity over the latent vector space to find relevant Web Services, for which we propose to modify the autoencoder cost function. The main hypothesis behind our work is that this approach improves discoverability by reducing term mismatch since autoencoders would be able to find complex, latent term relationships between service descriptions, and queries and service descriptions.
Methodologically, we trained the VAE component of our approach using a dataset of 17, 113 service descriptions crawled from ProgrammableWeb.com. We preprocessed each service description to derive its bag of words, and each bag of words was then transformed into a vector using TF-IDF to train the VAE. Then, we used the trained model along with a subset of the main dataset (2772 services) and service queries to assess the performance of the approach. We compared the results of our approach in terms of common IR metrics, namely Precision, Recall, F-Measure and NDCG (Normalized Discounted Cumulative Gain). As baselines, we considered VSM (Vector Space Model) (Salton, Wong, & Yang, 1975), as it is the canonical IR model to retrieve documents, LSA, which is one of the first models that tried to estimate continuous representations of words (Lee, Lee, Hwang, Lee, 2007, Paliwal, Adam, Bornhövd, 2007, Platzer, Dustdar, 2005, Sajjanhar, Hou, Zhang, 2004), and Word Embeddings, which showed promising results in previous work (Lizarralde et al., 2017). The results show an improvement over these techniques of up to 14% in Precision, 12% in Recall, 25% in F-Measure and up to 10% in NDCG.
The rest of the paper is organized as follows. The next Section explains the concepts that underpin the problem and our approach. Section 3 revisits prominent dimensionality reduction techniques used so far to address the same problem. Then, Section 4 outlines our proposed approach, and explains the added value of using VAE. Section 5 presents the above-mentioned experimental evaluation. Section 6 analyses previous works in service discovery, e.g. both syntactics-based and semantics-based approaches. Finally, Section 7 concludes the paper and outlines future research opportunities.
Section snippets
Background
In the context of this paper, discovering Web Services means finding those that fulfill client side application needs. For example, one might want to develop a new application to recommend movies and read or store user opinions on specific movies, while delegating the task of translating text between different languages to an external service such as Google Translate or Microsoft Translator. Therefore, the application developer has to search the service registry for relevant translation
Traditional dimensionality reduction techniques for web service discovery
Given the publishing process explained in the previous section, we now concentrate on the indexing step, which is where our contribution lies. To this end, we step into the main alternatives that have been explored to represent service descriptions as vectors. Particularly, we focus on the use of plain VSM and variants (Section 3.1) and Word embeddings (Section 3.2). Indeed, VSM+LDA, VSM+LSA and Word Embeddings are regarded as competitors of our approach in our experiments (Section 5).
Variational autoencoders applied to web service discovery
Finding the underlying meaning of service descriptions or queries is key to improve discoverability. Accurately doing so without requiring client and service providers to exploit pure semantic-based approaches is in the agenda of various researchers. We will now discuss how we apply autoencoders to index Web Service descriptions, as an alternative to VSM-based approaches and Word Embeddings.
Autoencoders are neural networks that try to model the input as a new set of features in a lower
Validation
The proposed approach aims to improve syntactics-based Web Service discoverability by introducing VAE in the standard Web Service indexing and searching process. The rationale is that VAE enables the modelling of more compact and precise representations of service descriptions. As each description is modelled in a continuous vector space, by calculating the distance between vectors we can obtain the similarity between them. Our main experimental hypothesis, which is assessed in this Section, is
Related work
Web Services are essential building blocks of modern Web 2.0 applications, and nowadays Web Services are present in almost any Web, mobile, and even desktop applications. Along with this, the number of available Web Services is also increasing heavily Corbellini et al. (2017), making Web Service discovery essential to find services that effectively fulfill users’ needs when developing service-oriented client applications. Consequently, many researchers have focused on improving service
Conclusions
Motivated by the past successful application of autoencoders for image processing (Kingma, Mohamed, Rezende, Welling, 2014, Vincent, Larochelle, Bengio, Manzagol, 2008) and more recently text feature extraction (Chen & Zaki, 2017), we have proposed to exploit autoencoders for the task of Web Service retrieval. Autoencoders are neural networks that reduce the input dimensionality and then try to reconstruct the input from the new encoded representation. This allows autoencoders to extract
CRediT authorship contribution statement
Ignacio Lizarralde: Software, Investigation, Data curation, Writing - original draft. Cristian Mateos: Conceptualization, Methodology, Investigation, Writing - review & editing, Funding acquisition, Writing - original draft. Alejandro Zunino: Conceptualization, Supervision, Writing - review & editing, Funding acquisition. Tim A. Majchrzak: Writing - review & editing. Tor-Morten Grønli: Writing - review & editing.
Acknowledgements
We acknowledge funding by CONICET through grant code 11220170100490CO – Convocatoria PIP 2017-2019 GI.
References (57)
- et al.
Learning user interaction models for predicting web search result preferences
29th annual international ACM SIGIR conference on research and development in information retrieval
(2006) - et al.
Using different cost functions to train stacked auto-encoders
12th mexican international conference on artificial intelligence
(2013) - et al.
Web service discovery using combined bi-term topic model and wdag similarity
Information & communication technology and system (ICTS), 2017 11th international conference on
(2017) - et al.
Latent dirichlet allocation
Journal of machine Learning research
(2003) - et al.
Enriching word vectors with subword information
Transactions of the Association for Computational Linguistics
(2017) - et al.
A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering
Service Oriented Computing and Applications
(2018) - et al.
A survey of automatic query expansion in information retrieval
ACM Computing Surveys
(2012) - et al.
Word co-occurrence augmented topic model in short text
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 2, December 2015-Special Issue on Selected Papers from ROCLING XXVII
(2015) - et al.
Wordnet-powered web services discovery using kernel-based similarity matching mechanism
Service oriented system engineering (SOSE), 2010 fifth IEEE international symposium on
(2010) - et al.
Kate: K-competitive autoencoder for text
Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining
(2017)
Web services description language (WSDL) version 2.0 part 1: Core language
W3C recommendation
Mining social web service repositories for social relationships to aid service discovery
Proceedings of the 2017 IEEE/ACM 4th international conference on mobile software engineering and systems
Combining query-by-example and query expansion for simplifying web service discovery
Information Systems Frontiers
Latent semantic indexing for web service retrieval
Computational collective intelligence. technologies and applications
Bringing semantics to web services with OWL-s
World Wide Web
A domain independent readability metric for web service descriptions
Computer Standards & Interfaces
godiscovery: Web service discovery made efficient
IEEE international conference on web services
The vocabulary problem in human-system communication
Communications of the ACM
Restful service composition at a glance: A survey
Journal of Network and Computer Applications
A survey of text similarity approaches
International Journal of Computer Applications
Semi-supervised learning with deep generative models
Advances in neural information processing systems
A framework for understanding latent semantic indexing (lsi) performance
Information Processing & Management
Web service clustering using a hybrid term-similarity measure with ontology learning
International Journal of Web Services Research (IJWSR)
Cluster-based web service recommendation
Services computing (SCC), 2016 IEEE international conference on
From word embeddings to document distances
International conference on machine learning
Cited by (22)
A systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation
2022, Computer Science ReviewCitation Excerpt :After the preprocessing steps, it is required to represent features in vector space to perform clustering. Generally feature representation techniques for web services can be categorized into three parts which are as follows [44–46]: Weighted word representation: In these types of methods, features are represented on the basis of their occurrences.
An intermediary utility-based service search and structure organization approach in service-oriented MAS
2022, Knowledge-Based SystemsCitation Excerpt :Distributed networks in real systems provide numerous web services (e.g., peer-to-peer networks) [4–8]. In these distributed systems, each node hosts a set of web services that can be accessed by other nodes via standard invocation protocols of web services [9,10]. The service-oriented systems are similar to multiagent systems where the nodes and agents are both autonomous and interconnected through networks [6].
A deep recommendation model of cross-grained sentiments of user reviews and ratings
2022, Information Processing and ManagementCitation Excerpt :Research on recommendation systems with deep learning has brought new breakthroughs. Deep learning techniques used in recommendation systems include autoencoder (AE) (Lizarralde et al., 2020), convolutional neural networks (CNNs), recurrent neural networks (RNNs) (Hammou et al., 2020), and the restricted Boltzmann machine (RBM) (Liu et al., 2014). Kim et al. (2016) proposed convolutional matrix factorization (ConvMF), which used CNNs to produce deeper latent expressions from an item's description, taking into account local word order from the text to produce more accurate latent factors.
A Web service clustering method based on topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model and service collaboration graph
2022, Information SciencesCitation Excerpt :Das investigated a method of using the Gaussian LDA model to process text word embedding [17]. On this basis, Lizzaralde added service description to the Gaussian LDA model to obtain service description representation, and finally ranked services by the correlation between the user query and service description representation [18]. A Sen-LDA that learns topics of words, sentences and descriptions in service description was presented by Shi.
Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation
2020, Expert Systems with ApplicationsCitation Excerpt :Due to enhanced vector space representation, the performance of web service clustering is also improved. In the future, the proposed method can be enhanced by using word embedding techniques and other methods to find the semantic relations among the features (Lizarralde, Mateos, Zunino, Majchrzak, & Grønli, 2020). The proposed method can be exploited in clustering the web pages, twitter text, etc.
An Enhancing Recovery Links between Two Artifacts Using Variational Autoencoder
2024, International Journal of Intelligent Engineering and Systems