Introduction

Social media have become an essential tool in everyone’s life. People are used to sharing their ideas, information, and plans on different social media forums. They express their feelings, suggestions, attitudes, likeness, and dislikes in online social communities. As a result, bulk of data is extracted for different user queries comprised of textual documents, images, videos, and sound [1]. Data mining techniques are used to find out the user patterns from a large amount of data. Intention mining is one of the data mining techniques used to determine the user's implicit or explicit intention from a given set of data. Intent can be defined as an activity, wish, or aim which a user wants to do in the future [2, 3]. Intention mining is a major research area in the field of data mining implemented on a web search environment to expose the future goals of users [3]. Figure 1 shows the process of detecting intention from a given dataset. A human mostly uses natural language (written or oral) medium to express his intentions. As an example, few expressions are categorized in different intentions as described in Table 1. When a user puts any query in the search engine to retrieve related content, it depicts his search query intention [4]. A comment or query expressing the need or wish to buy a product is called purchase intention [5, 6]. Behavioral intentions depict user behavior, e.g., such as hiding and unfriend contacts from Facebook or using a digital smartwatch and free trial based software services on the internet [7]. Continuance intentions is a special type of intention that describes user willingness to continue his e-service business. For example, to predict that a user willing to continue to comment the posts on Facebook [8, 9]. The implicit human intention is expressed indirectly rather than explicit intention. For example, if a user talked about his mobile phone in such a manner, “my phone is disturbing me badly, and the new mobile model of this company is amazing”. It implicitly defined that the user wants to purchase the latest mobile [10].

Fig. 1
figure 1

Process of intention mining

Table 1 Sample intention phrases

Datasets are origin to extract intended user goals. Most of the intentions are derived from real-time datasets collected from multiple sources. The most commonly used datasets for search intentions are weblogs, tweets, Facebook comments, and blogs [11, 12]. To detect the query intention, vital dataset sources are search engine query log, click the graph, and model-based experiments. Mobile activities logs are used to find out the user’s behavioral intentions related to mobile usage [9].

In this study, a comprehensive analysis is performed to discuss the frameworks and methods used to mine the intentions. Most of the included studies use algorithms based on machine learning techniques such as supervised, unsupervised, rule-based frameworks, and neural networks [13,14,15,16]. Statistical methods played a vital role in intention detection [16,17,18,19]. According to the papers included in this study, supervised learning is one of the most widely used approaches for intention mining.

This systematic literature review aims to gather the knowledge about intention mining in one place to facilitate future researchers. It will also facilitate the vendors and manufacturers to insight user intentions related to their products. This study's main motivation is to detect the user intentions regarding usage of social media and mobile, either they want to continually use social services (Facebook, Twitter, search engines, online shopping forums). This study's inspiration is to bring all the intention mining techniques and approaches at one place to find out what type of algorithms and frameworks are used to extract user’s intentions. Dataset is one of the most important components in the context of intention mining. A worthy dataset is a key to accurate and promising results, and one of the stimulations of this study is to identify the datasets from different forums (social media, questionnaire surveys, and mobile logs) used to detect user’s intentions.

This systematic literature review (SLR) focuses on presenting a comprehensive knowledge of intention mining related to intention categories, datasets used, applied approaches, and techniques. This study followed the methodology of [8] for an unbiased collection of articles to conduct effective research. Multiple search strings according to the search syntax of digital libraries were formalized to extract relevant studies. The 109 research papers were selected out of 4362 based on screening criteria. The retrieved papers were evaluated qualitatively and empirically from multiple aspects to present comprehensive knowledge. This study's primary focus is to present a systematic literature review of existing studies of intention mining and contribute to the knowledge with eight proposed intention categories, six datasets and also presented a taxonomy of techniques and approaches used to detect user intentions. The novelty of this study is that there is no systematic literature review on this domain to the best of our knowledge. This study distinguished to related studies in such a context that other SLRs focused on detecting users' goals using process mining. The studies [2,3,4,5] classified the intention into categories, but there is no significant contribution to present techniques and approaches used to infer intentions as well as there is no comprehensive literature available on the classification of datasets used to extract intentions of the user.

This manuscript's primary focus is to discover and define the intention mining categories, taxonomy of approaches and highlight research challenges and gaps in intension mining domain.

  • We have proposed eight intention categories such as purchase intention, behavioral intention, search intention, continuous intention, human implicit intention, query intention, mobile usage, and general intentions. These classification types can help the vendors and manufacturers of products enhance their products according to user requirements. Furthermore, it would be beneficial for search engines and mobile phone companies to facilitate users better.

  • We have proposed a taxonomy of state-of-the-art approaches and techniques to mine the user's intention. The researchers can benefit from this taxonomy to select the techniques and approaches according to their dataset and context of use.

  • This study also proposes datasets classification into six categories, such as search engine logs, social media data, model-based generated data, questionnaire survey method, and generic datasets. It can be beneficial for training models and helps to choose a suitable type of dataset for research.

  • Lastly, research challenges and gaps have been identified to help future researchers.

The rest of the paper is structured as follows: “Related work” is about related work. “Methodology” presents the methodology of the paper and the selection process of relevant studies. “Results” approaches detailed answers to research questions. “Discussion” describes the discussion and future challenges. In the end, “Threats to validity” presents threats to validity, and lastly, the article concluded by summarizing the literature.

Related work

According to Oxford dictionary, the word intention is defined as “A thing intended, an aim or plan”. Several frameworks and methods have been proposed and developed to detect user intention from multiple types of datasets [6]. Formal research efforts for the domain of intention mining have few and far between, and most evidence gathered to explore the intention mining from the perspective of categorization, datasets, and techniques.

Epure et al. [1] revealed that foundation of each process is intentional and processes should be modeled from intentional point of view. The authors claimed that in earlier researches event logs have been neglected which is the basis of intention mining research. Khodabandelou et al. [2] described intention mining as an emerging research area of data mining. Intentional process models are the key models of intention mining used to process reasoning behind user activities. Hidden Markov model was used to extract user intention from traces of activities stored in event logs.

Bags et al. [3] covered the literature survey of purchase intention of durable things on surface level. The e-commerce forum datasets were used to identify intentions to facilitate retailers. Social network mining and sentiment analysis was used to predict user intention and brand perception scores. A suitable regression model was identified to predict product attributes. One of their key finding from the mobile dataset is that users are most interested in camera attributes, sensors, and image stabilization. Huang et al. [4] classified intention into four categories using convolution neural network (CNN). Dataset was created manually consisted of 5408 sentences of Github generated reports. The proposed approach was also used to improve an automated software engineering task used to rectify misclassified reports. Mainly, authors automated the classification of professional developer intentions.

Papadimitriou et al. [5] reviewed the approaches and techniques used to infer the user task's intention. The goal aware systems were discussed thoroughly instead of focusing on any specific area. Di Sorbo et al. [6] investigated the concept of intention mining from the perspective of software developers. Intention related to developer’s email contents was classified into six categories. The proposed approach required a lot of manual effort to detect intention. Moreover, the authors have not presented any taxonomy of datasets to help the researchers and developers. Whereas we classified user intentions by generalizing the domain, it worked on social media data, online survey data, mobile usage patterns, and human implicit action to detect intentions. Ghasemi et al. [7] presented goal-oriented mining from the perspective of process mining. A rigorous research was conducted on 24 articles selected from popular search engines. Research questions revealed that process mining in user goals' association has not a coherent line of research, whereas intention mining has more significant and mature goal oriented models. Experimental results represented that combination of process mining, and intention mining might bring more opportunities to system stakeholders.

Table 2 discussed the comparison of related articles with our study concerning three parameters categorization of intentions, dataset classification and techniques used to infer intentions. Authors in [1] revealed goal-oriented mining with respect to process mining. They determined that intention mining has a coherent line of research to detect user’s goal. Still, intentions were not classified to signify that which type of initiatives user can take in the future, and there is no significant discussion on datasets to infer user intention. The study [2] discussed Intention mining along with its processes comprehensively. Article [3] discussed the purchase intentions of durable things especially mobile phones. Still, it didn’t focus on other types of user’s intentions, such as search intention and query intention, while purchasing a product. Four intention categories were analyzed and discussed in [4] on GitHub dataset however, discussion on the dataset classification along with techniques and approaches used to deduce intention was missing. The article [5] addressed techniques and approaches to deduce user intention from goal aware systems. They did not discuss the intention categories and dataset. In [6] six intention categorized were discussed purely related to developer’s email content, but they ignored intention of the common people concerned to email as well as datasets and techniques were also not analyzed in said study. The study [7] revealed intention mining as foundation of process modeling and determined that event logs are the key dataset to infer the user’s intention.

Table 2 Comparison to other studies

Methodology

The main purpose of a systematic literature review (SLR) is to identify the challenges and gaps which need more investigation and research. SLR covers the quantity, quality, and type of research regarding the addressed topic [8]. This systematic literature review was conducted by following the guidelines of [8] for unbiased data collection and demonstration of obtained results. Figure 2 presents the research methodology of this study. The first step is the core of SLR to define the research objectives and questions then select the digital libraries to extract the relevant literature. The second phase is to design a search string and then exercise the screening phase to filter the retrieved articles. Furthermore, extract and map the articles and report the systematic review.

Fig. 2
figure 2

Systematic mapping process

Research objectives

This study aims to elaborate the intention mining in the context of research problems, solutions, and significance. The objectives to conduct SLR on intention mining are as follow:

O1: To build up a knowledge catalog that will facilitate the other researchers in the context of intention mining.

O2: To classify user intentions in different categories to shape research in intent mining domain.

O3: To characterize the datasets used to detect the implicit or explicit intentions.

O4: To depict existing solutions in the field of Intention Mining (IM), and clarify the similarities and differences between them using a characterization framework.

O5: Develop a taxonomy to highlight the adopted state-of-the-art approaches and methods.

Research questions (RQ)

To obtain the primary aim of SLR, this study defines three research questions as follows:

RQ1: What intention categories have been addressed in the last decade on intention mining?

RQ2: What data mining/machine learning approaches currently exist to support the effectiveness of different intention mining techniques?

RQ3: What types of datasets are used to detect multiple types of human intentions?

Design search string

Intention mining is an emerging research area of data mining. It has substantial effects on social media's world to facilitate users and organizations in multiple aspects [1, 2]. To develop an authentic knowledge base of IM, this study utilized the digital libraries of four major publishers IEEE, ACM, Science direct, and Springerlink.

This study's intended search phrase was "intention mining," but this query was too restricted, and it retrieved only a few results from scientific research databases. Therefore, the synonyms of intention and mining were used to build search strings. Synonyms of intention were not proved very beneficial to access relevant publications because almost all the authors used the term intent or intention to depict the user’s future goals. The secondary keywords chain was designed to complete the phrase with intention. The main secondary word was mining, such alternate words like mine, extract* and discover* were chosen.

The search string was designed with a combination of primary keywords (Kp) and secondary keywords (Ks), as mentioned in Figure 3. Intention or intent is used as primary keywords while mining, mine, discover* and extract* used as secondary keywords. Wild card * is used with discovering and extract to cover other similar terms like discovery, discovered, discovering, extracted, extracting, extracts, etc. Kp was used with any of the KS, i-e ∀ KP ∧ ∀ KS

Fig. 3
figure 3

Search queries used to extract related publications

The guidelines of [8] designed the search string format for each forum. Search strategies were checked and revised until final strings were obtained. Final search strings for four selected forums are illustrated in Table 3. Format of a search string for IEEE Xlopre in the context of primary and secondary keywords is as follows: (PK1 OR PK2) AND (SK1 OR SK2 OR SK3 OR SK3 OR SK4 OR SK5 OR SK6 OR SK7). Search strings with specific keywords are mentioned in Table 3.

Table 3 Search string for academic

Screening phase

The screening step carried out to select such articles using proposed search string which were more relevant to this study's objective. Screening phase filter articles based on inclusion/exclusion criteria, Journal and conference repute and also on quality assessment criteria on article contents. This study strictly focused on Intention Mining, so articles that were not exactly addressed the IM problem were excluded. All repeated studies were also eliminated by screening based on title and abstract.

Screening (inclusion/exclusion criteria)

This SLR examined multiple studies covering quantitative, qualitative research methods, but some publications are not of such quality to add in the review. Therefore, an inclusion/exclusion criterion is defined to select the most relevant papers.

A study was chosen for systematic review if it met the following inclusion criteria.

  • As data mining is a vast field consisting of several mining strategies, the articles mainly focused on intention mining.

  • Literature must be published in a computing journal or conference.

  • Dataset is one of the essential components of intention mining. Articles that focused on social datasets to detect human intention have been included.

  • The language should be English

Article excluded from analysis if it met any of the following exclusion criteria

  • Book chapters, posters, magazines, courses, and early access articles were excluded.

  • Papers address data mining fields other than intention mining.

  • The article presented the models to learn how to detect human intentions to robots.

  • The article presented the general focus on data mining

Screening (journal citation report/ranked conferences)

To ensure the quality of the review, we consider mostly core ranked conferences and Journals included in journal citation report (JCR). The links to check the ranking of conferences are http://www.conferenceranks.com/ and http://portal.core.edu.au/conf-ranks/6784, whereas https://www.scimagojr.com/ used to check the quartile of the journal.

Screening (quality assessment criteria)

Quality assessment criteria (QAC) is one of the important screening phases of the literature review. It is used to make assure the quality of the selected studies [8]. Assessments were performed on multiple parameters of the selected paper. All crucial aspects included the background, methodology, dataset, dataset analysis, implementation techniques, results, and conclusion.

QAC of study [8] was adopted to make sure the significant quality of included studies. It was assessed on the basis of the contents of the section, such as the background section and literature review section, and the methodology is clear to understand, the dataset is valid or not, as the dataset used for IM to make quality descriptive analysis. The implementation of techniques is systematic or partial, and the results are clear or not clearly defined. Finally, the article's conclusion was checked whether it supported the contents of the paper, as mentioned in Table 4. The quality score was assigned against each category to decide on paper inclusion or exclusion. Binary digits (0, 1) were used as quality assessment scores for each criterion quality. As eight assessment parameters were used in the study, therefore the total methodological score was eight. Article quality was considered high, if score ≥ 7, moderate, if 7 < score > 5 and low if score ≤ 4 otherwise excluded from the selected list. The implementation of three screening phases resulted in quality refined and filtered articles. Figure 4 represents the facts and figures of each forum’s extracted scripts—a total of 4362 research articles retrieved as a resultant of search strings given to mentioned online digital libraries. Furthermore, screening of duplicate articles have been performed and removed from the selected list. There was a total of 797 repeated articles retrieved from all repositories.

Table 4 Methodological quality assessment criteria
Fig. 4
figure 4

Flow chart represent screening phase

After completion of phase I screening, a total of 1334 research papers were omitted. Research papers published in conference or journal included in the study; therefore, phase II removed 656 articles not met the JCR or core conference ranking criteria. Phase III screen the quality articles according to the criteria mentioned in Table 3 and removed 1466 articles from the selected paper list. In the end, the remaining 109 articles were used to conduct a systematic literature review.

Results

Most of the selected articles indicated the increasing interest in applying intention mining on the social media platform, search engine logs, robot generated data, mobile usage data, and many other areas to extract users real-time intentions. Researchers developed many efficient frameworks based on supervised learning, unsupervised learning, neural networks, image processing, and statistical methods to determine the user’s intentions. However, the importance of this topic revealed that more research is required in this area. Overall, this study consisted of 109 topic related articles that helped answer three research questions described in the methodology section.

Intention categories

RQ1: what intention categories have been addressed in the last decade on intention mining?

The social media users perform multiple activities in daily routine life to achieve their set goals. Each action revealed an intention of what the User willing to do in the future. According to the current activities and past behaviors, this work characterized the selected intention studies published during last decade into eight categories such as purchase intention, behavioral intention, search intention, continuous intention, implicit human intention, query intention, mobile usage, and general intention. The Table 5 shows eight intention categories along with frequency of selected articles.

Table 5 Frequency of intention mining categories

Purchase intentions

Online shopping has facilitated users to purchase anything with a single click. The first step of e-shopping is to browse the online store, select and purchase the product. Purchase intentions (PI) are used to detect that either user really wants to buy a product or he is just surfing the web pages. PI facilitates the manufacturers and vendors to improve their products according to user requirements [3]. Wah et al. [9] described that the user intends to purchase a car. Logistic regression, decision trees, and neural networks were used to develop a model “Intention of purchase (IOP)”. The proposed model achieved an accuracy rate of 91.79% to predict whether a car user will purchase it or cancel the order after booking a car user. Guo et al. [13] proposed a new search behavior model to detect that either a searcher has the immediate purchase intention of searched product or research intention to that product. Loyola et al. [14] presented an encoder-decoder neural architecture to detect browse and purchase intention from e-commerce data. Guo et al. [15] proposed deep intent neural network to predict user real time purchase intent. The touch interface of handheld devices has been used to capture interactive behaviors to refine purchase intentions. Studies [16,17,18,19,20] discussed purchase intentions in the context of cell phone app interfaces, web interfaces during the product selection phase.

Behavioral intentions

Social media has encapsulated multiple user behaviors in the purchase history or logs. Although capturing and predicting user behavior is time-consuming and hard to log, numerous studies tried to extract behavioral intentions using classification, clustering, and statistical techniques. Li et al. [21] presented a novel interactive framework to facilitate the communication between human and assistive device. It was used to reduce most elderly and disabled people's effort to interact with machine based on gaze movements. Chen et al. [22] introduced an AIR recommender based on attentional recurrent neural network to predict the user's behavioral intention. Sun et al. and Wang et al. [23, 24] build classifiers based on user feedback data retrieved from user clicks sequence and queries logs using neural networks, statistical techniques, and image processing methods to mine behavioral intentions.

Li et al. [25] presented graph intention network-based model to detect behavioral intentions in click through rate (CTR). Real-world data of e-commerce platform has been used to assess the proposed model, and it delivers promising results. Giannopoulos et al. [26] proposed a client-centered intent-aware query framework to shield user data privacy in personalized web search [25]. Hashemi et al. [27] proposed a multiple intent model to infer users' behavioral intention from America Online (AOL) search query log. Peng et al. [28] proposed a structural equation-based model to discover the factors of discontinuance intention towards social network sites (SNS) concerning autonomous and controlled motivations. In [10], Fan et al. addressed the influential factors on decision making in the context of SNS. Statistical techniques based models were proposed to discover behavioral intentions towards adopting e-learning systems, suggestion intention, participation intention, and switch intention towards social media [29,30,31,32,33,34,35]. Peña et al. [36] detected user intentions to hide and unfriend Facebook contacts from user’s log and identified that people prefer to use hide option instead of using the unfriendly option. Zhu et al. [37] detected the behavioral intentions towards a cloud-based virtual learning environment. Statistical technology acceptance model-based framework has been used to infer the intention of free trial-based technology services. Peng et al. represented the User’s behavioral intention regarding switching relationship with one IT service provider to another. Intention to use smart TV and primary school teachers' behavioral intentions towards mobile usage have been investigated [28, 38, 39]. Kim et al. [40] classify user intentions into multiple categories according to their domain. Ren et al. [41] presented a model that worked on real-time commercial search engine log data. Statistical techniques applied to the questionnaire survey dataset are robust to detect multiple types of human behavioral intentions, such as learning management systems (LMS) [42,43,44,45].

Implicit intentions

Implicit intentions are the type of goals that users are not directly mentioned but hidden in explicit activities. Liu et al. [46] revealed the user's implicit intention to adopt a pension insured program. The fuzzy comprehensive evaluation method has been used to combine with the analytical hierarchy process (AHP) to assess the insured wishes index system [11, 12]. Luong et al. [47] designed a Bayesian network-based context-specific implicit intention recognition model to mine the user's implicit intention. Chen et al. [48] used a semi-supervised user’s question asking framework to detect the user's implicit intention from the community question answering (CQA) Yahoo! Answer dataset. Zhuang et al. [49] proposed an easy life app to perceive implicit intent without any explicit user input on the phone.

Search intentions

Search intentions are the result returned by the query given to the search engines [50]. Children are active users of digital devices, but due to limited vocabulary, they cannot retrieve desired results from the internet. Dragovic et al. [51] presented a search intent module to meet the children's social media requirements. Murata et al. [52] worked to detect search intentions from the Japanese Commercial search engine log. To predict dynamic query and generic search intents, search for a product on the e-commerce portal, and explore search intentions on touch enable digital devices. Qian et al. [53,54,55] presented machine learning-based efficient and effective models. Aljouma et al. [56] proposed a verb ontology and Domain ontology model representing semantic concepts related to verbs and business related vocabulary related to all business domains. These types of services facilitated to connect low-level service description language and web service to achieve business-related goals.

The [57,58,59,60,61] developed a systematic system named SciNet applied on top of scientific databases of above 60 million articles to annotate interactive user modeling. A comprehensive search behavioral model has been proposed to classify search intentions from google log and search intent annotation to work on image dataset. In [62,63,64,65,66] used statistical methods to find the user satisfaction level along with search intention from dynamic behaviors. Search intentions have been classified into three categories: target finding, decision making, and exploration to discover different user interaction patterns with perceived user satisfaction.

Continuance intention

Continuance intentions represent user behavior willing to continue or discontinue a service. This type of intention is detected from real-time user data available on social media to indicate either user is ready to avail of the specific service regularly or in the future wants to discontinue service. Basak et al. [66] used structural Equational Modeling approach to detect the continuance intention of Facebook users. It was concluded that attitude and satisfaction became the reason for continuous Facebook usage. Hong et al. [67] described the continuance intentions towards using Facebook on unethical groups called dangerous virtual communities. Statistical results revealed that online and general social anxiety was negatively correlated the continuance intentions. Lu et al. [68] integrated the Technology Acceptance Model (TAM), Task Fit Technology (TTF) model, Massive Open Online Course (MOOCs) features, and social motivation to investigate the continuance intentions to use MOOCs.

Wu et al. [69] worked to identify the continuance use of current mobile services. Dataset was collected from 512 customers of the Kuwait communication market. Abbas et al. [70] aimed to determine potential reasons as well as examined the moderated role of information overload and social overload. Authors have determined the user's continuous intention in context to use SoLoMo services, government e-learning services and predicted the student’s intention to use mobile cloud storage services [71,72,73]. Li et al. [74] used fuzzy set qualitative comparative analysis (fsQCA) technique to predict the continuance intention towards social media use. Results revealed enjoyment as the most significant factor of continuance use of social media. De Oliveira et al. [75] used statistical techniques to detect the continuance intention to use Facebook. Ifinedo et al. [20] used three theoretical frameworks: social-cognitive theory, technology acceptance model, and motivation theory to detect students' continuance intentions to use blogs. Cao et al. [76] depicted user discontinuance intentions about the usage of social media services. Boakye et al. [77] used mobile location-based services (LBS) characteristics to investigate continuance intentions towards LBS. The compatibility and perceived interactivity were detected as two influential LBS parameters to download and use location-based services on mobile. Swar et al. [78] used information processing theory and theory of planned behavior-based model to extract the factors that caused continuance intention towards online health services usage. Hong et al. [72], Kang et al. [79], Wu et al. [80] and Hur et al. [81] discussed continues intentions regarding mobile usage, online health services, and use of e-learning systems.

Mobile usage intentions

Ling et al. [82] used a classification method to detect mobile users' mobile usage intention. A statistical framework has been proposed to determine primary school teachers' intention to adopt mobile as learning technology, intention towards mobile payment adoption, and mobile shopping continuance intention [83,84,85,86].

Query intentions

Query intentions played a vital role in facilitating users during searching web content on the internet. Search engines suggest queries by matching the keywords given by users. Zhao et al. [87] used statistical techniques to detect user switching intention towards mobile cloud storage using a push-pull mooring framework. A user’s question asking framework was developed to detect user implicit intention regarding Community Question Answer (CQA) from Yahoo question-answer dataset. Query intents facilitate the non-experienced users to search related content on the internet. Lee et al. [88] proposed Similarity-Aware Query Intent Discovery (SQUID) system to mine the internet surfing patterns of such users to detect their query intents. Fariha et al. [89] and Trabelsi et al. [90] described that search engines help the internet surfers to find out the information from a question-answer session by predicting the optimal answer to the user’s question. Jiang et al. [91] proposed a search query log structure to perceive query intent from social media search logs using statistical techniques. Yu et al. [92] proposed Search-Coexistence Knowledge Evolution (SCKE) framework to find out search intents for query patterns from search engine click-through log data. Public demands detailed storyboard right after any news breached out.

General intentions

Gu et al. [93] proposed Cross Domain Random Walk (CDRW) to extract search query patterns from search click through logs. Zan et al. [94] investigated that Intent recommenders have been trending social media to suggest services when the user opens the app without any input. Heterogeneous networks were used to mine the complex and rich content to recommend intents. Liu et al. [95] proposed an intent recommender to analyze user behaviors towards mobile usability. Advertisement has become an essential part of search engines and social media. Advertisement intention aims to adjust the relevance of advertisements with the user query [96, 97]. Izquierdo et al. [98] addressed the issue of relating the advertisement to the user search query using the feature extraction mechanism as well as automate the TV ad schedule by mathematical programming. Social communities share information, ideas, opinions, and attitudes to make a suggestion or propose a solution to anyone’s problem [99,100,101,102,103,104,105,106,107]. The statistical structured model used to detect tourist intentions, intention to use transportation, and intention to participate in online travel companies is addressed [45, 84, 108]. Intention to mine quality event log and intention to recommend using a smartwatch is discussed in [109, 110]. Mishael et al. [111] presented that temporal intentions indicated the periodic change in page status of a twitter post. Stable intention, changing intention from current to past and undefined intention detected from twitter log data to predict the periodic change from twitter post to twitter share. Habib et al. [112] have used deep neural network techniques to extract social media intention from social media logs.

Approaches and techniques used In intention mining

RQ2: What data mining/machine learning approaches currently exist to support the effectiveness of different intention mining techniques?

As illustrated in RQ1 that multiple types of intention have been detected from the datasets. Researchers used various approaches and techniques for intention detection, such as machine learning, statistical techniques, heuristics, image processing methods, fuzzy logic, and deep learning. Machine learning has been considered the most effective and efficient approach to data mining due to its high accuracy performance. Many machine learning techniques such as supervised, unsupervised, semi-supervised, natural language processing, and neural networks have been used to infer intention. The discussion on the techniques and approaches used in 109 selected research articles has been presented in the following sections.

Supervised learning

Supervised learning is a simple approach to machine learning. It is used to train the machine according to training data with known output labels. Classification is one of the most commonly used techniques of supervised learning. In [18], the author used logistic regression, decision trees, and neural networks proposed a model intent of purchase (IOP) to detect the user's car purchase intention. In [20], authors proposed a supervised query intended for kids (QuIK) model to facilitate children of 6–15 age to formulate their query in search engines, which lead to more concise and relevant search results. In [21], a novel approach semi-supervised sequence clustering has been presented to extract and group interaction sequences of users, then assign the predefined task and visualize intuitively. Recommendation (MEIR) was proposed to recommend user intention according to the previous history automatically. In [22], the convolutional neural network and maximum entropy-based model to perceive suggestion intentions from Vietnam text data. An encoder–decoder neural architecture was proposed to mine users browse or purchase intention.

Convolutional attention model [35] was used to compare touch and click interactions with a mouse, keyboard log data to find the similarities and differences of user interaction. Multiple intent modeling has been used to collect candidate intent features efficiently without human supervision [36]. A Bayesian network-based context-specific implicit intention recognition model has been developed to mine the context based user implicit intentions [37]. The mobile touch interaction model [28] was used to detect the user's search intentions by analyzing the touch data of the mobile device.

In [84], a novel software engine EUI having a robust classifier Open Directory Project (ODP), was developed to mine user intentions by analyzing mobile usage data. In [85], authors used the support vector machine classification method to detect commercial intents relevant to the user's research and purchase behavior. A classification-based system semantic similarity aware query intent discovery (SQuID) [88] was proposed to detect the user query intent. It has taken examples from users as input and consults with the database to discover more profound associations. This association revealed the semantic context of the given input and inferred the query intent of the user. Linear discriminant analysis (LDA) [89] was used to classify event-related synchronization (ERS) and desynchronization (ERD) patterns while users were lifting different weights. LDA-CRC and LDA-SRC were used to develop an early warning system to detect a crime activity [99]. In [92], a framework for query feature extraction was developed to mine the advertiser's intentions.

Unsupervised learning

Unsupervised learning is a significant and complex approach of machine learning used to detect hidden patterns from the dataset with unknown output labels. Clustering is the most commonly used technique of unsupervised learning. A data-driven approach automated TV ad scheduling [31] used intention learning on top of mathematical optimization and clustering to imitate scheduling experts' decision-making process. Dynamic intent mining [52] method used to mine query intent from search query logs. The ranking model for user intent [53] exploits user feedback in terms of click data to cluster ranking model for historic queries according to user intent. Another cluster-based model, Heterogenous graph-based soft clustering [54], was developed to collaborate with Wikipedia, web, and queries data to learn search intents. A novel approach map miner method (MMM) [55], used to construct an intentional process model from the process log. MMM used a hidden Markov model to cluster user activities. The study [106] proposed intent sensitive word embedding to lean user satisfaction intention. In [107], an unsupervised sequence query clustering group queries the same interest for producing a pattern consisting of a sequence of semantic concepts and/or lexical items for each intent.

Semi-supervised learning

Semi-supervised learning is a combination of supervised and unsupervised learning techniques. It combines a small amount of labeled data with a large amount of unlabeled data. Discovering query intent patterns from general search has been a challenging task therefore cross-domain random walk (CDRW) [93] is used to extract query patterns from search engine click-through log data. In [29] convolutional neural network and maximum entropy-based model were developed to perceive Vietnam text data's suggestion intentions. In [30], an encoder-decoder neural architecture was proposed to mine users browse or purchase intents and behaviors from large scale datasets of e-commerce.

The study [31] proposed a predictive model based on semi-supervised learning to detect user new question intent in question-answer community. In [32] a novel approach semi-supervised sequence clustering to extract and group interaction sequences of users then assign the predefined task and visualize intuitively. Recommendation (MEIR) was proposed to recommend user intention according to the previous history automatically.

Natural language processing

In [87], a lightweight service composition framework was designed to mine the User intended goal from natural text using natural language processing techniques. Usually, end users have to perform many sub-tasks to achieve a goal therefore, the said system mine the task with non-functional constraints to guide the selection of services.

Fuzzy logic

In [100], the authors revealed the user's implicit intention to adopt the pension insured program. The fuzzy comprehensive evaluation method was used along with the analytical hierarchy process (AHP) to assess the insured wishes index system.

Statistical methods

In [81] authors presented the HMM-AIP model based on hidden Markov model to track and predict the attack intentions. In [82], the authors extracted consumers' mobile reading intention using technology assessment model. Human behavior and attitude were found as the major factors that influenced the said intentions. In [9], the authors developed a model IOP using logistic regression, decision trees, and neural networks to predict whether, after booking a car user will purchase it or cancel the order. In [10], Statistical techniques based collaborative intent nowcasting model was developed to extract the complex relation between intent and context. In [91,92,93,94,95,96,97,98,99,100,101,102,103,104,105], multiple statistical techniques were used to proposed intention detection frameworks such as the S-O-R model base framework, decomposed theory of planned behavior, self-determination theory, TPB, TAM, CTO, regression correlation, composite reliability, variance extracted, AVE, t value, social cognitive theory and many more.

Image processing

In [83], a gaze based intention inference framework was developed to infer elder and disabled people's intentions from gaze movements. The study [84] addressed user query intention to seek any information on search engines. The heuristic based interactive user intention understanding model was developed to help the web surfers to reach their search goals.

Deep learning

In [98], a deep-learning-based methodology was proposed to predict the vehicle's lane change intentions on the road. In deep intent prediction network (DIPN) touch interactive behavioral-based model was presented to predict real-time user-product purchasing intentions. A most comprehensive hierarchal attention interactive method was used to combine the real-time user behaviors more effectively and efficiently. Experiments performed on large scale commercial datasets revealed that DIPN significantly outperforms the baseline methods.

Dataset used in intention mining

RQ3: What types of datasets are used to detect multiple types of human intentions?

Dataset is a combination of related information constructed using social media, weblog, user log, and questionnaire survey methods. As the dataset is a basic element of intention mining; therefore, its accuracy and efficiency matter a lot to get good results from the proposed method/framework. The studies selected in this SLR classified datasets into six types such as search engine logs, social media data, model-based generated data, questionnaire survey method, mobile usage dataset and generic datasets. Figure 5 presents the complete layout of the classified datasets included in this study.

Fig. 5
figure 5

Taxonomy of approaches and techniques used in intention mining

Search engine log data

Search engine logs are the collection of user queries. Such data can use to detect user search patterns and intentions. Before applying techniques, researchers preprocess the dataset to remove all bugs and prepare it according to the method/framework requirements. In [41], an offline and online click-through rate dataset was retrieved from search engines to deduce the behavioral intention. Japanese commercial search engine log data was used to detect search and web advertisement intentions. In [42] search engine clicks through log used to detect query intent patterns. In [44] the authors depicted the intent recommendations using two real-life datasets Movie lens and Tmall. The articles [45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] used commercial, personal assistant clicks, and views log as a dataset to predict intention with its real-time context.

In [62], search engine log data was used to determine the relevance of advertisement and user search query intention. In [64], Yahoo! Query log was used to retrieve the web results most related to user personalized needs. In [65], the authors used Google and Firefox log data of 440 undergraduate and graduate users who have adequate knowledge to use search engines. In [102], SogouQ Chines query log data was used to perceive user role explicit intent query by the simplified word n-gram role model (SWNR) framework. In [104, 105], the image search dataset was used to mine the intention of disabled people using gaze cues and the influence of image search on human behavior. In [60], the authors presented a probabilistic method to reveal the most optimized searches resulting from the unspecified user query. Multiple users were asked to enter alternate queries with different keywords to form a dataset, which were further used for suggestions in search query intent. In [106], the user question answer framework was proposed to deduce query intention. Yahoo question-answer dataset was used to deduce user upcoming intentions.

Model-based generated data

Another approach to get dataset is by constructing a model according to research requirements. Researchers develop a model to get scenario-based data and implement tools and techniques to get fruitful results. In [58], a task-based model was constructed to develop dataset. Users were given 30 min to solve the research task over 60 million documents. The gathered dataset was used in interactive intent modeling for information discovery. In [53], authors developed a model of multiple users to get their online shopping behavior data and then used the dataset in product search taxonomy to find out that either a user will purchase a product or he just kills his time by browsing.

Questionnaire survey method (QSM)

A questionnaire survey is another method to form datasets for statistical analysis. In QSM, researchers design a questionnaire consists of relevant questions then verified it by experts. The questionnaire was distributed to the relevant people. Their feedback was recorded as the dataset for further usage. . In [70] Pearson’s R and Kendall’s T technique used to model purchase intention by conducting online questionnaire surveys. In [82] questionnaire survey method was used to collect datasets. The sample consisted of students and office workers who are active users of mobile reading. In [100], the questionnaire method was used to collect data to improve the index system's scientific credibility. In [71], 60 questionnaires were filled by 19–60 age participants to collect datasets used to infer strongly related context features.

Another set of questionnaires was used to measure the effectiveness of the proposed model. In [72] online questionnaire was collected to model the differences between the social networking service (SNS) model and the online technologies sage model. In [73] sample consisted of university students, their friends, and a family who used mobile PCSS. In [74] online survey method (from 402 participants) was used to detect the reasons to repost a marketing message on social media. Facebook is one of the most used social media forum on the internet. In [20, 72, 75,76,77,78,79] online surveys were conducted to detect user intentions towards Facebook usage. The addressed topics were the use of Facebook, continuance intention towards Facebook in dangerous virtual communities, and life satisfaction influence to use Facebook.

In [36], the author proposed a model to measure factors to hide or unfriend Facebook contact. Data was collected from the real-time model. Studies [72, 80,81,82,83] proposed a model to perceived user continuance intention towards mobile usage. Proposed models perceived whether cultural values affect continuance intention of mobile shopping, use mobile data service and adopt the mobile payment service for different activities. Continuance intention to use MOOCs, continuance intentions to continue communication in the Kuwait market, continuance intention of SoLoMo services addressed in [84,85,86,87,88]. Datasets collected by online questionnaire surveys consisted of two parts. First were demographic questions about the participants, whereas the second section features questions measuring the research model's constructs [89]. Studies [90,91,92,93,94] addressed the user's intention towards the use of smartwatches, behavioral intentions regarding IT services, continuance intention of pre-service teachers to use the mobile phone, and mobile gamer’s epistemic curiosity.

Generic datasets

There are many other datasets collected from real-life activities named generic dataset. In [11, 12], the car dealer company dataset was used to mine car purchasing intentions. In [10], the authors depict the intent recommendations using two real-life datasets Movie lens and Tmall. Real-world EEG data used in [30, 97] to predict patient intention level and human movement intention, respectively, and in [14], Japanese TV network data used for scheduling of TV advertisements.

Mobile usage dataset

The mobile phone is one of the most used devices of this era. Data generated by mobile through its touch interaction has grasped researchers' attention to detect hidden mobile usage patterns. In [42,43,44,45,46] authors used mobile usage patterns to understand user intent for cell phone, online shopping using mobile and predict the user implicit intention without any explicit input on the mobile phone. In [54] authors used mobile touch interaction dataset to detect the relevant web search on the mobile touch environment.

Social media dataset

Social media datasets are most robust to detect user intentions from daily life activities on different social media forums such as Facebook, Twitter, and e-commerce websites. This study selected many articles that used social media dataset to deduce social media intentions. The studies [10, 87, 95] investigated WikiHow and Wikipedia documents to deduce users' daily living intentions. In [88], Yahoo! Query click daily log was used to make out user intention to get precise knowledge soon after the information was announced.

E-commerce is one of the robust social media services used to facilitate people in the context of online shopping [46]. In [54], three datasets Arnetminer, Patent, and Random, were used to predict interactive user intention. The study [56] extracted the eBay dataset to accurately recommend the user searched product as well, as [56] used nine million sessions of a well-known e-commerce portal to deduce purchase intents. In [57], authors used Sina Weibo, an e-commerce product database, to detect purchase intention, [66] used Yahoo! Search log to see search intention, [67] mined turnover intention from IT job learner’s data, [68] used Chinese dataset NTCIR, spatiotemporal dataset and, Bio-medical microblogs to find out the behavioral intention. Figure 6 depicted the classification of 109 selected studies to the proposed six types of datasets.

Fig. 6
figure 6

Classification of articles by dataset

Discussion

This section summarized the results deduce from the above discussion. Intention mining is a decade old research field. This study aims to investigate the three main factors of intention mining, such as categories, approaches, along with techniques and, datasets.

Proposed intention categories

Eight intention categories are proposed to classify user intentive activities such as purchase intention, behavioral intention, search intention, continuous intention, human implicit intention, query intention, mobile usage, and general intention. This classification has been made by reviewing each article rigorously to understand the discussed intention type. It was analyzed that the most debated intention category is the generic intention with a 31% ratio. 13% of selected papers were presented behavioral intentions, 13% human implicit intention discussed, whereas 12% search intentions, and the ratio of continuance intention is 10% identified in this review. About 7, 5, and 3% query intention, mobile usage intention, and purchase intention were reported, respectively. It is to be analyzed that generic intention is one of the most discussed intention categories in intention mining research articles.

Proposed taxonomy

This paper identified machine learning, deep learning, image processing, statistical and heuristic as frequently used techniques to intention mining as concerned with approaches and techniques. Machine learning is significant to extract hidden patterns from a huge dataset. Studies included in this review used any one of the multiple approaches of ML, which is the best fit for their framework or method. Discussed approaches of ML are classification, clustering, neural networks, NLP, and semi-supervised learning. In this study, authors analyzed that almost 70% of the included studies used machine learning approaches to detect user intentions. Technology assessment model (TAM) is a statistical technique robust to use for data analysis in intention mining. Many other statistical techniques such as regression, correlation, hidden Markov model, qualitative, quantitative, and search coexistence are used in almost 40% of the papers included in this article. 1% of Fuzzy logic techniques and 1% of image processing techniques used 1% in research articles to mine user intentions. Figure 5 presented a state-of-the-art taxonomy of approaches and techniques used in intention mining.

Proposed types of datasets

Dataset is a core phase of the intention mining process. The questionnaire survey method is one of the major sources to get datasets on multiple topics. This method is used to collect data for statistical techniques-based frameworks. About 38% of included articles used online questionnaire survey methods to collect data from users according to requirement, while 18% of Search engine log data and 22% of selected papers used social media datasets used to extract user intention. Only 4% of studies used model-based, and 9% mobile-based generated data, respectively. About 8% of articles used generic datasets such as email composing data, EEG signal data, images, and PREVENTION datasets.

Table 6 presents the overall classification results of selected papers to quality assessment parameters.

Table 6 Classification and quality assessment score

Issues and challenges

According to the reviewed literature, this article identified the following issues and challenges related to intention mining

  • Predict behavioral intention of software users either they will purchase and install the new version of the software after using its beta version

  • Significant research can be performed to identify a chef's behavioral intentions while cooking food; either he will sustain or change the taste.

  • Many approaches identified purchase or buy intentions, but only a few discussed user intentions; therefore, sell intentions need to be addressed.

  • Investigate property dealer behavior intention either he intended to purchase/sell a property or not.

  • Automate the cluster assignment of TV advertisement intentions using natural language processing and deploy such a system on the TV network.

  • Facebook and Twitter data used in a different perspective to mine the user intention along with these forums, WhatsApp and Instagram have also grasped user attention. A vast number of users are using these apps for social connectivity. Therefore, the bulk of data is available to analyze user intention to use WhatsApp and Instagram, intention to share data (images, videos). Chat data of both forums can also be used to infer user intention.

Deep learning is a modern approach to machine learning. It is used for prediction methods as it reduces much of the labor work to extract features and related artifacts. But it is being observed that usage of deep learning techniques in intention mining is far less than machine learning; therefore, it is suggested that the researchers should use deep learning to extract intention patterns.

Threats to validity

Threats to validity are important to recognize in research to make a robust study [113,114,115,116]. There have been three kinds of threats to validity identified in this section

Construct validity

In the context of SLR, threats to validity refer to the classification of selected articles [114]. In this study, two authors identified primary and secondary search keywords. Six terms related to intention mining have been used to construct search strings. The search string was performed using well-reputed digital libraries such as IEEE Xplore, ACM, Sciencedirect, and Springerlink. We have found most of the research articles related to intention mining. We have searched the relevant papers in data mining and machine learning research venues to reduce the risk of related publications. We have selected JCR journals and raked conference articles, which indicates the good quality of included articles.

Internal validity

Internal validity handles the extraction data analysis process, in which two authors worked on the classification of selected studies and the data extraction process, whereas one author reviewed the results [114]. The Kappa coefficient value is 0.92 between two authors who have worked on the related articles collection and classification. The Kappa value has been indicated the high level of agreement and confidence of authors on selected studies.

Conclusion validity

The conclusion validity in SLR is related to recognize the improper relationship that may lead to an incorrect conclusion. To decrease this threat, a proper data extraction and selection process have been discussed in internal validity.

Conclusions

Intention mining is a promising field of research that aims to detect future actions of the user. This article has presented a systematic literature review on intention mining by comprehensively reviewing the 109 best quality articles of well-reputed forums selected by employing a systematic methodology. This study's primary focus is to discuss intention mining by its categories, approaches, techniques, and datasets. The contribution of this study is to classify user intention into eight categories such as purchase intention, behavioral intention, search intention, continuous intention, implicit human intention, query intention, mobile usage, and general intention. It has been analyzed by reviewing the included studies that behavioral intention is the most discussed intention type, which aims to detect human behavior to purchase a product, use Facebook, or search daily news. After that, a comprehensive review was performed on proposed approaches and techniques to deduce intention from past human activities. As contrasted to the other studies, this SLR presented a taxonomy to map the state-of-the-art techniques such as machine learning, deep learning, image processing, and statistical techniques for intention mining. It has been observed that statistical and machine learning techniques are frequently used for intention detection. Furthermore, a detailed discussion was accomplished on the classification of datasets used to infer the user intention. The datasets' perceived group was classified into six major categories: search engine logs, social media data, model-based generated data, questionnaire survey method, mobile usage dataset, and generic datasets. The questionnaire survey method was observed as a major source of data collection. Finally, promising future directions have been discussed for the researchers working in the domain of intention mining. This study is an effort to gather intention mining knowledge in intention categories, techniques, approaches, and datasets.