
显示样式: 排序: IF: - GO 导出
-
High-Order Structure Exploration on Massive Graphs: A Local Graph Clustering Perspective ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-09 Dawei Zhou; Si Zhang; Mehmet Yigit Yildirim; Scott Alcorn; Hanghang Tong; Hasan Davulcu; Jingrui He
Modeling and exploring high-order connectivity patterns, also called network motifs, are essential for understanding the fundamental structures that control and mediate the behavior of many complex systems. For example, in social networks, triangles have been proven to play the fundamental role in understanding social network communities; in online transaction networks, detecting directed looped transactions
-
The 8M Algorithm from Today’s Perspective ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-09 Radim Belohlavek; Martin Trnecka
We provide a detailed analysis and a first complete description of 8M—an old but virtually unknown algorithm for Boolean matrix factorization. Even though the algorithm uses a rather limited insight into the factorization problem from today’s perspective, we demonstrate that its performance is reasonably good compared to the currently available algorithms. Our analysis reveals that this is due to certain
-
Knowledge Graph Embedding for Link Prediction: A Comparative Analysis ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Andrea Rossi; Denilson Barbosa; Donatella Firmani; Antonio Matinata; Paolo Merialdo
Knowledge Graphs (KGs) have found many applications in industrial and in academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even the largest KGs suffer from incompleteness; Link Prediction (LP) techniques address this issue by identifying missing facts among
-
Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Chainarong Amornbunchornvej; Navaporn Surasvadi; Anon Plangprasopchok; Suttipong Thajchayapong
One shirt size cannot fit everybody, while we cannot make a unique shirt that fits perfectly for everyone because of resource limitations. This analogy is true for policy making as well. Policy makers cannot make a single policy to solve all problems for all regions because each region has its own unique issue. At the other extreme, policy makers also cannot make a policy for each small village due
-
Recommending Statutes: A Portable Method Based on Neural Networks ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Yi Feng; Chuanyi Li; Jidong Ge; Bin Luo; Vincent Ng
Legal judgment prediction, which aims at predicting judgment results such as penalty, charges, and statutes for cases, has attracted much attention recently. In this article, we focus on building a recommender system to predict the associated statutes for a case given the facts of the case as input. For this purpose, we propose a two-step neural network-based machine learning framework to assist judges
-
HARP: A Novel Hierarchical Attention Model for Relation Prediction ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Yashen Wang; Huanhuan Zhang
Recent years have witnessed great advancement of representation learning (RL)-based models for the knowledge graph relation prediction task. However, they generally rely on structure information embedded in the encyclopedic knowledge graph, while the beneficial semantic information provided by lexical knowledge graph is ignored, leading the problem of shallow understanding and coarse-grained analysis
-
An Exponential Factorization Machine with Percentage Error Minimization to Retail Sales Forecasting ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Chongshou Li; Brenda Cheang; Zhixing Luo; Andrew Lim
This article proposes a new approach to sales forecasting for new products (stock-keeping units [SKUs]) with long lead time but short product life cycle. These SKUs are usually sold for one season only, without any replenishments. An exponential factorization machine (EFM) sales forecast model is developed to solve this problem which not only takes into account SKU attributes, but also pairwise interactions
-
Core Interest Network for Click-Through Rate Prediction ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 En Xu; Zhiwen Yu; Bin Guo; Helei Cui
In modern online advertising systems, the click-through rate (CTR) is an important index to measure the popularity of an item. It refers to the ratio of users who click on a specific advertisement to the number of total users who view it. Predicting the CTR of an item in advance can improve the accuracy of the advertisement recommendation. And it is commonly calculated based on users’ interests. Thus
-
Context-Based Evaluation of Dimensionality Reduction Algorithms—Experiments and Statistical Significance Analysis ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Aindrila Ghosh; Mona Nashaat; James Miller; Shaikh Quader
Dimensionality reduction is a commonly used technique in data analytics. Reducing the dimensionality of datasets helps not only with managing their analytical complexity but also with removing redundancy. Over the years, several such algorithms have been proposed with their aims ranging from generating simple linear projections to complex non-linear transformations of the input data. Subsequently,
-
Unbiased Measurement of Feature Importance in Tree-Based Methods ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2021-01-04 Zhengze Zhou; Giles Hooker
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries
-
Robust Tensor Recovery with Fiber Outliers for Traffic Events ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-29 Yue Hu; Daniel B. Work
Event detection is gaining increasing attention in smart cities research. Large-scale mobility data serves as an important tool to uncover the dynamics of urban transportation systems, and more often than not the dataset is incomplete. In this article, we develop a method to detect extreme events in large traffic datasets, and to impute missing data during regular conditions. Specifically, we propose
-
Hierarchical Physician Recommendation via Diversity-enhanced Matrix Factorization ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Hao Wang; Shuai Ding; Yeqing Li; Xiaojian Li; Youtao Zhang
Recent studies have shown that there exhibits significantly imbalanced medical resource allocation across public hospitals. Patients, regardless of their diseases, tend to choose hospitals and physicians with a better reputation, which often overloads major hospitals while leaving others underutilized. Guiding patients to hospitals that can serve their treatment needs both timely and with good quality
-
Span-core Decomposition for Temporal Networks: Algorithms and Applications ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Edoardo Galimberti; Martino Ciaperoni; Alain Barrat; Francesco Bonchi; Ciro Cattuto; Francesco Gullo
When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). In this article, we tackle this task by introducing a notion of temporal core decomposition where each core is associated with two quantities, its
-
Dynamic Graph Mining for Multi-weight Multi-destination Route Planning with Deadlines Constraints ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Yu Huang; Josh Jia-Ching Ying; Philip S. Yu; Vincent S. Tseng
Route planning satisfied multiple requests is an emerging branch in the route planning field and has attracted significant attention from the research community in recent years. The prevailing studies focus only on seeking a route by minimizing a single kind of Travel Cost, such as trip time or distance, among others. In reality, most users would like to choose an appropriate route, neither fastest
-
Class Imbalance and Cost-Sensitive Decision Trees: A Unified Survey Based on a Core Similarity ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Michael J. Siers; Md Zahidul Islam
Class imbalance treatment methods and cost-sensitive classification algorithms are typically treated as two independent research areas. However, many of these techniques have properties in common. After providing a background to the two fields of research, this article identifies the fundamental mechanism which is common to both. Using this mechanism, a taxonomy is created which encompasses approaches
-
Multi-Stage Network Embedding for Exploring Heterogeneous Edges ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Hong Huang; Yu Song; Fanghua Ye; Xing Xie; Xuanhua Shi; Hai Jin
The relationships between objects in a network are typically diverse and complex, leading to the heterogeneous edges with different semantic information. In this article, we focus on exploring the heterogeneous edges for network representation learning. By considering each relationship as a view that depicts a specific type of proximity between nodes, we propose a multi-stage non-negative matrix factorization
-
Automatic Recommendation of a Distance Measure for Clustering Algorithms ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Xiaoyan Zhu; Yingbin Li; Jiayin Wang; Tian Zheng; Jingwen Fu
With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method consists of the following steps: (1) metadata extraction, including meta-feature collection and meta-target
-
Combinatorial Algorithms for String Sanitization ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Giulia Bernardini; Huiping Chen; Alessio Conte; Roberto Grossi; Grigorios Loukides; Nadia Pisanti; Solon P. Pissis; Giovanna Rosone; Michelle Sweering
String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this article, we consider the problem of sanitizing a string by concealing the occurrences
-
Heterogeneous Graphlets ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Ryan A. Rossi; Nesreen K. Ahmed; Aldo Carranza; David Arbour; Anup Rao; Sungchul Kim; Eunyee Koh
In this article, we introduce a generalization of graphlets to heterogeneous networks called typed graphlets. Informally, typed graphlets are small typed induced subgraphs. Typed graphlets generalize graphlets to rich heterogeneous networks as they explicitly capture the higher-order typed connectivity patterns in such networks. To address this problem, we describe a general framework for counting
-
Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Yugang Ji; Mingyang Yin; Hongxia Yang; Jingren Zhou; Vincent W. Zheng; Chuan Shi; Yuan Fang
In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types
-
MeSHProbeNet-P: Improving Large-scale MeSH Indexing with Personalizable MeSH Probes ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Guangxu Xun; Kishlay Jha; Aidong Zhang
Indexing biomedical research articles with Medical Subject Headings (MeSH) can greatly facilitate biomedical research and information retrieval. Currently MeSH indexing is performed by human experts. To alleviate the time consumption and monetary cost caused by manual indexing, many automatic MeSH indexing models have been developed, such as MeSHProbeNet, DeepMeSH, and NLM’s official model Medical
-
CrowdWT: Crowdsourcing via Joint Modeling of Workers and Tasks ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Jinzheng Tu; Guoxian Yu; Jun Wang; Carlotta Domeniconi; Maozu Guo; Xiangliang Zhang
Crowdsourcing is a relatively inexpensive and efficient mechanism to collect annotations of data from the open Internet. Crowdsourcing workers are paid for the provided annotations, but the task requester usually has a limited budget. It is desirable to wisely assign the appropriate task to the right workers, so the overall annotation quality is maximized while the cost is reduced. In this article
-
A Reduced Network Traffic Method for IoT Data Clustering ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-12-07 Ricardo De Azevedo; Gabriel Resende Machado; Ronaldo Ribeiro Goldschmidt; Ricardo Choren
Internet of Things (IoT) systems usually involve interconnected, low processing capacity, and low memory sensor nodes (devices) that collect data in several sorts of applications that interconnect people and things. In this scenario, mining tasks, such as clustering, have been commonly deployed to detect behavioral patterns from the collected data. The centralized clustering of IoT data demands high
-
Efficient Outlier Detection in Text Corpus Using Rare Frequency and Ranking ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-10-02 Wathsala Anupama Mohotti; Richi Nayak
Outlier detection in text data collections has become significant due to the need of finding anomalies in the myriad of text data sources. High feature dimensionality, together with the larger size of these document collections, presents a need for developing accurate outlier detection methods with high efficiency. Traditional outlier detection methods face several challenges including data sparseness
-
Boosting Item-based Collaborative Filtering via Nearly Uncoupled Random Walks ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Athanasios N. Nikolakopoulos; George Karypis
Item-based models are among the most popular collaborative filtering approaches for building recommender systems. Random walks can provide a powerful tool for harvesting the rich network of interactions captured within these models. They can exploit indirect relations between the items, mitigate the effects of sparsity, ensure wider itemspace coverage, as well as increase the diversity of recommendation
-
NGUARD+: An Attention-based Game Bot Detection Framework via Player Behavior Sequences ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Jiarong Xu; Yifan Luo; Jianrong Tao; Changjie Fan; Zhou Zhao; Jiangang Lu
Game bots are automated programs that assist cheating users, leading to an imbalance in the game ecosystem and the collapse of user interest. Online games provide immersive gaming experience and attract many loyal fans. However, game bots have proliferated in volume and method, evolving with the real-world detection methods and showing strong diversity, leaving game bot detection efforts extremely
-
Influence Maximization: Seeding Based on Community Structure ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Jianxiong Guo; Weili Wu
Influence maximization problem attempts to find a small subset of nodes in a social network that makes the expected influence maximized, which has been researched intensively before. Most of the existing literature focus only on maximizing total influence, but it ignores whether the influential distribution is balanced through the network. Even though the total influence is maximized, but gathered
-
Exploiting User Preference and Mobile Peer Influence for Human Mobility Annotation ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Renjun Hu; Yanchi Liu; Yanyan Li; Jingbo Zhou; Shuai Ma; Hui Xiong
Human mobility annotation aims to assign mobility records the corresponding visiting Point-of-Interests (POIs). It is one of the most fundamental problems for understanding human mobile behaviors. In literature, many efforts have been devoted to annotating mobility records in a pointwise or trajectory-wise manner. However, the user preference factor is not fully explored and, worse still, the mobile
-
Heterogeneous Univariate Outlier Ensembles in Multidimensional Data ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Guansong Pang; Longbing Cao
In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying
-
Probabilistic Modeling for Frequency Vectors Using a Flexible Shifted-Scaled Dirichlet Distribution Prior ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Nuha Zamzami; Nizar Bouguila
Burstiness and overdispersion phenomena of count vectors pose significant challenges in modeling such data accurately. While the dependency assumption of the multinomial distribution causes its failure to model frequency vectors in several machine learning and data mining applications, researchers found that by extending the multinomial distribution to the Dirichlet Compound multinomial (DCM), both
-
An Approach For Concept Drift Detection in a Graph Stream Using Discriminative Subgraphs ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Ramesh Paudel; William Eberle
The emergence of mining complex networks like social media, sensor networks, and the world-wide-web has attracted considerable research interest. In a streaming scenario, the concept to be learned can change over time. However, while there has been some research done for detecting concept drift in traditional data streams, little work has been done on addressing concept drift in data represented as
-
Time-Warped Sparse Non-negative Factorization for Functional Data Analysis ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Chen Zhang; Steven C. H. Hoi; Fugee Tsung
This article proposes a novel time-warped sparse non-negative factorization method for functional data analysis. The proposed method on the one hand guarantees the extracted basis functions and their coefficients to be positive and interpretable, and on the other hand is able to handle weakly correlated functions with different features. Furthermore, the method incorporates time warping into factorization
-
Scalable Spatial Scan Statistics for Trajectories ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Michael Matheny; Dong Xie; Jeff M. Phillips
We define several new models for how to define anomalous regions among enormous sets of trajectories. These are based on spatial scan statistics, and identify a geometric region which captures a subset of trajectories which are significantly different in a measured characteristic from the background population. The model definition depends on how much a geometric region is contributed to by some overlapping
-
Bi-Directional Recurrent Attentional Topic Model ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Shuangyin Li; Yu Zhang; Rong Pan
In a document, the topic distribution of a sentence depends on both the topics of its neighbored sentences and its own content, and it is usually affected by the topics of the neighbored sentences with different weights. The neighbored sentences of a sentence include the preceding sentences and the subsequent sentences. Meanwhile, it is natural that a document can be treated as a sequence of sentences
-
Robust Adaptive Linear Discriminant Analysis with Bidirectional Reconstruction Constraint ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Jipeng Guo; Yanfeng Sun; Junbin Gao; Yongli Hu; Baocai Yin
Linear discriminant analysis (LDA) is a well-known supervised method for dimensionality reduction in which the global structure of data can be preserved. The classical LDA is sensitive to the noises, and the projection direction of LDA cannot preserve the main energy. This article proposes a novel feature extraction model with l2,1 norm constraint based on LDA, termed as RALDA. This model preserves
-
Large-scale Data Exploration Using Explanatory Regression Functions ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Fotis Savva; Christos Anagnostopoulos; Peter Triantafillou; Kostas Kolomvatsos
Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. Then, they use aggregation functions, the results of which determine a subspace’s interestingness for further exploration and deeper analysis. However, Aggregate Query (AQ) results are scalars and convey limited
-
REMIAN: Real-Time and Error-Tolerant Missing Value Imputation ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-09-28 Qian Ma; Yu Gu; Wang-Chien Lee; Ge Yu; Hongbo Liu; Xindong Wu
Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called Real-time and Error-tolerant Missing vAlue ImputatioN (REMAIN), to impute MVs in poor-quality streaming
-
Introduction to the Special Issue on the Best Papers from KDD 2018 ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-17 Hui Xiong; Chih-Jen Lin
No abstract available.
-
Towards an Optimal Outdoor Advertising Placement: When a Budget Constraint Meets Moving Trajectories ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-07-06 Ping Zhang; Zhifeng Bao; Yuchen Li; Guoliang Li; Yipeng Zhang; Zhiyong Peng
In this article, we propose and study the problem of trajectory-driven influential billboard placement: given a set of billboards U (each with a location and a cost), a database of trajectories T, and a budget L, we find a set of billboards within the budget to influence the largest number of trajectories. One core challenge is to identify and reduce the overlap of the influence from different billboards
-
Multi-User Mobile Sequential Recommendation for Route Optimization ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-07-06 Keli Xiao; Zeyang Ye; Lihao Zhang; Wenjun Zhou; Yong Ge; Yuefan Deng
We enhance the mobile sequential recommendation (MSR) model and address some critical issues in existing formulations by proposing three new forms of the MSR from a multi-user perspective. The multi-user MSR (MMSR) model searches optimal routes for multiple drivers at different locations while disallowing overlapping routes to be recommended. To enrich the properties of pick-up points in the problem
-
Learning Distance Metrics from Probabilistic Information ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-07-06 Mengdi Huai; Chenglin Miao; Yaliang Li; Qiuling Suo; Lu Su; Aidong Zhang
The goal of metric learning is to learn a good distance metric that can capture the relationships among instances, and its importance has long been recognized in many fields. An implicit assumption in the traditional settings of metric learning is that the associated labels of the instances are deterministic. However, in many real-world applications, the associated labels come naturally with probabilities
-
Pop Music Generation: From Melody to Multi-style Arrangement ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-07-06 Hongyuan Zhu; Qi Liu; Nicholas Jing Yuan; Kun Zhang; Guang Zhou; Enhong Chen
Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and arrangement, most of these methods have limitations when applying to multi-track music generation. Some critical factors related to the quality of music
-
Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-19 Dominik Mautz; Wei Ye; Claudia Plant; Christian Böhm
A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like
-
MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-19 Matteo Riondato; Fabio Vandin
We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different popular interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures as functions of averages, that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key
-
Adversarial Attacks on Graph Neural Networks: Perturbations and their Patterns ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-19 Daniel Zügner; Oliver Borchert; Amir Akbarnejad; Stephan Günnemann
Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, little is known about their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g., the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we present a study of adversarial attacks on attributed
-
Efficient Approaches to k Representative G-Skyline Queries ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-07-06 Xu Zhou; Kenli Li; Zhibang Yang; Yunjun Gao; Keqin Li
The G-Skyline (GSky) query is a powerful tool to analyze optimal groups in decision support. Compared with other group skyline queries, it releases users from providing an aggregate function. Besides, it can get much comprehensive results without overlooking some important results containing non-skylines. However, it is hard for the users to make sensible choices when facing so many results the GSky
-
A Unified Framework for Sparse Online Learning ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-17 Peilin Zhao; Dayong Wang; Pengcheng Wu; Steven C. H. Hoi
The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream
-
A General Coreset-Based Approach to Diversity Maximization under Matroid Constraints ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-05 Matteo Ceccarello; Andrea Pietracaprina; Geppino Pucci
Diversity maximization is a fundamental problem in web search and data mining. For a given dataset S of n elements, the problem requires to determine a subset of S containing k≪n “representatives” which maximize some diversity function expressed in terms of pairwise distances, where distance models dissimilarity. An important variant of the problem prescribes that the solution satisfy an additional
-
End-to-End Continual Rare-Class Recognition with Emerging Novel Subclasses ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-05 Hung Nguyen; Xuejian Wang; Leman Akoglu
Given a labeled dataset that contains a rare (or minority) class containing of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? The setting is different from traditional classification in that instances from novel minority subclasses might continually emerge over time—and hence is
-
Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-05 Tingting Wang; Lei Duan; Guozhu Dong; Zhifeng Bao
Recently, a lot of research work has been proposed in different domains to detect outliers and analyze the outlierness of outliers for relational data. However, while sequence data is ubiquitous in real life, analyzing the outlierness for sequence data has not received enough attention. In this article, we study the problem of mining outlying sequence patterns in sequence data addressing the question:
-
On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-08-17 Ryan A. Rossi; Di Jin; Sungchul Kim; Nesreen K. Ahmed; Danai Koutra; John Boaz Lee
Structural roles define sets of structurally similar nodes that are more similar to nodes inside the set than outside, whereas communities define sets of nodes with more connections inside the set than outside. Roles based on structural similarity and communities based on proximity are fundamentally different but important complementary notions. Recently, the notion of structural roles has become increasingly
-
Learning Bayesian Networks with the Saiyan Algorithm ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-22 Anthony C. Constantinou
Some structure learning algorithms have proven to be effective in reconstructing hypothetical Bayesian Network graphs from synthetic data. However, in their mission to maximise a scoring function, many become conservative and minimise edges discovered. While simplicity is desired, the output is often a graph that consists of multiple independent subgraphs that do not enable full propagation of evidence
-
Sparse Graph Connectivity for Image Segmentation ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-15 Xiaofeng Zhu; Shichao Zhang; Jilian Zhang; Yonggang Li; Guangquan Lu; Yang Yang
It has been demonstrated that the segmentation performance is highly dependent on both subspace preservation and graph connectivity. In the literature, the full connectivity method linearly represents each data point (e.g., a pixel in one image) by all data points for achieving subspace preservation, while the sparse connectivity method was designed to linearly represent each data point by a set of
-
Internal Evaluation of Unsupervised Outlier Detection ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-26 Henrique O. Marques; Ricardo J. G. B. Campello; Jürg Sander; Arthur Zimek
Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in
-
Self-weighted Multi-view Fuzzy Clustering ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-22 Xiaofeng Zhu; Shichao Zhang; Yonghua Zhu; Wei Zheng; Yang Yang
Since the data in each view may contain distinct information different from other views as well as has common information for all views in multi-view learning, many multi-view clustering methods have been designed to use these information (including the distinct information for each view and the common information for all views) to improve the clustering performance. However, previous multi-view clustering
-
Discovering Anomalies by Incorporating Feedback from an Expert ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-22 Shubhomoy Das; Weng-Keen Wong; Thomas Dietterich; Alan Fern; Andrew Emmott
Unsupervised anomaly detection algorithms search for outliers and then predict that these outliers are the anomalies. When deployed, however, these algorithms are often criticized for high false-positive and high false-negative rates. One main cause of poor performance is that not all outliers are anomalies and not all anomalies are outliers. In this article, we describe the Active Anomaly Discovery
-
Neural Serendipity Recommendation: Exploring the Balance between Accuracy and Novelty with Sparse Explicit Feedback ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-06-15 Yuanbo Xu; Yongjian Yang; En Wang; Jiayu Han; Fuzhen Zhuang; Zhiwen Yu; Hui Xiong
Recommender systems have been playing an important role in providing personalized information to users. However, there is always a trade-off between accuracy and novelty in recommender systems. Usually, many users are suffering from redundant or inaccurate recommendation results. To this end, in this article, we put efforts into exploring the hidden knowledge of observed ratings to alleviate this recommendation
-
Incomplete Network Alignment ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-05-30 Si Zhang; Hanghang Tong; Jie Tang; Jiejun Xu; Wei Fan
Networks are prevalent in many areas and are often collected from multiple sources. However, due to the veracity characteristics, more often than not, networks are incomplete. Network alignment and network completion have become two fundamental cornerstones behind a wealth of high-impact graph mining applications. The state-of-the-art have been addressing these two tasks in parallel. That is, most
-
Fully Dynamic Approximate k-Core Decomposition in Hypergraphs ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-05-30 Bintao Sun; T.-H. Hubert Chan; Mauro Sozio
In this article, we design algorithms to maintain approximate core values in dynamic hypergraphs. This notion has been well studied for normal graphs in both static and dynamic setting. We generalize the problem to hypergraphs when edges can be inserted or deleted by an adversary.
-
Efficient Nonnegative Tensor Factorization via Saturating Coordinate Descent ACM Trans. Knowl. Discov. Data (IF 2.01) Pub Date : 2020-05-30 Thirunavukarasu Balasubramaniam; Richi Nayak; Chau Yuen
With the advancements in computing technology and web-based applications, data are increasingly generated in multi-dimensional form. These data are usually sparse due to the presence of a large number of users and fewer user interactions. To deal with this, the Nonnegative Tensor Factorization (NTF) based methods have been widely used. However existing factorization algorithms are not suitable to process