当前期刊: Data Mining and Knowledge Discovery Go to current issue    加入关注    本刊投稿指南
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Predictive modeling of infant mortality
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-18
    Antonia Saravanou, Clemens Noelke, Nicholas Huntington, Dolores Acevedo-Garcia, Dimitrios Gunopulos

    The Infant Mortality Rate (IMR) is defined as the number of infants for every thousand infants that do not survive until their first birthday. IMR is an important metric not only because it provides information about infant births in an area, but it also measures the general societal health status. In the United States of America, the IMR is higher than many other developed countries, despite the high

    更新日期:2021-01-19
  • A framework for deep constrained clustering
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-17
    Hongjing Zhang, Tianyang Zhan, Sugato Basu, Ian Davidson

    The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular

    更新日期:2021-01-18
  • User preference and embedding learning with implicit feedback for recommender systems
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-16
    Sumit Sidana, Mikhail Trofimov, Oleh Horodnytskyi, Charlotte Laclau, Yury Maximov, Massih-Reza Amini

    In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number

    更新日期:2021-01-18
  • Social explorative attention based recommendation for content distribution platforms
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-07
    Wenyi Xiao, Huan Zhao, Haojie Pan, Yangqiu Song, Vincent W. Zheng, Qiang Yang

    In modern social media platforms, an effective content recommendation should benefit both creators to bring genuine benefits to them and consumers to help them get really interesting content. To address the limitations of existing methods for social recommendation, we propose Social Explorative Attention Network (SEAN), a social recommendation framework that uses a personalized content recommendation

    更新日期:2021-01-07
  • Feature extraction from unequal length heterogeneous EHR time series via dynamic time warping and tensor decomposition
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-04
    Chi Zhang, Hadi Fanaee-T, Magne Thoresen

    Electronic Health Records (EHR) data is routinely generated patient data that can provide useful information for analytical tasks such as disease detection and clinical event prediction. However, temporal EHR data such as physiological vital signs and lab test results are particularly challenging. Temporal EHR features typically have different sampling frequencies; such examples include heart rate

    更新日期:2021-01-04
  • Detecting singleton spams in reviews via learning deep anomalous temporal aspect-sentiment patterns
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2021-01-02
    Yassien Shaalan, Xiuzhen Zhang, Jeffrey Chan, Mahsa Salehi

    Customer reviews are an essential source of information to consumers. Meanwhile, opinion spams spread widely and the detection of spam reviews becomes critically important for ensuring the integrity of the echo system of online reviews. Singleton spam reviews—one-time reviews—have spread widely of late as spammers can create multiple accounts to purposefully cheat the system. Most available techniques

    更新日期:2021-01-03
  • The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-12-18
    Alejandro Pasos Ruiz, Michael Flynn, James Large, Matthew Middlehurst, Anthony Bagnall

    Time Series Classification (TSC) involves building predictive models for a discrete target variable from ordered, real valued, attributes. Over recent years, a new set of TSC algorithms have been developed which have made significant improvement over the previous state of the art. The main focus has been on univariate TSC, i.e. the problem where each case has a single series and a class label. In reality

    更新日期:2020-12-18
  • Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-12-10
    Yuan Jin, Ming Liu, Yunfeng Li, Ruohua Xu, Lan Du, Longxiang Gao, Yong Xiang

    Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better

    更新日期:2020-12-10
  • SMILE : a feature-based temporal abstraction framework for event-interval sequence classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-11-23
    Jonathan Rebane, Isak Karlsson, Leon Bornemann, Panagiotis Papapetrou

    In this paper, we study the problem of classification of sequences of temporal intervals. Our main contribution is a novel framework, which we call SMILE, for extracting relevant features from interval sequences to construct classifiers.SMILE introduces the notion of utilizing random temporal abstraction features, we define as e-lets, as a means to capture information pertaining to class-discriminatory

    更新日期:2020-11-23
  • A survey of deep network techniques all classifiers can adopt
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-11-17
    Alireza Ghods, Diane J. Cook

    Deep neural networks (DNNs) have introduced novel and useful tools to the machine learning community. Other types of classifiers can potentially make use of these tools as well to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Non-neural network classifiers can employ

    更新日期:2020-11-18
  • Mining explainable local and global subgraph patterns with surprising densities
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-11-10
    Junning Deng, Bo Kang, Jefrey Lijffijt, Tijl De Bie

    The connectivity structure of graphs is typically related to the attributes of the vertices. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form ‘the subgroup of

    更新日期:2020-11-12
  • Natural language techniques supporting decision modelers
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-11-06
    Leticia Arco, Gonzalo Nápoles, Frank Vanhoenshoven, Ana Laura Lara, Gladys Casas, Koen Vanhoof

    Decision Model and Notation (DMN) has become a relevant topic for organizations since it allows users to control their processes and organizational decisions. The increasing use of DMN decision tables to capture critical business knowledge raises the need for supporting analysis tasks such as the extraction of inputs, outputs and their relations from natural language descriptions. In this paper, we

    更新日期:2020-11-06
  • An exemplar-based clustering using efficient variational message passing
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-10-28
    Mohamed Hamza Ibrahim, Rokia Missaoui

    Clustering is a crucial step in scientific data analysis and engineering systems. Thus, an efficient cluster analysis method often remains a key challenge. In this paper, we introduce a general purpose exemplar-based clustering method called (MEGA), which performs a novel message-passing strategy based on variational expectation–maximization and generalized arc-consistency techniques. Unlike message

    更新日期:2020-10-30
  • A survey of community detection methods in multilayer networks
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-10-13
    Xinyu Huang, Dongming Chen, Tao Ren, Dongqi Wang

    Community detection is one of the most popular researches in a variety of complex systems, ranging from biology to sociology. In recent years, there’s an increasing focus on the rapid development of more complicated networks, namely multilayer networks. Communities in a single-layer network are groups of nodes that are more strongly connected among themselves than the others, while in multilayer networks

    更新日期:2020-10-13
  • The network-untangling problem: from interactions to activity timelines
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-10-03
    Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

    In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge \((u,v,t)\in E\) denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (u

    更新日期:2020-10-04
  • For real: a thorough look at numeric attributes in subgroup discovery
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-21
    Marvin Meeng, Arno Knobbe

    Subgroup discovery (SD) is an exploratory pattern mining paradigm that comes into its own when dealing with large real-world data, which typically involves many attributes, of a mixture of data types. Essential is the ability to deal with numeric attributes, whether they concern the target (a regression setting) or the description attributes (by which subgroups are identified). Various specific algorithms

    更新日期:2020-09-22
  • Recency-based sequential pattern mining in multiple event sequences
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-20
    Hakkyu Kim, Dong-Wan Choi

    The standard sequential pattern mining scheme hardly considers the positions of events in a sequence, and therefore it is difficult to focus on more interesting patterns that represent better the causal relationships between events. Without quantifying how close two events are in a sequence, we may fail to evaluate how likely an event is caused by the others from the pattern, which is a severe drawback

    更新日期:2020-09-20
  • Online summarization of dynamic graphs using subjective interestingness for sequential data
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-09
    Sarang Kapoor, Dhish Kumar Saxena, Matthijs van Leeuwen

    Many real-world phenomena can be represented as dynamic graphs, i.e., networks that change over time. The problem of dynamic graph summarization, i.e., to succinctly describe the evolution of a dynamic graph, has been widely studied. Existing methods typically use objective measures to find fixed structures such as cliques, stars, and cores. Most of the methods, however, do not consider the problem

    更新日期:2020-09-10
  • DeepTable: a permutation invariant neural network for table orientation classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-08
    Maryam Habibi, Johannes Starlinger, Ulf Leser

    Tables are a common way to present information in an intuitive and concise manner. They are used extensively in media such as scientific articles or web pages. Automatically analyzing the content of tables bears special challenges. One of the most basic tasks is determination of the orientation of a table: In column tables, columns represent one entity with the different attribute values present in

    更新日期:2020-09-08
  • InceptionTime: Finding AlexNet for time series classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-07
    Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, François Petitjean

    This paper brings deep learning at the forefront of research into time series classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate,

    更新日期:2020-09-08
  • CrawlSN: community-aware data acquisition with maximum willingness in online social networks
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-09-08
    Bay-Yuan Hsu; Chia-Lin Tu; Ming-Yi Chang; Chih-Ya Shen

    Real social network datasets with community structures are critical for evaluating various algorithms in Online Social Networks (OSNs). However, obtaining such community data from OSNs has recently become increasingly challenging due to privacy issues and government regulations. In this paper, we thus make our first attempt to address two important factors, i.e., user willingness and existence of community

    更新日期:2020-09-08
  • Bayesian mean-parameterized nonnegative binary matrix factorization
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-08-30
    Alberto Lumbreras, Louis Filstroff, Cédric Févotte

    Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually

    更新日期:2020-08-30
  • Correction to: A unified view of density-based methods for semi-supervised clustering and classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-08-03
    Jadson Castro Gertrudes, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello

    The article, A unified view of density-based methods for semi-supervised.

    更新日期:2020-08-03
  • Simple and effective neural-free soft-cluster embeddings for item cold-start recommendations
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-08-03
    Shameem A. Puthiya Parambath; Sanjay Chawla

    Recommender systems are widely used in online platforms for easy exploration of personalized content. The best available recommendation algorithms are based on using the observed preference information among collaborating entities. A significant challenge in recommender system continues to be item cold-start recommendation: how to effectively recommend items with no observed or past preference information

    更新日期:2020-08-03
  • A unified view of density-based methods for semi-supervised clustering and classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-27
    Jadson Castro Gertrudes; Arthur Zimek; Jörg Sander; Ricardo J. G. B. Campello

    Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we first introduce a unified view of density-based clustering algorithms. We then build upon this view and bridge the areas

    更新日期:2020-07-27
  • MIDIA: exploring denoising autoencoders for missing data imputation
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-25
    Qian Ma, Wang-Chien Lee, Tao-Yang Fu, Yu Gu, Ge Yu

    Due to the ubiquitous presence of missing values (MVs) in real-world datasets, the MV imputation problem, aiming to recover MVs, is an important and fundamental data preprocessing step for various data analytics and mining tasks to effectively achieve good performance. To impute MVs, a typical idea is to explore the correlations amongst the attributes of the data. However, those correlations are usually

    更新日期:2020-07-25
  • Deep soccer analytics: learning an action-value function for evaluating soccer players
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-21
    Guiliang Liu; Yudong Luo; Oliver Schulte; Tarak Kharrat

    Given the large pitch, numerous players, limited player turnovers, and sparse scoring, soccer is arguably the most challenging to analyze of all the major team sports. In this work, we develop a new approach to evaluating all types of soccer actions from play-by-play event data. Our approach utilizes a Deep Reinforcement Learning (DRL) model to learn an action-value Q-function. To our knowledge, this

    更新日期:2020-07-21
  • Active learning for hierarchical multi-label classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-17
    Felipe Kenji Nakano; Ricardo Cerri; Celine Vens

    Due to technological advances, a massive amount of data is produced daily, presenting challenges for application areas where data needs to be labelled by a domain specialist or by expensive procedures, in order to be useful for supervised machine learning purposes. In order to select which data points will provide more information when labelled, one can make use of active learning methods. Active learning

    更新日期:2020-07-17
  • An efficient K -means clustering algorithm for tall data
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-15
    Marco Capó; Aritz Pérez; Jose A. Lozano

    The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. Therefore, the development of efficient and parallel algorithms to perform such an analysis is a a crucial topic in unsupervised learning. Cluster analysis algorithms are a key element of exploratory data analysis and, among them, the K-means algorithm stands out as the most popular approach

    更新日期:2020-07-15
  • ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-13
    Angus Dempster; François Petitjean; Geoffrey I. Webb

    Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification

    更新日期:2020-07-13
  • Challenges in benchmarking stream learning algorithms with real-world data
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-07
    Vinicius M. A. Souza, Denis M. dos Reis, André G. Maletzke, Gustavo E. A. P. A. Batista

    Streaming data are increasingly present in real-world applications such as sensor measurements, satellite data feed, stock market, and financial data. The main characteristics of these applications are the online arrival of data observations at high speed and the susceptibility to changes in the data distributions due to the dynamic nature of real environments. The data stream mining community still

    更新日期:2020-07-07
  • Visualizing image content to explain novel image discovery
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-06
    Jake H. Lee, Kiri L. Wagstaff

    The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting

    更新日期:2020-07-07
  • Credible seed identification for large-scale structural network alignment
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-03
    Chenxu Wang, Yang Wang, Zhiyuan Zhao, Dong Qin, Xiapu Luo, Tao Qin

    Structural network alignment utilizes the topological structure information to find correspondences between nodes of two networks. Researchers have proposed a line of useful algorithms which usually require a prior mapping of seeds acting as landmark points to align the rest nodes. Several seed-free algorithms are developed to solve the cold-start problem. However, existing approaches suffer high computational

    更新日期:2020-07-03
  • Introducing time series snippets: a new primitive for summarizing long time series
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-02
    Shima Imani, Frank Madrid, Wei Ding, Scott E. Crouter, Eamonn Keogh

    The first question a data analyst asks when confronting a new dataset is often, “Show me some representative/typical data.” Answering this question is simple in many domains, with random samples or aggregate statistics of some kind. Surprisingly, it is difficult for large time series datasets. The major difficulty is not time or space complexity, but defining what it means to be representative data

    更新日期:2020-07-03
  • Gaussian bandwidth selection for manifold learning and classification.
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-07-02
    Ofir Lindenbaum,Moshe Salhov,Arie Yeredor,Amir Averbuch

    Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel’s scale parameter, also referred to as the kernel’s bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification

    更新日期:2020-07-02
  • Large-scale network motif analysis using compression
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-23
    Peter Bloem; Steven de Rooij

    We introduce a new method for finding network motifs. Subgraphs are motifs when their frequency in the data is high compared to the expected frequency under a null model. To compute this expectation, a full or approximate count of the occurrences of a motif is normally repeated on as many as 1000 random graphs sampled from the null model; a prohibitively expensive step. We use ideas from the minimum

    更新日期:2020-06-23
  • Treant : training evasion-aware decision trees
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-21
    Stefano Calzavara; Claudio Lucchese; Gabriele Tolomei; Seyum Assefa Abebe; Salvatore Orlando

    Despite its success and popularity, machine learning is now recognized as vulnerable to evasion attacks, i.e., carefully crafted perturbations of test inputs designed to force prediction errors. In this paper we focus on evasion attacks against decision tree ensembles, which are among the most successful predictive models for dealing with non-perceptual problems. Even though they are powerful and interpretable

    更新日期:2020-06-21
  • Scalable attack on graph data by injecting vicious nodes
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-17
    Jihong Wang; Minnan Luo; Fnu Suya; Jundong Li; Zijiang Yang; Qinghua Zheng

    Recent studies have shown that graph convolution networks (GCNs) are vulnerable to carefully designed attacks, which aim to cause misclassification of a specific node on the graph with unnoticeable perturbations. However, a vast majority of existing works cannot handle large-scale graphs because of their high time complexity. Additionally, existing works mainly focus on manipulating existing nodes

    更新日期:2020-06-17
  • Comparison of novelty detection methods for multispectral images in rover-based planetary exploration missions
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-16
    Hannah R. Kerner, Kiri L. Wagstaff, Brian D. Bue, Danika F. Wellington, Samantha Jacob, Paul Horton, James F. Bell, Chiman Kwan, Heni Ben Amor

    Science teams for rover-based planetary exploration missions like the Mars Science Laboratory Curiosity rover have limited time for analyzing new data before making decisions about follow-up observations. There is a need for systems that can rapidly and intelligently extract information from planetary instrument datasets and focus attention on the most promising or novel observations. Several novelty

    更新日期:2020-06-16
  • TEASER: early and accurate time series classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-16
    Patrick Schäfer; Ulf Leser

    Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest possible accuracy. The most critical issue of any eTSC method is to decide when enough data of a time series has been seen to take a decision: Waiting for more data points usually makes the classification problem easier but delays the time in which a classification

    更新日期:2020-06-16
  • Efficient mining of the most significant patterns with permutation testing
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-09
    Leonardo Pellegrina; Fabio Vandin

    The extraction of patterns displaying significant association with a class label is a key data mining task with wide application in many domains. We introduce and study a variant of the problem that requires to mine the top-k statistically significant patterns, thus providing tight control on the number of patterns reported in output. We develop TopKWY, the first algorithm to mine the top-k significant

    更新日期:2020-06-09
  • ColluEagle: collusive review spammer detection using Markov random fields
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-06
    Zhuo Wang, Runlong Hu, Qian Chen, Pei Gao, Xiaowei Xu

    Product reviews are extremely valuable for online shoppers in providing purchase decisions. Driven by immense profit incentives, fraudsters deliberately fabricate untruthful reviews to distort the reputation of online products. As online reviews become more and more important, group spamming, i.e., a team of fraudsters working collaboratively to attack a set of target products, becomes a new fashion

    更新日期:2020-06-06
  • TEAGS: time-aware text embedding approach to generate subgraphs
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-03
    Saeid Hosseini; Saeed Najafipour; Ngai-Man Cheung; Hongzhi Yin; Mohammad Reza Kangavari; Xiaofang Zhou

    Contagions (e.g. virus and gossip) spread over the nodes in propagation graphs. We can use temporal-textual contents of nodes to compute the edge weights and generate subgraphs with highly relevant nodes. This is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between each pair of nodes may change by time. Second, not always the same contagion propagates. Hence

    更新日期:2020-06-03
  • ABBA: adaptive Brownian bridge-based symbolic aggregation of time series
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-06-03
    Steven Elsworth; Stefan Güttel

    A new symbolic representation of time series, called ABBA, is introduced. It is based on an adaptive polygonal chain approximation of the time series into a sequence of tuples, followed by a mean-based clustering to obtain the symbolic representation. We show that the reconstruction error of this representation can be modelled as a random walk with pinned start and end points, a so-called Brownian

    更新日期:2020-06-03
  • An ultra-fast time series distance measure to allow data mining in more complex real-world deployments
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-05-30
    Shaghayegh Gharghabi; Shima Imani; Anthony Bagnall; Amirali Darvishzadeh; Eamonn Keogh

    At their core, many time series data mining algorithms reduce to reasoning about the shapes of time series subsequences. This requires an effective distance measure, and for last two decades most algorithms use Euclidean distance or DTW as their core subroutine. We argue that these distance measures are not as robust as the community seems to believe. The undue faith in these measures perhaps derives

    更新日期:2020-05-30
  • struc2gauss : Structural role preserving network embedding via Gaussian embedding
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-05-12
    Yulong Pei; Xin Du; Jianpeng Zhang; George Fletcher; Mykola Pechenizkiy

    Network embedding (NE) is playing a principal role in network mining, due to its ability to map nodes into efficient low-dimensional embedding vectors. However, two major limitations exist in state-of-the-art NE methods: role preservation and uncertainty modeling. Almost all previous methods represent a node into a point in space and focus on local structural information, i.e., neighborhood information

    更新日期:2020-05-12
  • Matrix profile goes MAD: variable-length motif and discord discovery in data series
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-05-07
    Michele Linardi; Yan Zhu; Themis Palpanas; Eamonn Keogh

    In the last 15 years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the choice

    更新日期:2020-05-07
  • Counting frequent patterns in large labeled graphs: a hypergraph-based approach
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-05-05
    Jinghan Meng; Napath Pitaksirianan; Yi-Cheng Tu

    In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for designing support measures that brings together

    更新日期:2020-05-05
  • The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-31
    Yan Zhu; Shaghayegh Gharghabi; Diego Furtado Silva; Hoang Anh Dau; Chin-Chia Michael Yeh; Nader Shakibay Senobari; Abdulaziz Almaslukh; Kaveh Kamgar; Zachary Zimmerman; Gareth Funning; Abdullah Mueen; Eamonn Keogh

    The recently introduced data structure, the Matrix Profile, annotates a time series by recording the location of and distance to the nearest neighbor of every subsequence. This information trivially provides answers to queries for both time series motifs and time series discords, perhaps two of the most frequently used primitives in time series data mining. One attractive feature of the Matrix Profile

    更新日期:2020-03-31
  • Guided sampling for large graphs
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-18
    Muhammad Irfan Yousuf; Suhyun Kim

    Large real-world graphs claim lots of resources in terms of memory and computational power to study them and this makes their full analysis extremely challenging. In order to understand the structure and properties of these graphs, we intend to extract a small representative subgraph from a big graph while preserving its topology and characteristics. In this work, we aim at producing good samples with

    更新日期:2020-03-18
  • ptype: probabilistic type inference
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-16
    Taha Ceritli; Christopher K. I. Williams; James Geddes

    Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method outperforms

    更新日期:2020-03-16
  • Computing exact P-values for community detection
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-16
    Zengyou He; Hao Liang; Zheng Chen; Can Zhao; Yan Liu

    Community detection is one of the most important issues in modern network science. Although numerous community detection algorithms have been proposed during the past decades, how to assess the statistical significance of one single community analytically and exactly still remains an open problem. In this paper, we present an analytical solution to calculate the exact p-value of a single community

    更新日期:2020-03-16
  • Discrete-time survival forests with Hellinger distance decision trees
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-14
    Matthias Schmid; Thomas Welchowski; Marvin N. Wright; Moritz Berger

    Random survival forests (RSF) are a powerful nonparametric method for building prediction models with a time-to-event outcome. RSF do not rely on the proportional hazards assumption and can be readily applied to both low- and higher-dimensional data. A remaining limitation of RSF, however, arises from the fact that the method is almost entirely focussed on continuously measured event times. This issue

    更新日期:2020-03-14
  • TS-CHIEF: a scalable and accurate forest algorithm for time series classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-03-05
    Ahmed Shifaz; Charlotte Pelletier; François Petitjean; Geoffrey I. Webb

    Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails

    更新日期:2020-03-05
  • Robust and sparse multigroup classification by the optimal scoring approach
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-02-20
    Irene Ortner; Peter Filzmoser; Christophe Croux

    We propose a robust and sparse classification method based on the optimal scoring approach. It is also applicable if the number of variables exceeds the number of observations. The data are first projected into a low dimensional subspace according to an optimal scoring criterion. The projection only includes a subset of the original variables (sparse modeling) and is not distorted by outliers (robust

    更新日期:2020-02-20
  • Model-based exception mining for object-relational data
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-02-19
    Fatemeh Riahi; Oliver Schulte

    This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously

    更新日期:2020-02-19
  • Fair-by-design matching
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-02-04
    David García-Soriano; Francesco Bonchi

    Matching algorithms are used routinely to match donors to recipients for solid organs transplantation, for the assignment of medical residents to hospitals, record linkage in databases, scheduling jobs on machines, network switching, online advertising, and image recognition, among others. Although many optimal solutions may exist to a given matching problem, when the elements that shall or not be

    更新日期:2020-02-04
  • MasterMovelets: discovering heterogeneous movelets for multiple aspect trajectory classification
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-01-30
    Carlos Andres Ferrero; Lucas May Petry; Luis Otavio Alvares; Camila Leite da Silva; Willian Zalewski; Vania Bogorny

    In the last few years trajectory classification has been applied to many real problems, basically considering the dimensions of space and time or attributes inferred from these dimensions. However, with the explosion of social media data and the advances in the semantic enrichment of mobility data, a new type of trajectory data has emerged, and the trajectory spatio-temporal points have now multiple

    更新日期:2020-01-30
  • Exceptional spatio-temporal behavior mining through Bayesian non-parametric modeling
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-01-29
    Xin Du; Yulong Pei; Wouter Duivesteijn; Mykola Pechenizkiy

    Collective social media provides a vast amount of geo-tagged social posts, which contain various records on spatio-temporal behavior. Modeling spatio-temporal behavior on collective social media is an important task for applications like tourism recommendation, location prediction and urban planning. Properly accomplishing this task requires a model that allows for diverse behavioral patterns on each

    更新日期:2020-01-29
  • NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
    Data Min. Knowl. Discov. (IF 2.629) Pub Date : 2020-01-21
    Thomas Guyet; René Quiniou

    Sequential pattern mining is concerned with the extraction of frequent or recurrent behaviors, modeled as subsequences, from a sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. events that really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting observed events. Negative sequential patterns (NSPs)

    更新日期:2020-01-21
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
微生物研究
亚洲大洋洲地球科学
NPJ欢迎投稿
自然科研论文编辑
ERIS期刊投稿
欢迎阅读创刊号
自然职场,为您触达千万科研人才
spring&清华大学出版社
城市可持续发展前沿研究专辑
Springer 纳米技术权威期刊征稿
全球视野覆盖
施普林格·自然新
chemistry
物理学研究前沿热点精选期刊推荐
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
阿拉丁试剂right
上海中医药大学
清华大学
复旦大学
南科大
北京理工大学
清华
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
清华大学-1
武汉大学
浙江大学
天合科研
x-mol收录
试剂库存
down
wechat
bug