当前期刊: Data Mining and Knowledge Discovery Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-31
    Yan Zhu, Shaghayegh Gharghabi, Diego Furtado Silva, Hoang Anh Dau, Chin-Chia Michael Yeh, Nader Shakibay Senobari, Abdulaziz Almaslukh, Kaveh Kamgar, Zachary Zimmerman, Gareth Funning, Abdullah Mueen, Eamonn Keogh

    Abstract The recently introduced data structure, the Matrix Profile, annotates a time series by recording the location of and distance to the nearest neighbor of every subsequence. This information trivially provides answers to queries for both time series motifs and time series discords, perhaps two of the most frequently used primitives in time series data mining. One attractive feature of the Matrix

    更新日期:2020-03-31
  • Guided sampling for large graphs
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-18
    Muhammad Irfan Yousuf, Suhyun Kim

    Abstract Large real-world graphs claim lots of resources in terms of memory and computational power to study them and this makes their full analysis extremely challenging. In order to understand the structure and properties of these graphs, we intend to extract a small representative subgraph from a big graph while preserving its topology and characteristics. In this work, we aim at producing good

    更新日期:2020-03-19
  • ptype: probabilistic type inference
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-16
    Taha Ceritli, Christopher K. I. Williams, James Geddes

    Abstract Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method

    更新日期:2020-03-16
  • Computing exact P-values for community detection
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-16
    Zengyou He, Hao Liang, Zheng Chen, Can Zhao, Yan Liu

    Abstract Community detection is one of the most important issues in modern network science. Although numerous community detection algorithms have been proposed during the past decades, how to assess the statistical significance of one single community analytically and exactly still remains an open problem. In this paper, we present an analytical solution to calculate the exact p-value of a single community

    更新日期:2020-03-16
  • Discrete-time survival forests with Hellinger distance decision trees
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-14
    Matthias Schmid, Thomas Welchowski, Marvin N. Wright, Moritz Berger

    Abstract Random survival forests (RSF) are a powerful nonparametric method for building prediction models with a time-to-event outcome. RSF do not rely on the proportional hazards assumption and can be readily applied to both low- and higher-dimensional data. A remaining limitation of RSF, however, arises from the fact that the method is almost entirely focussed on continuously measured event times

    更新日期:2020-03-16
  • An efficient K -means clustering algorithm for tall data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-10
    Marco Capó, Aritz Pérez, Jose A. Lozano

    Abstract The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. Therefore, the development of efficient and parallel algorithms to perform such an analysis is a a crucial topic in unsupervised learning. Cluster analysis algorithms are a key element of exploratory data analysis and, among them, the K-means algorithm stands out as the most popular

    更新日期:2020-03-10
  • TS-CHIEF: a scalable and accurate forest algorithm for time series classification
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-03-05
    Ahmed Shifaz, Charlotte Pelletier, François Petitjean, Geoffrey I. Webb

    Abstract Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning

    更新日期:2020-03-06
  • Robust and sparse multigroup classification by the optimal scoring approach
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-02-20
    Irene Ortner, Peter Filzmoser, Christophe Croux

    Abstract We propose a robust and sparse classification method based on the optimal scoring approach. It is also applicable if the number of variables exceeds the number of observations. The data are first projected into a low dimensional subspace according to an optimal scoring criterion. The projection only includes a subset of the original variables (sparse modeling) and is not distorted by outliers

    更新日期:2020-02-20
  • Model-based exception mining for object-relational data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-02-19
    Fatemeh Riahi, Oliver Schulte

    Abstract This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been

    更新日期:2020-02-19
  • Fair-by-design matching
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-02-04
    David García-Soriano, Francesco Bonchi

    Abstract Matching algorithms are used routinely to match donors to recipients for solid organs transplantation, for the assignment of medical residents to hospitals, record linkage in databases, scheduling jobs on machines, network switching, online advertising, and image recognition, among others. Although many optimal solutions may exist to a given matching problem, when the elements that shall or

    更新日期:2020-02-04
  • MasterMovelets: discovering heterogeneous movelets for multiple aspect trajectory classification
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-30
    Carlos Andres Ferrero, Lucas May Petry, Luis Otavio Alvares, Camila Leite da Silva, Willian Zalewski, Vania Bogorny

    Abstract In the last few years trajectory classification has been applied to many real problems, basically considering the dimensions of space and time or attributes inferred from these dimensions. However, with the explosion of social media data and the advances in the semantic enrichment of mobility data, a new type of trajectory data has emerged, and the trajectory spatio-temporal points have now

    更新日期:2020-01-31
  • Exceptional spatio-temporal behavior mining through Bayesian non-parametric modeling
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-29
    Xin Du, Yulong Pei, Wouter Duivesteijn, Mykola Pechenizkiy

    Abstract Collective social media provides a vast amount of geo-tagged social posts, which contain various records on spatio-temporal behavior. Modeling spatio-temporal behavior on collective social media is an important task for applications like tourism recommendation, location prediction and urban planning. Properly accomplishing this task requires a model that allows for diverse behavioral patterns

    更新日期:2020-01-30
  • NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-21
    Thomas Guyet, René Quiniou

    Abstract Sequential pattern mining is concerned with the extraction of frequent or recurrent behaviors, modeled as subsequences, from a sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. events that really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting observed events. Negative sequential patterns

    更新日期:2020-01-22
  • Relaxing the strong triadic closure problem for edge strength inference
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-17
    Florian Adriaens, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt, Antonis Matakos, Polina Rozenshtein

    Abstract Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical to obtain. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network

    更新日期:2020-01-17
  • A survey and benchmarking study of multitreatment uplift modeling
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-13
    Diego Olaya, Kristof Coussement, Wouter Verbeke

    Uplift modeling is an instrument used to estimate the change in outcome due to a treatment at the individual entity level. Uplift models assist decision-makers in optimally allocating scarce resources. This allows the selection of the subset of entities for which the effect of a treatment will be largest and, as such, the maximization of the overall returns. The literature on uplift modeling mostly

    更新日期:2020-01-13
  • Topical network embedding
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-24
    Min Shi, Yufei Tang, Xingquan Zhu, Jianxun Liu, Haibo He

    Networked data involve complex information from multifaceted channels, including topology structures, node content, and/or node labels etc., where structure and content are often correlated but are not always consistent. A typical scenario is the citation relationships in scholarly publications where a paper is cited by others not because they have the same content, but because they share one or multiple

    更新日期:2020-01-08
  • Grafting for combinatorial binary model using frequent itemset mining
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-28
    Taito Lee, Shin Matsushima, Kenji Yamanishi

    Abstract We consider the class of linear predictors over all logical conjunctions of binary attributes, which we refer to as the class of combinatorial binary models (CBMs) in this paper. CBMs are of high knowledge interpretability but naïve learning of them from labeled data requires exponentially high computational cost with respect to the length of the conjunctions. On the other hand, in the case

    更新日期:2020-01-08
  • Interactive visual data exploration with subjective feedback: an information-theoretic approach
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-03
    Kai Puolamäki, Emilia Oikarinen, Bo Kang, Jefrey Lijffijt, Tijl De Bie

    Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing projection methods for data visualization use predefined criteria to choose the representation of data. There is a lack of methods that (i) use information on what the user has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical

    更新日期:2020-01-08
  • A comparative study of data-dependent approaches without learning in measuring similarities of data objects
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-30
    Sunil Aryal, Kai Ming Ting, Takashi Washio, Gholamreza Haffari

    Abstract Conventional general-purpose distance-based similarity measures, such as Minkowski distance (also known as \(\ell _p\)-norm with \(p>0\)), are data-independent and sensitive to units or scales of measurement. There are existing general-purpose data-dependent measures, such as rank difference, Lin’s probabilistic measure and \(m_p\)-dissimilarity (\(p>0\)), which are not sensitive to units

    更新日期:2020-01-08
  • A semi-supervised model for knowledge graph embedding
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-09-24
    Jia Zhu, Zetao Zheng, Min Yang, Gabriel Pui Cheong Fung, Yong Tang

    Knowledge graphs have shown increasing importance in broad applications such as question answering, web search, and recommendation systems. The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous low-dimensional vector spaces to perform various machine learning tasks. Most of the existing works only focused on the local structure of knowledge

    更新日期:2020-01-08
  • Matching code and law: achieving algorithmic fairness with optimal transport
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-01
    Meike Zehlike, Philipp Hacker, Emil Wiedemann

    Increasingly, discrimination by algorithms is perceived as a societal and legal problem. As a response, a number of criteria for implementing algorithmic fairness in machine learning have been developed in the literature. This paper proposes the continuous fairness algorithm \((\hbox {CFA}\theta )\) which enables a continuous interpolation between different fairness definitions. More specifically,

    更新日期:2020-01-08
  • A drift detection method based on dynamic classifier selection
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-11
    Felipe Pinagé, Eulanda M. dos Santos, João Gama

    Abstract Machine learning algorithms can be applied to several practical problems, such as spam, fraud and intrusion detection, and customer preferences, among others. In most of these problems, data come in streams, which mean that data distribution may change over time, leading to concept drift. The literature is abundant on providing supervised methods based on error monitoring for explicit drift

    更新日期:2020-01-08
  • Parameterized low-rank binary matrix approximation
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-02
    Fedor V. Fomin, Petr A. Golovach, Fahad Panolan

    Low-rank binary matrix approximation is a generic problem where one seeks a good approximation of a binary matrix by another binary matrix with some specific properties. A good approximation means that the difference between the two matrices in some matrix norm is small. The properties of the approximation binary matrix could be: a small number of different columns, a small binary rank or a small Boolean

    更新日期:2020-01-04
  • Integer programming ensemble of temporal relations classifiers
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-02
    Catherine Kerr, Terri Hoare, Paula Carroll, Jakub Mareček

    The extraction of temporal events from text and the classification of temporal relations among both temporal events and time expressions are major challenges for the interface of data mining and natural language processing. We present an ensemble method, which reconciles the outputs of multiple heterogenous classifiers of temporal expressions. We use integer programming, a constrained optimisation

    更新日期:2020-01-04
  • Mining relaxed functional dependencies from data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-12-23
    Loredana Caruccio, Vincenzo Deufemia, Giuseppe Polese

    Relaxed functional dependencies (rfds) are properties expressing important relationships among data. Thanks to the introduction of approximations in data comparison and/or validity, they can capture constraints useful for several purposes, such as the identification of data inconsistencies or patterns of semantically related data. Nevertheless, rfds can provide benefits only if they can be automatically

    更新日期:2020-01-04
  • Identifying exceptional (dis)agreement between groups
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-26
    Adnene Belfodil, Sylvie Cazalens, Philippe Lamarre, Marc Plantevit

    Under the term behavioral data, we consider any type of data featuring individuals performing observable actions on entities. For instance, voting data depict parliamentarians who express their votes w.r.t. legislative procedures. In this work, we address the problem of discovering exceptional (dis)agreement patterns in such data, i.e., groups of individuals that exhibit an unexpected (dis)agreement

    更新日期:2020-01-04
  • SIAS-miner: mining subjectively interesting attributed subgraphs
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-22
    Anes Bendimerad, Ahmad Mel, Jefrey Lijffijt, Marc Plantevit, Céline Robardet, Tijl De Bie

    Abstract Data clustering, local pattern mining, and community detection in graphs are three mature areas of data mining and machine learning. In recent years, attributed subgraph mining has emerged as a new powerful data mining task in the intersection of these areas. Given a graph and a set of attributes for each vertex, attributed subgraph mining aims to find cohesive subgraphs for which (some of)

    更新日期:2020-01-04
  • On normalization and algorithm selection for unsupervised outlier detection
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-21
    Sevvandi Kandanaarachchi, Mario A. Muñoz, Rob J. Hyndman, Kate Smith-Miles

    This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform

    更新日期:2020-01-04
  • FastEE: Fast Ensembles of Elastic Distances for time series classification
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-18
    Chang Wei Tan, François Petitjean, Geoffrey I. Webb

    Abstract In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based

    更新日期:2020-01-04
  • Delayed labelling evaluation for data streams
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-16
    Maciej Grzenda, Heitor Murilo Gomes, Albert Bifet

    Abstract A large portion of the stream mining studies on classification rely on the availability of true labels immediately after making predictions. This approach is well exemplified by the test-then-train evaluation, where predictions immediately precede true label arrival. However, in many real scenarios, labels arrive with non-negligible latency. This raises the question of how to evaluate classifiers

    更新日期:2020-01-04
  • Deep multi-task learning for individuals origin–destination matrices estimation from census data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-11-12
    Mehdi Katranji, Sami Kraiem, Laurent Moalic, Guilhem Sanmarty, Ghazaleh Khodabandelou, Alexandre Caminada, Fouad Hadj Selem

    Abstract Rapid urbanization has made the estimation of the human mobility flows a substantial task for transportation and urban planners. Worker and student mobility flows are among the most weekly regular displacements and consequently generate road congestion issues. With urge of demands on efficient transport planning policies, estimating their commuting facilitates the decision-making processes

    更新日期:2020-01-04
  • Correction to: Domain agnostic online semantic segmentation for multi-dimensional time series
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-02-14
    Shaghayegh Gharghabi, Chin-Chia Michael Yeh, Yifei Ding, Wei Ding, Paul Hibbing, Samuel LaMunion, Andrew Kaplan, Scott E. Crouter, Eamonn Keogh

    The article Domain agnostic online semantic segmentation for multi-dimensional time series, written by Shaghayegh Gharghabi, Chin-Chia Michael Yeh, Yifei Ding, Wei Ding, Paul Hibbing, Samuel LaMunion, Andrew Kaplan, Scott E. Crouter, Eamonn Keogh was originally published electronically on the publisher’s internet portal (currently SpringerLink) on 25 September 2018 without open access.

    更新日期:2020-01-04
  • Efficient mixture model for clustering of sparse high dimensional binary data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-06-01
    Marek Śmieja, Krzysztof Hajto, Jacek Tabor

    Clustering is one of the fundamental tools for preliminary analysis of data. While most of the clustering methods are designed for continuous data, sparse high-dimensional binary representations became very popular in various domains such as text mining or cheminformatics. The application of classical clustering tools to this type of data usually proves to be very inefficient, both in terms of computational

    更新日期:2020-01-04
  • catch22 : CAnonical Time-series CHaracteristics
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-08-09
    Carl H. Lubba, Sarab S. Sethi, Philip Knaute, Simon R. Schultz, Ben D. Fulcher, Nick S. Jones

    Abstract Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library

    更新日期:2020-01-04
  • A unifying view of explicit and implicit feature maps of graph kernels
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-09-17
    Nils M. Kriege, Marion Neumann, Christopher Morris, Kristian Kersting, Petra Mutzel

    Abstract Non-linear kernel methods can be approximated by fast linear ones using suitable explicit feature maps allowing their application to large scale problems. We investigate how convolution kernels for structured data are composed from base kernels and construct corresponding feature maps. On this basis we propose exact and approximative feature maps for widely used graph kernels based on the

    更新日期:2020-01-04
  • A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-06-17
    James Large, Jason Lines, Anthony Bagnall

    Abstract Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real-world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed

    更新日期:2020-01-04
  • SAZED: parameter-free domain-agnostic season length estimation in time series data
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-07-26
    Maximilian Toller, Tiago Santos, Roman Kern

    Abstract Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may

    更新日期:2020-01-04
  • Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applications
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-07-20
    Domenico Saccá, Edoardo Serra, Antonino Rullo

    Abstract The development of novel platforms and techniques for emerging “Big Data” applications requires the availability of real-life datasets for data-driven experiments, which are however not accessible in most cases for various reasons, e.g., confidentiality, privacy or simply insufficient availability. An interesting solution to ensure high quality experimental findings is to synthesize datasets

    更新日期:2020-01-04
  • Contextual bandits with hidden contexts: a focused data capture from social media streams
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-08-10
    Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari

    This paper addresses the problem of real time data capture from social media. Due to different limitations, it is not possible to collect all the data produced by social networks such as Twitter. Therefore, to be able to gather enough relevant information related to a predefined need, it is necessary to focus on a subset of the information sources. In this work, we focus on user-centered data capture

    更新日期:2020-01-04
  • Attributed network embedding via subspace discovery
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-08-26
    Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang

    Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology structure to learn node representations, recently proposed attributed network embedding algorithms attempt to integrate rich node content information with network

    更新日期:2020-01-04
  • Dynamics reconstruction and classification via Koopman features
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-06-24
    Wei Zhang, Yao-Chi Yu, Jr-Shin Li

    Abstract Knowledge discovery and information extraction of large and complex datasets has attracted great attention in wide-ranging areas from statistics and biology to medicine. Tools from machine learning, data mining, and neurocomputing have been extensively explored and utilized to accomplish such compelling data analytics tasks. However, for time-series data presenting active dynamic characteristics

    更新日期:2020-01-04
  • Wrangling messy CSV files by detecting row and type patterns
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-07-26
    G. J. J. van den Burg, A. Nazábal, C. Sutton

    Abstract Data scientists spend the majority of their time on preparing data for analysis. One of the first steps in this preparation phase is to load the data from the raw storage format. Comma-separated value (CSV) files are a popular format for tabular data due to their simplicity and ostensible ease of use. However, formatting standards for CSV files are not followed consistently, so each file requires

    更新日期:2020-01-04
  • A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-10-22
    James Large,Jason Lines,Anthony Bagnall

    Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real-world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through

    更新日期:2019-11-01
  • Domain agnostic online semantic segmentation for multi-dimensional time series.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2019-03-05
    Shaghayegh Gharghabi,Chin-Chia Michael Yeh,Yifei Ding,Wei Ding,Paul Hibbing,Samuel LaMunion,Andrew Kaplan,Scott E Crouter,Eamonn Keogh

    Unsupervised semantic segmentation in the time series domain is a much studied problem due to its potential to detect unexpected regularities and regimes in poorly understood data. However, the current techniques have several shortcomings, which have limited the adoption of time series semantic segmentation beyond academic settings for four primary reasons. First, most methods require setting/learning

    更新日期:2019-11-01
  • Data-driven generation of spatio-temporal routines in human mobility.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2018-01-01
    Luca Pappalardo,Filippo Simini

    The generation of realistic spatio-temporal trajectories of human mobility is of fundamental importance in a wide range of applications, such as the developing of protocols for mobile ad-hoc networks or what-if analysis in urban ecosystems. Current generative algorithms fail in accurately reproducing the individuals' recurrent schedules and at the same time in accounting for the possibility that individuals

    更新日期:2019-11-01
  • Generalizing DTW to the multi-dimensional case requires an adaptive approach.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2017-11-07
    Mohammad Shokoohi-Yekta,Bing Hu,Hongxia Jin,Jun Wang,Eamonn Keogh

    In recent years Dynamic Time Warping (DTW) has emerged as the distance measure of choice for virtually all time series data mining applications. For example, virtually all applications that process data from wearable devices use DTW as a core sub-routine. This is the result of significant progress in improving DTW's efficiency, together with multiple empirical studies showing that DTW-based classifiers

    更新日期:2019-11-01
  • The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2017-01-01
    Anthony Bagnall,Jason Lines,Aaron Bostrom,James Large,Eamonn Keogh

    In the last 5 years there have been a large number of new time series classification algorithms proposed in the literature. These algorithms have been evaluated on subsets of the 47 data sets in the University of California, Riverside time series classification archive. The archive has recently been expanded to 85 data sets, over half of which have been donated by researchers at the University of East

    更新日期:2019-11-01
  • Visual Semantic Based 3D Video Retrieval System Using HDFS.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2016-12-23
    C Ranjith Kumar,S Suguna

    This paper brings out a neoteric frame of reference for visual semantic based 3d video search and retrieval applications. Newfangled 3D retrieval application spotlight on shape analysis like object matching, classification and retrieval not only sticking up entirely with video retrieval. In this ambit, we delve into 3D-CBVR (Content Based Video Retrieval) concept for the first time. For this purpose

    更新日期:2019-11-01
  • Inhibiting diffusion of complex contagions in social networks: theoretical and experimental results.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2015-03-01
    Chris J Kuhlman,V S Anil Kumar,Madhav V Marathe,S S Ravi,Daniel J Rosenkrantz

    We consider the problem of inhibiting undesirable contagions (e.g. rumors, spread of mob behavior) in social networks. Much of the work in this context has been carried out under the 1-threshold model, where diffusion occurs when a node has just one neighbor with the contagion. We study the problem of inhibiting more complex contagions in social networks where nodes may have thresholds larger than

    更新日期:2019-11-01
  • FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2012-05-29
    Keith Noto,Carla Brodley,Donna Slonim

    Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called "normal" instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised

    更新日期:2019-11-01
  • Sensor Selection to Support Practical Use of Health-Monitoring Smart Environments.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2011-07-16
    Diane J Cook,Lawrence B Holder

    The data mining and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their

    更新日期:2019-11-01
  • ECM-Aware Cell-Graph Mining for Bone Tissue Modeling and Classification.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2010-06-15
    Cemal Cagatay Bilgin,Peter Bullough,George E Plopper,Bülent Yener

    Pathological examination of a biopsy is the most reliable and widely used technique to diagnose bone cancer. However, it suffers from both inter- and intra- observer subjectivity. Techniques for automated tissue modeling and classification can reduce this subjectivity and increases the accuracy of bone cancer diagnosis. This paper presents a graph theoretical method, called extracellular matrix (ECM)-aware

    更新日期:2019-11-01
  • A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets.
    Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2003-07-01
    Greg Ridgeway,David Madigan

    Markov chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian analysis computationally practical. At the same time the increasing prevalence of massive datasets and the expansion of the field of data mining has created the need for statistically sound methods that scale to these large problems

    更新日期:2019-11-01
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
聚焦肿瘤,探索癌症
欢迎探索2019年最具下载量的材料科学论文
宅家赢大奖
向世界展示您的会议墙报和演示文稿
全球疫情及响应:BMC Medicine专题征稿
新版X-MOL期刊搜索和高级搜索功能介绍
化学材料学全球高引用
ACS材料视界
x-mol收录
自然科研论文编辑服务
南方科技大学
南方科技大学
舒伟
中国科学院长春应化所于聪-4-8
复旦大学
课题组网站
X-MOL
香港大学化学系刘俊治
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug