-
Wavelet Multidimensional Scaling Analysis of European Economic Sentiment Indicators J. Classif. (IF 1.156) Pub Date : 2021-01-09 Antonis A. Michis
We propose the use of wavelet coefficients, which are generated from nondecimated discreet wavelet transforms, to form a correlation-based dissimilarity measure in metric multidimensional scaling. This measure enables the construction of configurations depicting the associations between objects across different timescales. The proposed method is used to examine the similarities between the economic
-
Clustering Brain Signals: a Robust Approach Using Functional Data Ranking J. Classif. (IF 1.156) Pub Date : 2020-11-18 Tianbo Chen, Ying Sun, Carolina Euan, Hernando Ombao
In this paper, we analyze electroencephalograms (EEGs) which are recordings of brain electrical activity. We develop new clustering methods for identifying synchronized brain regions, where the EEGs show similar oscillations or waveforms according to their spectral densities. We treat the estimated spectral densities from many epochs or trials as functional data and develop clustering algorithms based
-
Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index J. Classif. (IF 1.156) Pub Date : 2020-11-14 Valerie Robert, Yann Vasseur, Vincent Brault
We consider the simultaneous clustering of rows and columns of a matrix and more particularly the ability to measure the agreement between two co-clustering partitions. The new criterion we developed is based on the Adjusted Rand Index and is called the Co-clustering Adjusted Rand Index named CARI. We also suggest new improvements to existing criteria such as the classification error which counts the
-
A Unified Treatment of Agreement Coefficients and their Asymptotic Results: the Formula of the Weighted Mean of Weighted Ratios J. Classif. (IF 1.156) Pub Date : 2020-10-07 Haruhiko Ogasawara
A unified treatment of agreement coefficients for multiple raters is shown, where the chance-expected proportions of the Bennett et al.-type, Scott-type, its new variation, and Cohen-type are dealt with using full or lower-order agreement among raters. When only pairwise agreement is used for multiple raters, chance corrections of the Gwet-type and its new variation are also considered. For the unified
-
Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints J. Classif. (IF 1.156) Pub Date : 2020-09-30 Nathanaël Randriamihamison, Nathalie Vialaneix, Pierre Neuvial
Hierarchical agglomerative clustering (HAC) with Ward’s linkage has been widely used since its introduction by Ward (Journal of the American Statistical Association, 58(301), 236–244, 1963). This article reviews extensions of HAC to various input data and contiguity-constrained HAC, and provides applicability conditions. In addition, different versions of the graphical representation of the results
-
Alternative Axioms in Group Identification Problems J. Classif. (IF 1.156) Pub Date : 2020-09-12 Federico Fioravanti, Fernando Tohmé
Group identification problems are introduced as the issue of classifying the members of a group in terms of the opinions of their potential members. This involves a finite set of agents N = {1, 2, … , n}, each one having an opinion about which agents should be classified as belonging to a specific subgroup J. A Collective Identity Function (CIF) aggregates those opinions yielding the class of members
-
k -Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint J. Classif. (IF 1.156) Pub Date : 2020-08-26 Andrzej Młodak
We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed
-
Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data J. Classif. (IF 1.156) Pub Date : 2020-08-20 Michael C. Thrun, Alfred Ultsch
For high-dimensional datasets in which clusters are formed by both distance and density structures (DDS), many clustering algorithms fail to identify these clusters correctly. This is demonstrated for 32 clustering algorithms using a suite of datasets which deliberately pose complex DDS challenges for clustering. In order to improve the structure finding and clustering in high-dimensional DDS datasets
-
An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering J. Classif. (IF 1.156) Pub Date : 2020-08-12 Sharon M. McNicholas, Paul D. McNicholas, Daniel A. Ashlock
An evolutionary algorithm (EA) is developed as an alternative to the EM algorithm for parameter estimation in model-based clustering. This EA facilitates a different search of the fitness landscape, i.e., the likelihood surface, utilizing both crossover and mutation. Furthermore, this EA represents an efficient approach to “hard” model-based clustering and so it can be viewed as a sort of generalization
-
Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals J. Classif. (IF 1.156) Pub Date : 2020-08-07 José E. Chacón
The problem of maximizing (or minimizing) the agreement between clusterings, subject to given marginals, can be formally posed under a common framework for several agreement measures. Until now, it was possible to find its solution only through numerical algorithms. Here, an explicit solution is shown for the case where the two clusterings have two clusters each.
-
Initializing k -means Clustering by Bootstrap and Data Depth J. Classif. (IF 1.156) Pub Date : 2020-07-24 Aurora Torrente, Juan Romo
The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. Our technique consists of
-
Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering J. Classif. (IF 1.156) Pub Date : 2020-07-11 Christophe Biernacki, Matthieu Marbac, Vincent Vandewalle
A generic method is introduced to visualize in a “Gaussian-like way,” and onto \(\mathbb {R}^{2}\), results of Gaussian or non-Gaussian–based clustering. The key point is to explicitly force a visualization based on a spherical Gaussian mixture to inherit from the within cluster overlap that is present in the initial clustering mixture. The result is a particularly user-friendly drawing of the clusters
-
Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning J. Classif. (IF 1.156) Pub Date : 2020-07-06 Nicole Ellenbach, Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger, Roman Hornung
In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and
-
Model-based Clustering of Count Processes J. Classif. (IF 1.156) Pub Date : 2020-07-02 Tin Lok James Ng, Thomas Brendan Murphy
A model-based clustering method based on Gaussian Cox process is proposed to address the problem of clustering of count process data. The model allows for nonparametric estimation of intensity functions of Poisson processes, while simultaneous clustering count process observations. A logistic Gaussian process transformation is imposed on the intensity functions to enforce smoothness. Maximum likelihood
-
Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions J. Classif. (IF 1.156) Pub Date : 2020-06-15 Antonio D’Ambrosio, Sonia Amodio, Carmela Iorio, Giuseppe Pandolfo, Roberta Siciliano
In comparing clustering partitions, the Rand index (RI) and the adjusted Rand index (ARI) are commonly used for measuring the agreement between partitions. Such external validation indexes can be used to quantify how close the clusters are to a reference partition (or to prior knowledge about the data) by counting classified pairs of elements. To evaluate the solution of a fuzzy clustering algorithm
-
“Compositional Data Analysis in Practice” by Michael Greenacre Universitat Pompeu Fabra (Barcelona, Spain), Chapman and Hall/CRC, 2018 J. Classif. (IF 1.156) Pub Date : 2020-05-18 J. A. Martín-Fernández
This 122-page book is intended to be a practical guide to CoDa analysis and its easyto-read format and didactic layout are designed for students and researchers alike from different fields. For more insight, the interested reader can find other books that present the subject in a more up-to-date manner and cover more multivariate techniques with applications and examples from geochemistry.
-
A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting J. Classif. (IF 1.156) Pub Date : 2020-03-04 Sanjeena Subedi, Paul D. McNicholas
Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction over fifty years ago, and is now commonly utilized within a family setting. Families of mixture models arise when the component parameters, usually the component covariance (or scale) matrices, are decomposed and a number of constraints are imposed. Within the family setting, model selection
-
Consumer Segmentation Based on Use Patterns J. Classif. (IF 1.156) Pub Date : 2020-02-19 Juan José Fernández-Durán, María Mercedes Gregorio-Domínguez
Recent technological advances have enabled the easy collection of consumer behavior data in real time. Typically, these data contain the time at which a consumer engages in a particular activity such as entering a store, buying a product, or making a call. The occurrence time of certain events must be analyzed as circular random variables, with 24:00 corresponding to 0:00. To effectively implement
-
Spherical Classification of Data, a New Rule-Based Learning Method J. Classif. (IF 1.156) Pub Date : 2020-02-18 Zhengyu Ma, Hong Seo Ryoo
This paper presents a new rule-based classification method that partitions data under analysis into spherical patterns. The forte of the method is twofold. One, it exploits the efficiency of distance metric-based clustering to fast collect similar data into spherical patterns. The other, spherical patterns are each a trait shared among one type of data only, hence are built for classification of new
-
Modified Subspace Constrained Mean Shift Algorithm J. Classif. (IF 1.156) Pub Date : 2020-02-11 Youness Aliyari Ghassabeh, Frank Rudzicz
A subspace constrained mean shift (SCMS) algorithm is a non-parametric iterative technique to estimate principal curves. Principal curves, as a nonlinear generalization of principal components analysis (PCA), are smooth curves (or surfaces) that pass through the middle of a data set and provide a compact low-dimensional representation of data. The SCMS algorithm combines the mean shift (MS) algorithm
-
A New Performance Evaluation Metric for Classifiers: Polygon Area Metric J. Classif. (IF 1.156) Pub Date : 2020-01-25 Onder Aydemir
Classifier performance assessment (CPA) is a challenging task for pattern recognition. In recent years, various CPA metrics have been developed to help assess the performance of classifiers. Although the classification accuracy (CA), which is the most popular metric in pattern recognition area, works well if the classes have equal number of samples, it fails to evaluate the recognition performance
-
A Membership Probability–Based Undersampling Algorithm for Imbalanced Data J. Classif. (IF 1.156) Pub Date : 2020-01-14 Gilseung Ahn, You-Jin Park, Sun Hur
Classifiers for a highly imbalanced dataset tend to bias in majority classes and, as a result, the minority class samples are usually misclassified as majority class. To overcome this, a proper undersampling technique that removes some majority samples can be an alternative. We propose an efficient and simple undersampling method for imbalanced datasets and show that the proposed method outperforms
-
A Note on the Formal Implementation of the K -means Algorithm with Hard Positive and Negative Constraints J. Classif. (IF 1.156) Pub Date : 2020-01-10 Igor Melnykov, Volodymyr Melnykov
The paper discusses a new approach for incorporating hard constraints into the K-means algorithm for semi-supervised clustering. An analytic modification of the objective function of K-means is proposed that has not been previously considered in the literature.
-
An Impartial Trimming Approach for Joint Dimension and Sample Reduction J. Classif. (IF 1.156) Pub Date : 2020-01-09 Luca Greco, Antonio Lucadamo, Pietro Amenta
A robust version of reduced and factorial k-means is proposed that is based on the idea of trimming. Reduced and factorial k-means are data reduction techniques well suited for simultaneous dimension and sample reduction through PCA and clustering. The occurrence of data inadequacies can invalidate standard analyses. Actually, contamination in the data at hand can hide the underlying clustered structure
-
Lorenz Model Selection J. Classif. (IF 1.156) Pub Date : 2020-01-08 Paolo Giudici, Emanuela Raffinetti
In the paper, we introduce novel model selection measures based on Lorenz zonoids which, differently from measures based on correlations, are based on a mutual notion of variability and are more robust to the presence of outlying observations. By means of Lorenz zonoids, which in the univariate case correspond to the Gini coefficient, the contribution of each explanatory variable to the predictive
-
C443: a Methodology to See a Forest for the Trees J. Classif. (IF 1.156) Pub Date : 2020-01-07 Aniek Sies, Iven Van Mechelen
Often tree-based accounts of statistical learning problems yield multiple decision trees which together constitute a forest. Reasons for this include examining tree instability, improving prediction accuracy, accounting for missingness in the data, and taking into account multiple outcome variables. A key disadvantage of forests, unlike individual decision trees, is their lack of transparency. Hence
-
Cognitive Diagnostic Computerized Adaptive Testing for Polytomously Scored Items J. Classif. (IF 1.156) Pub Date : 2019-12-24 Xuliang Gao, Daxun Wang, Yan Cai, Dongbo Tu
Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Currently, large number of CD-CAT researches focus on the dichotomous data. In our knowledge, there are no researches on CD-CAT for polytomously scored items or data. However, polytomously scored items have been broadly used in a variety of tests for their advantages of
-
ROC and AUC with a Binary Predictor: a Potentially Misleading Metric J. Classif. (IF 1.156) Pub Date : 2019-12-23 John Muschelli
In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a predictor is categorical, the ROC curve has one less than number of categories as potential thresholds;
-
Proximity Curves for Potential-Based Clustering J. Classif. (IF 1.156) Pub Date : 2019-12-18 Attila Csenki, Daniel Neagu, Denis Torgunov, Natasha Micic
The concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contributing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited
-
An Optimal Weight Semi-Supervised Learning Machine for Neural Networks with Time Delay J. Classif. (IF 1.156) Pub Date : 2019-12-10 Chengbo Lu, Ying Mei
In this paper, an optimal weight semi-supervised learning machine for a single-hidden layer feedforward network (SLFN) with time delay is developed. Both input weights and output weights of the SLFN are globally optimized with manifold regularization. By feature mapping, input vectors can be placed at the prescribed positions in the feature space in the sense that the separability of all nonlinearly
-
Using an Iterative Reallocation Partitioning Algorithm to Verify Test Multidimensionality J. Classif. (IF 1.156) Pub Date : 2019-11-21 Douglas L. Steinley; M. J. Brusco
This article addresses the issue of assigning items to different test dimensions (e.g., determining which dimension an item belongs to) with cluster analysis. Previously, hierarchical methods have been used (Roussos et al. 1997); however, the findings here suggest that an iterative reallocation partitioning (IRP) algorithm provides interpretively similar solutions and statistically better solutions
-
Are We Underestimating Food Insecurity? Partial Identification with a Bayesian 4-Parameter IRT Model J. Classif. (IF 1.156) Pub Date : 2019-10-02 Christian A. Gregory
This paper addresses measurement error in food security in the USA. In particular, it uses a Bayesian 4-parameter IRT model to look at the likelihood of over- or under-reporting of the conditions that comprise the food security module (FSM), the data collection administered in many US surveys to assess and monitor food insecurity. While this model’s parameters are only partially identified, we learn
-
A General Framework for Dimensionality Reduction of K-Means Clustering J. Classif. (IF 1.156) Pub Date : 2019-08-23 Tong Wu, Yanni Xiao, Muhan Guo, Feiping Nie
Dimensionality reduction plays an important role in many machine learning and pattern recognition applications. Linear discriminant analysis (LDA) is the most popular supervised dimensionality reduction technique which searches for the projection matrix that makes the data points of different classes to be far from each other while requiring data points of the same class to be close to each other.
-
The δ -Machine: Classification Based on Distances Towards Prototypes J. Classif. (IF 1.156) Pub Date : 2019-08-22 Beibei Yuan; Willem Heiser; Mark de Rooij
We introduce the δ-machine, a statistical learning tool for classification based on (dis)similarities between profiles of the observations to profiles of a representation set consisting of prototypes. In this article, we discuss the properties of the δ-machine, propose an automatic decision rule for deciding on the number of clusters for the K-means method on the predictive perspective, and derive
-
A Short Note on Improvement of Agreement Rate J. Classif. (IF 1.156) Pub Date : 2019-08-21 Doyeob Kim, Sung-Ho Kim
Consider a rank-ordering problem, ranking a group of subjects by the conditional probability from a Bayesian network (BN) model of binary variables. The conditional probability is the probability that a subject is in a certain state given an outcome of some other variables. The classification is based on the rank order and the class levels are assigned with equal proportions. Two BN models are said
-
Erratum to: A Framework for Quantifying Qualitative Responses in Pairwise Experiments J. Classif. (IF 1.156) Pub Date : 2019-08-14 A. H. Al-Ibrahim
The original version of this article unfortunately contained a mistake in Title and reference Thurstone, L. L. (1927).
-
MCC: a Multiple Consensus Clustering Framework J. Classif. (IF 1.156) Pub Date : 2019-08-09 Tao Li; Yi Zhang; Dingding Wang; Jian Xu
Consensus clustering has emerged as an important extension of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings. There is a significant drawback in generating a single consensus clustering since different input clusterings could differ significantly
-
Unequal Priors in Linear Discriminant Analysis J. Classif. (IF 1.156) Pub Date : 2019-07-24 Carmen van Meegen, Sarah Schnackenberg, Uwe Ligges
Dealing with unequal priors in both linear discriminant analysis (LDA) based on Gaussian distribution (GDA) and in Fisher’s linear discriminant analysis (FDA) is frequently used in practice but almost described in neither any textbook nor papers. This is one of the first papers exhibiting that GDA and FDA yield the same classification results for any number of classes and features. We discuss in which
-
A Framework for Quantifying Qualitative Responses in Pairwise Experiments J. Classif. (IF 1.156) Pub Date : 2019-07-22 A. H. Al-Ibrahim
Suppose an experiment is conducted on pairs of objects with outcome response a continuous variable measuring the interactions among the pairs. Furthermore, assume the response variable is hard to measure numerically but we may code its values into low and high levels of interaction (and possibly a third category in between if neither label applies). In this paper, we estimate the interaction values
-
Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering J. Classif. (IF 1.156) Pub Date : 2019-07-16 Alberto Fernández, Sergio Gómez
Agglomerative hierarchical clustering can be implemented with several strategies that differ in the way elements of a collection are grouped together to build a hierarchy of clusters. Here we introduce versatile linkage, a new infinite system of agglomerative hierarchical clustering strategies based on generalized means, which go from single linkage to complete linkage, passing through arithmetic average
-
Effects of Resampling in Determining the Number of Clusters in a Data Set J. Classif. (IF 1.156) Pub Date : 2019-07-16 Rainer Dangl, Friedrich Leisch
Using cluster validation indices is a widely applied method in order to detect the number of groups in a data set and as such a crucial step in the model validation process in clustering. The study presented in this paper demonstrates how the accuracy of certain indices can be significantly improved when calculated numerous times on data sets resampled from the original data. There are obviously many
-
Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition J. Classif. (IF 1.156) Pub Date : 2019-07-16 Salvatore Ingrassia; Antonio Punzo
One of the challenges in cluster analysis is the evaluation of the obtained clustering results without using auxiliary information. To this end, a common approach is to use internal validity criteria. For mixtures of linear regressions whose parameters are estimated by maximum likelihood, we propose a three-term decomposition of the total sum of squares as a starting point to define some internal validity
-
A Modified k -Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix J. Classif. (IF 1.156) Pub Date : 2019-07-16 Naoto Yamashita; Kohei Adachi
k-means clustering is a well-known procedure for classifying multivariate observations. The resulting centroid matrix of clusters by variables is noted for interpreting which variables characterize clusters. However, between-clusters differences are not always clearly captured in the centroid matrix. We address this problem by proposing a new procedure for obtaining a centroid matrix, so that it has
-
Note: t for Two (Clusters) J. Classif. (IF 1.156) Pub Date : 2019-07-11 Stanley L. Sclove
The computation for cluster analysis is done by iterative algorithms. But here, a straightforward, non-iterative procedure is presented for clustering in the special case of one variable and two groups. The method is univariate but may reasonably be applied to multivariate datasets when the first principal component or a single factor explains much of the variation in the data. The t method is motivated
-
An Ensemble Feature Ranking Algorithm for Clustering Analysis J. Classif. (IF 1.156) Pub Date : 2019-07-11 Jaehong Yu; Hua Zhong; Seoung Bum Kim
Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm
-
Suboptimal Comparison of Partitions J. Classif. (IF 1.156) Pub Date : 2019-07-11 Jonathon J. O’Brien; Michael T. Lawson; Devin K. Schweppe; Bahjat F. Qaqish
The distinction between classification and clustering is often based on a priori knowledge of classification labels. However, in the purely theoretical situation where a data-generating model is known, the optimal solutions for clustering do not necessarily correspond to optimal solutions for classification. Exploring this divergence leads us to conclude that no standard measures of either internal
-
Where Should I Submit My Work for Publication? An Asymmetrical Classification Model to Optimize Choice J. Classif. (IF 1.156) Pub Date : 2019-07-11 A. Ferrer-Sapena; J. M. Calabuig; L. M. García Raffi; E. A. Sánchez Pérez
Choosing a journal to publish a work is a task that involves many variables. Usually, the authors’ experience allows them to classify journals into categories, according to their suitability and the characteristics of the article. However, there are certain aspects in the choice that are probabilistic in nature, whose modelling may provide some help. Suppose an author has to choose a journal from a
-
A Theoretical Analysis of the Peaking Phenomenon in Classification J. Classif. (IF 1.156) Pub Date : 2019-07-11 Amin Zollanvari; Alex Pappachen James; Reza Sameni
In this work, we analytically study the peaking phenomenon in the context of linear discriminant analysis in the multivariate Gaussian model under the assumption of a common known covariance matrix. The focus is finite-sample setting where the sample size and observation dimension are comparable. Therefore, in order to study the phenomenon in such a setting, we use an asymptotic technique whereby the
-
Adjusting Person Fit Index for Skewness in Cognitive Diagnosis Modeling J. Classif. (IF 1.156) Pub Date : 2019-07-11 Kevin Carl P. Santos; Jimmy de la Torre; Matthias von Davier
Because the validity of diagnostic information generated by cognitive diagnosis models (CDMs) depends on the appropriateness of the estimated attribute profiles, it is imperative to ensure the accurate measurement of students’ test performance by conducting person fit (PF) evaluation to avoid flawed remediation measures. The standardized log-likelihood statistic lZ has been extended to the CDM framework
-
Erratum to: Effects of Distance and Shape on the Estimation of the Piecewise Growth Mixture Model J. Classif. (IF 1.156) Pub Date : 2019-05-31 Yuan Liu, Hongyun Liu
The authors missed an important reference “Liu, Luo, & Liu, 2014” on the original version of this article.
-
Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality J. Classif. (IF 1.156) Pub Date : 2019-05-11 M. Isabel Landaluce-Calvo; Juan I. Modroño-Herrán
In this work we use a novel methodology for the classification of time series data, through a natural, unsupervised data learning process. This strategy is based on the sequential use of Multiple Factor Analysis and an ascending Hierarchical Classification Analysis. These two exploratory techniques complement each other and allow for a clustering of the series based on their time paths and on the reduction
-
Comparing the Utility of Different Classification Schemes for Emotive Language Analysis J. Classif. (IF 1.156) Pub Date : 2019-05-10 Lowri Williams; Michael Arribas-Ayllon; Andreas Artemiou; Irena Spasić
In this paper we investigated the utility of different classification schemes for emotive language analysis with the aim of providing experimental justification for the choice of scheme for classifying emotions in free text. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation
-
Mixtures of Hidden Truncation Hyperbolic Factor Analyzers J. Classif. (IF 1.156) Pub Date : 2019-05-02 Paula M. Murray; Ryan P. Browne; Paul D. McNicholas
The mixture of factor analyzers model was first introduced over 20 years ago and, in the meantime, has been extended to several non-Gaussian analogs. In general, these analogs account for situations with heavy tailed and/or skewed clusters. An approach is introduced that unifies many of these approaches into one very general model: the mixture of hidden truncation hyperbolic factor analyzers (MHTHFA)
-
Distance and Consensus for Preference Relations Corresponding to Ordered Partitions J. Classif. (IF 1.156) Pub Date : 2019-04-30 Boris Mirkin; Trevor I. Fenner
Ranking is an important part of several areas of contemporary research, including social sciences, decision theory, data analysis, and information retrieval. The goal of this paper is to align developments in quantitative social sciences and decision theory with the current thought in Computer Science, including a few novel results. Specifically, we consider binary preference relations, the so-called
-
Effects of Distance and Shape on the Estimation of the Piecewise Growth Mixture Model J. Classif. (IF 1.156) Pub Date : 2019-04-30 Yuan Liu; Hongyun Liu
The piecewise growth mixture model is used in longitudinal studies to tackle non-continuous trajectories and unobserved heterogeneity in a compound way. This study investigated how factors such as latent distance and shape influence the model. Two simulation studies were used exploring the 2- and 3-class situation with sample size, latent distance (Mahalanobis distance), and shape being considered
-
Robustification of Gaussian Bayes Classifier by the Minimum β -Divergence Method J. Classif. (IF 1.156) Pub Date : 2019-04-26 Md. Matiur Rahaman; Md. Nurul Haque Mollah
The goal of classification is to classify new objects into one of the several known populations. A common problem in most of the existing classifiers is that they are very much sensitive to outliers. To overcome this problem, several author’s attempt to robustify some classifiers including Gaussian Bayes classifiers based on robust estimation of mean vectors and covariance matrices. However, these
-
Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering J. Classif. (IF 1.156) Pub Date : 2019-04-24 Mohammad Rezaei
Fast centroid-based clustering algorithms such as k-means usually converge to a local optimum. In this work, we propose a method for constructing a better clustering from two such suboptimal clustering solutions based on the fact that each suboptimal clustering has benefits regarding to including some of the correct clusters. We develop the new method COTCLUS to find two centroids from one clustering
-
MDCGen: Multidimensional Dataset Generator for Clustering J. Classif. (IF 1.156) Pub Date : 2019-04-23 Félix Iglesias; Tanja Zseby; Daniel Ferreira; Arthur Zimek
We present a tool for generating multidimensional synthetic datasets for testing, evaluating, and benchmarking unsupervised classification algorithms. Our proposal fills a gap observed in previous approaches with regard to underlying distributions for the creation of multidimensional clusters. As a novelty, normal and non-normal distributions can be combined for either independently defining values
-
A Partial Mastery, Higher-Order Latent Structural Model for Polytomous Attributes in Cognitive Diagnostic Assessments J. Classif. (IF 1.156) Pub Date : 2019-04-22 Peida Zhan; Wen-Chung Wang; Xiaomin Li
The latent attribute space in cognitive diagnosis models (CDMs) is often assumed to be unstructured or saturated. In recent years, the number of latent attributes in real tests has often been found to be large, and polytomous latent attributes have been advocated. Therefore, it is preferable to adopt substantive theories to connect seemingly unrelated latent attributes, to replace the unstructured
-
A Mixture of Coalesced Generalized Hyperbolic Distributions J. Classif. (IF 1.156) Pub Date : 2019-04-22 Cristina Tortora; Brian C. Franczak; Ryan P. Browne; Paul D. McNicholas
A mixture of multiple scaled generalized hyperbolic distributions (MMSGHDs) is introduced. Then, a coalesced generalized hyperbolic distribution (CGHD) is developed by joining a generalized hyperbolic distribution with a multiple scaled generalized hyperbolic distribution. After detailing the development of the MMSGHDs, which arises via implementation of a multi-dimensional weight function, the density
Contents have been reproduced by permission of the publishers.