当前位置: X-MOL 学术Sci. Rep. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distribution Optimization: An evolutionary algorithm to separate Gaussian mixtures.
Scientific Reports ( IF 4.6 ) Pub Date : 2020-01-20 , DOI: 10.1038/s41598-020-57432-w
Florian Lerch 1 , Alfred Ultsch 1 , Jörn Lötsch 2, 3
Affiliation  

Finding subgroups in biomedical data is a key task in biomedical research and precision medicine. Already one-dimensional data, such as many different readouts from cell experiments, preclinical or human laboratory experiments or clinical signs, often reveal a more complex distribution than a single mode. Gaussian mixtures play an important role in the multimodal distribution of one-dimensional data. However, although fitting of Gaussian mixture models (GMM) is often aimed at obtaining the separate modes composing the mixture, current technical implementations, often using the Expectation Maximization (EM) algorithm, are not optimized for this task. This occasionally results in poorly separated modes that are unsuitable for determining a distinguishable group structure in the data. Here, we introduce "Distribution Optimization" an evolutionary algorithm to GMM fitting that uses an adjustable error function that is based on chi-square statistics and the probability density. The algorithm can be directly targeted at the separation of the modes of the mixture by employing additional criterion for the degree by which single modes overlap. The obtained GMM fits were comparable with those obtained with classical EM based fits, except for data sets where the EM algorithm produced unsatisfactory results with overlapping Gaussian modes. There, the proposed algorithm successfully separated the modes, providing a basis for meaningful group separation while fitting the data satisfactorily. Through its optimization toward mode separation, the evolutionary algorithm proofed particularly suitable basis for group separation in multimodally distributed data, outperforming alternative EM based methods.

中文翻译:

分布优化:分离高斯混合的进化算法。

在生物医学数据中寻找亚组是生物医学研究和精密医学领域的关键任务。一维数据,例如来自细胞实验,临床前或人体实验室实验或临床体征的许多不同读数,通常显示出比单一模式更为复杂的分布。高斯混合在一维数据的多峰分布中起着重要作用。但是,尽管高斯混合模型(GMM)的拟合通常旨在获得组成混合的单独模式,但是经常使用期望最大化(EM)算法的当前技术实现并未为此任务进行优化。这有时会导致模式分离不良,不适合确定数据中可区分的组结构。在这里,我们介绍“分布优化” GMM拟合的一种进化算法,该算法使用基于卡方统计量和概率密度的可调误差函数。通过对单个模式重叠的程度采用其他标准,该算法可以直接针对混合物模式的分离。获得的GMM拟合与基于经典EM的拟合获得的GMM拟合具有可比性,只是在EM算法产生重叠高斯模式的结果不令人满意的数据集之外。在那里,所提出的算法成功地分离了模式,为令人满意地拟合数据的同时为有意义的组分离提供了基础。通过对模式分离进行优化,进化算法证明了多模式分布数据中的组分离特别合适的基础,
更新日期:2020-01-21
down
wechat
bug