当前位置: X-MOL 学术arXiv.eess.SP › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptation of Autoencoder for Sparsity Reduction From Clinical Notes Representation Learning
arXiv - EE - Signal Processing Pub Date : 2022-09-26 , DOI: arxiv-2209.12831
Thanh-Dung Le, Rita Noumeir, Jerome Rambaud, Guillaume Sans, Philippe Jouvet

When dealing with clinical text classification on a small dataset recent studies have confirmed that a well-tuned multilayer perceptron outperforms other generative classifiers, including deep learning ones. To increase the performance of the neural network classifier, feature selection for the learning representation can effectively be used. However, most feature selection methods only estimate the degree of linear dependency between variables and select the best features based on univariate statistical tests. Furthermore, the sparsity of the feature space involved in the learning representation is ignored. Goal: Our aim is therefore to access an alternative approach to tackle the sparsity by compressing the clinical representation feature space, where limited French clinical notes can also be dealt with effectively. Methods: This study proposed an autoencoder learning algorithm to take advantage of sparsity reduction in clinical note representation. The motivation was to determine how to compress sparse, high-dimensional data by reducing the dimension of the clinical note representation feature space. The classification performance of the classifiers was then evaluated in the trained and compressed feature space. Results: The proposed approach provided overall performance gains of up to 3% for each evaluation. Finally, the classifier achieved a 92% accuracy, 91% recall, 91% precision, and 91% f1-score in detecting the patient's condition. Furthermore, the compression working mechanism and the autoencoder prediction process were demonstrated by applying the theoretic information bottleneck framework.

中文翻译:

适应自编码器以减少临床笔记表示学习的稀疏性

在处理小型数据集上的临床文本分类时,最近的研究证实,经过良好调整的多层感知器优于其他生成分类器,包括深度学习分类器。为了提高神经网络分类器的性能,可以有效地使用学习表示的特征选择。然而,大多数特征选择方法仅估计变量之间的线性依赖程度,并根据单变量统计检验选择最佳特征。此外,学习表示中涉及的特征空间的稀疏性被忽略了。目标:因此,我们的目标是通过压缩临床表征特征空间来获得一种替代方法来解决稀疏问题,其中也可以有效地处理有限的法国临床记录。方法:本研究提出了一种自动编码器学习算法,以利用临床笔记表示中的稀疏性减少。其动机是确定如何通过减少临床笔记表示特征空间的维度来压缩稀疏的高维数据。然后在训练和压缩的特征空间中评估分类器的分类性能。结果:建议的方法为每次评估提供了高达 3% 的整体性能提升。最后,分类器在检测患者状况方面达到了 92% 的准确率、91% 的召回率、91% 的准确率和 91% 的 f1-score。此外,应用理论信息瓶颈框架证明了压缩工作机制和自编码器预测过程。
更新日期:2022-09-27
down
wechat
bug