当前位置: X-MOL 学术Commun. Stat. Theory Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the analytical properties of category encodings in logistic regression
Communications in Statistics - Theory and Methods ( IF 0.6 ) Pub Date : 2021-06-21 , DOI: 10.1080/03610926.2021.1939382
Guoping Zeng 1
Affiliation  

Abstract

Categorical variables cannot be handled directly by logistic regression. Rather, they must be encoded or converted into continuous variables. Numerous category encodings have been proposed and used in logistic regression. However, these encodings haven’t been studied analytically. In this paper, we study analytical properties of eight commonly used category encodings in logistic regression, namely, one-hot encoding, Weight of Evidence encoding, flag encoding, label encoding, ordinal encoding, count encoding, frequency encoding and target encoding. Numerical examples are provided to demonstrate our analysis.



中文翻译:

逻辑回归中类别编码的分析性质

摘要

逻辑回归不能直接处理分类变量。相反,它们必须被编码或转换为连续变量。许多类别编码已被提出并用于逻辑回归。但是,尚未对这些编码进行分析研究。在本文中,我们研究了逻辑回归中八种常用类别编码的分析特性,即单热编码、证据权重编码、标志编码、标签编码、序数编码、计数编码、频率编码和目标编码。提供了数值示例来证明我们的分析。

更新日期:2021-06-21
down
wechat
bug