当前位置: X-MOL 学术Int. J. Softw. Eng. Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
dSubSign: Classification of Instance-Feature Data Using Discriminative Subgraphs as Class Signatures
International Journal of Software Engineering and Knowledge Engineering ( IF 0.6 ) Pub Date : 2021-07-23 , DOI: 10.1142/s0218194021500285
Parnika N. Paranjape 1 , Meera M. Dhabu 1 , Parag S. Deshpande 1
Affiliation  

Applications like customer identification from their peculiar purchase patterns require class-wise discriminative feature subsets called as class signatures for classification. If the classifiers like KNN, SVM, etc. which require to work with a complete feature set, are applied to such applications, then the entire feature set may introduce errors in the classification. Decision tree classifier generates class-wise prominent feature subsets and hence, can be employed for such applications. However, all of these classifiers fail to model the relationship between features present in vector data. Thus, we propose to model the features and their interrelationships as graphs. Graphs occur naturally in protein molecules, chemical compounds, etc. for which several graph classifiers exist. However, multivariate data do not exhibit the graphs naturally. Thus, the proposed work focuses on (1) modeling multivariate data as graphs and (2) obtaining class-wise prominent subgraph signatures which are then used to train classifiers like SVM for decision making. The proposed method dSubSign can also classify multivariate data with missing values without performing imputation or case deletion. The performance analysis of both real-world and synthetic datasets shows that the accuracy of dSubSign is either higher or comparable to other existing methods.

中文翻译:

dSubSign:使用判别子图作为类签名的实例特征数据分类

像从他们的特殊购买模式中识别客户这样的应用程序需要分类区分特征子集,称为分类签名。如果将需要使用完整特征集的KNN、SVM等分类器应用于此类应用,则整个特征集可能会在分类中引入错误。决策树分类器生成按类别突出的特征子集,因此可以用于此类应用。然而,所有这些分类器都无法对矢量数据中存在的特征之间的关系进行建模。因此,我们建议将特征及其相互关系建模为图形。图自然地出现在蛋白质分子、化合物等中,其中存在几个图分类器。但是,多变量数据不会自然地显示图形。因此,所提出的工作侧重于(1)将多变量数据建模为图形和(2)获得按类别突出的子图签名,然后将其用于训练分类器(如支持向量机)以进行决策。所提出的方法 dSubSign 还可以对具有缺失值的多元数据进行分类,而无需执行插补或案例删除。对真实世界和合成数据集的性能分析表明,dSubSign 的准确性高于或与其他现有方法相当。
更新日期:2021-07-23
down
wechat
bug