当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods
Information Systems ( IF 3.7 ) Pub Date : 2020-01-15 , DOI: 10.1016/j.is.2020.101494
Tie Li , Gang Kou , Yi Peng

In malicious URLs detection, traditional classifiers are challenged because the data volume is huge, patterns are changing over time, and the correlations among features are complicated. Feature engineering plays an important role in addressing these problems. To better represent the underlying problem and improve the performances of classifiers in identifying malicious URLs, this paper proposed a combination of linear and non-linear space transformation methods. For linear transformation, a two-stage distance metric learning approach was developed: first, singular value decomposition was performed to get an orthogonal space, and then a linear programming was used to solve an optimal distance metric. For nonlinear transformation, we introduced Nyström method for kernel approximation and used the revised distance metric for its radial basis function such that the merits of both linear and non-linear transformations can be utilized. 33,1622 URLs with 62 features were collected to validate the proposed feature engineering methods. The results showed that the proposed methods significantly improved the efficiency and performance of certain classifiers, such as k-Nearest Neighbor, Support Vector Machine, and neural networks. The malicious URLs’ identification rate of k-Nearest Neighbor was increased from 68% to 86%, the rate of linear Support Vector Machine was increased from 58% to 81%, and the rate of Multi-Layer Perceptron was increased from 63% to 82%. We also developed a website to demonstrate a malicious URLs detection system which uses the methods proposed in this paper. The system can be accessed at: http://url.jspfans.com.



中文翻译:

通过特征工程改进恶意URL检测:线性和非线性空间转换方法

在恶意URL检测中,传统分类器面临挑战,因为数据量巨大,模式随时间变化并且功能之间的关联复杂。功能工程在解决这些问题中起着重要作用。为了更好地表示潜在问题并提高分类器识别恶意URL的性能,本文提出了线性和非线性空间转换方法的组合。对于线性变换,开发了一种两阶段的距离度量学习方法:首先,执行奇异值分解以获得正交空间,然后使用线性编程求解最佳距离度量。对于非线性变换,我们引入了Nyström方法进行核逼近,并将修正的距离度量用于其径向基函数,以便可以利用线性和非线性变换的优点。收集了33,1622个具有62个特征的URL,以验证所提出的特征工程方法。结果表明,所提出的方法显着提高了某些分类器的效率和性能,例如k-最近邻,支持向量机和神经网络。k-最近邻居的恶意URL识别率从68%增加到86%,线性支持向量机的比率从58%增加到81%,多层感知器的比率从63%增加到81%。 82%。我们还开发了一个网站来演示使用本文提出的方法的恶意URL检测系统。可以从以下位置访问该系统:http://url.jspfans.com。

更新日期:2020-01-15
down
wechat
bug