Computers & Electrical Engineering ( IF 4.0 ) Pub Date : 2021-08-21 , DOI: 10.1016/j.compeleceng.2021.107379 Nasir Ali Khan 1 , Abid Khan 2 , Mansoor Ahmad 1, 3 , Munam Ali Shah 1 , Gwanggil Jeon 4
The future generations networking technologies such as 5G and 6G will provide tremendous performance, network capacity, quality of service and connectivity. Therefore, the convergence of these with technologies with big data analytics in today's smart ecosystem will provide tremendous opportunities. The existing URL filtering techniques do not do real-time filtering, and lack fault-tolerance and scalability. We have addressed these issues and have developed a real-time, fault-tolerant and scalable machine learning based binary classification model, which handles streams of URL traffic and classifies it into obscene or clean material, in real-time. We have only used the URL based features for classification, and have still achieved a good accuracy of 93% on logistic regression classifier and 88%. Our model can filter 2 million URLs in 55 seconds. The proposed model achieved precision, recall and f1-score values of 0.92, 0.95 and 0.93 respectively.
中文翻译:
在 5G 网络中使用大数据分析进行 URL 过滤
5G 和 6G 等下一代网络技术将提供巨大的性能、网络容量、服务质量和连接性。因此,在当今的智能生态系统中,这些技术与大数据分析技术的融合将提供巨大的机会。现有的URL过滤技术没有进行实时过滤,缺乏容错性和可扩展性。我们已经解决了这些问题,并开发了一种基于实时、容错和可扩展的机器学习的二进制分类模型,该模型可以实时处理 URL 流量并将其分类为淫秽或干净的材料。我们只使用了基于 URL 的特征进行分类,在逻辑回归分类器上仍然达到了 93% 和 88% 的良好准确率。我们的模型可以在 55 秒内过滤 200 万个 URL。所提出的模型的精度、召回率和 f1-score 值分别为 0.92、0.95 和 0.93。