当前位置: X-MOL 学术Natl. Acad. Sci. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Approach of Analyzing and Generating Intrinsic Information from Weblog
National Academy Science Letters ( IF 1.1 ) Pub Date : 2021-01-16 , DOI: 10.1007/s40009-020-01042-7
Brijesh Bakariya

The weblog data have unstructured data, and due to this extraction, the desired information from it is a very challenging task. This kind of data is rapidly growing with huge volume. In this paper, the weblog analysis through Pyspark (WAP) algorithm is proposed to analyze the complete weblog dataset and extract useful information from it. In the proposed approach is used Resilient Distributed Dataset (RDD) for running and operating on multiple nodes to do parallel processing on a cluster; for achieving these concepts is used apache pyspark and implemented on jupyter notebook. WAP extracts different results such as host count, path count, status count, etc. This approach also generates patterns and track user behavior. The recommendation is also possible by the proposed algorithm based on user behaviors.



中文翻译:

从Weblog分析和生成内部信息的有效方法

Weblog数据具有非结构化数据,由于这种提取,从中获取所需的信息是一项非常具有挑战性的任务。这种数据正以巨大的数量快速增长。本文提出了通过Pyspark(WAP)算法进行的Weblog分析,以分析完整的Weblog数据集并从中提取有用的信息。在所提出的方法中,使用弹性分布式数据集(RDD)在多个节点上运行和操作,以在集群上进行并行处理。为了实现这些概念,使用了apache pyspark并在jupyter笔记本上实现。WAP提取不同的结果,例如主机计数,路径计数,状态计数等。此方法还生成模式并跟踪用户行为。通过所提出的基于用户行为的算法,推荐也是可能的。

更新日期:2021-01-18
down
wechat
bug