当前位置: X-MOL 学术Entrepreneurship Research Journal › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis of Enterprise Social Media Intelligence Acquisition Based on Data Crawler Technology
Entrepreneurship Research Journal ( IF 2.0 ) Pub Date : 2021-04-01 , DOI: 10.1515/erj-2020-0267
Lehe Yu 1, 2 , Zhengxiu Gui 3
Affiliation  

There are generally hundreds of millions of nodes in social media, and they are connected to a huge social network through attention and fan relationships. The news is spread through this huge social network. This paper studies the acquisition technology of social media topic data and enterprise data. The topic positioning technology based on Sina meta search and topic related keywords is introduced, and the crawling efficiency of topic crawlers is analyzed. Aiming at the factors of diverse and variable webpage structure on the Internet, this paper proposes a new Web information extraction algorithm by studying the general laws existing in the webpage structure, combining DOM (Document Object Model) tree and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Several links in the algorithm are introduced in detail, including Web page processing, DOM tree construction, segmented text content acquisition, and web content extraction based on the DBSCAN algorithm. The simulation results show that the intelligence culture, intelligence system, technology platform and intelligence organization ecological collaboration strategy under the extraction of DOM tree and DBSCAN information can improve the level of intelligence participation of all employees. There is a significant positive correlation between the level of participation and the level of the intelligence environment of all employees. According to the research results, the DOM tree and DBSCAN information proposed in this paper can extract the enterprise’s employee intelligence and the effective implementation of relevant collaborative strategies, which can provide guidance for the effective implementation of the employee intelligence.

中文翻译:

基于数据爬虫技术的企业社交媒体智能获取分析

社交媒体中通常有数亿个节点,它们通过注意力和粉丝关系连接到巨大的社交网络。新闻通过这个庞大的社交网络传播。本文研究了社交媒体主题数据和企业数据的获取技术。介绍了基于新浪元搜索和主题相关关键字的主题定位技术,并分析了主题爬虫的抓取效率。针对互联网上网页结构多样化和可变的因素,结合DOM(文档对象模型)树和DBSCAN(基于密度的空间聚类),研究了网页结构中存在的一般规律,提出了一种新的网页信息提取算法。带有噪声的应用程序)算法。详细介绍了该算法中的几个链接,包括网页处理,DOM树构建,分段文本内容获取以及基于DBSCAN算法的Web内容提取。仿真结果表明,在提取DOM树和DBSCAN信息的情况下,智能文化,智能系统,技术平台和智能组织生态协作策略可以提高全体员工的智能参与水平。所有员工的参与程度与智能环境之间存在显着的正相关关系。根据研究结果,本文提出的DOM树和DBSCAN信息可以提取企业的员工智能和有效实施相关的协作策略,
更新日期:2021-04-29
down
wechat
bug