当前位置: X-MOL 学术Mobile Netw. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
System Design of Cloud Search Engine Based on Rich Text Content
Mobile Networks and Applications ( IF 2.3 ) Pub Date : 2020-10-31 , DOI: 10.1007/s11036-020-01676-3
Hao-peng Chan , Liang Xu , Hui-hui Liu , Run-tian Zhang , Arun Kumar Sangaiah

In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy.



中文翻译:

基于富文本内容的云搜索引擎系统设计

为了提高富文本内容的搜索性能,设计了一种基于富文本内容的云搜索引擎系统。在传统搜索引擎硬件系统的基础上,安装了Solr索引服务器,采集器,中文分词器和搜索器等硬件设备,并调整了数据接口。在硬件设备和数据库支持的基础上,本文使用开源的Apache Tika框架获取富文本文档的元数据,根据富文本内容和语义实现分词,并计算每个关键词的权重。输入搜索关键词,建立文本索引,使用BM25算法计算关键词与文本的相似度,并根据相似度计算结果输出富文本的搜索结果。

更新日期:2020-11-02
down
wechat
bug