Figure search by text in large scale digital document collections,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Figure search by text in large scale digital document collections
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-07-24 , DOI: 10.1002/cpe.6529
M. Mücahit Enes Yurtsever ₁ , Muhammet Özcan ₂ , Zübeyir Taruz ₂ , Süleyman Eken ₁ , Ahmet Sayar ₂

Affiliation

Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.

中文翻译：

在大规模数字文档集合中按文本进行图形搜索

数字文件收藏是通过将大量文件传输到数字媒体而创建的。这些数字档案为用户提供了许多好处。随着数字图像集合的多样性和规模呈指数级增长，从中获取所需图像变得越来越重要和困难。文档上的图像可能包含有关其主题的关键信息。在这项研究中，开发了一种架构，可以通过创建正则表达式和全文搜索方法来处理大规模数据。该系统的性能已经在不同的学术文档上进行了测试，并比较了 Elasticsearch 和 Apache Solr 的插入时间。与 Elasticsearch 相比，Apache Solr 取得了更快、更成功的结果。

更新日期：2021-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>