当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards an automated repository for indexing, analysis and characterization of municipal e-government websites in Mexico
arXiv - CS - Digital Libraries Pub Date : 2020-06-26 , DOI: arxiv-2006.14746
Sergio R. Coria, Leonardo Marcos-Santiago, Christian A. Cruz-Melendez, Juan M. Jimenez-Canseco

This article addresses a problem in the electronic government discipline with special interest in Mexico: the need for a concentrated and updated information source about municipal e-government websites. One reason for this is the lack of a complete and updated database containing the electronic addresses (web domain names) of the municipal governments having a website. Due to diverse causes, not all the Mexican municipalities have one, and a number of those having it do not present information corresponding to the current governments but, instead, to other previous ones. The scarce official lists of municipal websites are not updated with the sufficient frequency, and manually determining which municipalities have an operating and valid website in a given moment is a time-consuming process. Besides, website contents do not always comply with legal requirements and are considerably heterogeneous. In turn, the evolution development level of municipal websites is valuable information that can be harnessed for diverse theoretical and practical purposes in the public administration field. Obtaining all these pieces of information requires website content analysis. Therefore, this article investigates the need for and the feasibility to automate implementation and updating of a digital repository to perform diverse analyses of these websites. Its technological feasibility is addressed by means of a literature review about web scraping and by proposing a preliminary manual methodology. This takes into account known, proven, techniques and software tools for web crawling and scraping. No new techniques for crawling or scraping are proposed because the existing ones satisfy the current needs. Finally, software requirements are specified in order to automate the creation, updating, indexing, and analyses of the repository.

中文翻译:

迈向墨西哥市政电子政务网站索引、分析和表征的自动化存储库

本文解决了墨西哥特别感兴趣的电子政府学科中的一个问题:需要一个关于市政电子政务网站的集中和更新的信息源。造成这种情况的一个原因是缺乏一个完整和更新的数据库,其中包含拥有网站的市政府的电子地址(网络域名)。由于各种原因,并非所有墨西哥城市都有一个,而且许多拥有它的城市没有提供与当前政府相对应的信息,而是提供与其他以前政府相对应的信息。稀缺的市政网站官方列表更新频率不够高,手动确定哪些城市在特定时刻拥有可运行且有效的网站是一个耗时的过程。除了,网站内容并不总是符合法律要求,而且相当多样化。反过来,市政网站的演进发展水平是有价值的信息,可以在公共管理领域用于各种理论和实践目的。获取所有这些信息需要网站内容分析。因此,本文调查了自动化实施和更新数字存储库以对这些网站进行各种分析的必要性和可行性。其技术可行性通过有关网络抓取的文献综述和提出初步手动方法来解决。这考虑了用于网络爬行和抓取的已知、经过验证的技术和软件工具。没有提出新的爬行或抓取技术,因为现有技术满足当前需求。最后,指定软件需求以自动创建、更新、索引和分析存储库。
更新日期:2020-06-29
down
wechat
bug