当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Infer XPath
arXiv - CS - Programming Languages Pub Date : 2020-11-05 , DOI: arxiv-2011.03538
Micha{\l} J. Gajda, Hai Nguyen Quang, Do Ngoc Khanh, and Vuong Hai Thanh

We propose reformulation of discovery of data structure within a web page as relations between sets of document nodes. We start by reformulating web page analysis as finding expressions in extension of XPath. Then we propose to automatically discover these XPath expressions with InferXPath meta-language. Our goal is to automate laborious process of conversion of manually created web pages that serve as software documentations, wikis, and reference documents, and speed up their conversion into tabular data that can be directly fed into data pipeline.

中文翻译:

推断 XPath

我们建议将网页内数据结构的发现重新表述为文档节点集之间的关系。我们首先将网页分析重新表述为在 XPath 的扩展中寻找表达式。然后我们建议使用 InferXPath 元语言自动发现这些 XPath 表达式。我们的目标是将手动创建的网页(用作软件文档、wiki 和参考文档)的繁琐转换过程自动化,并加快将它们转换为可直接输入数据管道的表格数据的速度。
更新日期:2020-11-10
down
wechat
bug