当前位置:
X-MOL 学术
›
arXiv.cs.PL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Infer XPath
arXiv - CS - Programming Languages Pub Date : 2020-11-05 , DOI: arxiv-2011.03538 Micha{\l} J. Gajda, Hai Nguyen Quang, Do Ngoc Khanh, and Vuong Hai Thanh
arXiv - CS - Programming Languages Pub Date : 2020-11-05 , DOI: arxiv-2011.03538 Micha{\l} J. Gajda, Hai Nguyen Quang, Do Ngoc Khanh, and Vuong Hai Thanh
We propose reformulation of discovery of data structure within a web page as
relations between sets of document nodes. We start by reformulating web page
analysis as finding expressions in extension of XPath. Then we propose to
automatically discover these XPath expressions with InferXPath meta-language.
Our goal is to automate laborious process of conversion of manually created web
pages that serve as software documentations, wikis, and reference documents,
and speed up their conversion into tabular data that can be directly fed into
data pipeline.
中文翻译:
推断 XPath
我们建议将网页内数据结构的发现重新表述为文档节点集之间的关系。我们首先将网页分析重新表述为在 XPath 的扩展中寻找表达式。然后我们建议使用 InferXPath 元语言自动发现这些 XPath 表达式。我们的目标是将手动创建的网页(用作软件文档、wiki 和参考文档)的繁琐转换过程自动化,并加快将它们转换为可直接输入数据管道的表格数据的速度。
更新日期:2020-11-10
中文翻译:
推断 XPath
我们建议将网页内数据结构的发现重新表述为文档节点集之间的关系。我们首先将网页分析重新表述为在 XPath 的扩展中寻找表达式。然后我们建议使用 InferXPath 元语言自动发现这些 XPath 表达式。我们的目标是将手动创建的网页(用作软件文档、wiki 和参考文档)的繁琐转换过程自动化,并加快将它们转换为可直接输入数据管道的表格数据的速度。