当前位置: X-MOL 学术Comput. Sci. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on semi-structured web data manipulations by non-expert users
Computer Science Review ( IF 12.9 ) Pub Date : 2021-02-03 , DOI: 10.1016/j.cosrev.2021.100367
Gilbert Tekli

Since the emergence of web 2.0, data started floating all over the web, through online and offline applications, and across all application domains. Web data (semi-structured data loaded through web browsers and applications communicating via internet protocols such as HTTP), in particular XML-based data, is being used for simple commercial information display (i.e., XHTML/HTML in commercial websites), instant messaging (e.g., XMPP for messaging in Whatsapp, Skype, Gtalk etc.), financial transactions (i.e., CDF3 in ecommerce), medical record processing and storage (e.g., HL7 for electronic medical records), social media (e.g., XHTML/HTML in facebook, LinkedIn, Google Plus, etc.), and others. This phenomenon rendered web data manipulation (i.e., monitoring, modifying, controlling, etc.) by IT (information technology) experts, computer technicians and engineers utterly difficult seeing its exponential growth rate in volume and diversity. Not to mention the dynamicity of the data which is continuously changing on the clock and its heterogeneity (e.g., HTML/HTML5, XML, XHTML, RDF, OWL, etc.).

Consequently, the manipulation of web data and in particular XML data (since XML has become one of the most essential data types used in computer communications) has shifted from the hands of computer scientists and programmers towards public computer users in all application domains.

This has brought a new criterion into the web data manipulation research field, web data manipulation by non-experts. In this paper, we study and analyze existent techniques for manipulating semi-structured web data, particularly XML data, from a non-expert point of view while relating it to traditional manipulation techniques defined in the literature (i.e., filtering, adaptation, data extraction, transformation, access control, encryption, etc.). Web data manipulation techniques by non-experts were categorized under 3 major titles: (i) XML-oriented visual languages dealing with XML data extraction and transformations, (ii) Mashups tackling mainly XML restructuring with value manipulations, and (iii) Dataflow visual programming languages targeting non-expert manipulations and providing means to visually manipulate scientific data. A full analysis was conducted which allowed existent approaches/techniques to be compared and evaluated providing an overview of the current requirements on this subject.



中文翻译:

非专家用户对半结构化Web数据操作的调查

自从Web 2.0出现以来,数据开始通过在线和脱机应用程序以及所有应用程序域遍及整个Web浮动。Web数据(通过Web浏览器加载的半结构化数据以及通过Internet协议(例如HTTP)进行通信的应用程序),尤其是基于XML的数据,正用于简单的商业信息显示(即商业网站中的XHTML / HTML),即时消息传递(例如,用于Whatsapp,Skype,Gtalk等中的消息传递的XMPP),财务交易(即,电子商务中的CDF3),病历处理和存储(例如,用于电子病历的HL7),社交媒体(例如,XHTML / HTML Facebook,LinkedIn,Google Plus等),以及其他。这种现象使IT(信息技术)专家对Web数据进行了操作(即监视,修改,控制等),计算机技术人员和工程师完全很难看到其数量和多样性的指数增长。更不用说不断变化的数据动态性及其异构性(例如HTML / HTML5,XML,XHTML,RDF,OWL等)。

因此,对Web数据尤其是XML数据的操作(由于XML已成为计算机通信中使用的最基本的数据类型之一)已经从计算机科学家和程序员的手中转移到了所有应用程序领域中的公共计算机用户。

这为非专家对Web数据的处理带来了新的标准,进入Web数据处理研究领域。在本文中,我们从非专家的角度研究和分析了用于处理半结构化Web数据(特别是XML数据)的现有技术,同时将其与文献中定义的传统处理技术(例如,过滤,自适应,数据提取)相关联,转换,访问控制,加密等)。非专家的Web数据操纵技术分为3个主要类别:(i)面向XML的可视语言,处理XML数据提取和转换;(ii)Mashup,主要通过值操纵解决XML重组,以及(iii)Dataflow可视化编程以非专家操作为目标的语言,并提供可视化地操作科学数据的手段。

更新日期:2021-02-03
down
wechat
bug