当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WARDER: Towards Effective Spreadsheet Defect Detection by Validity-based Cell Cluster Refinements
Journal of Systems and Software ( IF 3.5 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.jss.2020.110615
Yicheng Huang , Chang Xu , Yanyan Jiang , Huiyan Wang , Da Li

Abstract Nowadays spreadsheets are very popular and being widely used. However, they can be prone to various defects and cause severe consequences when end users poorly maintain them. Our research communities have proposed various techniques for automated detection of spreadsheet defects, but they commonly fall short of effectiveness, either due to their limited scope or relying on strict patterns. In this article, we discuss and improve one state-of-the-art technique, CUSTODES, which exploits spreadsheet cell clustering and defect detection to extend its scope and make its detection patterns adaptive to varying spreadsheet styles. Still, CUSTODES can be prone to problematic clustering when accidentally involving irrelevant cells, leading to a largely reduced detection precision. Regarding this, we present WARDER to refine CUSTODES’s spreadsheet cell clustering based on three extensible validity-based properties. Experimental results show that WARDER could improve the precision by 19.1% on spreadsheet cell clustering, which contributed to a precision improvement of 23.3 ~ 24.3% for spreadsheet defect detection, as compared to CUSTODES (F-measure increased from 0.71 to 0.79 ~ 0.82). WARDER also exhibited satisfactory results on another practical large-scale spreadsheet corpus VEnron2, improving the defect detection precision by 10.7 ~ 21.2% over CUSTODES.

中文翻译:

WARDER:通过基于有效性的细胞簇细化实现有效的电子表格缺陷检测

摘要 如今,电子表格非常流行并被广泛使用。然而,当最终用户维护不善时,它们可能容易出现各种缺陷并导致严重后果。我们的研究社区提出了各种自动检测电子表格缺陷的技术,但由于其范围有限或依赖于严格的模式,它们通常都缺乏有效性。在本文中,我们讨论并改进了一种最先进的技术 CUSTODES,该技术利用电子表格单元格聚类和缺陷检测来扩展其范围并使其检测模式适应不同的电子表格样式。尽管如此,当意外涉及不相关的单元格时,CUSTODES 可能容易出现问题聚类,导致检测精度大大降低。对此,我们提出 WARDER 以基于三个可扩展的基于有效性的属性来改进 CUSTODES 的电子表格单元格聚类。实验结果表明,与CUSTODES(F-measure从0.71增加到0.79~0.82)相比,WARDER可以将电子表格单元聚类的精度提高19.1%,这有助于电子表格缺陷检测的精度提高23.3~24.3%。WARDER 在另一个实用的大型电子表格语料 VEnron2 上也表现出令人满意的结果,缺陷检测精度比 CUSTODES 提高了 10.7~21.2%。与CUSTODES相比(F-measure从0.71增加到0.79~0.82)。WARDER 在另一个实用的大型电子表格语料 VEnron2 上也表现出令人满意的结果,缺陷检测精度比 CUSTODES 提高了 10.7~21.2%。与CUSTODES相比(F-measure从0.71增加到0.79~0.82)。WARDER 在另一个实用的大型电子表格语料库 VEnron2 上也表现出令人满意的结果,缺陷检测精度比 CUSTODES 提高了 10.7~21.2%。
更新日期:2020-09-01
down
wechat
bug