当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ECHR-DB: On building an integrated open repository of legal documents for machine learning applications
Information Systems ( IF 3.0 ) Pub Date : 2021-06-07 , DOI: 10.1016/j.is.2021.101822
Alexandre Quemy , Robert Wrembel

This paper presents an exhaustive and unified repository of judgments documents, called ECHR-DB, based on the European Court of Human Rights. The need of such a repository is explained through the prism of the researcher, the data scientist, the citizen, and the legal practitioner. Contrarily to many open data repositories, the full creation process of ECHR-DB, from the collection of raw data to the feature transformation, is provided by means of a collection of fully automated and open-source scripts. It ensures reproducibility and a high level of confidence in the processed data, which is one of the most important issues in data governance nowadays. The experimental evaluation was performed to study the problem of predicting the outcome of a case, and to establish baseline results of popular machine learning algorithms. The obtained results are consistently good across the binary datasets with an accuracy comprised between 75.86% and 98.32%, having the average accuracy equals to 96.45%, which is 14pp higher than the best known result with similar methods. We achieved a F1-Score of 82% which is aligned with the recent result using BERT. We show that in a multilabel setting, the features available prior to a judgment are good predictors of the outcome, opening the road to practical applications.



中文翻译:

ECHR-DB:关于为机器学习应用程序构建集成的法律文件开放存储库

本文介绍了一个基于欧洲人权法院的详尽而统一的判决文件库,称为ECHR-DB。通过研究人员、数据科学家、公民和法律从业者的棱镜来解释对这样一个存储库的需求。与许多开放数据存储库相反,ECHR-DB的完整创建过程,从原始数据的采集到特征的转换,都是通过一套全自动开源的脚本来提供的。它确保处理数据的可重复性和高度可信度,这是当今数据治理中最重要的问题之一。进行实验评估是为了研究预测案例结果的问题,并建立流行的机器学习算法的基线结果。获得的结果在二进制数据集上始终良好,准确度介于 75.86% 和 98.32% 之间,平均准确度等于 96.45%,比使用类似方法的最佳已知结果高 14pp。我们实现了 82% 的 F1-Score,这与使用 BERT 的最近结果一致。我们表明,在多标签设置中,

更新日期:2021-06-07
down
wechat
bug