POCASUM: policy categorizer and summarizer based on text mining and machine learning,Soft Computing

当前位置： X-MOL 学术 › Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

POCASUM: policy categorizer and summarizer based on text mining and machine learning
Soft Computing ( IF 3.1 ) Pub Date : 2021-06-11 , DOI: 10.1007/s00500-021-05916-w
Rushikesh Deotale ₁ , Shreyash Rawat ₁ , V Vijayarajan ₁ , V B Surya Prasath ₂

Affiliation

Having control over your data is a right and a duty that every citizen has in our digital society. It is often that users skip entire policies of applications or websites to save time and energy without realizing the potential sticky points in these policies. Due to obscure language and verbose explanations majority of users of hypermedia do not bother to read them. Further, sometimes digital media companies do not spend enough effort in stating their policies clearly which often time can also be incomplete. A summarized version of these privacy policies that can be categorized into the useful information can help the users. To solve this problem, in this work we propose to use machine learning-based models for policy categorizer that classifies the policy paragraphs under the attributes proposed like security, contact, etc. By benchmarking different machine learning-based classifier models, we show that artificial neural network model performs with higher accuracy on a challenging dataset of textual privacy policies. We thus show that machine learning can help summarize the relevant paragraphs under the various attributes so that the user can get the gist of that topic within a few lines.

中文翻译：

POCASUM：基于文本挖掘和机器学习的策略分类器和摘要器

控制您的数据是我们数字社会中每个公民的权利和义务。用户通常会跳过应用程序或网站的整个策略以节省时间和精力，而没有意识到这些策略中的潜在粘性点。由于晦涩的语言和冗长的解释，大多数超媒体用户都懒得去阅读它们。此外，有时数字媒体公司没有花足够的精力清楚地说明他们的政策，这往往也可能是不完整的。这些可以归类为有用信息的隐私政策的摘要版本可以帮助用户。为了解决这个问题，在这项工作中，我们建议使用基于机器学习的策略分类器模型，根据安全、联系等提出的属性对策略段落进行分类。通过对不同的基于机器学习的分类器模型进行基准测试，我们表明人工神经网络模型在具有挑战性的文本隐私策略数据集上具有更高的准确性。因此，我们表明机器学习可以帮助总结各种属性下的相关段落，以便用户可以在几行内获得该主题的要点。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11