EmailDetective: An Email Authorship Identification And Verification Model,The Computer Journal

当前位置： X-MOL 学术 › Comput. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EmailDetective: An Email Authorship Identification And Verification Model
The Computer Journal ( IF 1.5 ) Pub Date : 2020-07-13 , DOI: 10.1093/comjnl/bxaa059
Yong Fang ₁ , Yue Yang ₁ , Cheng Huang ₁

Affiliation

Emails are often used to illegal cybercrime today, so it is important to verify the identity of the email author. This paper proposes a general model for solving the problem of anonymous email author attribution, which can be used in email authorship identification and email authorship verification. The first situation is to find the author of an anonymous email among the many suspected targets. Another situation is to verify if an email was written by the sender. This paper extracts features from the email header and email body and analyzes the writing style and other behaviors of email authors. The behaviors of email authors are extracted through a statistical algorithm from email headers. Moreover, the author’s writing style in the email body is extracted by a sequence-to-sequence bidirectional long short-term memory (BiLSTM) algorithm. This model combines multiple factors to solve the problem of anonymous email author attribution. The experiments proved that the accuracy and other indicators of proposed model are better than other methods. In email authorship verification experiment, our average accuracy, average recall and average F1-score reached 89.9%. In email authorship identification experiment, our model’s accuracy rate is 98.9% for 10 authors, 92.9% for 25 authors and 89.5% for 50 authors.

中文翻译：

EmailDetective：电子邮件作者身份识别和验证模型

电子邮件如今经常被用于非法网络犯罪，因此，重要的是验证电子邮件作者的身份。本文提出了一种解决匿名电子邮件作者归属问题的通用模型，该模型可用于电子邮件作者身份识别和电子邮件作者身份验证。第一种情况是在许多可疑目标中找到匿名电子邮件的作者。另一种情况是验证发件人是否写了电子邮件。本文从电子邮件标题和电子邮件正文中提取功能，并分析电子邮件作者的写作风格和其他行为。电子邮件作者的行为是通过统计算法从电子邮件标题中提取的。此外，作者在电子邮件正文中的写作风格是通过序列对序列双向长短期记忆（BiLSTM）算法。该模型结合了多种因素来解决匿名电子邮件作者的归属问题。实验证明，所提模型的准确性和其他指标均优于其他方法。在电子邮件作者身份验证实验中，我们的平均准确性，平均召回率和平均F1得分达到89.9％。在电子邮件作者身份识别实验中，我们模型的正确率是10名作者为98.9％，25名作者为92.9％，50名作者为89.5％。

更新日期：2020-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文