当前位置: X-MOL 学术Child Abuse & Neglect › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting child sexual abuse images: Traits of child sexual exploitation hosting and displaying websites
Child Abuse & Neglect ( IF 3.4 ) Pub Date : 2021-09-21 , DOI: 10.1016/j.chiabu.2021.105336
Enrique Guerra 1 , Bryce G Westlake 1
Affiliation  

Background

Automated detection of child sexual abuse images (CSAI) often relies on image attributes, such as hash values. However, electronic service providers and others without access to hash value databases are limited in their ability to detect CSAI. Additionally, the increasing amount of CSA content being distributed means that a large percentage of images are not yet cataloged in hash value databases. Therefore, additional detection criteria need to be determined to improve identification of non-hashed CSAI.

Objective

We aim to identify patterns in the locations and folder/file naming practices of websites hosting and displaying CSAI, to use as additional detection criteria for non-hashed CSAI.

Methods

Using a custom-designed web crawler and snowball sampling, we analyzed the locations and naming practices of 103 Surface Web websites hosting and/or displaying 8108 known CSAI hash values.

Results

Websites specialize in either hosting or displaying CSAI with only 20% doing both. Neither hosting nor displaying websites fear repercussions. Over 27% of CSAI were displayed in the home directory (i.e., main page) with only 6% located in at least 4th-level sub-folder. Websites focused more on organizing images than hiding them with 68% of hosted and 54% of displayed CSAI being found in folders formatted year/month. Qualitatively, hosting websites were likely to use alphanumeric or disguised folder and file names to conceal images, while displaying websites were more explicit.

Conclusion

File and folder naming patterns can be combined with existing criteria to improve automated detection of websites and website locations likely hosting and/or displaying CSAI.



中文翻译:

检测儿童性虐待图像:托管和展示网站的儿童性剥削特征

背景

儿童性虐待图像 (CSAI) 的自动检测通常依赖于图像属性,例如哈希值。但是,无法访问哈希值数据库的电子服务提供商和其他人检测 CSAI 的能力有限。此外,分发的 CSA 内容数量的增加意味着很大比例的图像尚未在哈希值数据库中编目。因此,需要确定额外的检测标准以改进非散列 CSAI 的识别。

客观的

我们的目标是识别托管和显示 CSAI 的网站的位置和文件夹/文件命名实践中的模式,以用作非散列 CSAI 的附加检测标准。

方法

使用定制设计的网络爬虫和滚雪球采样,我们分析了托管和/或显示 8108 个已知 CSAI 哈希值的 103 个 Surface Web 网站的位置和命名实践。

结果

网站专注于托管或展示 CSAI,只有 20% 的网站两者都做。托管和展示网站都不怕受到影响。超过 27% 的 CSAI 显示在主目录(即主页)中,只有 6% 位于至少 4 级子文件夹中。网站更注重组织图像而不是隐藏它们,68% 的托管和 54% 显示的 CSAI 位于年/月格式的文件夹中。定性地,托管网站可能使用字母数字或伪装的文件夹和文件名来隐藏图像,而显示网站则更加明确。

结论

文件和文件夹命名模式可以与现有标准相结合,以改进对可能托管和/或显示 CSAI 的网站和网站位置的自动检测。

更新日期:2021-09-21
down
wechat
bug