当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparison among keyframe extraction techniques for CNN classification based on video periocular images
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2021-01-13 , DOI: 10.1007/s11042-020-10384-9
Carolina Toledo Ferraz , William Barcellos , Osmando Pereira Junior , Tamiris Trevisan Negri Borges , Marcelo Garcia Manzato , Adilson Gonzaga , José Hiroki Saito

Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.



中文翻译:

基于视频眼周图像的CNN分类关键帧提取技术的比较

标记数据的训练和验证集是监督学习中用于建立分类模型的重要组成部分。在训练期间,大多数学习算法会使用给定训练集中的所有图像来估计模型的参数。特别是对于视频分类,需要关键帧提取技术以选择代表性的帧进行训练,这通常基于简单的启发式方法,例如低级特征帧差异。由于某些学习算法对噪声敏感,因此重要的是仔细选择要训练的帧,以便更准确,更快地完成模型的优化。我们提议在本文中分析四种用于选择眼周视频数据库的代表性帧的方法。其中之一基于阈值计算(T),另一种是改进的Kennard-Stone(KS)模型,该方法基于LUV颜色空间中的绝对差之和,最后一种是随机采样。为了评估选定的图像集,我们使用两种深度网络方法:特征提取(FE)和微调(FT)。结果表明,使用改进的KS细化方法和FT评估方法,减少训练图像的数量,我们可以达到完整数据库的相同精度。

更新日期:2021-01-13
down
wechat
bug