当前位置: X-MOL 学术IEEE J. Biomed. Health Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring
IEEE Journal of Biomedical and Health Informatics ( IF 6.7 ) Pub Date : 2020-09-08 , DOI: 10.1109/jbhi.2020.3022815
Jianing Qiu , Frank Po Wen Lo , Shuo Jiang , Charlie Tsai , Yingnan Sun , Benny Lo

Assessing dietary intake in epidemiological studies are predominantly based on self-reports, which are subjective, inefficient, and also prone to error. Technological approaches are therefore emerging to provide objective dietary assessments. Using only egocentric dietary intake videos, this work aims to provide accurate estimation on individual dietary intake through recognizing consumed food items and counting the number of bites taken. This is different from previous studies that rely on inertial sensing to count bites, and also previous studies that only recognize visible food items but not consumed ones. As a subject may not consume all food items visible in a meal, recognizing those consumed food items is more valuable. A new dataset that has 1,022 dietary intake video clips was constructed to validate our concept of bite counting and consumed food item recognition from egocentric videos. 12 subjects participated and 52 meals were captured. A total of 66 unique food items, including food ingredients and drinks, were labelled in the dataset along with a total of 2,039 labelled bites. Deep neural networks were used to perform bite counting and food item recognition in an end-to-end manner. Experiments have shown that counting bites directly from video clips can reach 74.15% top-1 accuracy (classifying between 0-4 bites in 20-second clips), and a MSE value of 0.312 (when using regression). Our experiments on video-based food recognition also show that recognizing consumed food items is indeed harder than recognizing visible ones, with a drop of 25% in F1 score.

中文翻译:


计算叮咬次数并从视频中识别消耗的食物以进行被动饮食监测



流行病学研究中膳食摄入量的评估主要基于自我报告,主观性低、效率低且容易出错。因此,提供客观饮食评估的技术方法不断涌现。这项工作仅使用以自我为中心的饮食摄入视频,旨在通过识别消耗的食物并计算咬的次数来准确估计个人的饮食摄入量。这与之前依靠惯性传感来计算叮咬次数的研究不同,也不同于之前仅识别可见食物而不识别已食用食物的研究。由于受试者可能不会消耗一顿饭中可见的所有食物,因此识别那些​​消耗的食物更有价值。构建了一个包含 1,022 个饮食摄入视频剪辑的新数据集,以验证我们的咬口计数和从以自我为中心的视频中识别所消耗食物的概念。 12 名受试者参与并捕获了 52 顿饭菜。数据集中总共标记了 66 种独特的食品,包括食品成分和饮料,以及总共 2,039 个标记的食物。深度神经网络用于以端到端的方式执行咬数计数和食物识别。实验表明,直接从视频剪辑中计算叮咬次数可以达到 74.15% 的 top-1 准确度(对 20 秒剪辑中的 0-4 次叮咬进行分类),MSE 值为 0.312(使用回归时)。我们对基于视频的食物识别的实验也表明,识别消耗的食物确实比识别可见的食物更难,F1 分数下降了 25%。
更新日期:2020-09-08
down
wechat
bug