当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-modal estimation of the properties of containers and their content: survey and evaluation
arXiv - CS - Multimedia Pub Date : 2021-07-27 , DOI: arxiv-2107.12719
Alessio Xompero, Santiago Donaher, Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola, Reina Ishikawa, Yuichi Nagao, Ryo Hachiuma, Qi Liu, Fan Feng, Chuanlin Lan, Rosa H. M. Chan, Guilherme Christmann, Jyun-Ting Song, Gonuguntla Neeharika, Chinnakotla Krishna Teja Reddy, Dinesh Jain, Bakhtawar Ur Rehman, Andrea Cavallaro

Acoustic and visual sensing can support the contactless estimation of the weight of a container and the amount of its content when the container is manipulated by a person. However, transparencies (both of the container and of the content) and the variability of materials, shapes and sizes make this problem challenging. In this paper, we present an open benchmarking framework and an in-depth comparative analysis of recent methods that estimate the capacity of a container, as well as the type, mass, and amount of its content. These methods use learned and handcrafted features, such as mel-frequency cepstrum coefficients, zero-crossing rate, spectrograms, with different types of classifiers to estimate the type and amount of the content with acoustic data, and geometric approaches with visual data to determine the capacity of the container. Results on a newly distributed dataset show that audio alone is a strong modality and methods achieves a weighted average F1-score up to 81% and 97% for content type and level classification, respectively. Estimating the container capacity with vision-only approaches and filling mass with multi-modal, multi-stage algorithms reaches up to 65% weighted average capacity and mass scores.

中文翻译:

集装箱及其内容物特性的多模态估计:调查和评估

当容器被人操纵时,声学和视觉传感可以支持对容器重量及其内容量的非接触式估计。然而,透明度(容器和内容物)以及材料、形状和尺寸的可变性使这个问题具有挑战性。在本文中,我们提出了一个开放的基准测试框架,并对最近估计容器容量及其内容的类型、质量和数量的方法进行了深入的比较分析。这些方法使用学习和手工制作的特征,例如梅尔频率倒谱系数、过零率、频谱图,以及不同类型的分类器来估计声学数据的内容类型和数量,以及使用视觉数据的几何方法来确定容器的容量。在新发布的数据集上的结果表明,单独的音频是一种强大的模式,并且方法在内容类型和级别分类方面的加权平均 F1 分数分别高达 81% 和 97%。使用仅视觉方法估计容器容量并使用多模式、多阶段算法填充质量可达到高达 65% 的加权平均容量和质量分数。
更新日期:2021-07-28
down
wechat
bug