当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging Hand-Object Interactions in Assistive Egocentric Vision.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2023-05-05 , DOI: 10.1109/tpami.2021.3123303
Kyungjun Lee 1 , Abhinav Shrivastava 1 , Hernisa Kacorri 1
Affiliation  

Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.

中文翻译:

在辅助自我中心视觉中利用手-对象交互。

以自我为中心的视觉为增加视觉信息的获取和改善盲人的生活质量带来了巨大希望。虽然我们努力提高识别性能,但仍然难以识别用户感兴趣的对象;由于在没有视觉反馈的情况下相机瞄准的挑战,该对象甚至可能不包含在框架中。此外,通常用于推断自我中心视觉感兴趣区域的注视信息通常不可靠。然而,盲人用户倾向于将他们的手包括在与他们希望识别的对象交互或简单地将其放在附近以便更好地瞄准相机。我们提出了一种利用手作为识别感兴趣对象的上下文信息的方法。在我们的方法中,预训练的手部分割模型的输出被注入到我们的对象识别网络的后续卷积层中,具有用于定位和分类的单独输出层。使用来自视力正常和盲人的自我中心数据集,我们表明手启动比其他编码手部信息的方法实现更准确的定位。仅给定对象中心和标签,我们的方法实现了与使用带标签的边界框的最先进方法相当的分类性能。
更新日期:2021-10-27
down
wechat
bug