Interpretability in Contact-Rich Manipulation via Kinodynamic Images,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Interpretability in Contact-Rich Manipulation via Kinodynamic Images
arXiv - CS - Robotics Pub Date : 2021-02-23 , DOI: arxiv-2102.11825
Ioanna Mitsioni, Joonatan Mänttäri, Yiannis Karayiannidis, John Folkesson, Danica Kragic

Deep Neural Networks (NNs) have been widely utilized in contact-rich manipulation tasks to model the complicated contact dynamics. However, NN-based models are often difficult to decipher which can lead to seemingly inexplicable behaviors and unidentifiable failure cases. In this work, we address the interpretability of NN-based models by introducing the kinodynamic images. We propose a methodology that creates images from the kinematic and dynamic data of a contact-rich manipulation task. Our formulation visually reflects the task's state by encoding its kinodynamic variations and temporal evolution. By using images as the state representation, we enable the application of interpretability modules that were previously limited to vision-based tasks. We use this representation to train Convolution-based Networks and we extract interpretations of the model's decisions with Grad-CAM, a technique that produces visual explanations. Our method is versatile and can be applied to any classification problem using synchronous features in manipulation to visually interpret which parts of the input drive the model's decisions and distinguish its failure modes. We evaluate this approach on two examples of real-world contact-rich manipulation: pushing and cutting, with known and unknown objects. Finally, we demonstrate that our method enables both detailed visual inspections of sequences in a task, as well as high-level evaluations of a model's behavior and tendencies. Data and code for this work are available at https://github.com/imitsioni/interpretable_manipulation.

中文翻译：

通过动力学动力学图像进行接触丰富的操纵的可解释性

深度神经网络（NNs）已被广泛用于接触丰富的操作任务中，以对复杂的接触动力学进行建模。但是，基于神经网络的模型通常很难解密，这可能导致看似莫名其妙的行为和无法识别的故障案例。在这项工作中，我们通过介绍运动动力学图像来解决基于NN的模型的可解释性。我们提出一种从接触丰富的操纵任务的运动和动态数据创建图像的方法。我们的公式通过编码其运动动力学变化和时间演变来直观地反映任务的状态。通过使用图像作为状态表示，我们可以启用以前仅限于基于视觉的任务的可解释性模块的应用。我们使用这种表示来训练基于卷积的网络，并使用Grad-CAM提取模型决策的解释，而Grad-CAM是一种产生视觉解释的技术。我们的方法用途广泛，可通过使用同步特征进行操作来应用于任何分类问题，以直观地解释输入的哪些部分驱动模型的决策并区分其故障模式。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。Grad-CAM的决策，这是一种产生视觉解释的技术。我们的方法用途广泛，可通过使用同步特征进行操作来应用于任何分类问题，以直观地解释输入的哪些部分驱动模型的决策并区分其故障模式。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。Grad-CAM的决策，这是一种产生视觉解释的技术。我们的方法用途广泛，可通过使用同步特征进行操作来应用于任何分类问题，以直观地解释输入的哪些部分驱动模型的决策并区分其故障模式。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。我们的方法用途广泛，可通过使用同步特征进行操作来应用于任何分类问题，以直观地解释输入的哪些部分驱动模型的决策并区分其故障模式。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。我们的方法用途广泛，可通过使用同步特征进行操作来应用于任何分类问题，以直观地解释输入的哪些部分驱动模型的决策并区分其故障模式。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。我们在现实世界中接触丰富的操纵的两个示例中评估这种方法：推入和切断，以及已知和未知对象。最后，我们证明了我们的方法既可以对任务中的序列进行详细的视觉检查，又可以对模型的行为和趋势进行高级评估。这项工作的数据和代码可在https://github.com/imitsioni/interpretable_manipulation获得。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文