Understanding Neural Code Intelligence Through Program Simplification,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Understanding Neural Code Intelligence Through Program Simplification
arXiv - CS - Software Engineering Pub Date : 2021-06-07 , DOI: arxiv-2106.03353
Md Rafiqul Islam Rabin, Vincent J. Hellendoorn, Mohammad Amin Alipour

A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods have tried to reduce this opacity in the vein of "transparent/interpretable-AI". However, these approaches are often specific to a particular set of network architectures, even requiring access to the network's parameters. This makes them difficult to use for the average programmer, which hinders the reliable adoption of neural CI systems. In this paper, we propose a simple, model-agnostic approach to identify critical input features for models in CI systems, by drawing on software debugging research, specifically delta debugging. Our approach, SIVAND, uses simplification techniques that reduce the size of input programs of a CI model while preserving the predictions of the model. We show that this approach yields remarkably small outputs and is broadly applicable across many model architectures and problem domains. We find that the models in our experiments often rely heavily on just a few syntactic features in input programs. We believe that SIVAND's extracted features may help understand neural CI systems' predictions and learned behavior.

中文翻译：

通过程序简化理解神经代码智能

最近开发了多种由深度神经网络提供支持的代码智能 (CI) 工具，以提高编程效率和执行程序分析。为了可靠地使用这些工具，开发人员通常需要对底层模型的行为以及影响它们的因素进行推理。这对于由深度神经网络支持的工具尤其具有挑战性。各种方法都试图以“透明/可解释人工智能”的方式减少这种不透明度。然而，这些方法通常特定于一组特定的网络架构，甚至需要访问网络参数。这使得它们难以被普通程序员使用，从而阻碍了神经 CI 系统的可靠采用。在本文中，我们提出了一个简单的，一种与模型无关的方法，通过利用软件调试研究，特别是增量调试来识别 CI 系统中模型的关键输入特征。我们的方法 SIVAND 使用简化技术来减少 CI 模型的输入程序的大小，同时保留模型的预测。我们表明，这种方法产生的输出非常小，并且广泛适用于许多模型架构和问题领域。我们发现我们实验中的模型通常严重依赖于输入程序中的几个句法特征。我们相信 SIVAND 提取的特征可能有助于理解神经 CI 系统的预测和学习行为。使用简化技术来减少 CI 模型的输入程序的大小，同时保留模型的预测。我们表明，这种方法产生的输出非常小，并且广泛适用于许多模型架构和问题领域。我们发现我们实验中的模型通常严重依赖于输入程序中的几个句法特征。我们相信 SIVAND 提取的特征可能有助于理解神经 CI 系统的预测和学习行为。使用简化技术来减少 CI 模型的输入程序的大小，同时保留模型的预测。我们表明，这种方法产生的输出非常小，并且广泛适用于许多模型架构和问题领域。我们发现我们实验中的模型通常严重依赖于输入程序中的几个句法特征。我们相信 SIVAND 提取的特征可能有助于理解神经 CI 系统的预测和学习行为。

更新日期：2021-06-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>