当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Advanced Graph-Based Deep Learning for Probabilistic Type Inference
arXiv - CS - Programming Languages Pub Date : 2020-09-13 , DOI: arxiv-2009.05949
Fangke Ye, Jisheng Zhao, Vivek Sarkar

Dynamically typed languages such as JavaScript and Python have emerged as the most popular programming languages in use. Important benefits can accrue from including type annotations in dynamically typed programs. This approach to gradual typing is exemplified by the TypeScript programming system which allows programmers to specify partially typed programs, and then uses static analysis to infer the remaining types. However, in general, the effectiveness of static type inference is limited and depends on the complexity of the program's structure and the initial type annotations. As a result, there is a strong motivation for new approaches that can advance the state of the art in statically predicting types in dynamically typed programs, and that do so with acceptable performance for use in interactive programming environments. Previous work has demonstrated the promise of probabilistic type inference using deep learning. In this paper, we advance past work by introducing a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation. The TFG represents an input program's elements as graph nodes connected with syntax edges and data flow edges, and our GNN models are trained to predict the type labels in the TFG for a given input program. We study different design choices for our GNN models for the 100 most common types in our evaluation dataset, and show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively, outperforming the two most closely related deep learning type inference approaches from past work -- DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with a top-1 accuracy of 79.45%. Further, the average inference throughputs of those two configurations are 353.8 and 1,303.9 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for LambdaNet.

中文翻译:

用于概率类型推断的基于图的高级深度学习

JavaScript 和 Python 等动态类型语言已成为最流行的编程语言。在动态类型程序中包含类型注释可以获得重要的好处。这种逐渐类型化的方法以 TypeScript 编程系统为例,它允许程序员指定部分类型化的程序,然后使用静态分析来推断剩余的类型。但是,一般来说,静态类型推断的有效性是有限的,并且取决于程序结构的复杂性和初始类型注释。因此,有一种新方法的强烈动机,这些方法可以提高动态类型程序中静态预测类型的最新技术水平,并且在交互式编程环境中以可接受的性能这样做。以前的工作已经证明了使用深度学习进行概率类型推断的前景。在本文中,我们通过引入一系列在新型流图 (TFG) 表示上运行的图神经网络 (GNN) 模型来推进过去的工作。TFG 将输入程序的元素表示为与语法边和数据流边相连的图节点,我们的 GNN 模型经过训练可以预测给定输入程序的 TFG 中的类型标签。我们研究了评估数据集中 100 种最常见类型的 GNN 模型的不同设计选择,并表明我们最好的两个 GNN 配置准确度分别达到了 87.76% 和 86.89% 的 top-1 准确度,优于两个最密切相关的过去工作中的深度学习类型推断方法——DeepTyper 的前 1 名准确率为 84。62% 和 LambdaNet,前 1 名准确率为 79.45%。此外,这两种配置的平均推理吞吐量分别为 353.8 和 1,303.9 个文件/秒,而 DeepTyper 为 186.7 个文件/秒,LambdaNet 为 1,050.3 个文件/秒。
更新日期:2020-09-15
down
wechat
bug