Mechanical incrementalization of typing algorithms

https://doi.org/10.1016/j.scico.2021.102657Get rights and content

Highlights

  • A general algorithmic schema incrementalizing typing algorithms with implementation.

  • A general rule format driving the generation of standard and incremental type systems.

  • A coherence theorem between standard and incrementalized algorithms.

  • The instantiation of the schema on four well-known different programming paradigms.

  • Experimental results showing effectiveness and performance of the proposed schema.

Abstract

The ever-growing size of programs and their continuous evolution require building fast and efficient analyzers. Here we focus on the static ones, in particular on type systems, for both checking and inference. Just as programs change by incrementally changing or inserting pieces of code, called diffs, also type systems should be incremental and re-type the diffs, only.

An algorithmic schema is proposed that mechanically derives an incremental version of existing, standard typing algorithms. Ours is a grey-box approach: just the shape of the typing rules, that of the types and some domain-specific knowledge are needed to instantiate our schema. Here, we present the foundations of our approach and the conditions for its correctness. Our schema is applied to derive four incremental typing and inference algorithms for languages in different programming paradigms. We implemented an OCaml module that inputs a type system and outputs its incrementalized version. Experimental results show that our approach is effective, and prove its usage beneficial.

Introduction

The size of software code bases is increasingly growing, and the developers need mechanical support for managing this complexity. Software systems are no longer monolithic pieces of code, to which only new modules are compositionally added, rather their components grow and change incrementally. These issues become even more demanding because many companies are recently adopting development methodologies that advocate a continuous evolution of software, e.g. perpetual development model [8]. In such a model a shared code base is altered by many programmers submitting small code modifications (diffs). As recently observed by Harman and O'Hearn [18], it becomes crucial defining support tools that require an amount of work on the size of the diffs instead of the whole codebase.

Typically, developers analyze their code for checking certain properties at compile-time. Thus, the ever-growing size of programs requires building fast and efficient static analyzers. Here we focus on the widespread type systems that permit an early code verification, so reducing errors, and also prescribe programmers a clean programming style. As a matter of fact, most of the modern programming languages are equipped with mechanisms for checking or inferring types, and to verify specific properties through them.

Designing and implementing incremental type systems is not an easy task, especially when complex features are taken into account. To the best of our knowledge, the literature offers no general guidelines to design incremental typing systems and algorithms. Actually, some techniques are presented that however introduce new typing algorithms that work incrementally. Here, we propose instead a methodology and an algorithm to transform an existing type system (be it a checker or an inference one) into one that works incrementally and is correct by construction. More precisely, our proposal is to take an existing type algorithm A and to use it incrementally, without re-doing work already done before changes, but exploiting available results, through caching and memoization. In this way, no effort is spent in designing and developing new incremental typing algorithms. An advantage of our proposal is that it consists of an algorithmic schema, namely a wrapper that is independent of any specific language and type system. In addition, we put forward mild conditions on the results and on the original type system that guarantee that the results of incremental typing match those of the original algorithm.

Roughly, our algorithmic schema works as follows. We start from the abstract syntax tree of a program, where each node is annotated with the result R provided by the original typing algorithm A. We build then a cache, including for each sub-term t both the result R and some relevant contextual information needed by A to type t (typically a typing environment binding the free variables of t). When the program changes, its annotated abstract syntax tree changes accordingly, and typing the sub-term associated with the changed node is done incrementally. This is done by reusing the results in the cache whenever possible and by suitably invoking A upon need. Clearly, the more local the changes, the more reused the information.

Technically, our proposal consists of a set of rule schemata that drive the usage of the cache and of the original algorithm A, as sketched above. Actually, the user has to define the shape of caches and to instantiate a well-confined part of the rule schemata. If the instantiation meets two easy-to-check criteria, the typing results of A and of the incremental algorithm are guaranteed to be coherent, i.e. the incrementalized algorithm behaves as the non-incremental one. Remarkably, the size of the incrementalized version is always the same of the original version, with a couple of additional rules that take care of cache hits. The overhead in time and space is small, in particular it decreases with the size of diffs.

All the above provides us with the guidelines to develop a framework that makes incremental the usage of a given typing algorithm. As a proof-of-concept, our approach has been fully mechanized and the OCaml implementation, along with some examples is available online.1 This implementation consists of a parametric module that inputs a type system in a specific format, and outputs its incrementalized version.

Summing up, the main contributions of this paper include:

  • a parametric, language-independent algorithmic schema that builds a wrapper around an existing typing algorithm A, so as to allow using it incrementally (Section 3);

  • a formalization of the steps that instantiate the schema and yield the incrementalized version of A: the resulting typing algorithm only types the diffs and those parts of the code affected by them (Section 3);

  • a characterization of a rule format suitable to most typing algorithms, in terms of two auxiliary functions tr and checkJoin, only (Section 3) that, together with the syntax of the current language, suffice for automatically generating the code implementing its (non-incremental) type system;

  • a theorem that under two mild conditions guarantees the coherence of results between the original algorithm and its incremental version (Section 3);

  • the instantiation of the schema on four typing algorithms for languages in three different programming paradigms: imperative, functional and process calculus. The original type systems are taken from the literature, and cover four different aspects: checking confidentiality and integrity, dependent types, exceptions and protocol security (Section 4);

  • a module that inputs a type system, rather the two auxiliary functions tr and checkJoin and the syntax of the language in hand, and that outputs its incrementalized version (Section 5); and

  • experimental results showing that the time and the space used by the above incrementalized type checker depend on the size of diffs, and its performance increases as these become smaller (Section 5). This assessment is carried on a prototype of the incremental version of the type checker for MinCaml [40] (Section 5).

All the proofs of our theorems, some additional material and the full incremental type systems of our four case studies are in the Appendixes.

Preliminary portions of this paper appeared in two conference papers. Here we generalized and improved the schema of [7] in Section 3, providing the means for proving the correctness of the incremental type system obtained. Besides the case study in [6], we applied our algorithmic schema in Section 4 to languages taken from different programming paradigms not covered before: (i) the λ-calculus with dependent types; (ii) a functional language with inference of types and effects capturing exceptions; (iii) the SPI-calculus with a type system for secrecy. Also, we implemented our proposal in Section 5 showing that incrementalizing standard type systems is mechanically doable. As an example of feasibility, we considered an existing ray tracer and its different commits stored in GitHub. In addition, we devised and implemented a mechanism for mitigating the loss of performance in time that may occur is specific cases. Finally, we carried out more extensive experiments on the performance of our prototype, also on memory usage.

To the best of our knowledge, the literature has some proposals for incrementally typing programs. However, these approaches heavily differ from ours, because all of them propose a new incremental algorithm for typing, while we incrementally use existing algorithms as they are. Additionally, none of the approaches surveyed below use a uniform characterization of type judgements as we do through the metafunctions tr and checkJoin.

Meertens [28] proposes an incremental type checking algorithm for the language B. Johnson and Walz [20] treat incremental type inference, focusing on identifying where type errors precisely arise. Aditya and Nikhil [3] propose an incremental Hindley/Milner type system supporting incremental type checking of top-level definitions. Our approach instead supports incremental type-checking for all kinds of expressions, not only the top-level ones. Miao and Siek [29] introduce an incremental type checker leveraging the fact that, in multi-staged programming, programs are successively refined. Wachsmuth et al. [42] propose a task engine for type checking and name resolution: when a file is modified a task is generated and existing (cached) results are re-used where possible. The proposal by Erdweg et al. [12] is the most similar to ours, but, given a type checking algorithm, they describe how to obtain a new incremental algorithm. As in our case, they decorate an abstract syntax tree with types and typing environments, represented as sets of constraints, to be suitably propagated when typing. In this way there is no need of dealing with top-down context propagation while types flow bottom-up. Recently, Facebook released Pyre [13] a scalable and incremental type checker specifically designed for Python. Pacak et al. [33] propose a systematic approach to derive incremental type checkers directly from the inference rules of a type checker, expressed in Datalog [9]. Despite sharing our goals, their benchmarking results are not directly comparable with ours since they adopt existing and very efficient Datalog interpreters whereas we use our own prototypical implementation. Further research is needed to compare the two approaches.

Incrementality has also been studied for static analysis other than typing. IncA [41] is a domain-specific language for the definition of incremental program analyses, which represents dependencies among the nodes of the abstract syntax tree of the target program as a graph. Infer [19] uses an approach similar to ours in which analysis results are cached to improve performance [5]. Designing incremental dataflow analyzers is a well studied problem and many proposals are based on the technique of the restarting iteration [11], [16]. Intuitively, the idea of this technique is to start the fixpoint iteration to solve the data-flow equations from an already computed analysis where the entries corresponding to the changed program points are invalidated. Ryder and Paull [38] present two incremental update algorithms, ACINCB and ACINCF, that allow incremental data-flow analysis. Yur et al. [45] propose an algorithm for an incremental points-to analysis. McPeak et al. [27] describe a technique for incremental and parallel static analysis based on work units (self-contained atoms of analysis input). The solutions are computed by a sort of processes called analysis workers, all coordinated by an analysis master. Arzt and Bodden [4] presented Reviser, a tool for incremental interprocedural data-flow analysis based on the IDE/IFDS framework. Given the control-flow graphs of the two program versions, and a previous analysis, Reviser uses a graph-diff algorithm to determine a superset of the changed nodes. Then, the tool recomputes the analysis for them and for those nodes that transitively depend on them. More recently, Nichols et al. [32] presents fixpoint reuse, an incremental static analysis technique based on fixpoint computations for Javascript programs. This technique, given two versions of a program Po and Pn, and an analysis Ao for Po, computes a sound approximation An of the analysis for Pn. Seidel et at. [39] extend the local generic solver for computing fixpoint solution on which their tool Goblint relies to support incremental static analysis of different versions of the same programs.

Also, there are papers that use memoization with a goal similar to the one of our cache, even if they consider different analysis techniques. In particular, Mudduluru et al. propose, implement, and test an incremental analysis algorithm based on memoization of (equivalent) boolean formulas used to encode paths on programs [30]. Leino et al. [23] extend the verification machinery of the Dafny language with a cache mechanism to record the results from earlier runs of the verifier. The cache works on the control-flow graph of the program and for each node stores its verification results. Thus, the verification efforts are focused on those parts of the program that were affected by the user's most recent modifications. Although their cache mechanism is similar to ours, they do not provide any formal condition when it is safe re-use cached data. Also other authors apply memoization techniques to incremental model-checking [22], [43] and incremental symbolic execution [44], [36].

A few general formats for specifying programming languages, program analysis tools and type systems exist in the literature, e.g., PLT Redex [14] and K [37]. However, these proposals are usually much wider in scope than the format we proposed in Definition 2. We feel that providing an incrementalization schema based on (restricted versions of) one of these existing formats could foster a wider adoption of our approach but at the same time we think that it would make our framework significantly more complex.

Focusing instead of formats that are specific for type systems, Marino and Millstein [26] propose a very general format with the aim to describe a large class of type and effect systems. More precisely, they provide a set of rule templates to be instantiated to the language at hand by specifying two functions: adjust and check. Very roughly, the first function works similarly to our tr and the second one is analogous to checkJoin. However, differently from us, they are interested in describing and studying the properties of various type (and effect) systems and not in providing a general format useful for making typing algorithms incremental.

In an approach analogous to ours, Cimini and Siek [10] introduce Gradualizer, an algorithm that inputs a type system expressed in λ-prolog [15] and outputs a gradually typed version of it. Similarly, Pennino developed a tool in his B.Sc. thesis [34], which parses inference rules written in Datalog and generates a type checker in our format.

Section snippets

The incremental schema in a nutshell

Here we overview our algorithmic schema instantiating it on a core of a functional language, with the standard syntax, types, and typing system T with typing judgements ΓTe:τ, where Γ is the typing environment, e is an expression and τType is its type. The upper part of Fig. 1 shows the typing rules for variables and the letx=e1ine2 expression (ignore for the moment the colored boxes).

Suppose you have typed the following factorial function (in λfact the index is the name of the recursive

Formalizing the incremental schema

In this section we formally present our algorithmic schema that, given a typing algorithm A, yields its incrementalized version IA. We also prove that IA is sound, in that it computes the same types of A, possibly up to some type manipulations used by A.

We remark that our proposal is independent of the paradigm of the programming language considered and of the kind of its type system, be it designed for inferring and for checking types, as exemplified in Section 4. We only assume in the formal

Making some existing typing algorithms incremental

In this section we illustrate the flexibility of our proposal and how our algorithmic schema can be easily instantiated to make incremental the usage of non trivial type systems. We consider four case studies. The first is in Section 4.1 and takes an imperative language with a type system for robust declassification. The second in Section 4.2 is about checking the λ-calculus with dependent types. The third case study in Section 4.3 considers inference of types for exception handling in a

Implementation and experiments

This section presents a prototypical implementation of our algorithmic schema as the OCaml module Incrementalizer. We have applied it to the type checker for MinCaml [40], a monomorphic higher-order core of ML. Finally, we report on some experiments that show that the time and space performance of the incrementalized type checker are almost always better than those of the original MinCaml type checker. In particular, the time depends on the size of diffs and decreases as these become smaller,

Conclusions

We have presented an algorithmic schema for incrementally using existing type checking and type inference algorithms. Since only the shape of the input, the output, and some domain-specific knowledge of the original algorithms are relevant, our schema is essentially a wrapper that uses the original typing algorithms as grey-boxes.

We have introduced the basic bricks of our approach and proved a theorem guaranteeing the coherence of any original algorithm with its incremental version, and vice

CRediT authorship contribution statement

All the authors contributed equally to this paper.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Matteo Busi has been partially supported by the research grant on Incremental type systems for secure compilation from the Department of Computer Science of the University of Pisa. Pierpaolo Degano and Letterio Galletta have been partially supported by the MIUR project PRIN 2017FTXR7S IT MATTERS (Methods and Tools for Trustworthy Smart Systems).

References (45)

  • M. Abadi et al.

    A calculus for cryptographic protocols: the spi calculus

    Inf. Comput.

    (1999)
  • G. Rosu et al.

    An overview of the K semantic framework

    J. Log. Algebraic Methods Program.

    (2010)
  • M. Abadi

    Secrecy by typing in security protocols

    J. ACM

    (1999)
  • S. Aditya et al.

    Incremental polymorphism

  • S. Arzt et al.

    Reviser: efficiently updating ide-/ifds-based data-flow analyses in response to incremental program changes

  • S. Blackshear et al.

    Finding inter-procedural bugs at scale with Infer static analyzer

  • M. Busi et al.

    Robust declassification by incremental typing

  • M. Busi et al.

    Using standard typing algorithms incrementally

  • C. Calcagno et al.

    Moving fast with software verification

  • S. Ceri et al.

    What you always wanted to know about datalog (and never dared to ask)

    IEEE Trans. Knowl. Data Eng.

    (1989)
  • M. Cimini et al.

    The gradualizer: a methodology and algorithm for generating gradual type systems

  • K.D. Cooper

    Interprocedural Data Flow Analysis in a Programming Environment

    (1983)
  • S. Erdweg et al.

    A co-contextual formulation of type rules and its application to incremental type checking

  • Facebook

    Pyre - a performant type-checker for Python 3

  • M. Felleisen et al.

    Semantics Engineering with PLT Redex

    (2009)
  • A.P. Felty et al.

    Lambda-prolog: an extended logic programming language

  • V. Ghodssi

    Incremental analysis of programs

    (1983)
  • S. Grewe et al.

    Type systems for the masses: deriving soundness proofs and efficient checkers

  • M. Harman et al.

    From start-ups to scale-ups: opportunities and open problems for static and dynamic program analysis

  • F. Infer

    Infer static analyzer

  • G.F. Johnson et al.

    A maximum-flow approach to anomaly isolation in unification-based incremental type inference

  • P.C. Kanellakis et al.

    Polymorphic unification and ML typing

  • Cited by (0)

    View full text