Java decompiler diversity and its application to meta-decompilation,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Java decompiler diversity and its application to meta-decompilation
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.jss.2020.110645
Nicolas Harrand , César Soto-Valero , Martin Monperrus , Benoit Baudry

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler.

中文翻译：

Java反编译器多样性及其在元反编译中的应用

在从 Java 源代码编译为字节码的过程中，某些信息不可逆转地丢失了。换句话说，Java 代码的编译和反编译不是对称的。因此，旨在从字节码生成源代码的反编译依赖于重构丢失信息的策略。不同的 Java 反编译器使用不同的策略来实现正确的反编译。在这项工作中，我们假设字节码反编译的不同方式对反编译器生成的源代码的质量有直接影响。在本文中，我们针对三个质量指标评估了八个 Java 反编译器的策略：句法正确性、句法失真和语义等价模输入。我们的结果表明，没有一个现代反编译器能够正确处理来自现实世界程序的各种字节码结构。本研究中排名最高的反编译器为我们数据集中的 84% 和 78% 的类分别生成了语法正确且语义等效的代码输出。我们的结果表明，每个反编译器都能正确处理一组不同的字节码类。我们提出了一种名为 Arlecchino 的新反编译器，它利用了现有反编译器的多样性。为此，我们根据编译错误将部分反编译合并为一个新的反编译。Arlecchino 处理了 37.6% 以前没有反编译器处理的字节码类。我们发布了这个新的字节码反编译器的源代码。本研究中排名最高的反编译器为我们数据集中的 84% 和 78% 的类分别生成了语法正确且语义等效的代码输出。我们的结果表明，每个反编译器都能正确处理一组不同的字节码类。我们提出了一种名为 Arlecchino 的新反编译器，它利用了现有反编译器的多样性。为此，我们根据编译错误将部分反编译合并为一个新的反编译。Arlecchino 处理了 37.6% 以前没有反编译器处理的字节码类。我们发布了这个新的字节码反编译器的源代码。本研究中排名最高的反编译器为我们数据集中的 84% 和 78% 的类分别生成了语法正确且语义等效的代码输出。我们的结果表明，每个反编译器都能正确处理一组不同的字节码类。我们提出了一种名为 Arlecchino 的新反编译器，它利用了现有反编译器的多样性。为此，我们根据编译错误将部分反编译合并为一个新的反编译。Arlecchino 处理了 37.6% 以前没有反编译器处理的字节码类。我们发布了这个新的字节码反编译器的源代码。我们提出了一种名为 Arlecchino 的新反编译器，它利用了现有反编译器的多样性。为此，我们根据编译错误将部分反编译合并为一个新的反编译。Arlecchino 处理了 37.6% 以前没有反编译器处理的字节码类。我们发布了这个新的字节码反编译器的源代码。我们提出了一种名为 Arlecchino 的新反编译器，它利用了现有反编译器的多样性。为此，我们根据编译错误将部分反编译合并为一个新的反编译。Arlecchino 处理了 37.6% 以前没有反编译器处理的字节码类。我们发布了这个新的字节码反编译器的源代码。

更新日期：2020-10-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11