CoTexT: Multi-task Learning with Code-Text Transformer,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CoTexT: Multi-task Learning with Code-Text Transformer
arXiv - CS - Programming Languages Pub Date : 2021-05-18 , DOI: arxiv-2105.08645
Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec Peltekian, Yanfang Ye

We present CoTexT, a transformer-based architecture encoder-decoder pre-trained model that learns the representative context between natural language (NL) and programming language (PL) through multi-task learning. CoTexT is pre-trained, in self-supervised fashion, based on large programming language corpus to learn general-purpose understanding and code-text generation supporting downstream NL-PL task such as code summarizing/documentation, code generation, defect detection, code debugging, etc. We train CoTexT on different combination of available PL corpus including both "bimodal" and "unimodal" data where the former is the combinations of both natural texts and their corresponding code snippets in an input sequence and the latter is merely code snippets. We evaluate multi-task learning CoTexT on different generation and classification tasks on CodeXGLUE and it achieves state-of-the-art on all downstream tasks.

中文翻译：

CoTexT：使用代码文本转换器进行多任务学习

我们介绍CoTexT，这是一个基于变压器的体系结构编码器-解码器预训练模型，该模型通过多任务学习来学习自然语言（NL）和编程语言（PL）之间的代表性上下文。CoTexT以自我监督的方式基于大型编程语言语料库进行了预培训，以学习通用知识和代码文本生成，以支持下游NL-PL任务，例如代码摘要/文档编制，代码生成，缺陷检测，代码调试等等。我们在可用PL语料库的不同组合（包括“双峰”和“单峰”数据）上训练CoTexT，其中前者是输入序列中自然文本及其对应代码段的组合，而后者仅仅是代码段。

更新日期：2021-05-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文