Traceability Support for Multi-Lingual Software Projects,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Traceability Support for Multi-Lingual Software Projects
arXiv - CS - Software Engineering Pub Date : 2020-06-30 , DOI: arxiv-2006.16940
Yalin Liu, Jinfeng Lin, Jane Cleland-Huang

Software traceability establishes associations between diverse software artifacts such as requirements, design, code, and test cases. Due to the non-trivial costs of manually creating and maintaining links, many researchers have proposed automated approaches based on information retrieval techniques. However, many globally distributed software projects produce software artifacts written in two or more languages. The use of intermingled languages reduces the efficacy of automated tracing solutions. In this paper, we first analyze and discuss patterns of intermingled language use across multiple projects, and then evaluate several different tracing algorithms including the Vector Space Model (VSM), Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), and various models that combine mono- and cross-lingual word embeddings with the Generative Vector Space Model (GVSM). Based on an analysis of 14 Chinese-English projects, our results show that best performance is achieved using mono-lingual word embeddings integrated into GVSM with machine translation as a preprocessing step.

中文翻译：

多语言软件项目的可追溯性支持

软件可追溯性在不同的软件工件（例如需求、设计、代码和测试用例）之间建立关联。由于手动创建和维护链接的成本很高，许多研究人员提出了基于信息检索技术的自动化方法。然而，许多分布在全球的软件项目会产生用两种或多种语言编写的软件工件。混合语言的使用会降低自动跟踪解决方案的效率。在本文中，我们首先分析和讨论跨多个项目的混合语言使用模式，然后评估几种不同的跟踪算法，包括向量空间模型 (VSM)、潜在语义索引 (LSI)、潜在狄利克雷分配 (LDA)、以及将单语言和跨语言词嵌入与生成向量空间模型 (GVSM) 相结合的各种模型。基于对 14 个中英项目的分析，我们的结果表明，将单语词嵌入以机器翻译作为预处理步骤集成到 GVSM 中可以获得最佳性能。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>