当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SpaceLDA: Topic distributions aggregation from a heterogeneous corpus for space systems
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2021-05-05 , DOI: 10.1016/j.engappai.2021.104273
Audrey Berquand , Yashar Moshfeghi , Annalisa Riccardi

The design of highly complex systems such as spacecraft entails large amounts of documentation. Tracking relevant information, including hundreds of requirements, throughout several design stages is a challenge. In this study, we propose a novel strategy based on Topic Modelling to facilitate the management of spacecraft design requirements. We introduce spaceLDA, a novel domain-specific semi-supervised Latent Dirichlet Allocation (LDA) model enriched with lexical priors and an optimised Weighted Sum (WS). We collect and curate the first large collection of unstructured data related to space systems, combining several sources: Wikipedia pages, books, and feasibility reports provided by the European Space Agency (ESA). We train the spaceLDA model on three subsets of our heterogeneous training corpus. To combine the resulting per-document topic distributions, we enrich our model with an aggregation method based on an optimised WS. We evaluate our model through a case study, a categorisation of spacecraft design requirements. We finally compare our model’s performance with an unsupervised LDA model and with a literature aggregation method. The results demonstrate that the spaceLDA model successfully identifies the topics of requirements and that our proposed approach surpasses the use of a classic LDA model and the state of the art aggregation method.



中文翻译:

SpaceLDA:来自空间系统异构语料库的主题分布聚合

诸如航天器之类的高度复杂的系统的设计需要大量的文档。在多个设计阶段中跟踪相关信息(包括数百个需求)是一个挑战。在这项研究中,我们提出了一种基于主题建模的新颖策略,以促进航天器设计需求的管理。我们介绍了spaceLDA,这是一种新颖的领域特定的半监督潜在Dirichlet分配(LDA)模型,该模型丰富了词汇先验和优化的加权和(WS)。我们收集并整理了与空间系统相关的非结构化数据的第一批大型集合,这些资源组合了以下几种来源:Wikipedia页面,书籍和欧洲航天局(ESA)提供的可行性报告。我们在异构训练语料库的三个子集上训练spaceLDA模型。为了合并每个文档的主题分布,我们使用了基于优化WS的聚合方法来丰富我们的模型。我们通过案例研究(航天器设计要求的分类)评估模型。最后,我们将模型的性能与无监督LDA模型和文献汇总方法进行了比较。结果表明spaceLDA模型成功识别了需求主题,并且我们提出的方法超越了经典LDA模型的使用和最新的聚合方法。

更新日期:2021-05-05
down
wechat
bug