当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain-topic models with chained dimensions: charting the evolution of a major oncology conference (1995-2017)
arXiv - CS - Digital Libraries Pub Date : 2019-12-31 , DOI: arxiv-1912.13349
Alexandre Hannud Abdo, Jean-Philippe Cointet, Pascale Bourret, Alberto Cambrosio

This paper presents three main contributions to the computational study of science from bibliographic corpora. First, by combining hypergraphs and stochastic block models, it introduces a new approach to model corpora based on their substantive contents and integrating both temporal and other metadata dimensions. We call this simultaneous modeling of documents and words "domain-topic models", and their integration with metadata their "chained dimensions". Second, the paper introduces a new form of interactive map for the exploration of hypergraph data that enables the seamless navigation of the different dimensions, scales, and their relations, as expressed in the models, and describes the steps to accurately read these new science maps. Third, it introduces a new corpus that is both of great interest to current STS research and an exemplary case for the new methodology presented here: the 1995-2017 collection of abstracts presented at ASCO, the largest annual oncology research conference. It is shown that the new approach, named SASHIMI, is able to infer thematic clusters in the corpus, describe them as assemblages of topics, and detect the presence of significant temporal patterns, identifying the major thematic transformations of oncology during the period.

中文翻译:

具有链接维度的领域主题模型:绘制主要肿瘤学会议的演变(1995-2017)

本文介绍了书目语料库对科学计算研究的三个主要贡献。首先,通过结合超图和随机块模型,它引入了一种基于其实质性内容并整合时间和其他元数据维度的语料库建模新方法。我们将这种文档和单词的同步建模称为“领域主题模型”,并将它们与元数据的集成称为“链式维度”。其次,本文介绍了一种用于探索超图数据的新形式的交互式地图,它可以无缝导航模型中表示的不同维度、比例及其关系,并描述了准确读取这些新科学地图的步骤. 第三,它引入了一个对当前 STS 研究非常感兴趣的新语料库,也是这里介绍的新方法的一个典型案例:1995-2017 年在 ASCO 上展示的摘要集,这是最大的年度肿瘤学研究会议。结果表明,名为 SASHIMI 的新方法能够推断语料库中的主题集群,将它们描述为主题的组合,并检测显着时间模式的存在,识别该时期肿瘤学的主要主题转变。
更新日期:2020-03-04
down
wechat
bug