当前位置: X-MOL 学术Explor. Econ. Hist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Measuring document similarity with weighted averages of word embeddings.
Explorations in Economic History ( IF 2.6 ) Pub Date : 2022-12-15 , DOI: 10.1016/j.eeh.2022.101494
Bryan Seegmiller , Dimitris Papanikolaou , Lawrence D.W. Schmidt

We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.



中文翻译:

使用词嵌入的加权平均值测量文档相似性。

我们详细介绍了一种估计两个文档之间文本相似性的方法,同时考虑了两个不同单词可能具有相似含义的可能性。我们通过将职业任务和行业输出描述与专利文本中描述的相关技术进行文本链接,来说明该方法在促进具有非常不同格式和词汇的文档之间进行比较方面的有用性;我们还研究了由此产生的文档相似性度量的经济应用。在最终应用中,我们证明该方法相对于比较同一领域内文档的替代方法也很有效,方法是显示职业任务描述之间的成对文本相似性强烈预测给定工人从一种职业过渡到另一种职业的概率。最后,

更新日期:2022-12-15
down
wechat
bug