Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks
arXiv - CS - Computation and Language Pub Date : 2020-03-23 , DOI: arxiv-2003.11645
Tosin P. Adewumi, Foteini Liwicki and Marcus Liwicki

Word2Vec is a prominent model for natural language processing (NLP) tasks. Similar inspiration is found in distributed embeddings for new state-of-the-art (SotA) deep neural networks. However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to empirically show optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the released, pre-trained original word2vec model. Both intrinsic and extrinsic (downstream) evaluations, including named entity recognition (NER) and sentiment analysis (SA) were carried out. The downstream tasks reveal that the best model is usually task-specific, high analogy scores don't necessarily correlate positively with F1 scores and the same applies to focus on data alone. Increasing vector dimension size after a point leads to poor quality or performance. If ethical considerations to save time, energy and the environment are made, then reasonably smaller corpora may do just as well or even better in some cases. Besides, using a small corpus, we obtain better human-assigned WordSim scores, corresponding Spearman correlation and better downstream performances (with significance tests) compared to the original model, trained on 100 billion-word corpus.

中文翻译：

Word2Vec：最优超参数及其对 NLP 下游任务的影响

Word2Vec 是用于自然语言处理 (NLP) 任务的突出模型。在新的最先进 (SotA) 深度神经网络的分布式嵌入中发现了类似的灵感。然而，超参数的错误组合会产生质量较差的向量。这项工作的目的是凭经验证明存在超参数的最佳组合并评估各种组合。我们将它们与已发布的、预训练的原始 word2vec 模型进行比较。进行了内在和外在（下游）评估，包括命名实体识别（NER）和情感分析（SA）。下游任务表明，最佳模型通常是特定于任务的，高类比分数不一定与 F1 分数呈正相关，这同样适用于仅关注数据。在一个点之后增加向量尺寸会导致质量或性能不佳。如果考虑到节省时间、能源和环境的伦理考虑，那么在某些情况下，合理较小的语料库可能会做得同样好甚至更好。此外，与在 1000 亿字语料库上训练的原始模型相比，使用较小的语料库，我们获得了更好的人工分配 WordSim 分数、相应的 Spearman 相关性和更好的下游性能（具有显着性测试）。

更新日期：2020-05-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文