当前位置: X-MOL 学术Electron. Commer. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM
Electronic Commerce Research ( IF 3.462 ) Pub Date : 2019-09-03 , DOI: 10.1007/s10660-019-09371-6
Yongcong Luo , Jing Ma , Chi Li

Commodity information must be matched to HSCode so as to be quickly through customs for export. So it is particularly important to identify entity name in the commodity title of e-commerce platform quickly and accurately. Aim at the problem, an approach based on TWs-LSTM is proposed to identify the entity name of commodity. In this paper, we apply TFIDF algorithm to manipulate text corpus of the commodity for getting the weight matrix of the commodity words. Meanwhile, we use the Word2Vec model to represent the semantic meanings of the words extracted from the bag of words. Then, the weight vector of commodity titles and every word vector of the title are combined into a new one-dimensional vector. We use these one-dimensional vectors to represent the commodity titles, named TWs model. Finally, we put the TWs vector into the LSTM for commodity entity name recognition. In the experimental stage, we compare the TWs-LSTM model with other text processing models for experimental calculation by dividing the commodity entity name data into a training set and a testing set. After applying the TWs-LSTM model, the F1-Score reached 64.58% with the commodity title corpus of the Tmall platform, where the TWs-LSTM achieves a state-of-the-art in comparison with the baseline models and previous studies.

中文翻译:

基于TWs-LSTM的跨境电子商务商品名称的实体名称识别

商品信息必须与HSCode匹配,以便快速通过海关进行出口。因此,快速准确地识别电子商务平台商品名称中的实体名称尤为重要。针对该问题,提出了一种基于TWs-LSTM的方法来识别商品的实体名称。本文采用TFIDF算法对商品文本语料进行处理,得到商品词的权重矩阵。同时,我们使用Word2Vec模型来表示从单词袋中提取的单词的语义。然后,将商品标题的权重向量和标题的每个单词向量组合成一个新的一维向量。我们使用这些一维向量来表示商品名称,称为TWs模型。最后,我们将TWs向量放入LSTM中以进行商品实体名称识别。在实验阶段,我们通过将商品实体名称数据分为训练集和测试集,将TWs-LSTM模型与其他文本处理模型进行实验计算进行比较。在应用TWs-LSTM模型之后,F1-Score达到了天猫平台的商品名称语料库的64.58%,与基线模型和先前的研究相比,TWs-LSTM达到了最先进的水平。
更新日期:2019-09-03
down
wechat
bug