当前位置: X-MOL 学术J. Intell. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Measuring semantic similarity of documents with weighted cosine and fuzzy logic
Journal of Intelligent & Fuzzy Systems ( IF 1.7 ) Pub Date : 2020-06-11 , DOI: 10.3233/jifs-179889
Juan Huetle-Figueroa 1 , Fernando Perez-Tellez 1 , David Pinto 2
Affiliation  

Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods to improve the cosine similarity algorithm and fuzzy logic. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits enrich the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores.

中文翻译:

用加权余弦和模糊逻辑测量文档的语义相似度

当前,语义分析已用于不同领域,例如信息检索,生物医学领域和自然语言处理。这项研究工作的主要重点是使用语义方法来改进余弦相似度算法和模糊逻辑。在这种情况下,将算法应用于纯文本(简历)和职位描述。使用WordNet的同义词集来丰富语义相似性方法,例如Wu-Palmer相似性(WUP),Leacock-Chodorow相似性(LCH)和路径相似性(双名/简称)。另外,使用关键词提取来创建发布列表,其中对关键词进行加权。在这项研究工作中,讨论了在发布职务说明的公司中招募新人员并在工人发布简历时相互寻找公司的任务。需要创建新的金标准来实现对所提议方法的比较。设计了一个Web应用程序以手动匹配文档,从而创建了新的金标准。从而,新的金本钱确认优势在语义上丰富了余弦算法。最后,将结果与新的金标准进行比较,以检验所提出的新方法的效率。用于分析的度量是精度,查全率和f度量,认为可以使用语义加权的余弦相似度来获得更好的相似度分数。将结果与新的金标准进行比较,以检验所提出的新方法的效率。用于分析的度量是精度,查全率和f度量,认为可以使用语义加权的余弦相似度来获得更好的相似度分数。将结果与新的金标准进行比较,以检验所提出的新方法的效率。用于分析的度量是精度,查全率和f度量,认为可以使用语义加权的余弦相似度来获得更好的相似度分数。
更新日期:2020-06-19
down
wechat
bug