当前位置: X-MOL 学术Int. J. Lexicogr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Erratum to Collaborative construction of a good quality, broad coverage and copyright free Japanese-French dictionary
International Journal of Lexicography ( IF 0.8 ) Pub Date : 2016-11-03 , DOI: 10.1093/ijl/ecw041
Mathieu Mangeot-Nagata 1
Affiliation  

This research project is located in the field of natural language processing (NLP), at the intersection of computer science and linguistics, specifically multilingual lexicography and lexicology. Concerning the Web, although French and Japanese are two well resourced languages (Berment, 2004), is not the case of the French-Japanese couple: - Electronic French-Japanese bilingual dictionaries (denshi jisho) can not be copied to a computer or reused; - There is a French-Japanese dictionary on the Web1, but it only contains 40 000 entries, no examples and is not available for download. There are collaborative Web dictionaries such as the Japanese-English JMdict project led by Jim Breen (2004) that contains over 173,000 items. These resources are freely downloadable. It is therefore possible to carry out such projects. During a first stay in Japan from November 2001 to March 2004, we had already noticed the lack of French-Japanese bilingual resources on the Web. Which gave rise to the Papillon project about the construction of a multilingual lexical database with a pivot structure (Serasset et al., 2001). Since then, progress has been made in several areas (technical, theoretical, social) (Mangeot, 2006) but the actual production of data has made very little progress. On the other hand, there is a new trend in reusing existing lexical resources (word sense disambiguation, using open source resources (Wiktionary, dbpedia) merging with ontologies, etc.). Although they allow to consolidate and expand the coverage of existing resources, these experiences still use data created by hand by professional lexicographers. There are printed French-Japanese dictionaries of good quality and sufficiently old to be royalty free. It should be possible to reuse these resources as part of our project to build a good quality dictionary and broad coverage available on the Web. Based on this observation, we defined the following project to build a rich multilingual lexical system with priority over French-Japanese languages. The construction will be done first by reusing existing resources (printed Japanese-French dictionaries, Japanese-other language dictionaries, 1http://www.dictionnaire-japonais.com  Wikipedia) and automatic operations (scanning and corrections, calculating translation links) and then by volunteer contributors working as a community on the Web. They will have to contribute to dictionary articles according to their level of expertise and knowledge in the field of lexicography or bilingual translation. The resulting resources will be royalty-free and intended for use by both humans via conventional bilingual dictionaries and by machines for automatic language processing tools (analysis, machine translation, etc.).

中文翻译:

质量好、覆盖面广、无版权的日法词典的协同建设勘误

该研究项目位于自然语言处理 (NLP) 领域,处于计算机科学和语言学的交叉点,特别是多语言词典编纂和词汇学。关于网络,虽然法语和日语是两种资源丰富的语言 (Berment, 2004),但法日夫妇的情况并非如此: - 电子法日双语词典 (denshi jisho) 不能复制到计算机或重复使用; - Web1 上有法日词典,但仅包含 40 000 个词条,没有示例,无法下载。有协作网络词典,例如由 Jim Breen (2004) 领导的日语-英语 JMdict 项目,其中包含超过 173,000 个项目。这些资源可以免费下载。因此,可以开展此类项目。在 2001 年 11 月至 2004 年 3 月第一次在日本逗留期间,我们已经注意到网络上缺乏法日双语资源。这引发了 Papillon 项目,该项目是关于构建具有枢轴结构的多语言词汇数据库(Serasset 等,2001)。从那时起,在多个领域(技术、理论、社会)取得了进展(Mangeot,2006 年),但数据的实际生产进展甚微。另一方面,重用现有词汇资源(词义消歧、使用开源资源(Wiktionary、dbpedia)与本体融合等)出现了新的趋势。尽管它们允许整合和扩大现有资源的覆盖范围,但这些体验仍然使用由专业词典编纂者手工创建的数据。印刷的法日词典质量上乘,而且年代久远,可以免版税。应该可以将这些资源作为我们项目的一部分来重用,以构建高质量的字典并在 Web 上提供广泛的覆盖范围。基于这一观察,我们定义了以下项目来构建一个丰富的多语言词汇系统,优先于法日语言。构建将首先通过重用现有资源(印刷的日法词典、日语-其他语言词典、1http://www.dictionnaire-japonais.com  维基百科)和自动操作(扫描和更正、计算翻译)来完成链接),然后由志愿者贡献者在网络上作为社区工作。他们必须根据他们在词典编纂或双语翻译领域的专业知识和知识水平为词典文章做出贡献。由此产生的资源将是免版税的,旨在供人类通过传统的双语词典使用,也供机器使用,用于自动语言处理工具(分析、机器翻译等)。
更新日期:2016-11-03
down
wechat
bug